Title: | Data to Accompany Applied Linear Regression 4th Edition |
---|---|
Description: | Datasets to Accompany S. Weisberg (2014, ISBN: 978-1-118-38608-8), "Applied Linear Regression," 4th edition. Many data files in this package are included in the `alr3` package as well, so only one of them should be used. |
Authors: | Sanford Weisberg <[email protected]> |
Maintainer: | Sanford Weisberg <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.6 |
Built: | 2024-11-01 03:54:19 UTC |
Source: | https://github.com/cran/alr4 |
Data on 102 male and 100 female athletes collected at the Australian Institute of Sport.
This data frame contains the following columns:
(0 = male or 1 = female)
height (cm)
weight (kg)
lean body mass
red cell count
white cell count
Hematocrit
Hemoglobin
plasma ferritin concentration
body mass index, weight/(height)**2
sum of skin folds
Percent body fat
Case Labels
Sport
Ross Cunningham and Richard Telford
S. Weisberg (2014). Applied Linear Regression, 4th edition. New York: Wiley.
head(ais)
head(ais)
Bland's Apple Shoot data. allshoots includes all the data, shortshoots just the short shoot data, and longshoots includes long shoots only.
This data frame contains the following columns:
days from dormancy
number of shoots sampled
average number of stem units
within-day standard deviation
1 if long shoots, 0 if shortshoots.
Bland, J. (1978). A comparisonof certain aspects of ontogeny in the long and short shoots of McIntosh apple during one annual growth cycle. Unpublished Ph. D. dissertation, University of Minnesota, St. Paul, Minnesota.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(longshoots)
head(longshoots)
These function will access the website for Applied Linear Regression, 3rd and 4th editions.
alr4Web(page = c("webpage", "errata", "primer", "solutions"))
alr4Web(page = c("webpage", "errata", "primer", "solutions"))
page |
A character string indicating what page to open. The default "webpage" will open the main webpage, "errata" displays the Errata sheet for the thrid edition of the book, "primer" fetches and displays the primer for R, and "solutions" gives solutions to odd-numbered problems. |
Either a webpage or a pdf document is displayed. This function gives quick access to the website for the book and in particular to the R primer and solutions to odd-numbered problems. The pdf files are formatted for viewing on a computer screen. With Adobe Reader, view the pdf files with the bookmarks showning at the left, using signle page view which is selected by View -> Page Dispaly -> Single Page View.
Sanford Weisberg, based on the function UsingR in the UsingR package by John Verzani
## Not run: alr4Web("primer")
## Not run: alr4Web("primer")
The data in the file were collected in a study of the effect of dissolved sulfur on the surface tension of liquid copper (Baes and Kellogg, 1953)
This data frame contains the following columns:
Weight percent sulfur
Decrease in surface tension, dynes/cm
Baes, C. and Kellogg, H. (1953). Effect of dissolved sulphur on the surface tension of liquid copper. J. Metals, 5, 643-648.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(baeskel)
head(baeskel)
Data from the Berkeley guidance study of children born in 1928-29 in Berkeley, CA. BGSall contains all the data, BGSboys the boys only, and BGSgirls the girls only.
This data frame contains the following columns:
0 = males, 1 = females
Age 2 weight (kg)
Age 2 height (cm)
Age 9 weight (kg)
Age 9 height (cm)
Age 9 leg circumference (cm)
Age 9 strength (kg)
Age 18 weight (kg)
Age 18 height (cm)
Age 18 leg circumference (cm)
Age 18 strength (kg)
Body Mass Index, WT18/(HT18/100)^2
, rounded to one decimal.
Somatotype, a 1 to 7 scale of body type.
Tuddenham, R. D. and Snyder, M. M. (1954). Physical Growth of California Boys and Girls from Birth to Eighteen years. Univ. of Calif. Publications in Child Development, 1, 183-364.
S. Weisberg (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(BGSall) head(BGSboys) head(BGSgirls)
head(BGSall) head(BGSboys) head(BGSgirls)
Prices in many world cities from a 2003 Union Bank of Switzerland report.
This data frame uses the name of the city as row names, and contains the following columns:
Minutes of labor to purchase a Big Mac
Minutes of labor to purchase 1 kg of bread
Minutes of labor to purchase 1 kg of rice
Food price index (Zurich=100)
Cost in US dollars for a one-way 10 km ticket
Normal rent (US dollars) of a 3 room apartment
Primary teacher's gross income, 1000s of US dollars
Primary teacher's net income, 1000s of US dollars
Tax rate paid by a primary teacher
Primary teacher's hours of work per week:
Union Bank of Switzerland report, Prices and Earnings Around the Globe (2003 version).
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(BigMac2003)
head(BigMac2003)
Data from the Boundary Waters Canoe Area Wilderness Blowdown. The data
frame Blowdown
includes nine species of trees, but this file only includes black spruce, grouped
by diameter.
This data frame contains the following columns:
Tree diameter, in cm
Number of trees of this value of d
that died (blowdown)
number of trees of this size class measured
Roy Rich
S. Weisberg (2014). Applied Linear Regression, fourth edition. New York: Wiley.
head(BlowBS)
head(BlowBS)
Data from the Boundary Waters Canoe Area Wilderness Blowdown. The data frame blowdown
includes nine species of trees. The data for balsam fir, summarized by diameter
class, are given in BlowBF
.
This data frame contains the following columns:
Tree diameter, in cm
Proportion of basal area killed for the four species balsam fir, cedar, paper birch and blue spruse, a measure of local severity of the storm.
Tree species, a factor with 9 levels
1 if the tree died, 0 if it survived
Roy Rich
S. Weisberg (2014). Applied Linear Regression, fourth edition. New York: Wiley.
head(Blowdown)
head(Blowdown)
The data provided gives the average body weight in kilograms and the average brain weight in grams for sixty-two species of mammals.
This data frame uses species names as row labels and contains the following columns:
Brain weight, grams
Body weight, kg
Allison, T. and Cicchetti, D. (1976). Sleep in mammals: Ecology and constitutional correlates. Science, 194, 732-734.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(brains)
head(brains)
Oehlert (2000, Example 19.3) provides data from a small experiment on baking packaged cake mixes.
A data frame with 14 observations on the following 4 variables.
a factor
Baking time, minutes
Baking temperature, degrees F
Palatability score
Oehlert, G. W. (2000). A First Course in Design and Analysis of Experiments. New York: Freeman.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(cakes) lm(Y~X1+X2+I(X1^2)+I(X2^2)+X1:X2, data=cakes)
head(cakes) lm(Y~X1+X2+I(X1^2)+I(X2^2)+X1:X2, data=cakes)
Heights and lengths of Gothic and Romanesque cathedrals.
This data frame uses cathedral names as row label andcontains the following columns:
Romanesque or Gothic
Total height, feet
Total length, feet
Stephen Jay Gould
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(cathedral)
head(cathedral)
Artificial data to illustrate problems with residual plots.
This data frame contains the following columns:
Artificial data item.
Artificial data item.
Artificial data item.
R. D. Cook and S. Weisberg (1999), Graphs in statistical analysis: Is the medium the message? American Statistician, 53, 29-37.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(caution)
head(caution)
Contains data from the performance of O-rings in 23 U.S. space shuttle flights prior to the Challenger disaster of January 20, 1986.
This data frame uses dates as row names and contains the following columns:
Air Temp at launch (degrees F)
Leak check pressure
Number of O-rings that failed
6, number of O-rings in launch
Number of erosion incidents
Number of blowby incidents
Total Damage Index
Dalal, S, Fowlkes, E. B. and Hoadley, B. (1989), Risk analysis of the space shuttle: Pre-challenger prediction of failure, Journal of the American Statistical Association, 84, 945-957. See also Tufte, E. R. (1997), Visual and statistical Thinking: Displays of evidence for making decisions, Cheshire, CT: Graphics Press.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Challeng)
head(Challeng)
The data summarize the results of the first Florida Area Cumulus Experiment, or FACE-1, designed to study the effectiveness of cloud seeding to increase rainfall in a target area (Woodley, Simpson, Biondini, and Berkley, 1977).
This data frame contains the following columns:
Action, 1=seed, 0=do not seed
Day after June 16, 1975
Suitability for seeding
percent cloud cover in experimental area, measured using radar in Coral Gables, Florida
prewetness
echo motion category, either 1 or 2, a measure for type of cloud
in target area
Woodley, W.L., Simpson, J., Biondini, R., and Berkley, J. (1977). Rainfall results 1970-75: Florida area cumulus experiment. Science, 195, 735-742.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(cloud)
head(cloud)
These files give the results of two experiments to see if manipulating the air conditioning fans in the Minneapolis metrodome can effect the distance travelled by a baseball. The data in domedata were collected in April 2003. The experiment was repeated in May 2003 and domedata1 gives the combined data from the two experiments.
A data frame with 96 observations on the following 7 variables.
a factor with levels March
- May
a factor with levels Headwind
, Tailwind
the actual angle
in feet per second
weight of ball in grams
diameter of ball in inches
distance in feet of the flight of the ball
Ivan Marusic
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(domedata1)
head(domedata1)
The Donner Party was the most famous tragedy in the history of the westward migration in the United States. In the winter of 1846-47, abount ninety wagon train emigrants were unable to cross the Sierra Nevada Mountains of California before winter, and almost one-half starved to death. Perhaps because they were ordinary people – farmers, merchants, parents, children. These data include some information about each of the members of the party from Johnson (1996).
This data frame uses the person's name as row labels and contains the following columns:
Approximate age in 1846
died or survived, a factor
Male or Female
Either a family name, hired or single
A factor with levels Family, Single or Hired
Johnson, K. (1996). Unfortunate Emigrants: Narratives of the Donner Party. Logan, UT: Utah State University Press, http://www.metrogourmet.com/crossroads/KJhome.htm.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Donner)
head(Donner)
For unknown reasons, some dairy cows become recumbant–they lay down. This condition can be serious, and may lead to death of the cow. These data are from a study of blood samples of over 500 cows studied at the Ruakura (N.Z.) Animal Health Laboratory during 1983-84. A variety of blood tests were performed, and for many of the animals the outcome (survived, died, or animal was killed) was determined. The goal is to see if survival can be predicted from the blood measurements. Case numbers 12607 and 11630 were noted as having exceptional care—and they survived.
This data frame contains the following columns:
a factor with levels before and after
Days recumbent
Serum creatine phosphokinase (U/l at 30C)
serum asparate amino transferase (U/l at 30C)
serum urea (mmol/l)
Packed Cell Volume (Haemactocrit),
inflamation 0=no, 1=yes
Muscle disorder, a factor with levels present, and absent
a factor with levels died and survived
Clark, R. G., Henderson, H. V., Hoggard, G. K. Ellison, R. S. and Young, B. J. (1987). The abiltiy of biochemical and haematolgical tests to predict recovery in periparturient recumbent cows. NZ Veterinary Journal, 35, 126-133.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Downer)
head(Downer)
These data are to try to understand the effect of health plan characteristics on drug costs. Health plans vary in size, given as member months. Some plans use generic drugs more than others. All differ on copayments. Some have strong restrictions on which drugs can be dispensed value of RI=0 means that all drugs are dispensed, RI=100 means that only one per category is avaiable. The goal is to determine the terms that are related to cost, and in particular to understand the role of GS and RI in determining cost.
This data frame uses a short code name for the drug plan as row labels and contains the following columns:
Ave. cost to plan for 1 prescription for 1 day
Number of prescriptions per member per year
Percent generic substitution, number between 0 (no substitution) to 100 (always use generic substitute)
Restrictiveness index (0=none, 100=total)
Average Rx copayment
Average age of member
Percent female members
Member months, a measure of the size of the plan
Mark Siracuse
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(drugcost)
head(drugcost)
An experiment was conducted to study the O2UP, oxygen uptake in milligrams of oxygen per minute, given five chemical measurements: biological oxygen demand (BOD), total Kjeldah nitrogen (TKN), total solids (TS), total vital solids (TVS), which is a component of TS, and chemical oxygen demand (COD), each measured in milligrams per liter (Moore, 1975).
This data frame contains the following columns:
Day number
Biological oxygen demand
Total Kjeldahl nitrogen
Total Solids
Total volatile solids
Chemical oxygen demand
Oxygen uptake
Moore, J. (1975). Total Biomedical Oxygen Demand of Animal Manures. Unpublished Ph. D. disseration, University of Minnesota.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(dwaste)
head(dwaste)
County-by-county vote for president in Florida in 2000 for Bush, Gore and Buchanan.
A data frame three vaiaables for each of Florida's 67 counties.
Vote for Gore
Vote for Bush
Vote for Buchanan
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(florida) ## maybe str(florida) ; plot(florida) ...
head(florida) ## maybe str(florida) ; plot(florida) ...
The data consists of 17 pairs of numbers corresponding to observed boiling point and corrected barometric pressure, at locations in the Alps.
This data frame contains three columns. The first two columns are identical to the data set named forbes in the MASS package.
Adjusted boiling point of water in degrees F.
Atmospheric pressure, in inches of Mercury
100 times log10(pres), rounded to two decimals
Forbes, J. (1857). Further experiments and remarks on the measurement of heights and boiling point of water. Transactions of the Royal Society of Edinburgh, 21, 235-243.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Forbes)
head(Forbes)
Monthly snowfall data for Fort Collins, CO, 1900-01 to 1992-93
This data frame contains the following columns:
Year corresponding to the September to December data
September to December snowfall, inches
January to June snowfall, inches
http://ccc.atmos.colostate.edu/cgi-bin/monthlydata.pl
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(ftcollinssnow)
head(ftcollinssnow)
Monthly average temperature data for Fort Collins, CO weather station 53005, 1900-01 to 2010-11
This data frame contains the following columns:
Year corresponding to the September to November data
September to November mean temperature, degrees F
December to February mean temperature, degrees F
http://ccc.atmos.colostate.edu/cgi-bin/monthlydata.pl
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(ftcollinstemp)
head(ftcollinstemp)
Data on motor fuel consumption and related variables, for the year 2001. The unit is a state in the United States or the District of Columbia. Data are for 2001, unless noted.
This data frame contains the following columns. Row labels are the two-digit US Postal abbreviations for the US states.
Number of Licensed drivers in the state
Gasoline sold for road use (1000s of gal.)
Per capita personal income (year 2000)
Miles of Federal-aid highway miles in the state
Estimated miles driven per capita
Population age 16 and over
Gasoline state tax rate, cents per gallon
http://www.fhwa.dot.gov/ohim/hs01/index.htm
Weisberg, S. (2014). Applied Linear Regression, third edition. New York: Wiley.
head(fuel2001) # Most of the examples in ALR3 that use these data first # transform several of the columns fuel2001 <- transform(fuel2001, Dlic=1000 * Drivers/Pop, Fuel=1000 * FuelC/Pop, Income=Income/1000) pairs(Fuel~Tax + Dlic + Income + log2(Miles), data=fuel2001)
head(fuel2001) # Most of the examples in ALR3 that use these data first # transform several of the columns fuel2001 <- transform(fuel2001, Dlic=1000 * Drivers/Pop, Fuel=1000 * FuelC/Pop, Income=Income/1000) pairs(Fuel~Tax + Dlic + Income + log2(Miles), data=fuel2001)
Johnson and Raven (1973) have presented data giving the number of species and related variables for 29 different islands in the Galapagos Archipelago.
This data frame uses the island name as row labels and contains the following columns:
Number of Species
Number of endemic species (orrur only on that island)
Surface area of island, hectares
Area of closest island, hectares
Distance to closest island, km
Distance from Santa Cruz Island, km
Elevation in m, missing values given as zero
1 if elevation is observed, 0 if missing
Johnson, M.P., and Raven, P.H. (1973). Species number and endemism: The Galapagos Archipelago revisited. Science, 179, 893-895.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(galapagos)
head(galapagos)
In a paper presented to the Royal Institute on February 9, 1877, Sir Francis Galton discussed his experiments on sweet peas in which he compared the sweet peas produced by parent plants to those produced by offspring plants. In these experiments he could observe inheritance from one generation to the next. Galton categorized the parent plants according to the typical diameter of the peas they produced.
This data frame contains the following columns:
mean diameter of parent
mean diameter of offspring
offspring standard deviation
Pearson, K. (1930). Life and Letters and Labours of Francis Galton, Vol IIIa. Cambridge: Cambridge University Press.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(galtonpeas)
head(galtonpeas)
Karl Pearson organized the collection of data on over 1100 families in England in the period 1893 to 1898. This particular data set gives the Heights in inches of mothers and their daughters, with up to two daughters per mother. All daughters are at least age 18, and all mothers are younger than 65. Data were given in the source as a frequency table to the nearest inch. Rounding error has been added to remove discreteness from graph.
This data frame contains the following columns:
Mother's ht, in.
Daughter's ht, in.
K. Pearson and A. Lee (1903), On the laws of inheritance in man, Biometrika, 2, 357–463, Table 31.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Heights)
head(Heights)
The data comes from a unpublished master's paper by Carl Hoffstedt. They relate the automobile accident rate, in accidents per million vehicle miles to several potential terms. The data include 39 sections of large Highways in the state of Minnesota in 1973. The goal of this analysis was to understand the impact of design variables, acpts, slim, Sig, and shld that are under the control of the Highway department, on accidents.
This data frame contains the following columns:
average daily traffic count in thousands
truck volume as a percent of the total volume
total number of lanes of traffic
number of access points per mile
number of signalized interchanges per mile
number of freeway-type interchanges per mile
speed limit in 1973
length of the Highway segment in miles
lane width, in feet
width in feet of outer shoulder on the roadway
An indicator of the type of roadway or the source of funding for the road; "mc" for major collector, "fai" for Federal interstate highways, "pa" for principal arterial highway, and "ma" for major arterial highways
1973 accident rate per million vehicle miles
Carl Hoffstedt
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Highway)
head(Highway)
In his original paper, Forbes provided additional data collected by the botanist Dr. Joseph Hooker on temperatures and boiling points measured often at higher altitudes in the Himalaya Mountains.
This data frame contains the following columns:
Measured boiling temperature, degrees F.
Measured air pressure, inches of Mercury.
100 times pres rounded to two decimals.
Forbes, J. (1957). Further experiments and remarks on the measurement of heights by boiling point of water. Transactions of the Royal Society of Edinburgh, 21, 235-243.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Hooker)
head(Hooker)
The data for this table are a sample size of ten 18-year old girls taken from the study that was conducted by Tuddenham and Snyder (1954).
This data frame contains the following columns:
Height (cm) at age 18
Weight (kg) at age 18
Tuddenham, R., and Snyder, M. (1954). Physical growth of California boys and girls from birth to age 18. California Publications on Child Development, 1, 183-364.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Htwt)
head(Htwt)
In a study of coinage, W. Stanley Jevons weighed 274 gold sovereigns that he had collected from circulation in Manchester, England. For each coin, he recorded the weight, after cleaning, to the nearest .001 gram, and the date of issue. The age classes are coded 1 to 5, roughly corresponding to the age of the coin in decades. The standard weight of a gold sovereign was suppose to be 7.9876 grams; minimum legal weight was 7.9379 grams.
This data frame contains the following columns:
Age of coins, decades
Number of coins
Average weight, grams
Standard deviation.
Minimum weight
Maximum weight
Stephen Stigler
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(jevons)
head(jevons)
78 bluegills were captured from Lake Mary, Minnesota. On each fish, a key scale was removed. The age of a fish is determined by counting the number of annular rings on the scale. The goal is to relate length at capture to the radius of the scale.
This data frame contains the following columns:
Years
mm
Collected by Richard Frie, and discussed in S. Weisberg (1986), A linear model approach to the backcalculation of fish length, J. Amer. Statist. Assoc., 81, 922-929.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(lakemary)
head(lakemary)
These data give the number of known crustacean zooplankton species for 69 world lakes. Also included are a number of characteristics of each lake. There are missing values.
This data frame uses lake name as row label and contains the following columns:
Number of zooplankton species
Maximum lake depth, m
Mean lake depth, m
Specific conductance, micro Siemans
Elevation, m
N latitude, degrees
W longitude, degrees
distance to nearest lake, km
number of lakes within 20 km
Rate of photosynthesis, mostly by the 14C method
Lake area, in hectares
Dodson, S. (1992), Predicting curstacean zooplankton species richness, Limnology and Oceanography, 37, 848–856.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(lakes)
head(lakes)
The data were collected by Douglas Tiffany to study the variation in rent paid in 1977 for agricultural land planted to alfalfa.
This data frame contains the following columns:
average rent for all tillable land
density of dairy cows (number per square mile)
proportion of farmland used for pasture
1 if liming required to grow alfalfa; 0 otherwise
average rent per acre planted to alfalfa
Douglas Tiffany
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(landrent)
head(landrent)
These data are the results of an experiment to study the performance of cutting-tool material in cutting steel on a lathe. The two factors are revolution speed and feed rate. The response is tool life in minutes.
This data frame contains the following columns:
Coded feed rate, coded as (actual feed rate -13)/6. Feed is in thousandths of an inch per revolution.
Coded speed, coded as (actual speed -900)/300. Speed is in feet per minute.
Life of tool until failure, minutes
M. R. Delozier
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(lathe1)
head(lathe1)
An artificial data set suggested by N. Mantel to illustrate stepwise regression methods.
A data frame with 5 observations on the following 4 variables.
the response
predictor 1
predictor 2
predictor 3
Mantel, N. (1970). Why stepdown procedures in variable selection? Technometrics, 12, 621–625.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(mantel)
head(mantel)
World record times for the mile run, 1861–2003.
A data frame with 46 observations:
Year in which the record was set
Running time, in seconds
Name of person setting the record
Country of residence of the record setter
Place the record was set
Gender of the record holder
Data source: http://www.saunalahti.fi/~sut/eng/
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(mile)
head(mile)
These data include nearly every farm sale in 6 economic regions in Minnesota from 2002-2011 that either has land enrolled in the federal Conservation Reserve Program, or CRP, or has no restrictions. A few sales with non-crp land easements were excluded. CRP enrollment is for a fixed period during which farmers agree not to grow crops for a fixed payment. This can effect sale price of land since buyers have fewer choices on use of land that could lower values, but also have guaranteed income for a fixed period that could raise values.
data(MinnLand)
data(MinnLand)
A data frame with 18700 observations on the following 10 variables.
acrePrice
sale price in dollars per acre. Sale prices were adjusted to a common date within the year. No inflation adjustment is made between years.
region
a factor with levels giving the geographic names of six economic regions of Minnesota. Excluded economic regions had few farm sales.
improvements
percentage of property value due to improvements. Minnesota assessors estimate values separately for land and buildings. This variable is the ratio of the building value to the total value.
year
year of sale, as a continuous variable, not as a factor. Most uses of this variable would require converting it to a factor.
acres
size of the farm in acres
tillable
percentage of farm acreaage that is rated arable by the assessor
financing
a factor with levels title transfer
and
seller finance
crpPct
the percentage of all farm acres enrolled in CRP
productivity
average agronomic productivity scaled 1 to 100, with larger numbers for more productive land. This score is based on University of Minnesota soil studies. This value is frequently missing because some counties never had the study done, and some county assessors are inconsistent in including this value in the record of the sale.
Data is collected from Minnesota counties. Some counties do not include the
productivity
value in sales records, accounting for
most of the missing values. The variable tillable
is also frequently
missing.
S. J. Taff
Taff, S. J. and Weisberg, S. (2007). Compensated shrot-term conservation restrictions may reduce sale prices. The Appraisal Journal, 75(1), 45.
head(MinnLand) ## Not run: require(mice) md.pattern(MinnLand) ## End(Not run)
head(MinnLand) ## Not run: require(mice) md.pattern(MinnLand) ## End(Not run)
Yearly water consumption in Minnesota from 1988-2011.
data(MinnWater)
data(MinnWater)
A data frame with 24 observations on the following variables.
year
total ground water consumption, statewide, in billions of gallons
total municipal water consumption, statewide, in billions of gallons
consumption for irrigation in 13 counties, in billions of gallons
average growing season June to August precipiciation (inches) for the 13 Minnesota counties that use the most irrigation
average May to September precipiciation (inches) for the 10 Minnesota counties with highest municipal water pumping
estimated state population
estimated 10 county urban population
Is water usage increasing? How fast?
These data were provided by the Freshwater Society. They collected the data from the Minnesota Department of Natural Resources and from the Minnesota Climatology Working Group. Thanks to Tom Burk.
data(MinnWater) ## maybe str(MinnWater) ; plot(MinnWater) ...
data(MinnWater) ## maybe str(MinnWater) ; plot(MinnWater) ...
Data collected by Kenneth G. Hubbard on soil temperature at 20 cm depth in Mitchell, Nebraska for 17 years (1976-1992) The variable month is the month number.
This data frame contains the following columns:
Months beginning Jan, 1976
Average soil temperature, degrees C
Kenneth G. Hubbard
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Mitchell)
head(Mitchell)
The data give the frequencies of words in works from four different sources: the political writings of eighteenth century American political figures Alexander Hamilton, James Madison, and John Jay, and the book Ulysses by twentieth century Irish writer James Joyce.
This data frame uses the word as row labels and contains the following columns:
Hamilton frequency
Hamilton rank
Madison frequency
Madison rank
Jay frequency
Jay rank
Word frequency in Ulysses
Word rank in Ulysses
Mosteller, F. and Wallace, D. (1964). Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(MWwords)
head(MWwords)
Catch per unit effort data for 16 Minnesota lakes
A data frame with 16 observations on the following 4 variables.
Estimated catch per unit effect
Estimated standard error of CPUE
Estimated fish density
Estimated standard error of Density
R. Pierce, Minnesota Dept. of Natural Resources
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(npdata)
head(npdata)
Data on eruptions of Old Faithful Geyser, October 1980. Variables are the duration in seconds of the current eruption, and the time in minutes to the next eruption. Collected by volunteers, and supplied by the Yellowstone National Park Geologist. Data was not collected between approximately midnight and 6 AM.
This data frame contains the following columns:
Duration in seconds
Time to next eruption
R. Hutchinson
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(oldfaith)
head(oldfaith)
The file physics constains results for meson as input and
meson as
output. physics1 is for
to
.
This data frame contains the following columns:
Inverse total energy
Scattering cross-section/sec
Standard deviation
Weisberg, H., Beier, H., Brody, H., Patton, R., Raychaudhari, K., Takeda, H., Thern, R. and Van Berg, R. (1978). s-dependence of proton fragmentation by hadrons. II. Incident laboratory momenta, 30–250 GeV/c. Physics Review D, 17, 2875–2887.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(physics1)
head(physics1)
The Alaska pipeline data consists of in-field ultrasonic measurements of the depths of defects in the Alaska pipeline. The depth of the defects were then re-measured in the laboratory. These measurements were performed in six different batches. The data were analyzed to calibrate the bias of the field measurements relative to the laboratory measurements. In this analysis, the field measurement is the response variable and the laboratory measurement is the predictor variable.
These data were originally provided by Harry Berger, who was at the time a scientist for the Office of the Director of the Institute of Materials Research (now the Materials Science and Engineering Laboratory) of NIST. These data were used for a study conducted for the Materials Transportation Bureau of the U.S. Department of Transportation.
This data frame contains the following columns:
Number of defects measured in the field.
Number of defects measured in the field.
Batch number
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd621.htm
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(pipeline)
head(pipeline)
Soil productivity scores for farms in townships in four counties in the Minneapolis St. Paul metropolitan area, 1981-82. The goal is to see if the productivity score is a good predictor of the assessed value of the farmland. If so, then productivity score could be used to set assesed value for farms enrolled in the “green acres” program that requires that urban farmland be taxed at its agricultural value only without regard to development potential.
This data frame contains the following columns:
Name of the county
Assessed value in dollars per acre.
Productivity score, a number between 1 and 100.
Tax year, either 1981 or 1982.
Douglas Tiffany
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(prodscore)
head(prodscore)
Data collected in an experiment in which rats were injected with a dose of a drug approximately proportional to body weight. At the end of the experiment, the animal's liver was weighed, and the fraction of the drug recoved in the liver was recorded. The experimenter expected the response to be independent of the predictors.
This data frame contains the following columns:
BodyWt of the rat
LiverWt measured after sacrifice
Dose, roughly proportional to body weight
dose of drug recovered after sacrifice of the animal
Dennis Cook
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(rat)
head(rat)
These data includes the summaries of the ratings of 364 instructors at one large campus in the Midwest from Bleske-Rechek and Fritsch (2011). Each instructor included in the data had at least 10 ratings over a several year period. Students provided ratings on 5 point scales. The data file provides the averages ratings and additional characteristics of the instructors
A data frame with 364 observations on the following 17 variables.
gender
instructor gender, a factor with levels female
male
numYears
a numeric vector, number of years in which this instructor had ratings between 1999 and 2009.
numRaters
number of ratings
numCourses
number of different course titles included in the rating for this instructor
pepper
a factor with levels no
and yes
. In addition
to rating for quality, instructors are rated as attractive or not. A value of yes
means that the consensus is that
the instructor is attractive.
discipline
a factor with levels Hum
for humanities,
SocSci
for social sciences,
STEM
for science, technology, engineering and mathematics and
Pre-prof
for professional training
dept
a factor with department names Accounting
,
Anthropology
, Art
, Art and design
, Art History
,
Astronomy/Physics
, Biology
, Business
, Chemistry
,
Communication
, Communication Disorders
, Computer Science
,
Criminal Justice
, Curriculum and Instruction
, Dance
,
Economics
, English
, Environmental Public Health
,
Finance
, FLTR
, French
, Geography
, Geology
,
German
, History
, Information Systems
, Japanese
,
Kins
, Library Science
, Management
, Managerial Science
,
Marketing
, Math
, Music
, Nursing
, Philosophy
,
Physics
, Physics & Astronomy
, Physics and Astronomy
,
Political Science
, Psychology
, Religious Studies
,
Social Work
, Sociology
, Spanish
, Special Education
,
Theater
, Womens Studies
,
quality
Average quality rating, between 1, worst, to 5, best
helpfulness
Average helfpulness rating, between 1, worst, to 5, best
clarity
Average clarity rating, between 1, worst, to 5, best
easiness
Average easiness rating, between 1, worst, to 5, best
raterInterest
Average rater interest, between 1, lowest, to 5, highest
sdQuality
SD of quality rating
sdHelpfulness
SD of helpfulness rating
sdClarity
SD of clarity rating
sdEasiness
SD of easiness rating
sdRaterInterest
SD of rater interest
Provided by April Bleske-Rechek.
Bleske-Rechek, A. and Fritsch, A. (2011). Student Consensus on RateMyProfessors.com. Practical Assessment, Research \& Evaluation, 16(18), http://pareonline.net/getvn.asp?v=16&n=18
data(Rateprof)
data(Rateprof)
This example with aritifical data is designed to demonstrate the importance of plotting residuals.
data(Rpdata)
data(Rpdata)
A data frame with 990 observations on the following 7 variables.
y
a numeric vector
x1
a numeric vector
x2
a numeric vector
x3
a numeric vector
x4
a numeric vector
x5
a numeric vector
x6
a numeric vector
Data generated using programs from http://www4.stat.ncsu.edu/~stefanski/NSF_Supported/Hidden_Images/stat_res_plots.html
Stefanski, L. A. (2007). Residual (sur)Realism. The American Statistician, 61, 163-177. url=https://www.amstat.org/about/pdfs/NCSUStatsProfSurpriseHomework.pdf.
data(Rpdata) ## Not run: require(car) residualPlot(lm(Rpdata)) ## End(Not run)
data(Rpdata) ## Not run: require(car) residualPlot(lm(Rpdata)) ## End(Not run)
Salary of faculty in a small Midwestern college in the early 1980s.
This data frame contains the following columns:
Factor with levels "PhD" or "Masters"
Factor, "Asst", "Assoc" or "Prof"
Factor, "Male" or "Female"
Years in current rank
Years since highest degree earned
dollars per year
Sanford Weisberg
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(salary)
head(salary)
Data on non-unionized job classes in a US county in 1986. Included are the job class difficulty score, the number of employees in the class, number of female employees, and the name of the class.
This data frame contains the following columns:
Name of job class
Number of women employees
Total number of employees in a job class
Difficulty score for job class
Maximum salary for job class
Sanford Weisberg
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(salarygov)
head(salarygov)
Data on electricity consumption (KWH) and mean temperature (degrees F) for one building on the University of Minnesota's Twin Cities campus. for 39 months in 1988-92. The goal is to model consumption as a function of temperature. Higher temperature causes the use of air conditioning, so high temperatures should mean high consumption. This building is steam heated, so electricity is not used for heating.
This data frame contains the following columns:
Monthly mean temperature, degrees F.
Electricty consumption in KWH/day
Charles Ng
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(segreg)
head(segreg)
Results of a small experiment to learn about the effects of small electric shocks on dairy cows.
A data frame with 6 observations on the following 3 variables.
Shock level, milliamps
Number of trials
Number of times a positive reaction was observed
R. Norell
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(shocks)
head(shocks)
Includes species averages for 62 mammals.
This data frame uses spcies as row lable and contains the following columns:
Slow wave nondreaming sleep, hrs/day
Paradoxical dreaming sleep, hrs/day
Total sleep, hrs/day
Body weight in kg
Brain weight in g
Maximum life span, years
Gestation time, days
Predation index, 1=low,5=hi
Sleep exposure index 1=exposed, 5=protected
Danger index, 1=least, 5=most
Allison, T. and Cicchetti, D. (1976). Sleep in Mammals: Ecological and Constitutional Correlates Science, vol. 194, pp. 732-734.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(sleep1)
head(sleep1)
The data give the water content of snow and the water yield in inches in the Snake River watershed in Wyoming.
This data frame contains the following columns:
water content of snow
water yield from April to July
Wilm, H. G. (1950). Statistical control in hydrologic forecasting. “Res. Notes”, 61, Pacific Northwest Forest Range Experiment Station, Oregon.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(snake)
head(snake)
When gasoline is pumped into a tank, hydrocarbon vapors are forced out and into the atmosphere. To reduce this significant source of air pollution, devices are installed to capture the vapor. In testing these vapor recovery systems, a "sniffer" measures the amount recovered. John Rice provided the data for the file sniffer.txt.
This data frame contains the following columns:
Initial tank temperature (degrees F)
Temperature of the dispensed gasoline (degrees F)
Initial vapor pressure in the tank (psi)
Vapor pressure of the dispensed gasoline (psi)
Hydrocarbons emitted (grams)
John Rice
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(sniffer)
head(sniffer)
This experiment was apparently done by S. S. Stevens and colleagues in March 1962, although the exact reference is lost. 10 subjects were played tones at each of 5 loudnesses, presumably in random order. Subjects were asked to draw a line on paper whose length matched the loudness of the tone. Each subject repeated each loudness 3 times, for a total of 30 trials per subject. The original data are lost; reported here is the mean of the 3 log-lengths for each loudness, the sd of the three log-lengths, and the number of replications, which is always 3.
data(Stevens)
data(Stevens)
A data frame with 50 observations on the following 5 variables.
subject
a factor with unique values for each subject
loudness
either 50, 60, 70, 80 or 90 db. Decibels are a logrithmic scale
y
a numeric vector giving the mean of the log-lengths of three lines drawn. Exponentiating these values would give the geometric mean of the three lengths in cm.
sd
a numeric vector, giving the sd of the three log lengths
n
a numeric vector, equal to the constant value 3
This is a classic example of a psychophysics experiment pioneered by S. S. Stevens. The basic idea is that the phychological response y to a physical stimulus x should be proportional to x to a power. Since both the response and the loudness are already in log-scale, linear fits should be expected.
These data were obtained in the early 1970s from the data library in the Harvard University Statistics Department.
Stevens, S. S. (1966). A metric for social consensus, Science, 151, 530-541, http://www.jstor.org/stable/1717034
head(Stevens)
head(Stevens)
Ezekiel and Fox (1959) data on auto stopping distances.
This data frame contains the following columns:
Speed (mph)
Stopping distance (in feet)
Ezekiel, M. and Fox, K. A. (1959). Methods of Correlation Analysis, Linear and Curvilinear, Hoboken NJ: Wiley.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(stopping)
head(stopping)
Log catch per unit effect of 200 mm or longer black crappies was recored 27 times over the course of 1996 on Swan Lake, Minnesota.
A data frame with 27 observations on the following 2 variables.
Number of days after June 16, 1996
log of the catch of 200 mm or longer black crappies per unit effort (WHAT IS THE BASE?)
Minnesota Department of Natural Resources
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(swan96)
head(swan96)
Turkey weight increase in an experiment in which the supplementation with methionine was varied.
This data frame contains the following columns:
Amount of methionine supplement (percent of diet)
Pen weight increase (g)
Cook, R. D. and Witmer, J. (1985). A note on the parameter-effects curvature. Journal of the American Statistical Association, 80, 872-878.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(turk0)
head(turk0)
Data from an experiment on the growth of turkeys. 60 pens of turkeys were grown with a similar diet, supplemented with a dose of methionine from one of three sources. The response is average pen weight. Recorded is dose, source, m, always 5 except for dose=0, average weight gain, and within group SS.
This data frame contains the following columns:
Dose: Amount of supplement as a percent of the total diet
Ave. weight gain, over all replications
A factor for the source of methionine, three levels numbers 1, 2 and 3.
Number of replications or pens
SD of the m pens with the same values of S and A.
R. D. Cook and J. Witmer (1985). A note on parameter-effects curvature. Journal of the American Statistical Association, 80, 872–878.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(turkey)
head(turkey)
The given data are IQ scores from identical twins; one raised in a foster home, and the other raised by birth parents.
This data frame contains the following columns:
Social class, C1=high, C2=medium, C3=low, a factor
biological
foster
Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotic twins reared together and apart. Br. J. Psych., 57, 147-153.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(twins)
head(twins)
The international bank UBS produces a report on prices in major world cities every three years. This data.frame includes price data for a 1 kg loaf of bread, 1 kg of rice and for a Big Mac hamburger, for the years 2003 and 2009. All these prices are measured in the minutes of labor required by the typical worker in that country to buy the product, so it adjusts for currency, wages and price levels.
data(UBSprices)
data(UBSprices)
A data frame with 54 observations on the following 6 variables.
bigmac2009
2009 Big Mac price, in minutes of labor
bread2009
2009 Bread price, in minutes of labor
rice2009
2009 Rice price, in minutes of labor
bigmac2003
2003 Big Mac price, in minutes of labor
bread2003
2003 Bread price, in minutes of labor
rice2003
2003 Rice price, in minutes of labor
City names are the row labels.
Union Bank of Switzerland
data(UBSprices) ## maybe str(UBSprices) ; plot(UBSprices) ...
data(UBSprices) ## maybe str(UBSprices) ; plot(UBSprices) ...
These data are forest inventory measures from the Upper Flat Creek stand of the University of Idaho Experimental Forest, dated 1991.
The file ufc
contains all the data.
ufcwc
contains only Western red cedar.
ufcgf
contains only grand fir.
A data frame with the following 5 variables.
Plot number
Tree within plot
a factor with levels
DF
= Douglas-fir,
GF
= Grand fir,
SF
= Subalpine fir,
WL
= Western larch,
WC
= Western red cedar,
WP
= White pine
Diameter 137 cm perpendicular to the bole, mm
Height of the tree, in decimeters
Andrew Robinson
Weisberg, S. (2014). Applied Linear Regression, 4th edition. New York: Wiley.
head(ufcgf)
head(ufcgf)
Demographic data for 193 places, mostly UN members, but also other areas like Hong Kong that are not independent countries.
This data frame uses the locality name as a row label. In some cases the geographic area is smaller than a country; for example Hong Kong. The file contains the following columns:
Expected number of live births per female, 2000
Per capita 2001 GDP, in US \$
These data were collected at published by the UN from a variety of sources. See original source for additional footnotes concerning values for individual countries. Country names are given in the first column of the data file.
http://unstats.un.org/unsd/demographic
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(UN1)
head(UN1)
National health, welfare, and education statistics for 210 places, mostly UN members, but also other areas like Hong Kong that are not independent countries.
data(UN11)
data(UN11)
A data frame with 237 observations on the following 32 variables.
region
region of the world
group
a factor with levels oecd
for countries that
are members of the OECD, the Organization for Economic Co-operation and
Development, as of May 2012, africa
for countries on the African
continent, and other
for all other countries. No OECD countries are
located in Africa
fertility
number of children per woman
ppgdp
Per capita gross domestic product in US dollars
lifeExpF
Female life expectancy, years
pctUrban
Percent Urban
Similar data, from the period 2000-2003, appears in the alr3
package
under the name UN3
.
All data were collected from UN tables accessed at http://unstats.un.org/unsd/demographic/products/socind/ on April 23, 2012. OECD membership is from www.oecd.org, accessed May 25, 2012..
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
data(UN11) ## maybe str(UN11) ; plot(UN11) ...
data(UN11) ## maybe str(UN11) ; plot(UN11) ...
These data give length and age for over 3000 walleye (a type of fish) captured in Butternut Lake, Wisconsin, in three periods with different management method in place.
A data frame with 3198 observations on the following 3 variables.
Age of the fish, years
Length, mm
1 = pre 1990, 2 = 1991-1996, 3=1997-2000
Michelle LeBeau
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(walleye)
head(walleye)
Can Southern California's water supply in future years be predicted from past data? One factor affecting water availability is stream runoff. If runoff could be predicted, engineers, planners and policy makers could do their jobs more efficiently. Multiple linear regression models have been used in this regard. This dataset contains 43 years worth of precipitation measurements taken at six sites in the Owens Valley ( labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, and OPSLAKE), and stream runoff volume at a site near Bishop, California.
This data frame contains the following columns:
collection year
Snowfall in inches measurement site
Snowfall in inches measurement site
Snowfall in inches measurement site
Snowfall in inches measurement site
Snowfall in inches measurement site
Snowfall in inches measurement site
Stream runoff near Bishop, CA, in acre-feet
Source: http://www.stat.ucla.edu.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(water)
head(water)
Data on samples of small mouth bass collected in West Bearskin Lake, Minnesota,
in 1991. The file wblake
includes only fish of ages 8 or younger.
This data frame contains the following columns:
Age at capture (yrs)
Length at capture (mm)
radius of a key scale, mm
Minnesta Department of Natural Resources
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(wblake) # excludes fish age 9 or older
head(wblake) # excludes fish age 9 or older
For each person on board the fatal maiden voyage of the ocean liner Titanic, this dataset records sex, age (adult/child), economic status (first/second/third class, or crew) and whether or not that person survived. The name of the company that owned the Titanic was White Star. Several versions of these data exist in the R universe.
This data frame contains the following columns:
Number of survivors
survivors + deaths
Crew or passanger class
adult or child
male or female
Report on the Loss of the ‘Titanic’ (S.S.) (1990), British Board of Trade Inquiry Report (reprint), Gloucester, UK: Allan Sutton Publishing. Taken from the Journal on Statistical Education Archive, submitted by [email protected].
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(Whitestar)
head(Whitestar)
Windspeed data collected at a test site for a windmill, and also at a nearby long-term weather site, in Northern South Dakota. Data collected every six hours for all of 2002, except that all of the month of May and a few other observations are missing.
A data frame with 1116 observations on the following 3 variables.
A text variable with values like "2002/1/2/6" meaning the reading at 6AM on January 2, 2002
Windspeed in m/s at the candidate site
Windspeed for the reference site
Mark Ahlstrom and Rolf Miller, WindLogics, Inc.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(wm1)
head(wm1)
Windspeed data collected at a test site for a windmill, and also at a nearby long-term weather site, in Northern South Dakota. Data collected every six hours for all of 2002, except that all of the month of May and a few other observations missing.
A data frame with 1116 observations on the following 5 variables.
A text variable with values like "2002/1/2/6" meaning the reading at 6AM on January 2, 2002
Windspeed in m/s at the candidate site
Windspeed for the reference site
Wind direction, in degrees, at the reference site
Wind direction binned into 16 equal width bins
Mark Ahlstrom and Rolf Miller, WindLogics, Inc.
Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
head(wm2)
head(wm2)