Title: | Methods for Dimension Reduction for Regression |
---|---|
Description: | Functions, methods, and datasets for fitting dimension reduction regression, using slicing (methods SAVE and SIR), Principal Hessian Directions (phd, using residuals and the response), and an iterative IRE. Partial methods, that condition on categorical predictors are also available. A variety of tests, and stepwise deletion of predictors, is also included. Also included is code for computing permutation tests of dimension. Adding additional methods of estimating dimension is straightforward. For documentation, see the vignette in the package. With version 3.0.4, the arguments for dr.step have been modified. |
Authors: | Sanford Weisberg <[email protected]>, |
Maintainer: | Sanford Weisberg <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.0.10 |
Built: | 2024-11-19 04:09:43 UTC |
Source: | https://github.com/cran/dr |
Data on 102 male and 100 female athletes collected at the Australian Institute of Sport.
This data frame contains the following columns:
(0 = male or 1 = female)
height (cm)
weight (kg)
lean body mass
red cell count
white cell count
Hematocrit
Hemoglobin
plasma ferritin concentration
body mass index, weight/(height)**2
sum of skin folds
Percent body fat
Case Labels
Sport
Ross Cunningham and Richard Telford
S. Weisberg (2005). Applied Linear Regression, 3rd edition. New York: Wiley, Section 6.4
data(ais)
data(ais)
Six measurements made on 100 genuine Swiss banknotes and 100 counterfeit ones.
This data frame contains the following columns:
Length of bill, mm
Width of left edge, mm
Width of right edge, mm
Bottom margin width, mm
Top margin width, mm
Length of image diagonal, mm
0 = genuine, 1 = counterfeit
Flury, B. and Riedwyl, H. (1988). Multivariate Statistics: A practical approach. London: Chapman & Hall.
Weisberg, S. (2005). Applied Linear Regression, 3rd edition. New York: Wiley, Problem 12.5.
data(banknote)
data(banknote)
This is the main function in the dr package. It creates objects of class dr to estimate the central (mean) subspace and perform tests concerning its dimension. Several helper functions that require a dr object can then be applied to the output from this function.
dr (formula, data, subset, group=NULL, na.action = na.fail, weights, ...) dr.compute (x, y, weights, group=NULL, method = "sir", chi2approx="bx",...)
dr (formula, data, subset, group=NULL, na.action = na.fail, weights, ...) dr.compute (x, y, weights, group=NULL, method = "sir", chi2approx="bx",...)
formula |
a two-sided formula like The left-hand side of the formula will generally be a single vector, but it
can also be a matrix, such as |
data |
an optional data frame containing the variables in the model. By default the variables are taken from the environment from which ‘dr’ is called. |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
group |
If used, this argument specifies a grouping variable so that
dimension reduction is done separately for each distinct level. This is
implemented only when |
weights |
an optional vector of weights to be used where appropriate. In the context of dimension reduction methods, weights are used to obtain elliptical symmetry, not constant variance. |
na.action |
a function which indicates what should happen when the data contain ‘NA’s. The default is ‘na.fail,’ which will stop calculations. The option 'na.omit' is also permitted, but it may not work correctly when weights are used. |
x |
The design matrix. This will be computed from the formula by |
y |
The response vector or matrix |
method |
This character string specifies the method of fitting. The options
include |
chi2approx |
Several dr methods compute significance levels using
statistics that are asymptotically distributed as a linear combination of
|
... |
For |
The general regression problem studies , the conditional
distribution of a response
given a set of predictors
.
This function provides methods for estimating the dimension and central
subspace of a general regression problem. That is, we want to find a
matrix
of minimal rank
such that
Both the dimension and the subspace
are unknown. These methods make few assumptions. Many methods
are based on the inverse distribution,
.
For the methods "sir"
, "save"
, "phdy"
and
"phdres"
, a kernel matrix is estimated such that the
column space of
should be close to the central subspace
. The eigenvectors corresponding to the
d
largest
eigenvalues of provide an estimate of
.
For the method "ire"
, subspaces are estimated by minimizing
an objective function.
Categorical predictors can be included using the groups
argument, with the methods "sir"
, "save"
and
"ire"
, using the ideas from Chiaromonte, Cook and Li (2002).
The primary output from this method is (1) a set of vectors whose
span estimates R(B)
; and various tests concerning the
dimension d
.
Weights can be used, essentially to specify the relative
frequency of each case in the data. Empirical weights that make
the contours of the weighted sample closer to elliptical can be
computed using dr.weights
.
This will usually result in zero weight for some
cases. The function will set zero estimated weights to missing.
dr returns an object that inherits from dr (the name of the type is the
value of the method
argument), with attributes:
x |
The design matrix |
y |
The response vector |
weights |
The weights used, normalized to add to n. |
qr |
QR factorization of x. |
cases |
Number of cases used. |
call |
The initial call to |
M |
A matrix that depends on the method of computing. The column space of M should be close to the central subspace. |
evalues |
The eigenvalues of M (or squared singular values if M is not symmetric). |
evectors |
The eigenvectors of M (or of M'M if M is not square and symmetric) ordered according to the eigenvalues. |
chi2approx |
Value of the input argument of this name. |
numdir |
The maximum number of directions to be found. The output value of numdir may be smaller than the input value. |
slice.info |
output from 'sir.slice', used by sir and save. |
method |
the dimension reduction method used. |
terms |
same as terms attribute in lm or glm. Needed to make |
A |
If method= |
Sanford Weisberg, <[email protected]>.
Bentler, P. M. and Xie, J. (2000), Corrections to test statistics in principal Hessian directions. Statistics and Probability Letters, 47, 381-389. Approximate p-values.
Cook, R. D. (1998). Regression Graphics. New York: Wiley.
This book provides the basic results for dimension reduction
methods, including detailed discussion of the methods "sir"
,
"phdy"
and "phdres"
.
Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Annals of Statistics, 32, 1062-1092. Introduced marginal coordinate tests.
Cook, R. D. and Nachtsheim, C. (1994), Reweighting to achieve
elliptically contoured predictors in regression. Journal of
the American Statistical Association, 89, 592–599. Describes the
weighting scheme used by dr.weights
.
Cook, R. D. and Ni, L. (2004). Sufficient dimension reduction via
inverse regression: A minimum discrrepancy approach, Journal
of the American Statistical Association, 100, 410-428. The
"ire"
is described in this paper.
Cook, R. D. and Weisberg, S. (1999). Applied Regression
Including Computing and Graphics, New York: Wiley,
http://www.stat.umn.edu/arc. The program arc
described
in this book also computes most of the dimension reduction methods
described here.
Chiaromonte, F., Cook, R. D. and Li, B. (2002). Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 30 475-497. Introduced grouping, or conditioning on factors.
Shao, Y., Cook, R. D. and Weisberg (2007). Marginal tests with
sliced average variance estimation. Biometrika. Describes
the tests used for "save"
.
Wen, X. and Cook, R. D. (2007). Optimal Sufficient Dimension
Reduction in Regressions with Categorical Predictors, Journal
of Statistical Inference and Planning. This paper extends the
"ire"
method to grouping.
Wood, A. T. A. (1989) An approximation to the distribution
of a linear combination of chi-squared variables.
Communications in Statistics: Simulation and Computation, 18,
1439-1456. Approximations for p-values.
data(ais) # default fitting method is "sir" s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr),data=ais) # Refit, using a different function for slicing to agree with arc. summary(s1 <- update(s0,slice.function=dr.slices.arc)) # Refit again, using save, with 10 slices; the default is max(8,ncol+3) summary(s2<-update(s1,nslices=10,method="save")) # Refit, using phdres. Tests are different for phd, and not # Fit using phdres; output is similar for phdy, but tests are not justifiable. summary(s3<- update(s1,method="phdres")) # fit using ire: summary(s4 <- update(s1,method="ire")) # fit using Sex as a grouping variable. s5 <- update(s4,group=~Sex)
data(ais) # default fitting method is "sir" s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr),data=ais) # Refit, using a different function for slicing to agree with arc. summary(s1 <- update(s0,slice.function=dr.slices.arc)) # Refit again, using save, with 10 slices; the default is max(8,ncol+3) summary(s2<-update(s1,nslices=10,method="save")) # Refit, using phdres. Tests are different for phd, and not # Fit using phdres; output is similar for phdy, but tests are not justifiable. summary(s3<- update(s1,method="phdres")) # fit using ire: summary(s4 <- update(s1,method="ire")) # fit using Sex as a grouping variable. s5 <- update(s4,group=~Sex)
Functions to compute various tests concerning the dimension of a central subspace.
dr.test(object, numdir, ...) dr.coordinate.test(object, hypothesis,d,chi2approx,...) ## S3 method for class 'ire' dr.joint.test(object, hypothesis, d = NULL,...)
dr.test(object, numdir, ...) dr.coordinate.test(object, hypothesis,d,chi2approx,...) ## S3 method for class 'ire' dr.joint.test(object, hypothesis, d = NULL,...)
object |
The name of an object returned by a call to |
hypothesis |
A specification of the null hypothesis to be tested by the coordinate hypothesis. See details below for options. |
d |
For conditional coordinate hypotheses, specify the dimension of the central mean subspace, typically 1, 2 or possibly 3. If left at the default, tests are unconditional. |
numdir |
The maximum dimension to consider. If not set defaults to 4. |
chi2approx |
Approximation method for p.values of linear combination
of |
... |
Additional arguments. None are currently available. |
dr.test
returns marginal dimension tests.
dr.coordinate.test
returns marginal dimension tests (Cook, 2004)
if d=NULL
or conditional dimension tests if d
is a
positive integer giving the assumed dimension of the central
subspace. The function dr.joint.test
tests the coordinate
hypothesis and dimension simultaneously. It is defined only for
ire, and is used to compute the conditional coordinate test.
As an example, suppose we have created a dr
object
using the formula
y ~ x1 + x2 + x3 + x4
.
The marginal coordinate hypothesis defined by Cook (2004) tests
the hypothesis that y
is independent of some of the
predictors given the other predictors. For example, one could test
whether x4
could be dropped from the problem by testing y
independent of x4
given x1,x2,x3
.
The hypothesis to be tested is determined by the argument hypothesis
.
The argument hypothesis = ~.-x4
would test the hypothesis of the last
paragraph. Alternatively, hypothesis = ~x1+x2+x3
would
fit the same hypothesis.
More generally, if H
is a
rank
matrix, and
is the projection
on the column space of
H
, then specifying hypothesis = H
will test the
hypothesis that is independent of
.
Returns a list giving the value of the test statistic and an asymptotic
p.value computed from
the test statistic. For SIR objects, the p.value is computed in two ways. The
general test, indicated by p.val(Gen)
in the output, assumes only
that the predictors are linearly related. The restricted test, indicated
by p.val(Res)
in the output, assumes in addition to the linearity condition
that a constant covariance condition holds; see Cook (2004) for more information
on these assumptions. In either case, the asymptotic distribution is a linear
combination of Chi-squared random variables. The function specified by the
chi2approx
approximates this linear combination by a single Chi-squared
variable.
For SAVE objects, two p.values are also returned. p.val(Nor)
assumes
predictors are normally distributed, in which case the test statistic is asympotically
Chi-sqaured with the number of df shown. Assuming general linearly related
predictors we again get an asymptotic linear combination of Chi-squares that
leads to p.val(Gen)
.
For IRE and PIRE, the tests
statistics have an asymptotic distribution, so the
value of
chi2approx
is not relevant.
Yongwu Shao for SIR and SAVE and Sanford Weisberg for all methods, <[email protected]>
Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Annals of Statistics, 32, 1062-1092.
Cook, R. D. and Ni, L. (2004). Sufficient dimension reduction via inverse regression: A minimum discrrepancy approach, Journal of the American Statistical Association, 100, 410-428.
Cook, R. D. and Weisberg, S. (1999). Applied Regression Including Computing and Graphics. Hoboken NJ: Wiley.
Shao, Y., Cook, R. D. and Weisberg, S. (2007, in press). Marginal tests with sliced average variance estimation. Biometrika.
drop1.dr
, coord.hyp.basis
,
dr.step
,
dr.pvalue
# This will match Table 5 in Cook (2004). data(ais) # To make this idential to Arc (Cook and Weisberg, 1999), need to modify slices to match. summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+log(Hc)+log(Ferr), data=ais,method="sir",slice.function=dr.slices.arc,nslices=8)) dr.coordinate.test(s1,~.-log(Hg)) #The following nearly reproduces Table 5 in Cook (2004) drop1(s1,chi2approx="wood",update=FALSE) drop1(s1,d=2,chi2approx="wood",update=FALSE) drop1(s1,d=3,chi2approx="wood",update=FALSE)
# This will match Table 5 in Cook (2004). data(ais) # To make this idential to Arc (Cook and Weisberg, 1999), need to modify slices to match. summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+log(Hc)+log(Ferr), data=ais,method="sir",slice.function=dr.slices.arc,nslices=8)) dr.coordinate.test(s1,~.-log(Hg)) #The following nearly reproduces Table 5 in Cook (2004) drop1(s1,chi2approx="wood",update=FALSE) drop1(s1,d=2,chi2approx="wood",update=FALSE) drop1(s1,d=3,chi2approx="wood",update=FALSE)
Dimension reduction regression returns a set of up to orthogonal direction
vectors each of length
, the first
of which are estimates a basis of a
dimensional central subspace. The function returns the estimated directions
in the original
dimensional space for plotting.
dr.direction(object, which, x) dr.directions(object, which, x) ## Default S3 method: dr.direction(object, which=NULL,x=dr.x(object)) dr.basis(object,numdir) ## S3 method for class 'ire' dr.basis(object,numdir=length(object$result))
dr.direction(object, which, x) dr.directions(object, which, x) ## Default S3 method: dr.direction(object, which=NULL,x=dr.x(object)) dr.basis(object,numdir) ## S3 method for class 'ire' dr.basis(object,numdir=length(object$result))
object |
a dimension reduction regression object created by dr. |
which |
select the directions wanted, default is all directions.
If method is |
numdir |
The number of basis vectors to return |
x |
select the X matrix, the default is |
Dimension reduction regression is used to estimate a basis of the central
subspace or mean central subspace of a regression. If there are
predictors, the dimension of the central subspace is less than or equal to
. These two functions,
dr.basis
and dr.direction
,
return vectors that describe the central subspace in various ways.
Consder dr.basis
first. If you set numdir=3
, for example, this
method will return a by 3 matrix whose columns span the estimated
three dimensional central subspace. For all methods except for
ire
,
this simply returns the first three columns of object$evectors
. For
the ire
method, this returns the three vectors determined by a
three-dimensional solution. Call this matrix . The basis is
determined by back-transforming from centered and scaled predictors to
the scale of the original predictors, and then renormalizing the vectors
to have length one. These vectors are orthogonal in the inner
product determined by Var(X).
The dr.direction
method return , the same space but now a
subspace of the original
-dimensional space. These vectors are
appropriate for plotting.
Both functions return a matrix: for dr.direction
, the matrix has n rows and
numdir columns, and for dr.basis
it has p rows and numdir columns.
Sanford Weisberg <[email protected]>
See R. D. Cook (1998). Regression Graphics. New York: Wiley.
data(ais) #fit dimension reduction using sir m1 <- dr(LBM~Wt+Ht+RCC+WCC, method="sir", nslices = 8, data=ais) summary(m1) dr.basis(m1) dr.directions(m1)
data(ais) #fit dimension reduction using sir m1 <- dr(LBM~Wt+Ht+RCC+WCC, method="sir", nslices = 8, data=ais) summary(m1) dr.basis(m1) dr.directions(m1)
Approximates marginal dimension test significance levels for sir, save, and phd by sampling from the permutation distribution.
dr.permutation.test(object, npermute=50,numdir=object$numdir)
dr.permutation.test(object, npermute=50,numdir=object$numdir)
object |
a dimension reduction regression object created by dr |
npermute |
number of permutations to compute, default is 50 |
numdir |
maximum permitted value of the dimension, with the default from the object |
The method approximates significance levels of the marginal dimension tests based on a permutation test. The algorithm: (1) permutes the rows of the predictor but not the response; (2) computes marginal dimension tests for the permuted data; (3) obtains significane levels by comparing the observed statsitics to the permutation distribution.
The method is not implemented for ire.
Returns an object of type ‘dr.permutation.test’ that can be printed or summarized to give the summary of the test.
Sanford Weisberg, [email protected]
See www.stat.umn.edu/arc/addons.html, and then select the article on dimension reduction regression or inverse regression.
data(ais) attach(ais) # the Australian athletes data #fit dimension reduction regression using sir m1 <- dr(LBM~Wt+Ht+RCC+WCC, method="sir", nslices = 8) summary(m1) dr.permutation.test(m1,npermute=100) plot(m1)
data(ais) attach(ais) # the Australian athletes data #fit dimension reduction regression using sir m1 <- dr(LBM~Wt+Ht+RCC+WCC, method="sir", nslices = 8) summary(m1) dr.permutation.test(m1,npermute=100) plot(m1)
Returns an approximate quantile for a weighted sum of independent
random variables.
dr.pvalue(coef,f,chi2approx=c("bx","wood"),...) bentlerxie.pvalue(coef, f) wood.pvalue(coef, f, tol=0.0, print=FALSE)
dr.pvalue(coef,f,chi2approx=c("bx","wood"),...) bentlerxie.pvalue(coef, f) wood.pvalue(coef, f, tol=0.0, print=FALSE)
coef |
a vector of nonnegative weights |
f |
Observed value of the statistic |
chi2approx |
Which approximation should be used? |
tol |
tolerance for Wood's method. |
print |
Printed output for Wood's method |
... |
Arguments passed from |
For Bentler-Xie, we approximate by
for values of
and
computed by the function. The Wood approximation is more
complicated.
Returns a data.frame with four named components:
test |
The input argument |
test.adj |
For Bentler-Xie, returns |
df.adj |
For Bentler-Xie, returns |
pval.adj |
Approximate p.value. |
Sanford Weisberg <[email protected]>
Peter M. Bentler and Jun Xie (2000), Corrections to test statistics in principal Hessian directions. Statistics and Probability Letters, 47, 381-389.
Wood, Andrew T. A. (1989)
An approximation to the distribution of a linear combination of
chi-squared variables.
Communications in Statistics: Simulation and Computation, 18, 1439-1456.
Divides a vector into slices of approximately equal size.
dr.slices(y, nslices) dr.slices.arc(y, nslices)
dr.slices(y, nslices) dr.slices.arc(y, nslices)
y |
a vector of length |
nslices |
the number of slices, no larger than |
If is an n-vector, order
. The goal for the number of observations per slice
is
, the smallest integer in nslices/n. Allocate the first
observations to
slice 1. If there are duplicates in
, keep adding observations to the first
slice until the next value of
is not equal to the largest value in the
first slice. Allocate the next
values to the next slice, and again check
for ties. Continue until all values are allocated to a slice. This does not
guarantee that nslices will be obtained, nor does it guarantee an equal number
of observations per slice. This method of choosing slices is invariant under
rescaling, but not under multiplication by
, so the slices of
will not
be the same as the slices of
. This function was rewritten for Version 2.0.4 of
this package, and will no longer give exactly the same results as the program Arc. If you
want to duplicate Arc, use the function
dr.slice.arc
, as illustrated in the
example below.
If is a matrix of p columns, slice the first column as described above. Then,
within each of the slices determined for the first column, slice based on the
second column, so that each of the “cells” has approximately the same number
of observations. Continue through all the columns. This method is not
invariant under reordering of the columns, or under multiplication by
.
Returns a named list with three elements as follows:
slice.indicator |
ordered eigenvectors that describe the estimates of the dimension reduction subspace |
nslices |
Gives the actual number of slices produced, which may be smaller than the number requested. |
slice.sizes |
The number of observations in each slice. |
Sanford Weisberg, <[email protected]>
R. D. Cook and S. Weisberg (1999), Applied Regression Including Computing and Graphics, New York: Wiley.
data(ais) summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr), data=ais,method="sir",nslices=8)) # To make this idential to ARC, need to modify slices to match. summary(s2 <- update(s1,slice.info=dr.slices.arc(ais$LBM,8)))
data(ais) summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr), data=ais,method="sir",nslices=8)) # To make this idential to ARC, need to modify slices to match. summary(s2 <- update(s1,slice.info=dr.slices.arc(ais$LBM,8)))
This function estimate weights to apply to the rows of a data matrix to make the resulting weighted matrix as close to elliptically symmetric as possible.
dr.weights(formula, data = list(), subset, na.action = na.fail, sigma=1, nsamples=NULL, ...)
dr.weights(formula, data = list(), subset, na.action = na.fail, sigma=1, nsamples=NULL, ...)
formula |
A one-sided or two-sided formula. The right hand side is used to define the design matrix. |
data |
An optional data frame. |
subset |
A list of cases to be used in computing the weights. |
na.action |
The default is na.fail, to prohibit computations. If set to na.omit, the function will return a list of weights of the wrong length for use with dr. |
nsamples |
The weights are determined by random sampling from a data-determined normal distribution. This controls the number of samples. The default is 10 times the number of cases. |
sigma |
Scale factor, set to one by default; see the paper by Cook and Nachtsheim for more information on choosing this parameter. |
... |
Arguments are passed to |
The basic outline is: (1) Estimate a mean m and covariance matrix S using a
possibly robust method; (2) For each iteration, obtain a random vector
from N(m,sigma*S). Add 1 to a counter for observation i if the i-th row
of the data matrix is closest to the random vector; (3) return as weights
the sample faction allocated to each observation. If you set the keyword
weights.only
to T
on the call to dr
, then only the
list of weights will be returned.
Returns a list of weights, some of which may be zero.
Sanford Weisberg, [email protected]
R. D. Cook and C. Nachtsheim (1994), Reweighting to achieve elliptically contoured predictors in regression. Journal of the American Statistical Association, 89, 592–599.
data(ais) w1 <- dr.weights(~ Ht +Wt +RCC, data = ais) m1 <- dr(LBM~Ht+Wt+RCC,data=ais,weights=w1)
data(ais) w1 <- dr.weights(~ Ht +Wt +RCC, data = ais) m1 <- dr(LBM~Ht+Wt+RCC,data=ais,weights=w1)
This function implements backward elimination using a dr
object for which
a dr.coordinate.test
is defined, currently for SIR SAVE, IRE and PIRE.
dr.step(object,scope=NULL,d=NULL,minsize=2,stop=0,trace=1,...) ## S3 method for class 'dr' drop1(object, scope = NULL, update=TRUE, test="general",trace=1,...)
dr.step(object,scope=NULL,d=NULL,minsize=2,stop=0,trace=1,...) ## S3 method for class 'dr' drop1(object, scope = NULL, update=TRUE, test="general",trace=1,...)
object |
A |
scope |
A one sided formula specifying predictors that will never be removed. |
d |
To use conditional coordinate tests, specify the dimension
of the central (mean) subspace. The default is |
minsize |
Minimum subset size, must be greater than or equal to 2. |
stop |
Set stopping criterion: continue removing variables until the p-value for the next variable to be removed is less than stop. The default is stop = 0. |
update |
If true, the |
test |
Type of test to be used for selecting the next predictor
to remove for |
trace |
If positive, print informative output at each step, the default. If trace is 0 or false, suppress all printing. |
... |
Additional arguments passed to |
Suppose a dr
object has predictors, with
predictors specified in the
scope
statement.
drop1
will compute either marginal coordinate tests (if d=NULL
)
or conditional marginal coordinate tests (if d
is positive) for dropping each of the b
predictors not in the scope, and return p.values.
The result is an object created from the original object with the predictor
with the largest p.value removed.
dr.step
will call drop1.dr
repeatedly until
predictors remain.
As a side effect,
a data frame of labels, tests, df, and p.values is printed. If
update=TRUE
, a dr
object is returned with the predictor with the largest p.value removed.
Sanford Weisberg, <[email protected]>, based on the
drop1
generic function in the
base R. The dr.step
function is also similar to step
in
base R.
Cook, R. D. (2004). Testing predictor contributions in sufficient dimension reduction. Annals of Statistics, 32, 1062-1092.
Shao, Y., Cook, R. D. and Weisberg (2007). Marginal tests with sliced average variance estimation. Biometrika.
data(ais) # To make this idential to ARC, need to modify slices to match by # using slice.info=dr.slices.arc() rather than nslices=8 summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr), data=ais,method="sir", slice.method=dr.slices.arc,nslices=8)) # The following will almost duplicate information in Table 5 of Cook (2004). # Slight differences occur because a different approximation for the # sum of independent chi-square(1) random variables is used: ans1 <- drop1(s1) ans2 <- drop1(s1,d=2) ans3 <- drop1(s1,d=3) # remove predictors stepwise until we run out of variables to drop. dr.step(s1,scope=~log(Wt)+log(Ht))
data(ais) # To make this idential to ARC, need to modify slices to match by # using slice.info=dr.slices.arc() rather than nslices=8 summary(s1 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr), data=ais,method="sir", slice.method=dr.slices.arc,nslices=8)) # The following will almost duplicate information in Table 5 of Cook (2004). # Slight differences occur because a different approximation for the # sum of independent chi-square(1) random variables is used: ans1 <- drop1(s1) ans2 <- drop1(s1,d=2) ans3 <- drop1(s1,d=3) # remove predictors stepwise until we run out of variables to drop. dr.step(s1,scope=~log(Wt)+log(Ht))
Data were furnished by Mike Camden, Wellington Polytechnic, Wellington, New Zealand. Horse mussels, (Atrinia), were sampled from the Marlborough Sounds. The response is the mussels' Muscle Mass.
This data frame contains the following columns:
Shell height in mm
Shell length in mm
Muscle mass in g
Shell mass in g
Shell width in mm
R. D. Cook and S. Weisberg (1999). Applied Statistics Including Computing and Graphics. New York: Wiley.
Plots selected direction vectors determined by a dimension reduction regression fit.
By default, the pairs
function is used for plotting, but the user can use any
other graphics command that is appropriate.
## S3 method for class 'dr' plot(x, which = 1:x$numdir, mark.by.y = FALSE, plot.method = pairs, ...)
## S3 method for class 'dr' plot(x, which = 1:x$numdir, mark.by.y = FALSE, plot.method = pairs, ...)
x |
The name of an object of class dr, a dimension reduction regression object |
which |
selects the directions to be plotted |
mark.by.y |
if TRUE, color points according to the value of the response, otherwise, do not color points but include the response as a variable in the plot. |
plot.method |
the name of a function for the plotting. The default is |
... |
arguments passed to the plot.method. |
Returns a graph.
Sanford Weisberg, <[email protected]>.
data(ais) # default fitting method is "sir" s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr),data=ais) plot(s0) plot(s0,mark.by.y=TRUE)
data(ais) # default fitting method is "sir" s0 <- dr(LBM~log(SSF)+log(Wt)+log(Hg)+log(Ht)+log(WCC)+log(RCC)+ log(Hc)+log(Ferr),data=ais) plot(s0) plot(s0,mark.by.y=TRUE)