Chapter 10
Multicollinearity,What Happens if
Explanatory Variables are Correlated
One of the CLRM assumptions is:
there is no perfect multicollinearity—no exact
linear relationships among explanatory
variables,Xs,in a multiple regression.
In practice,one rarely encounters perfect
multicollinearity,but cases of near or very
high multicollinearity where explanatory
variables are approximately linearly related
frequently arise in many applications.
The objects of this chapter:
● The Nature of multicollinearity;
● Is multicollinearity really a problem?
● The theoretical consequences of
multicollinearity;
● How to detect multicollinearity?
● The remedial measures which can be used
to eliminate multicollinearity.
10.1,The Nature of Multicollinearity,
The Case of Perfect Multicollinearity
In cases of perfect linear relationship or
perfect multicollinearity among explanatory
variables,we cannot obtain unique
estimates of all parameters,And since we
cannot obtain their unique estimates,we
cannot draw any statistical inferences (i.e.,
hypothesis testing) about them from a given
sample.
Yi=A1+A2X2i+A3X3i+μ i
Transformation,X3i =300- 2X2i
Yi=A1+A2X2i+A3 ( 300- 2X2i ) +μ i
=(A1+300 A3 ) +(A2 - 2 A3 ) X2i +μ i
= C1 + C2 X2i +μ i
Estimation,get the OLS estimators
C1 =A1+300 A3,C2 =A2 - 2 A3,
So from the estimators of C1,C2,we can not get the
estimators of A1,A2 and A3
That is, in cases of perfect
multicollinearity,estimation and hypothesis
testing about individual regression
coefficients in a multiple regression are not
possible,We can just obtain estimates of a
linear combination of the original coefficients,
but not each of them individually.
10.2 The Case of Near,or Imperfect,
or High Multicollinearity
When we talk about multicollinearity,
we usually refer it to imperfect
multicollinearity.
X3i=B1+B2X2i+ei
If there are just two explanatory
variables,the coefficient of correlation r
can be used as a measure of the degree or
strength of collinearity,But if more than
two explanatory variables are involved,
as we will show later,the coefficient of
correlation may not be an adequate
measure of collinearity.
10.3 Theoretical Consequences of
Multicollinearity
Note,we consider only the case of
imperfect multicollinearity
When collinearity is not perfect,OLS
estimators still remain BLUE even though
one or more of the partial regression
coefficients in a multiple regression can be
individually statistically insignificant.
1,OLS estimators are unbiased.
But unbiasedness is a repeated sampling
property,In reality,we rarely have the luxury of
replicating samples.
2,OLS estimators have minimum variance.
But this does not mean,however,that the
variance of an OLS estimator will be small in any
given sample,minimum variance does not mean
that every numerical value of the variance will be
small.
3.Multicollinearity is essentially a sample
(regression)phenomenon
10.4 Practical Consequences of
Multicollinearity
Large variances and standard error of
OLS estimators,which leads to a fall in
the precision of OLS estimators.
Wider confidence intervals,
“Insignificant” or small t ratios,which
make us easier to accept the null
hypothesis.
High R2 value,high F ratio,but few
significant t ratios.
OLS estimators and their standard errors
become very sensitive to small changes in
the data; that is,they tend to be unstable.
Because small changes in the variables
will lead to a change in the degree of
collinearity between two variables
Wrong signs for regression coefficients.
It is difficult to assess the individual
contributions of explanatory variables to
the explained sum of squares (ESS) or R2.
Because the explanatory variables are
highly collinear so when one moves the
other moves with it almost automatically.
10.5 Detection of Multicollinearity
Note:
1,Multicollinearity is a question of
degree and not of kind.
2,Since multicollinearity refers to the
condition of the explanatory variables
that are assumed to be nonstochastic,it is
a feature of the sample and not of the
population.
Rules of thumb in detection of
multicollinearity:
1,High R2,high F ratio,but few significant t
ratios.
2,High pairwise correlations among explanatory
variables,( This criterion in not often reliable)
3,Examination of partial correlations by partial
correlation coefficient,( In the context of
several explanatory variables,rely on partial
correlations can be misleading)
4,Subsidiary,or auxiliary,regressions.
-- Regress each X variable on the remaining X
variables and to compute the corresponding R2.
Each of these regressions is called a subsidiary or
an auxiliary regression,auxiliary to the main
regression of Y on all Xs.
A regression model,Y~ X2,X3,X4,X5,X6,X7
① Regress X2,X3,X4,X5,X6,X7 on the
remaining Xs and obtain the coefficient of
determination,say,,,, …… 。
22R 23R 24R 25R
② Test the assumption that a particular
coefficient of determination is statistically equal
to zero.
H0,=0,X2 is not collinear with the
remaining five Xs
(7.50)
F> F critical value,reject H0
F< F critical value,not reject H0
? ?? ?
? ?knR
kRF
??
??
2
2
1
1
22R
5,The variance inflation factor (VIF).
(10.12)
(10.13)
(10.14)
① As R2 increases,VIF increases,the variance,
and hence the standard error,of both b2 and
b3 increase or inflate.
? ? ? ? V I FXRXb
ii
2
2
2
2
2
2
2
2
2 1v a r ?????
??
? ? ? ? V I FXRXb
ii
2
2
2
2
2
2
2
2
2 1va r ?????
??
? ?221 1RV IF ??
② But,The variance of b2,not only
depends upon the VIF,but also depends
upon the variance of ui,σ 2,as well as on
the variation in X2,.a high R2 can be
counterbalanced by a lowσ 2 or a
high,or both.
So high is neither necessary nor
sufficient to get high standard errors and
thus multicollinearity by itself need not
cause high standard errors.
22ix?
22ix?
2iR
10.6 Is Multicollinearity Necessarily
Bad?
① If the goal of the study is to use the
model to predict or forecast the future
mean value of the dependent variable,
collinearity may not be bad,If the same
relationship is expected to continue into
the future,then the model can be used
for the purposes of forecasting.
② If the objective of the study is not only
prediction but also reliable estimation of the
individual parameters of the chosen model,then
serious collinearity may be,bad”,because we
have seen that is leads to large standard errors of
the estimators.
③ If the objective of the study is to estimate a
group of coefficients,e.g.,the sum or difference of
two coefficients,fairly accurately,this can be done
even in the presence of multicollinearity.
10.8 What to Do with
Multicollinearity,Remedial Measures:
1,Dropping a variable(s) from the
model.
We base the model on some theoretical
considerations,dropping those variables
from the model will lead to what is
known as model specification error,the
estimated parameters of the reduced
model may turn out to be biased.
Thus,in reducing the severity of the
collinearity problem,we may be
obtaining biased estimates of the
coefficients retained in the model,So the
best practical advice is not to drop a
variable from an economically viable
model just because the collinearity
problem is serious.
2,( 1) Acquiring additional data or a new
sample,(Since multicollinearity is a sample
feature).-- But obtain another sample can
be costly.
( 2) Sometimes just acquiring additional
data-- increasing the sample size
Because:
(10.13)
For a given σ 2 and R2,if the sample size
of X3 increases,will generally
increase,and will decrease.
But getting additional data on variables
already in the sample may not be feasible
because of cost and other considerations.
? ? ? ?2
2
3
3
2
3 1va r Rxb
i ??
? ?
33ix?
? ?3var b
3,Rethinking the model.
Model misspecification might be the cause
of collinearity in the (wrongly) fitted
model,Maybe some important variables
are omitted,or maybe the functional form
of the model is incorrectly chosen.
4.Prior information about some
parameters.
( 1 ) User prior information about
parameters in the model can resolve
collinearity,but obtaining extraneous,or
prior,information,which is not always
possible.
( 2) If you can obtain prior information,
you also have to assume it continues to
hold in the sample under study,
5,Transformation of variables.
6,Other remedies.