Chapter 13
Model Selection:
Criteria and
Tests
One CLRM assumption is,
The model used in empirical
analysis is,correctly specified”
No theoretically relevant variable has been
excluded from the model.
No unnecessary or irrelevant variables are
included in the model.
The functional form of the model is correct
“Correct specification,of a model means:
13.1 The Attributes of a Good Model
——Criteria to judge a model:
1,Principle of parsimony
A model should be kept as simple as possible.
2,Identifiability
For a given set of data the estimated
parameters must have unique values
3,Goodness of fit.
Model is judged good by the higher adjusted
R2(= )
2R
4,Theoretical consistency
In constructing a model we should
have some theoretical underpinning
5,Predictive power
Choose the model whose
theoretical predictions are borne out by
actual experience.
13.2 Types of Specification Errors
1.Omitting a Relevant Variable:,Underfitting”
or,Underspecifying”a Model
True model,Yt=B1+B2X2t+B3X3t+μ t (13.1)
Misspecified model,Yt=A1+A2X2t+μ t (13.2)
( 1) If X2,X3 are correlated:
◎ a1 and a2 are biased,a1,a2 can have an upward or
downward bias
E(a1 )≠B 1 E(a1 )= B1 +B3( (13.4)
E(a2 ) ≠B 2 E(a2 )= B2 +B3b32
◎ a1 and a2 are inconsistent.
( 2) If X2 and X3 are not correlated
a2 is unbiased,consistent,b32 will be zero
a1 biased,unless is zero in the model(13.4)
)XbX 2323 ?
The consequences of omitting variable bias (X3)
3X
( 3) The error variance estimated from the misspecified model
is a biased estimator of the true error variance σ 2
——The conventionally estimated variance of a2 is a biased
estimator of the variance of the true estimator b2
∵ E[var(a 2 )]=var(b2 )+
∴Var(a 2 ) will overestimate the true variance of b2,that is,it
will have a positive bias.
( 4) The usual confidence interval and hypothesis-testing
procedures are unreliable,
The confidence interval will be wider.
?
?
2
2i
2
3i
2
3
x2)-(n
xB
Inclusion of irrelevant variables will
certainly increase R2,which might increase
the predictive power of the model.
True model,Yi=B1+B2X2i+μ i ( 13.9)
Misspecified model,
Yi=A1+A2X2i+ A3X3i+ vi ( 13.10)
2.Inclusion of Irrelevant Variables:
——,Overfitting” a Model
(1)The OLS estimators are unbiased and consistent,
E(a1) = B1
E(a2) = B2
E(a3) = 0
( 2) The estimator of σ 2 is correctly estimated.
( 3) The standard confidence interval and hypothesis-
testing procedure on the basis of the t and F tests
remain valid.
Consequences of inclusion of irrelevant
variables in a model:
( 4) The as are inefficient
——The variances of as will be generally larger
than those of the bs estimated from the true
model,
——The confidence intervals based on the
standard errors of as will be larger than those
based on the standard errors of bs of the true
model.
∴The OLS estimators are LUE but not BLUE.
( 1) Exclude a relevant variable(the case of
underfitting)
☆ The coefficients of variables are generally biased
as well as inconsistent;
☆ The error variance is incorrectly estimated;
☆ The standard errors of estimators are biased;
☆ The usual hypothesis-testing procedure becomes
invalid.
Compare two types of specification errors:
( 2) Including an irrelevant variable in the model
(the case of overfitting)
☆ The coefficients of variables are unbiased as well as
consistent of the true model ;
☆ The error variance is correctly estimated;
☆ The usual hypothesis-testing procedure is still valid.
But the estimated variances of the coefficients
are less precise because the standard errors are larger
and the confidence intervals tend to be wider.
It is better to include irrelevant
variables than to exclude the relevant ones.
So include only explanatory variables
that on theoretical grounds directly
influence the dependent variable and are
not accounted for by other included
variables.
Conclusion:
The estimated coefficients may be
biased.
3,Incorrect Functional Form
13.4 Detecting Specification Errors,
Tests of Specification Errors
(1)Test the control variable X4i is an irrelevant
varible,t test
Misspecified model:
Yi=B1+B2X2i+B3X3i +B4X4i +μ i (13.14)
① Have control variables( X4i) in the model
② Estimate the regression and test the significance of
b4,the estimator of B4.
H0,B4 =0 t test
If the control variables are not statistically
significant,dropping them does not substantially
alter our point estimates or hypothesis test results,
then dropping them may clarify the model.
( 2) Test both X3 and X4 are irrelevant
variables.
H0,B3=B4=0 F test
But it is very important to remember
that in carrying out these tests of
specifications,we have a specific model in
mind,which we accept as the,true”model.
( 1) Based upon theory or introspection and prior empirical
work develop a model that we believe captures the essence
of the subject under study
( 2) Subject the model to empirical testing:
◎ R2 and adjusted R2 ( )
◎ The estimated t ratios
◎ Signs of the estimated coefficients in relation to their
prior expectation
◎ Durbin-Watson d or the runs statistic
◎ Forecasting or prediction error
If all these diagnostics are good,we accept this model.
2.Tests for Omitted Variables and Incorrect
Functional Forms
2R
① Examination of residual
——Detecting autocorrelation,or heteroscedasticity,model
specification.
Plot the residuals from the misspecified model against
time,et~ t
Plot the residuals from the true model against time,
et~ t
Check if the residuals are randomly distributed.
Methods to determine what the specific
problem of the model
Detecting autocorrelation,(time series data)
if the d statistic is significant,that is meant the
presence of autocorrelation.
Detecting specification errors,
respecified the original model,if a significant d
statistic( original model) becomes insignificant
( transformed model),it is usually an indication
of specification errors.
③ Other specification error tests
② The Durbin-Watson d statistic
——Detecting autocorrelation,specification errors.
13.4 Model Selection Criteria for
Forecasting Purposes
——Judge the out-of-sample performance of a
regression model.
AIC= ( 13.18)
SIC= ( 13.19)
These two value are the estimates of out-of-
sample forecast error variance,the lower,the
better is the forecasting performance of the model.
n
ee tnk ?? 22
n
en tnk ? 2/