Ch. 7 Violations of the Ideal Conditions
1 Speci cation
1.1 Selection of Variables
Consider a initial model, which we assume that
Y = X1 1 + ";
It is not unusual to begin with some formulation and then contemplate adding
more variable (regressors) to the model:
Y = X1 1 + X2 2 + ":
Let R21 be the R-square of the model with fewer regressor, and R212 be the
R-square of the model with more regressors. It is apparent as we have shown
earlier that R212 > R21. Clearly, it would be possible to push R2 as high as desired
by adding regressors. This problem motivates the use of the adjusted R-square,
R2 = 1 T 1
T k(1 R
2)
It has been suggested that the adjusted R-square does not penalize the loss of
degree of freedom heavily, two alternative have been proposed for comparing
models are
~R2j = T + kj
T Kj (1 R
2
j)
and Akaike’s information criterion:
AICj = ln
e0
jej
T
+ 2kjT = ln ^ 2j + 2kjT :
Although intuitively appealing, these measures are a bit unorthodox in that
they have no rm basis in theory (unless that are used in time series analysis
model). Perhaps a somewhat more palatable alternative is the method of step-
wise regression; However, economists have tends to avoid stepwise regression
method for the break down of inference procedures.
1
1.2 Omission of Relevant variables
Suppose that a correctly speci ed regression model would be
Y = X1 1 + X2 2 + ";
where the two parts of X have k1 and k2 columns, respectively. If we regress Y
on X1 without including X2, that is you have estimate the model
Y = X1 1 + ";
and obtain the estimator as
^ 1 = (X01X1) 1X01Y = (X01X1) 1X01(X1 1 + X2 2 + ")
= 1 + (X01X1) 1X01 2 + (X01X1) 1X01":
Taking the expectation, we see that unless X01X2 = 0 or 2 = 0, ^ 1 is biased:
E( ^ 1) = 1 + (X01X1) 1X01X2 2:
The variance of ^ 1 is
V ar( ^ 1) = 2(X01X1) 1:
If we had computed the correct regression, including X2, then the slope estimator
on X1, denoted by ^ 12 would have a covariance matrix equal to the upper left
block of 2(X0X) 1, i.e.
V ar(^ ) =
V ar(^
12)
V ar(^ 22)
= 2(X0X) 1 = 2
X0
1X1 X
0
1X2
X02X1 X02X1
1
= 2
[X0
1X1 X
0
1X2(X
0
2X2)
1X0
2X1]
1
;
or
V ar(^ 12) = 2[X01X1 X01X2(X02X2) 1X02X1] 1:
We can compare the covariance matrix of ^ 1 and ^ 12 more easily by comparing
their inverse:
V ar(^ 1) 1 V ar(^ 12) 1 = (1= 2)X01X2(X02X2) 1X02X1;
2
which is nonnegative de nite. We conclude that although ^ 1 is biased, it has a
smaller variance than ^ 12.
Lemma:
Let A be a positive de nite (n n) matrix and let B denote any nonzero (n m)
matrix. Then B0AB is nonnegative de nite.
Proof:
Let x be ant nonzero vector. De ne
~x Bx:
Then ~x can be any vector including the zero vector. Then
x0B0ABx = ~x0A~x 0
from the positive de niteness of matrix A.
For statistical inference, it would be necessary to estimate 2. Proceeding as
usual, we would use
s2 = e
0
1e1
T k1 :
But
e1 = M1Y = M1(X1 1 + X2 2 + ") = M1X2 2 + M1":
Thus,
E[e01e1] = 20X02M1X2 2 + 2tr(M1) = 20X02M1X2 2 + 2(T k1):
It is simple to see that 20X02M1X2 2 is positive (how ?) so s2 is biased upward.
The conclusion is that if we omit relevant variables from the regression, then
our estimate of both 1 and 2 are biased although it is possible that ^ 1 is more
precise than ^ 12.
1.3 Inclusion of Irrelevant Variables
If the regression model is correct given by
Y = X1 1 + ";
3
and we estimate it by
Y = X1 1 + X2 2 + ";
from partitioned regression estimator, we obtain that
^ 1 = (X01M2X1) 1X01M2Y = (X01M2X1) 1X01M2(X1 1 + ")
= 1 + (X01M2X1) 1X01M2";
and
^ 2 = (X02M1X2) 1X02M1Y = (X02M1X2) 1X02M1(X1 1 + ")
= 0 + (X02M1X2) 1X02M1":
Therefore, E(^ 1) = 1 and E(^ 2) = 0.
Exercise:
Show that s2 is unbiased:
E
e0e
T k1 k2
= 2:
Then what’s the problem ? It would seem that one would generally want to
"over t" the model. However the cost is the reduction of the precision of the
estimate. As we have seen that the covariance matrix of the shorter regressors in
never larger than the covariance matrix for the estimator obtained in the presence
of the super uous variables.
2 Functional Form
2.1 Dummy Variables
One of the most useful devices in regression analysis is the binary, or dummy
variables, which takes value of only 0 and 1.
4
2.1.1 Comparing Two Mean
If a model describe the salary-paid function by
y = + x0 + ";
where can be regard as "initial-pay" to anyone (even) with di erent academic
degree. This model can be made more realistic by dividing the "initial-pay" into
two category: individuals attending college and not attending college. Formally
y = + di + x0 + "; where
d
i = 1; if attending college
di = 0; if not attending college:
Logically, > 0 and di is the dummy variable. The above model can also be
written equivalently as
y = d1i + d2i + x0 + "
where
d
1i = 1; if attending college
d1i = 0; if not attending college and
d
2i = 0; if attending college
d2i = 1; if not attending college
but not
y = + d1i + d2i + x0 + ";
to avoid dummy trap. Therefore, to remove seasonal e ect, we need 4 dummy
without a common mean or use 3 dummy with a common mean (see eq. 7-1 at
p. 118).
2.2 Nonlinearity in the Variables
The linear model we proposed is not as "limited’ as the rst glance. By using
logarithms, exponential, reciprocal, transcendental functions and polynomials,
and so on, this "linear model is also useful the general form:
g(y) = 1f1(z) + 2f2(z) + ::: + kfk(z) + "
= 1x1 + 2x2 + ::: + kxk + "
= x0 + ":
which can be tailored to any number of situations.
5
2.2.1 Log-Linear Model
A commonly used form of regression model is the log-linear model:
y =
Y
k
z kk e"
or
ln y = ln +
X
k
k lnzk + "
= 1 +
X
k
kxk + ":
All you have to do is take natural logarithms of the data before the regression.
3 Stochastic Regressors
This section will consider the linear regression model Y = X + ". It will be
assumed that the full ideal conditions hold except that the regressor matrix X is
random.
3.1 Independent Stochastic Linear regression Model
First of all, consider the case in which X and " are independent. In this case
the distribution of " conditional on X is the same as its marginal distribution;
speci cally, f("jX) = f(") N(0; 2I) and E("jX) = R "f("jX)d" = 0. We
now investigate the statistical properties of OLS estimator under this assumption.
3.1.1 Unbiasedness ?
Using laws of iterated expectation, the expected value of ^ is
E(^ ) = EXfE[ + (X0X) 1X0"jX]g
= EX[ + (X0X) 1X0E("jX)]
= EX( ) = :
6
The variance-covariance matrix of ^ is slightly di erent from previous model,
however.
V ar(^ ) = E[(^ )(^ )0]
= EXfE[(X0X) 1X0""0X(X0X) 1jX]g
= EXf(X0X) 1X0E[""0jX]X(X0X) 1g
= EXf(X0X) 1X0( 2I)(X0X) 1g
= 2EX(X0X) 1 = 2E(X0X) 1;
provided, of course, that 2E(X0X) 1 exists. The variance-covariance matrix of
^ is 2 times the expected value of (X0X) 1 since (X0X) 1 takes di erent values
with new random samples.
The OLS estimator of the disturbance variance,
s2 = e
0e
T k
remains unbiased since
E(e0e) = EX[E("0M"jX)] = EX( 2(T k)) = 2(T k);
therefore E(s2) = 2.
3.1.2 E ciency ?
The Gauss-Markov theorem can be established logically from the results of the
preceding paragraph. We have showed that
V ar(^ jX) < V ar(~ jX)
for ant ^ 6= ~ and for the speci c X in our sample. But if this inequality holds
for every particular X, then it must hold for
V ar(^ ) = EX[V ar(^ jX)]:
That is, if it holds for every particular X, then it must hold over the average
value of X.
Theorem: (Gauss-Markov Theorem with stochastic Regressors)
In the classical linear regression model, the least squares estimator ^ is the min-
imum variance linear unbiased estimator of whether X is stochastic or non-
stochastic.
7
3.1.3 Consistency ?
From notation
X =
2
66
66
66
4
X01
X02
:
:
:
X0T
3
77
77
77
5
;
then
1
T X
0X = 1
T
TX
t=1
XtX0t:
If we assume that
plim 1T X0X = plim 1T
TX
t=1
XtX0t = Q
is nite and nonsingular, then by law of large number, Q = E(XtX0t), that is the
second moments of the regressors is nite (this assumptions is violated when X
is a "I(1)" or an unit root process).
The independence assumption imply that plim(X0") = 0. This follows from
the fact that E(X0"T ) = 0 and
E
X0"
T
X0"
T
0
=
2
T
E(X0X)
T
=
2
T
E(PTt=1 XtX0t)
T
!
=
2
T
PT
t=1 E(XtX
0
t)
T
!
=
2
T
TQ
T =
2
T Q:
lim
T!1
E
X0"
T
X0"
T
0
= lim
T!1
2
T Q = 0:
But the fact that E(X0"T ) = 0 and limT!1 E(X0"T )(X0"T )0 = 0 imply that plimX0"T =
0.
Recall that
^ = (X0X) 1X0Y = + (X0X) 1X0" = + (X0X
T )
1X
0"
T ;
8
therefore
plim ^ = + Q 1plim X
0"
T = :
Remark: To prove the consistency of OLS under stochastic regressor, we
need to show that plim(X0") = 0. However to show that plim(X0") = 0 we
only require that E(X0") = PTt=1 E(Xt"t) = 0 or E(Xti"t) = 0; i = 1; 2; :::; k.
That is each regressor is uncorrelated with the disturbance at time t. i.e. not
contemporaneous correlation. There are three common circumstance that one of
the stochastic regressor at time t is correlated that with the disturbance terms
of the same period, i.e. E(Xti"t) 6= 0: the lagged dependent variables with
serial correlation disturbance, unobservable model and simultaneous equation
model. Under these circumstance, The OLS is not a consistent estimator and the
InstrumentalV ariables (IV) estimators is proposed to encounter this problems.
3.1.4 Distribution of the Estimators
Since
^ = + (X0X) 1X0";
The distribution of ^ therefore depend on the stochastic properties of X. It is
pessimistic to say that the usual test of hypothesis will not be valid. However as
we will see in the following , it is not the case.
3.1.5 Hypothesis Test
We now consider the validity of our sample test statistics and inference procedures
when X is stochastic. First consider the conventional t test statistics for testing
H0 : i = 0i . Under the null hypothesis
tjX = (
^ i 0i )
[s2(X0X) 1ii ]1=2 tT k:
However, what interest us is the marginal, that is the unconditional distribution
of t. Remember that if W t(n) then the density function of W would be:
f(w; n) = 1p(n )
n+1
2
n2
1
1 + w2n [(n+1)=2]
n > 0; w 2R:
9
Therefore we see that the distribution f(tjx) of the random variables (tjX) is not
a function of X.
Let g(x) be the density function of X, and the joint pdf of X and t are
f(t;x) = f(tjx)g(x);
therefore the marginal density of t are
f(t) =
Z
f(t;x)dx =
Z
f(tjx)g(x)dx since f(tjx) is not a functionof X
= f(tjx)
Z
g(x)dx = f(tjx):
We have the surprising results that, regardless of the distribution of X, or even of
whether X is stochastic or nonstochastic, the marginal distribution of t (Statis-
tics) is still t (distribution). The same reason can be used to deduce that the usual
F ratio used for testing linear restrictions are valid whether X is stochastic or not.
Remark: This conclusion only happens in the assumption that the disturbance
is normally distributed or we can deduce that ^ is asymptotically normal.
4 Non-Normal Disturbance
In this section we will suppose that all of the ideal conditions hold except that the
disturbances are not normally distributed. In particular, we still suppose that the
disturbance "t are independent and identically distributed with zero mean and
nite variance 2; however, we no longer suppose that their distribution is normal.
4.1 Unbiasedness, E ciency, and consistency?
It is easy to show the OLS properties in the following.
Theorem:
^ is unbiased, BLUE, consistent, and has covariance matrix 2(X0X) 1; s2 is
unbiased and consistent.
10
4.2 Hypothesis testing
Since " is not normally distributed, ^ is not normally distributed and therefore
(T k)s2= 2 doesn’t have a 2 distribution.
Theorem:
The test hypothesis developed in section 4 of Chapter 6 are not valid when " is
not normally distributed.
4.3 Asymptotically Normality
The following results show that the usual test procedures are asymptotically
justi ed whether or not the disturbance are normal.
Lemma:
Assume that "t; t = 1; 2; :::; T are independent and identically distributed with
mean 0 and variance 2 < 1, and limT!1((X0X) 1=T) = Q, a nite positive
de nite matrix, then
1
pT X0"
d ! N(0; 2Q):
Proof:
1
pT X0"
=
p
T( w E( w));
where
w = 1T
TX
t=1
Xt"t
is the average of T independent random vector Xt"t with mean E( w) = 0 and
variance
V ar(Xt"t) = 2XtX0t = 2Qt:
The variance of pT( w) is
2QT = 2
1
T
[Q1 + Q2 + ::: + QT ]
= 2
1
T
TX
t=1
XtX0t
= 2
X0X
T
:
11
Assume that limT!1((X0X) 1=T) = Q, a positive de nite matrix, then
lim
T!1
2QT = 2Q:
By Lindberg-Feller CLT to the vector pT w, we have the results.
Theorem:
The asymptotic distribution ofpT(^ ) is N(0; 2Q 1), where Q = limT!1(X0XT ).
Proof:
Note that
^ = + (X0X) 1X0" = + (X0X
T )
1X
0"
T ;
(^ ) = (X
0X
T )
1X
0"
T
pT(^ ) = (X0X
T )
1pT X
0"
T = (
X0X
T )
1 1p
T X
0":
Then as T !1,
pT(^ ) d ! Q 1N(0; 2Q)
N(0; 2Q 1QQ 1)
= N(0; 2Q 1):
The above results shows that ^ has the same asymptotic distribution whether
or not the distribution are normal, as long as they are iid with mean zero and
nite variance. The following results show that this implies that the usual t tests
are asymptotically valid; the t distribution is of course replaced by the N(0; 1)
distribution, however.
Theorem:
Suppose that the element of " are iid with zero mean and nite variance, and
12
that Q is nite and nonsingular. Let R be a known 1 k vector, and r be a
known scalar. Then under the null hypothesis that R = r, the test statistics
R^ rp
s2R(X0X) 1R0 N(0; 1):
Proof:
Under the null hypothesis, R^ r = R(^ ), so the test statistic is
pTR(^ )
ps2R(X0X=T) 1R0:
Now, since pT(^ ) d ! N(0; 2Q 1), it is clear that
p
TR(^ ) d ! N(0; 2RQ 1R0):
Also, the denominator of the test statistics has probability limit
q
2RQ 1R0;
since s2 ! 2 and ((X0X=T) 1 ! Q 1. But this is a scalar which is in fact the
standard deviation of the asymptotic distribution of the numerator random vari-
able pTR(^ ). Hence the test statistics does indeed converge asymptotically
to N(0; 1).
5 Instrumental Variables Estimator
We again consider the model Y = X + ". However, we will now be concerned
with the case in which the regressor matrix X is random and that
plim 1T X0" 6= 0;
so that the OLS estimator ^ = (X0X) 1X0Y is inconsistent.
Example: (Lagged Dependent variables with autocorrelated disturbance)
Let the regression be
yt = yt 1 + "t;
"t = "t 1 + ut:
13
In this model the regressor and the disturbance are correlated:
Cov(yt 1; "t) = Cov(yt 1; "t 1 + ut)
= Cov(yT 1; "t 1)
=
2
u
(1 )(1 2):
Since
plim^ = + Cov(yt 1; "t)var(y
t)
;
therefore,
plim^ = + (1
2)
1 + :
Theorem:
Suppose that there exists a set of variables Zt such that
QZX = plim 1T Z0X
is nite and nonsingular, and such that
Z0"p
T
converge in distribution to N(0; ). Then the estimator
~ = (Z0X) 1Z0Y
is consistent, and the asymptotic distribution of pT(~ ) is N(0; Q 1ZX Q 1XZ).
Proof:
Note that
~ = +
Z0X
T
1 Z0"
T :
Since
Z0"p
T
has a well-de ned asymptotic distribution, it follows that
plim Z
0"
T = 0:
14
Hence plim ~ = .
To prove the second part of the theorem, note that
pT(~ ) = Z0X
T
1 Z0"
pT :
The result follows immediately.
De nition:
The above estimator ~ is the instrumental variables (IV)estimator of ; The ma-
trix Z is the set of instruments for X.
Comment:
The typical case is the one in which
Z0"p
T
has asymptotic distribution N(0; 2QZZ), where
QZZ = plim 1T Z0Z:
If Z contains more variable than X, we may chose the projection of the column
of X in the column space of Z:
^X = Z(Z0Z) 1Z0X:
With the choice of instrumental variable, ^X for Z we have
~ = ( ^X0X) 1 ^X0Y
= [X0Z(Z0Z) 1Z0X] 1X0Z(Z0Z) 1Z0Y:
15