CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 1
Chapter 10 Generalized Least Squares Estimation
10.1 Model
y =Xβ+ε
E[ε|X] = 0
E[εε′|X] =σ2? = Σ(?>0).
1. Heteroskedasticity
σ2? =σ2
?
??
??
?
w11 0~
w22
...
0~ wnn
?
??
??
?
=
?
??
??
?
σ21 0~
...
...
0~ σ2n
?
??
??
?
2. Autocorrelation
σ2? =σ2
?
??
??
1 ρ1 ··· ρn?1
β1 1 ··· ρn?2
... ...
ρn?1 ··· ··· 1
?
??
??
10.2 OLS and IV estimation
? OLS estimation
The OLS estimator can be written as
b=β+ (X′X)?1X′ε.
1. Unbiasedness
E[b] =EX [E[b|X]] =β.
2. Variance—Coviance Matrix
Var[b|X] = Ebracketleftbig(b?β)(b?β)′|Xbracketrightbig
= E
bracketleftBig
(X′X)?1X′εε′X(X′X)?1|X
bracketrightBig
= (X′X)?1X′parenleftbigσ2?parenrightbigX(X′X)?1.
The unconditional variance is
EX [Var[b|X]].
Ifε is normally distributed,
b|X ~N
parenleftBig
β,σ2 (X′X)?1X′?X(X′X)?1
parenrightBig
.
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 2
3. Consistency
Suppose that
X′X
n
P→Q>0
X′?X
n
P→P >0.
Then
Var[b|X] = 1n
parenleftbiggX′X
n
parenrightbigg?1
σ2X
′?X
n
parenleftbiggX′X
n
parenrightbigg?1
P→ 0
and
Var[b] P→ 0.
UsingthisandChebyshev’sinequality,wehaveforandα∈Rk?{0}andε>0
P [|α′ (b?β)|>ε] ≤ α
′E(b?β)(b?β)′α
ε2
= α
′Var(b)α
ε2
→ 0 asn→ ∞
which implies
b p→β.
4. Asymptotic distribution ofb
Assume (Xi,εi) is a sequence of independent observations with
E(εε′) = diagparenleftbigσ21,··· ,σ2nparenrightbig
= Σ
In addition, assume for anyλ∈Rk andδ>0
E|λ′Xiεi|2+δ ≤B for alli.
Then, we can apply the CLT for a sequence of independent random variables,
with gives
summationtextλ′X
iεi√
n
d→N
parenleftBigg
0, limn→∞1n
∞summationdisplay
i=1
Eparenleftbigλ′Xiε2iX′iλparenrightbig
parenrightBigg
.
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3
But
1
n
summationdisplay
Eparenleftbigλ′Xiε2iX′iλparenrightbig = 1n
summationdisplay
EEparenleftbigλ′Xiε2iX′iλ|Xparenrightbig
= 1n
summationdisplay
σ2iλ′E(XiX′i)λ
= 1nλ′
summationdisplay
σ2iE(XiX′i)λ
= plim1nλ′
summationdisplay
σ2iXiX′iλ
= plim1nλ′ (X′ΣX)λ.
Thus summationtext
Xiεi√
n
d→N(0,P),
where
P = plim1nX′ΣX,
and we obtain
√n(b?β) d→Nparenleftbig0,Q?1PQ?1parenrightbig.
Whenεi are serially correlated, we need a different set of conditions and CLT
to derive the asymptotic normality result. See White’s “Asymptotic Theory
for Econometricians” for this.
? IV estimation
Assume
Z′Z
n
p→Q
ZZ (>0)
Z′X
n
p→Q
ZX (negationslash= 0)
X′X
n
p→Q
XX (>0)
Z′?X
n
p→Q
ZΣX
(Z′i,εi)′ isasequenceofindependentrandomvectorswithE(εε′|X) =diag(σ21,··· ,σ2n) =
Σ.E|λ′Ziεi|2+δ ≤B for all i for anyλ∈ Rk andδ>0. Then, letting
QXXZ = parenleftbigQXZQ?1ZZQZXparenrightbig?1QXZQ?1ZZ
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 4
√n(b
IV ?β)
d→N(0,Q
XXZQZΣZQXXZ).
Whenεi areseriallycorrelated, asbefore,weneedmoreassumptionsandadifferent
CLT.
10.3 Robust estimation of asymptotic covariance matrices
We can still use OLS for inference if its variance—covariance matrix
(X′X)?1X′ΣX(X′X)?1
can be estimated.
Suppose that
Σ =diagparenleftbigσ21,··· ,σ2nparenrightbig.
Obviously, σ21,··· ,σ2n cannot be estimated. But what we need is to estimateX′ΣX not
Σ. We may write
1
nX
′ΣX = 1
n
summationdisplay
σ2iXiX′i.
This
1
n
nsummationdisplay
i=1
ε2iXiX′i
have the same probability limit by the LLN. We replaceε2i withe2i and, then, have
1
n
summationdisplay
e2iXiX′i
= 1n
summationdisplaybracketleftBig
εi ?X′i
parenleftBig?
β?β
parenrightBigbracketrightBig2
XiX′i
= 1n
summationdisplay
ε2iXiX′i +op(1).
(See White (1980, Econometrica) for details)
Thus 1n summationtexte2iXiX′i consistently estimate 1nX′ΣX,and the estimated asymptotic variance—
covariance matrixb is
parenleftbigg1
nX
′X
parenrightbigg?1parenleftbigg1
n
summationdisplay
e2iXiX′i
parenrightbiggparenleftbigg1
nX
′X
parenrightbigg?1
p→Q?1PQ?1.
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 5
We can use this result for hypothesis testing. Suppose that the null hypothesis is H0 :
Rβ =r. Then, Wald test is defined by
W = (Rb?r)′
bracketleftBig
R(X′X)?1
summationdisplay
e2iXiX′i (X′X)?1R′
bracketrightBig
(Rb?r)
(heteroskedasticity robust Wald test)
and asn→ ∞
W d→χ2 (J), J =rank(R).
This follows because
W = √n(Rb?r)′
bracketleftBigg
R
parenleftbiggX′X
n
parenrightbigg?1parenleftbigg1
n
summationdisplay
e2iXiX′i
parenrightbiggparenleftbiggX′X
n
parenrightbigg?1
R′
bracketrightBigg?1
√n(Rb?r)
d→N(0,I
J)
′N(0,I
J)
= χ2 (J).
If the null hypothesis isH0 :βk =β0k, use thet?ratio
t= bk ?β
0
k√
Vkk
where
V = (X′X)?1
summationdisplay
e2iXiX′i (X′X)?1.
Ast→ ∞
t d→N(0,1).
This is White’s heteroskedasticity robust t?ratio.
10.4 GLS
Since ?>0, it can be factored as
? =CΛC′
where the columns ofC are the characteristic vectors of ? and the characteristic roots of
? are put in the diagonal matrix Λ.
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 6
LetP′ =CΛ?1/2.Then
??1 = C′?1Λ?1C?1 =CΛ?1C′ =CΛ?1/2Λ?1/2C′
= P′P
sinceC′ =C?1.Premultiplying the linear regression model byP,we obtain
Py =PXβ+Pε
or
y? =X?β+ε?.
Hence
E(ε?ε′?) = PE(εε′)P′ =σ2P?P′
= σ2Λ?1/2C′CΛC′CΛ?1/2
= σ2I.
The transformed model satisfies the conditions of the classical linear regression model.
Hence
?βGLS = (X′?X?)?1X′?y?
= (X′P′PX)?1X′P′Py
= parenleftbigX′??1Xparenrightbig?1X′??1y.
This estimator is called the generalized least squares estimator.
The properties of the GLS estimator
1. IfE[ε?|X?] = 0,E
bracketleftBig?
βGLS
bracketrightBig
=β.
2. If 1nX′?X? p→Q? (>0), ?βGLS p→β.
3. √n
parenleftBig?
βGLS ?β
parenrightBig d
→N(0,σ2Q?1? ).
The GLS estimator ?βGLS is the BLUE.
Example 1
yt =β′Xt +εt
εt =ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1.
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 7
Eεε′ = σ
2
1?ρ2
?
??
??
??
1 ρ ρ2 ··· ρT?1
ρ 1 ··· ρT?2
... ...
... ...
ρT?1 1
?
??
??
??
= σ2?.
??1 =
?
??
??
??
?
1 ?ρ 0~
?ρ 1 +ρ2 ?ρ
... ... ...
?ρ 1 +ρ2 ?ρ
0~ ?ρ 1
?
??
??
??
?
The transformation matrix is
P =
?
??
??
?
radicalbig1?ρ2 0
~
?ρ 1
... ...
0~ ?ρ 1
?
??
??
?
The transformed model is
radicalbig
1?ρ2y1 =
radicalbig
1?ρ2X′1β+ε?1
yt ?ρyt?1 = (Xt ?ρXt?1)′β+ut
where ε?1 = radicalbig1?ρ2ε1. Since
1. Eε?i =Eut = 0
2. Var(ε?1) = (1?ρ2)Var(ε1) = (1?ρ2)· σ21?ρ2 =σ2
3. E(ε?1ut) = radicalbig1?ρ2E(ε1ut) = 0,t= 2,··· ,n
the error terms of the transformed satisfy the condition of the standard linear regres-
sion model.
The GLS estimator ?βGLS depends on the unknown parameters associated with ?
and, therefore, cannot be used in practice. Suppose that ?? p→ ?. Then the feasible GLS
estimator is defined by
?βFG = parenleftBigX′???1XparenrightBig?1X′???1y.
If
1
nX
′??1X? 1
nX
′???1X p→ 0
CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 8
and
1
nX
′??1ε? 1
nX
′???1ε p→ 0,
?βG and ?βFG have the same asymptotic distribution.
Example 2 AR(1) error
yt = β′Xt +εt
εt = ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1.
1. Run OLS and get ?εt.
2. Run AR(1) regression using ?εt. This gives ?ρ.
3. Transform the model using ?ρ and run OLS.
10.5 Equivalence of GLS and OLS
Let X′X and Σ be both positive definite. Then the following statements are equivalent
(A) (X′X)?1X′ΣX(X′X)?1 = (X′Σ?1X)?1.
(B) ΣX =XB for some nonsingularB.
(C) (X′X)?1X′ = (X′Σ?1X)?1X′Σ?1.
Example 3
yt = β1 +β2t+εt
εt = ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1.
Σ = σ
2
1?ρ2
?
??
??
1 ρ ··· ρn?1
ρ 1 ··· ρn?2
...
ρn?1 1
?
??
??
Then
ΣX ?XA (This is an exercise problem)
for some nonsingularA.Thus, OLS and GLS for this model are asymptotically equivalent.