CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 1
Chapter 4 Finite—Sample Properties of the LSE
Finnite—sample theory : n is assumed to be fixed, normal distn assumed
Large—sample theory : n is sent to ∞, general distn assumed
4.1 Unbiasedness
Write
b = (XprimeX)?1 Xprimey = (XprimeX)?1 Xprime (Xβ + ε)
= β + (XprimeX)?1 Xprimeε.
Then
E (b|X) = β + E
bracketleftBig
(XprimeX)?1 Xprimeε|X
bracketrightBig
= β + (XprimeX)?1 XE (ε|X)
= β.
Therefore
E (b) = Ex{E [b|X]} = Ex [β] = β.
parallelshort parallelshort
center of the true parameter
distribution b vector
4.2 The variance of the LSE and the Gauss—Markov theorem
The OLS estimator of β is
b = (XprimeX)?1 Xprimey.
(XprimeX)?1 Xprime is an k × n vector. Thus each element of b can be written as a linear
combination of y1,···,yn. We call b a linear estimator for this reason.
The covariance matrix of b is
V ar(b|X) = E bracketleftbig(b?β)(b?β)prime |Xbracketrightbig
= E
bracketleftBig
(XprimeX)?1 XprimeεεprimeX (XprimeX)?1 |X
bracketrightBig
= (XprimeX)?1 XprimeE (εεprime|X)X (XprimeX)?1
= (XprimeX)?1 Xprime parenleftbigσ2IparenrightbigX (XprimeX)?1
= σ2 (XprimeX)?1
Consider an arbitrary linear estimator of β, b0 = Cy where C is a k ×n matrix. For
b0 to be unbiased, we should have
E (Cy|X) = E (CXβ + Cε|X)
= β.
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 2
For this to hold, CX = I.
The covariance matrix of b0 is
V ar[b0|X] = σCCprime.
Now let D = C ?(XprimeX)?1 Xprime. Since CX = I,
DX = CX ?(XprimeX)?1 XprimeX
= CX ?I
= 0.
Using this gives
V ar[b0|X] = σ2
parenleftBig
D + (XprimeX)?1 Xprime
parenrightBigparenleftBig
D + (XprimeX)?1 Xprime
parenrightBigprime
= σ2 (XprimeX)?1 + σ2DDprime
= Var[b|X] + σ2DDprime.
Since DDprime is a nonnegative definite matrix,
Var[b0|X] ≥ V ar[b|X]. (*)
That is, for any vector a,
aprimeV ar[b0|X]a ≥ aprimeV ar[b|X]a.
That is the Gauss—Markov theorem given X.
Since (*) holds for every particular X,
Var(b) ≤ V ar(b0.)
This is the uncondition version of the Gauss—Markov theorem.
Note that
V ar[b] = Ex [V ar(b|X)] + Varx [E (b|X)]
= Ex [V ar(b|X)]
= Ex
bracketleftBig
σ2 (XprimeX)?1
bracketrightBig
= σ2Ex
bracketleftBig
(XprimeX)?1
bracketrightBig
.
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 3
4.3 Estimating the variance of the least squares estimator
Since σ2 = E (ε2i), a natural estimator of σ2 is
?σ2 = 1n
nsummationdisplay
i=1
e2i = 1neprimee.
But this estimator is biased as discussed now. Since
e = My = M (Xβ + ε) = Mε,
eprimee = εprimeMε.
Thus
E [eprimee|X] = E [εprimeMε|X]
= E [tr(εprimeMε)|X]
= E [tr(Mεεprime)|X]
= tr(ME (εεprime|X))
= trparenleftbigMσ2Iparenrightbig
= σ2tr(M).
But
tr(M) = tr
bracketleftBig
In ?X (XprimeX)?1 Xprime
bracketrightBig
= tr(In) ?tr
parenleftBig
(XprimeX)?1 XprimeX
parenrightBig
= tr(In) ?tr(IK)
= n ?K.
Therefore,
E [eprimee|M] = (n ? K)σ2.
and an unbiased estimator of σ2 is
s2 = e
primee
n ? K,
not ?σ2.
The estimator s2 is also unbiased unconditionally, because
Ebracketleftbigs2bracketrightbig = ExbraceleftbigEbracketleftbigs2|Xbracketrightbigbracerightbig = Exparenleftbigσ2parenrightbig = σ2.
Using s2, we obtain an estimator of Var[b|X]
hatwidestV ar[b|X] = s2 (XprimeX)?1 .
The standard error of the estimator bk isradicalBig
bracketleftbigs2 (XprimeX)?1bracketrightbig
kk
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 4
4.4 Inference under a normality assumption
(i) t?test
Assume ε ~ N (0,σ2I). Then
b|X = β + (XprimeX)?1 Xprimeε|X
~ N
parenleftBig
β,σ2 (XprimeX)?1 XprimeX (XprimeX)?1
parenrightBig
= N
parenleftBig
β,σ2 (XprimeX)?1
parenrightBig
.
Recall that
Aε ~ N parenleftbig0,Aparenleftbigσ2IparenrightbigAprimeparenrightbig.
Each element of b|X is normally distributed
bk|X ~ N
parenleftBig
βk,σ2 (XprimeX)?1kk
parenrightBig
.
Consider the null hypothesis
H0 : βk = β0k.
The t?test for this null hypothesis is defined by
tk = bk ?β
0
kradicalBig
s2 (XprimeX)?1kk
.
Under the normality assumption,
bk ?β0kradicalBig
σ2 (XprimeX)?1kk
~ N (0,1).
In addition
(n?K)s2
σ2 =
eprimee
σ2 =
parenleftBigε
σ
parenrightBigprime
M
parenleftBigε
σ
parenrightBig
~ χ2tr(M)
= χ2n?K.
Furthermore,
b?β
σ = (X
primeX)?1 Xprime
parenleftBigε
σ
parenrightBig
is independent of
(n?K)s2
σ2 .
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 5
This follows because
Cov(e,b|X) = Eparenleftbige(b?β)prime|Xparenrightbig
= E
bracketleftBig
(I ?P)eeprimeX(XprimeX)?1|X
bracketrightBig
= σ2I(I ?P)X(XprimeX)?1
= 0
which implies
Cov(e,b) = 0
andindependenceofeandb,andbecauses2 isafunctionofe.(SeealsoTheoremB?12).
Therefore,
tk =
parenleftbigb
k ?β0k
parenrightbig/radicalBigσ2 (XprimeX)?1
kkradicalBig
(n?K) s2σ2/(n?K)
= bk ?β
0
kradicalBig
s2 (XprimeX)?1kk
has Student’s t?distribution with n?K degrees of freedom.
Recall N (0,1)
radicalbigχ2
k/k
~tk
when N (0,1) and χ2k are independent.
We deduce from the distribution of tk
P parenleftbigbk ?tα/2sbk ≤βk ≤bk +tα/2sbkparenrightbig = 1?α
where sbk =
radicalBig
s2 (XprimeX)?1kk and tα/2 is the critical value from the t?distribution with
(n?K) degrees of freedom. The 100(1-α)% confidence interval for βk is
bracketleftbigb
k ?tα/2sbk ≤βk ≤bk +tα/2sbk
bracketrightbig.
(ii) F?test
Consider the null hypothesis
H0 : Rβ = r
where the J ×K matrix R has full row rank. The F?test for this null is defined as
F = (Rb?r)prime
bracketleftBig
R(XprimeX)?1Rprime
bracketrightBig?1
(Rb?r)/Js2
CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 6
The null distribution of F is F (J,n?K).
Example 1 Let K = 2 and H0 : β1 ?β2 = 0.
Taking R = parenleftbig 1 ?1 parenrightbig and r = 0, we have
F = (b1 ?b2)
bracketleftBiggparenleftbig
1 ?1 parenrightbig(XprimeX)?1
parenleftbigg 1
?1
parenrightbigg?1bracketrightBigg?1
(b1 ?b2)/s2
~ F (1,n?2).
The null distribution follows because
1.
Rb?r ~ N
parenleftBig
0,σ2R(XprimeX)?1 Rprime
parenrightBig
or bracketleftBig
R(XprimeX)?1 Rprime
bracketrightBig?1/2
(Rb?r) ~ N parenleftbig0,σ2Iparenrightbig
2.
(n?K)s2
σ2 ~ χ
2
n?K
3. Rb and s2 are independent.