CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 1 Chapter 4 Finite—Sample Properties of the LSE Finnite—sample theory : n is assumed to be fixed, normal distn assumed Large—sample theory : n is sent to ∞, general distn assumed 4.1 Unbiasedness Write b = (XprimeX)?1 Xprimey = (XprimeX)?1 Xprime (Xβ + ε) = β + (XprimeX)?1 Xprimeε. Then E (b|X) = β + E bracketleftBig (XprimeX)?1 Xprimeε|X bracketrightBig = β + (XprimeX)?1 XE (ε|X) = β. Therefore E (b) = Ex{E [b|X]} = Ex [β] = β. parallelshort parallelshort center of the true parameter distribution b vector 4.2 The variance of the LSE and the Gauss—Markov theorem The OLS estimator of β is b = (XprimeX)?1 Xprimey. (XprimeX)?1 Xprime is an k × n vector. Thus each element of b can be written as a linear combination of y1,···,yn. We call b a linear estimator for this reason. The covariance matrix of b is V ar(b|X) = E bracketleftbig(b?β)(b?β)prime |Xbracketrightbig = E bracketleftBig (XprimeX)?1 XprimeεεprimeX (XprimeX)?1 |X bracketrightBig = (XprimeX)?1 XprimeE (εεprime|X)X (XprimeX)?1 = (XprimeX)?1 Xprime parenleftbigσ2IparenrightbigX (XprimeX)?1 = σ2 (XprimeX)?1 Consider an arbitrary linear estimator of β, b0 = Cy where C is a k ×n matrix. For b0 to be unbiased, we should have E (Cy|X) = E (CXβ + Cε|X) = β. CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 2 For this to hold, CX = I. The covariance matrix of b0 is V ar[b0|X] = σCCprime. Now let D = C ?(XprimeX)?1 Xprime. Since CX = I, DX = CX ?(XprimeX)?1 XprimeX = CX ?I = 0. Using this gives V ar[b0|X] = σ2 parenleftBig D + (XprimeX)?1 Xprime parenrightBigparenleftBig D + (XprimeX)?1 Xprime parenrightBigprime = σ2 (XprimeX)?1 + σ2DDprime = Var[b|X] + σ2DDprime. Since DDprime is a nonnegative definite matrix, Var[b0|X] ≥ V ar[b|X]. (*) That is, for any vector a, aprimeV ar[b0|X]a ≥ aprimeV ar[b|X]a. That is the Gauss—Markov theorem given X. Since (*) holds for every particular X, Var(b) ≤ V ar(b0.) This is the uncondition version of the Gauss—Markov theorem. Note that V ar[b] = Ex [V ar(b|X)] + Varx [E (b|X)] = Ex [V ar(b|X)] = Ex bracketleftBig σ2 (XprimeX)?1 bracketrightBig = σ2Ex bracketleftBig (XprimeX)?1 bracketrightBig . CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 3 4.3 Estimating the variance of the least squares estimator Since σ2 = E (ε2i), a natural estimator of σ2 is ?σ2 = 1n nsummationdisplay i=1 e2i = 1neprimee. But this estimator is biased as discussed now. Since e = My = M (Xβ + ε) = Mε, eprimee = εprimeMε. Thus E [eprimee|X] = E [εprimeMε|X] = E [tr(εprimeMε)|X] = E [tr(Mεεprime)|X] = tr(ME (εεprime|X)) = trparenleftbigMσ2Iparenrightbig = σ2tr(M). But tr(M) = tr bracketleftBig In ?X (XprimeX)?1 Xprime bracketrightBig = tr(In) ?tr parenleftBig (XprimeX)?1 XprimeX parenrightBig = tr(In) ?tr(IK) = n ?K. Therefore, E [eprimee|M] = (n ? K)σ2. and an unbiased estimator of σ2 is s2 = e primee n ? K, not ?σ2. The estimator s2 is also unbiased unconditionally, because Ebracketleftbigs2bracketrightbig = ExbraceleftbigEbracketleftbigs2|Xbracketrightbigbracerightbig = Exparenleftbigσ2parenrightbig = σ2. Using s2, we obtain an estimator of Var[b|X] hatwidestV ar[b|X] = s2 (XprimeX)?1 . The standard error of the estimator bk isradicalBig bracketleftbigs2 (XprimeX)?1bracketrightbig kk CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 4 4.4 Inference under a normality assumption (i) t?test Assume ε ~ N (0,σ2I). Then b|X = β + (XprimeX)?1 Xprimeε|X ~ N parenleftBig β,σ2 (XprimeX)?1 XprimeX (XprimeX)?1 parenrightBig = N parenleftBig β,σ2 (XprimeX)?1 parenrightBig . Recall that Aε ~ N parenleftbig0,Aparenleftbigσ2IparenrightbigAprimeparenrightbig. Each element of b|X is normally distributed bk|X ~ N parenleftBig βk,σ2 (XprimeX)?1kk parenrightBig . Consider the null hypothesis H0 : βk = β0k. The t?test for this null hypothesis is defined by tk = bk ?β 0 kradicalBig s2 (XprimeX)?1kk . Under the normality assumption, bk ?β0kradicalBig σ2 (XprimeX)?1kk ~ N (0,1). In addition (n?K)s2 σ2 = eprimee σ2 = parenleftBigε σ parenrightBigprime M parenleftBigε σ parenrightBig ~ χ2tr(M) = χ2n?K. Furthermore, b?β σ = (X primeX)?1 Xprime parenleftBigε σ parenrightBig is independent of (n?K)s2 σ2 . CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 5 This follows because Cov(e,b|X) = Eparenleftbige(b?β)prime|Xparenrightbig = E bracketleftBig (I ?P)eeprimeX(XprimeX)?1|X bracketrightBig = σ2I(I ?P)X(XprimeX)?1 = 0 which implies Cov(e,b) = 0 andindependenceofeandb,andbecauses2 isafunctionofe.(SeealsoTheoremB?12). Therefore, tk = parenleftbigb k ?β0k parenrightbig/radicalBigσ2 (XprimeX)?1 kkradicalBig (n?K) s2σ2/(n?K) = bk ?β 0 kradicalBig s2 (XprimeX)?1kk has Student’s t?distribution with n?K degrees of freedom. Recall N (0,1) radicalbigχ2 k/k ~tk when N (0,1) and χ2k are independent. We deduce from the distribution of tk P parenleftbigbk ?tα/2sbk ≤βk ≤bk +tα/2sbkparenrightbig = 1?α where sbk = radicalBig s2 (XprimeX)?1kk and tα/2 is the critical value from the t?distribution with (n?K) degrees of freedom. The 100(1-α)% confidence interval for βk is bracketleftbigb k ?tα/2sbk ≤βk ≤bk +tα/2sbk bracketrightbig. (ii) F?test Consider the null hypothesis H0 : Rβ = r where the J ×K matrix R has full row rank. The F?test for this null is defined as F = (Rb?r)prime bracketleftBig R(XprimeX)?1Rprime bracketrightBig?1 (Rb?r)/Js2 CHAPTER 4 FINITE—SAMPLE PROPERTIES OF THE LSE 6 The null distribution of F is F (J,n?K). Example 1 Let K = 2 and H0 : β1 ?β2 = 0. Taking R = parenleftbig 1 ?1 parenrightbig and r = 0, we have F = (b1 ?b2) bracketleftBiggparenleftbig 1 ?1 parenrightbig(XprimeX)?1 parenleftbigg 1 ?1 parenrightbigg?1bracketrightBigg?1 (b1 ?b2)/s2 ~ F (1,n?2). The null distribution follows because 1. Rb?r ~ N parenleftBig 0,σ2R(XprimeX)?1 Rprime parenrightBig or bracketleftBig R(XprimeX)?1 Rprime bracketrightBig?1/2 (Rb?r) ~ N parenleftbig0,σ2Iparenrightbig 2. (n?K)s2 σ2 ~ χ 2 n?K 3. Rb and s2 are independent.