CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 1 Chapter 10 Generalized Least Squares Estimation 10.1 Model y =Xβ+ε E[ε|X] = 0 E[εε′|X] =σ2? = Σ(?>0). 1. Heteroskedasticity σ2? =σ2 ? ?? ?? ? w11 0~ w22 ... 0~ wnn ? ?? ?? ? = ? ?? ?? ? σ21 0~ ... ... 0~ σ2n ? ?? ?? ? 2. Autocorrelation σ2? =σ2 ? ?? ?? 1 ρ1 ··· ρn?1 β1 1 ··· ρn?2 ... ... ρn?1 ··· ··· 1 ? ?? ?? 10.2 OLS and IV estimation ? OLS estimation The OLS estimator can be written as b=β+ (X′X)?1X′ε. 1. Unbiasedness E[b] =EX [E[b|X]] =β. 2. Variance—Coviance Matrix Var[b|X] = Ebracketleftbig(b?β)(b?β)′|Xbracketrightbig = E bracketleftBig (X′X)?1X′εε′X(X′X)?1|X bracketrightBig = (X′X)?1X′parenleftbigσ2?parenrightbigX(X′X)?1. The unconditional variance is EX [Var[b|X]]. Ifε is normally distributed, b|X ~N parenleftBig β,σ2 (X′X)?1X′?X(X′X)?1 parenrightBig . CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 2 3. Consistency Suppose that X′X n P→Q>0 X′?X n P→P >0. Then Var[b|X] = 1n parenleftbiggX′X n parenrightbigg?1 σ2X ′?X n parenleftbiggX′X n parenrightbigg?1 P→ 0 and Var[b] P→ 0. UsingthisandChebyshev’sinequality,wehaveforandα∈Rk?{0}andε>0 P [|α′ (b?β)|>ε] ≤ α ′E(b?β)(b?β)′α ε2 = α ′Var(b)α ε2 → 0 asn→ ∞ which implies b p→β. 4. Asymptotic distribution ofb Assume (Xi,εi) is a sequence of independent observations with E(εε′) = diagparenleftbigσ21,··· ,σ2nparenrightbig = Σ In addition, assume for anyλ∈Rk andδ>0 E|λ′Xiεi|2+δ ≤B for alli. Then, we can apply the CLT for a sequence of independent random variables, with gives summationtextλ′X iεi√ n d→N parenleftBigg 0, limn→∞1n ∞summationdisplay i=1 Eparenleftbigλ′Xiε2iX′iλparenrightbig parenrightBigg . CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 3 But 1 n summationdisplay Eparenleftbigλ′Xiε2iX′iλparenrightbig = 1n summationdisplay EEparenleftbigλ′Xiε2iX′iλ|Xparenrightbig = 1n summationdisplay σ2iλ′E(XiX′i)λ = 1nλ′ summationdisplay σ2iE(XiX′i)λ = plim1nλ′ summationdisplay σ2iXiX′iλ = plim1nλ′ (X′ΣX)λ. Thus summationtext Xiεi√ n d→N(0,P), where P = plim1nX′ΣX, and we obtain √n(b?β) d→Nparenleftbig0,Q?1PQ?1parenrightbig. Whenεi are serially correlated, we need a different set of conditions and CLT to derive the asymptotic normality result. See White’s “Asymptotic Theory for Econometricians” for this. ? IV estimation Assume Z′Z n p→Q ZZ (>0) Z′X n p→Q ZX (negationslash= 0) X′X n p→Q XX (>0) Z′?X n p→Q ZΣX (Z′i,εi)′ isasequenceofindependentrandomvectorswithE(εε′|X) =diag(σ21,··· ,σ2n) = Σ.E|λ′Ziεi|2+δ ≤B for all i for anyλ∈ Rk andδ>0. Then, letting QXXZ = parenleftbigQXZQ?1ZZQZXparenrightbig?1QXZQ?1ZZ CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 4 √n(b IV ?β) d→N(0,Q XXZQZΣZQXXZ). Whenεi areseriallycorrelated, asbefore,weneedmoreassumptionsandadifferent CLT. 10.3 Robust estimation of asymptotic covariance matrices We can still use OLS for inference if its variance—covariance matrix (X′X)?1X′ΣX(X′X)?1 can be estimated. Suppose that Σ =diagparenleftbigσ21,··· ,σ2nparenrightbig. Obviously, σ21,··· ,σ2n cannot be estimated. But what we need is to estimateX′ΣX not Σ. We may write 1 nX ′ΣX = 1 n summationdisplay σ2iXiX′i. This 1 n nsummationdisplay i=1 ε2iXiX′i have the same probability limit by the LLN. We replaceε2i withe2i and, then, have 1 n summationdisplay e2iXiX′i = 1n summationdisplaybracketleftBig εi ?X′i parenleftBig? β?β parenrightBigbracketrightBig2 XiX′i = 1n summationdisplay ε2iXiX′i +op(1). (See White (1980, Econometrica) for details) Thus 1n summationtexte2iXiX′i consistently estimate 1nX′ΣX,and the estimated asymptotic variance— covariance matrixb is parenleftbigg1 nX ′X parenrightbigg?1parenleftbigg1 n summationdisplay e2iXiX′i parenrightbiggparenleftbigg1 nX ′X parenrightbigg?1 p→Q?1PQ?1. CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 5 We can use this result for hypothesis testing. Suppose that the null hypothesis is H0 : Rβ =r. Then, Wald test is defined by W = (Rb?r)′ bracketleftBig R(X′X)?1 summationdisplay e2iXiX′i (X′X)?1R′ bracketrightBig (Rb?r) (heteroskedasticity robust Wald test) and asn→ ∞ W d→χ2 (J), J =rank(R). This follows because W = √n(Rb?r)′ bracketleftBigg R parenleftbiggX′X n parenrightbigg?1parenleftbigg1 n summationdisplay e2iXiX′i parenrightbiggparenleftbiggX′X n parenrightbigg?1 R′ bracketrightBigg?1 √n(Rb?r) d→N(0,I J) ′N(0,I J) = χ2 (J). If the null hypothesis isH0 :βk =β0k, use thet?ratio t= bk ?β 0 k√ Vkk where V = (X′X)?1 summationdisplay e2iXiX′i (X′X)?1. Ast→ ∞ t d→N(0,1). This is White’s heteroskedasticity robust t?ratio. 10.4 GLS Since ?>0, it can be factored as ? =CΛC′ where the columns ofC are the characteristic vectors of ? and the characteristic roots of ? are put in the diagonal matrix Λ. CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 6 LetP′ =CΛ?1/2.Then ??1 = C′?1Λ?1C?1 =CΛ?1C′ =CΛ?1/2Λ?1/2C′ = P′P sinceC′ =C?1.Premultiplying the linear regression model byP,we obtain Py =PXβ+Pε or y? =X?β+ε?. Hence E(ε?ε′?) = PE(εε′)P′ =σ2P?P′ = σ2Λ?1/2C′CΛC′CΛ?1/2 = σ2I. The transformed model satisfies the conditions of the classical linear regression model. Hence ?βGLS = (X′?X?)?1X′?y? = (X′P′PX)?1X′P′Py = parenleftbigX′??1Xparenrightbig?1X′??1y. This estimator is called the generalized least squares estimator. The properties of the GLS estimator 1. IfE[ε?|X?] = 0,E bracketleftBig? βGLS bracketrightBig =β. 2. If 1nX′?X? p→Q? (>0), ?βGLS p→β. 3. √n parenleftBig? βGLS ?β parenrightBig d →N(0,σ2Q?1? ). The GLS estimator ?βGLS is the BLUE. Example 1 yt =β′Xt +εt εt =ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1. CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 7 Eεε′ = σ 2 1?ρ2 ? ?? ?? ?? 1 ρ ρ2 ··· ρT?1 ρ 1 ··· ρT?2 ... ... ... ... ρT?1 1 ? ?? ?? ?? = σ2?. ??1 = ? ?? ?? ?? ? 1 ?ρ 0~ ?ρ 1 +ρ2 ?ρ ... ... ... ?ρ 1 +ρ2 ?ρ 0~ ?ρ 1 ? ?? ?? ?? ? The transformation matrix is P = ? ?? ?? ? radicalbig1?ρ2 0 ~ ?ρ 1 ... ... 0~ ?ρ 1 ? ?? ?? ? The transformed model is radicalbig 1?ρ2y1 = radicalbig 1?ρ2X′1β+ε?1 yt ?ρyt?1 = (Xt ?ρXt?1)′β+ut where ε?1 = radicalbig1?ρ2ε1. Since 1. Eε?i =Eut = 0 2. Var(ε?1) = (1?ρ2)Var(ε1) = (1?ρ2)· σ21?ρ2 =σ2 3. E(ε?1ut) = radicalbig1?ρ2E(ε1ut) = 0,t= 2,··· ,n the error terms of the transformed satisfy the condition of the standard linear regres- sion model. The GLS estimator ?βGLS depends on the unknown parameters associated with ? and, therefore, cannot be used in practice. Suppose that ?? p→ ?. Then the feasible GLS estimator is defined by ?βFG = parenleftBigX′???1XparenrightBig?1X′???1y. If 1 nX ′??1X? 1 nX ′???1X p→ 0 CHAPTER 10 GENERALIZED LEAST SQUARES ESTIMATION 8 and 1 nX ′??1ε? 1 nX ′???1ε p→ 0, ?βG and ?βFG have the same asymptotic distribution. Example 2 AR(1) error yt = β′Xt +εt εt = ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1. 1. Run OLS and get ?εt. 2. Run AR(1) regression using ?εt. This gives ?ρ. 3. Transform the model using ?ρ and run OLS. 10.5 Equivalence of GLS and OLS Let X′X and Σ be both positive definite. Then the following statements are equivalent (A) (X′X)?1X′ΣX(X′X)?1 = (X′Σ?1X)?1. (B) ΣX =XB for some nonsingularB. (C) (X′X)?1X′ = (X′Σ?1X)?1X′Σ?1. Example 3 yt = β1 +β2t+εt εt = ρεt?1 +ut, ut ~iidparenleftbig0,σ2parenrightbig, |ρ|<1. Σ = σ 2 1?ρ2 ? ?? ?? 1 ρ ··· ρn?1 ρ 1 ··· ρn?2 ... ρn?1 1 ? ?? ?? Then ΣX ?XA (This is an exercise problem) for some nonsingularA.Thus, OLS and GLS for this model are asymptotically equivalent.