CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 1 Chapter 5 Large—sample properties of the LSE 5.1 Stochastic convergence Suppose that {Xn} is a sequence of random variables with a corresponding sequence of distribution functions {Fn}. If Fn (x) → F (x) at every continuity point x of F, Fn is said to converge weakly to F, written Fn ? F. In this case, {Xn} is said to converge in distribution to X where X is a random variable with distribution function F, written Xn d→ X. If X is a random variable, and for all ε > 0 limn→∞P (|Xn ?X| < ε) = 1, Xn is said to converge in probability to X, written Xn P→ X. X is known as the probability limit of Xn, written X =plimXn. If limE (Xn ?X)2 = 0, Xn is said to converge in mean square to X, written Xn m.s.→ X. Some useful results regarding stochastic convergence are: 1. Xn P→ X and g(·) is a continuous function ? g(Xn) P→ g(X). Example 1 Let Xn = braceleftbigg 1 with probability 1 n0 with probability 1? 1 n . Obviously, Xn P→ 0. Let g(x) = x + 1. Then, g(Xn) P→ g(0) = 1. 2. Suppose that Yn d→ Y and Xn P→ c (a constant). Then (a) Xn + Yn d→ c + Y (b) XnYn d→ cY (c) YnXn d→ Yc when c negationslash= 0. 3. Xn d→ X and g(·) is continuous ? g(Xn) d→ g(X). (This is called continuous mapping theorem) CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 2 Example 2 If Xn d→ N (0,1), X2n d→ χ2 (1). 4. Xn ?Yn P→ 0 and Xn d→ X. ? Yn d→ X. 5. Xn P→ X implies Xn d→ X. (The converse is not necessarily true.) 6. Xn d→ c (a constant) ? Xn P→ c. 7. Xn m.s.→ X ? Xn P→ X. If for any ε > 0, there exists Bε < ∞ such that P parenleftbigg|X n| nr > Bε parenrightbigg < ε for all n ≥ 1, write Xn = Op (nr). parenleftbigXnnr is stochastically boundedparenrightbig If plimXnnr = 0, write Xn = op (nr). The weak law of large numbers 1. Let {Xi,i ≥ 1} be a sequence of i.i.d. r.v.s with EX1 < ∞. Then 1 n nsummationdisplay i=1 Xi P→ EX1 as n →∞. 2. Let {Xi,i ≥ 1} be sequence of independent r.v.s with EXi = m. If E|Xi|1+δ ≤ B < ∞ (δ > 0) for all i. Then 1 n nsummationdisplay i=1 Xi P→ m as n →∞. Example 3 Let εi ~ iid(0,σ2). Then 1 n nsummationdisplay i=1 εi P→ Eε1 = 0. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 3 The central limit theorem 1. Let{Xi,i≥ 1}be a sequence of i.i.d. r.v.s withE(X1) =μandVar(X1) =σ2 negationslash= 0. Then summationtextn i=1 (Xi ?μ) σ√n d→N(0,1) as n→∞. 2. Let {Xi,i≥ 1} be a sequence of independent r.v.s with mean μi and variance σ2i, and let ˉσ2n = 1n summationtextni=1σ2i. If max1≤i≤nbracketleftbigE|Xi ?μi|2+δbracketrightbig 12+δ ˉσn ≤B<∞ (δ>0) for all n, summationtextn i=1 (Xi ?μi) ˉσn√n d→N(0,1). Example 4 Let Xi ~iidB(1,p). Then EX1 =p and Var(X1) =p(1?p). Thus, summationtextn i=1 (Xi ?p)radicalbig p(1?p)√n d→N(0,1). For vector sequences, we use the following result known as the Cramer—Wold device. If {Xn} is a sequence of random vectors, Xn d→X iff λprimeXn d→λprimeX for any vector λ. Example 5 Let Xi m×1 ~iid(0,Σ). Then, summationtextX i√ n d→N(0,Σ). 5.2 Consistency of b Assume 1. (Xi,εi) is a sequence of independent observations. 2. 1n summationtextni=1XiXprimei parenleftbig= 1nXprimeXparenrightbig P→Q= limn→∞ 1n summationtextni=1E(XiXprimei) (>0). 3. For any λ∈Rk and δ>0,E|λprimeXiεi|2+δ ≤B<∞ for all i. The least squares estimator b may be written as b=β+ parenleftbigg1 n summationdisplay XiXprimei parenrightbigg?1parenleftbigg1 n summationdisplay Xiεi parenrightbigg Consider for λ∈Rk 1 n summationdisplay λprimeXiεi = 1n summationdisplay wi. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 4 Then, wi is an independent sequence with E(wi) = EE (λprimeXiεi|X) = 0. In addition, Eparenleftbigw2iparenrightbig = E(λprimeXiεi)2 ≤ C < ∞ which implies E|wi|1+δ ≤ D < ∞ for all i. Lyapounov’s inequality For 0 < α ≤ β, (E|X|α)1/α ≤ parenleftBig E|X|β parenrightBig1/β . Thus, by the WLLN for an independent sequence, 1 n summationdisplay wi P→ 0. Since this holds for any λ, 1 n nsummationdisplay i=1 Xiεi P→ 0 and we have b P→ β +Q?1 ·0 = β. 5.3 Asymptotic normality of the least squares estimator Write b?β = parenleftBigsummationdisplay XiXprimei parenrightBig?1summationdisplay Xiεi or √n(b?β) = parenleftbigg1 n summationdisplay XiXprimei parenrightbigg?1parenleftbigg 1 √n summationdisplay Xiεi parenrightbigg . Since 1n summationtextXiXprimei P→ Q by assumption, we need to show that 1√n summationtextXiεi is normally dis- tributed in the limit. Consider for λ ∈Rk, 1√ n summationdisplay λprimeXiεi = 1√n summationdisplay wi. We wish to check the conditions of the CLT for a sequence of independent r.v.’s. 1. E (wi) = 0 as before. 2. parenleftbigE|wi|2+δparenrightbig 12+δ ≤ B 12+δ for all i and ˉσ2n = 1n summationtextEw2i ≤ B. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 5 Thus summationtextw i ˉσn√n d→ N (0,1). Since ˉσ2n = 1n summationdisplay Eparenleftbigw2iparenrightbig = 1n summationdisplay E (λprimeXiεiεprimeiXprimeiλ) = 1n summationdisplay EEparenleftbigλprimeXiε2iXprimeiλ|Xparenrightbig = 1n summationdisplay EparenleftbigλprimeXiEparenleftbigε2i|XparenrightbigXprimeiλparenrightbig = σ 2 n λ prime nsummationdisplay i=1 XiXprimeiλ → σ2λprimeQλ, this result can be written as summationtext wi√ n d→ N parenleftbig0,σ2λprimeQλparenrightbig, which implies summationtext Xiεi√ n d→ N parenleftbig0,σ2Qparenrightbig. Using this and the given assumption, we have √n(b?β) d→ N parenleftbig0,σ2Q?1parenrightbig. (Recall that XnYn d→ cY if Xn P→ c (a constant) and Yn d→ Y) Some authors write this result as b similarequal N parenleftbigg β, 1n parenleftbigσ2Q?1parenrightbig parenrightbigg . That is, b is approximately normal with meanβ and variance—covariance matrix 1n (σ2Q?1). 5.4 Consistency of s2 Write s2 = 1n?KεprimeMε = 1n?K bracketleftBig εprimeε?εprimeX (XprimeX)?1Xprimeε bracketrightBig = nn?K bracketleftBigg εprimeε n ? εprimeX n parenleftbiggXprimeX n parenrightbigg?1 Xprimeε n bracketrightBigg . CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 6 Because εprimeε n = 1 n nsummationdisplay i=1 ε2i P→ σ2 εprimeX n = 1 n summationdisplay Xiεi P→ 0 XprimeX n = 1 n summationdisplay XiXprimei → Q and n n?K → 1, s2 P→ σ2. That is, σ2 is consistently by s2. Alternatively, we may use ?σ2 = 1n?εprime?ε. This is also consistent. 5.5 Asymptotic distribution of a function of b. Let f (b) be a vector of J continuous and continuously differentiable functions of b. We want to find the limiting distribution of f (b). By the Taylor expansion f (b) = f (β) + ?f (β)?βprime (b?β) + remainder. ?f(β) ?βprime is a matrix of the form ?f1(β) ?β1 ··· ?f1(β) ?βk. .. ?fJ(β) ?β1 ··· ?fJ(β) ?βk = Γ. The remainder term becomes negligible if b P→ β. Thus √n(f (b)?f (β)) d→ N parenleftbig0,Γparenleftbigσ2Q?1parenrightbigΓprimeparenrightbig. That is, f (b) also has a normal distribution in the limit. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 7 Example 6 Suppose √ n(b?β) d→ N parenleftbig0,σ2Q?1parenrightbig. Whatisthedistributionof b21 +···+ b2k? Here f (b) = b21 +···b2k and Γ = [2β1,···,2βk]. Thus √ nparenleftbigb21 +···+ b2k ?parenleftbigβ21 +···+ β2kparenrightbigparenrightbig d→ N parenleftbig0,Q2ΓQ?1Γprimeparenrightbig. 5.6Moregeneralassumptionontheregressors We have assumed (Xi) is a sequence of independent observations. This assumption may occasionally be violated in practice. For example, consider the autoregressive model of order p yt = α1yt?1 +···+ αpyt?p + εt where εt ~ iid(0,σ2). Here, the regressors are correlated over time. Still, consistency and asymptotic normality of the OLS estimator follows if we make a few extra assumptions. These should be dealt with in a more advanced course. 5.7Instrumentalvariablesestimation We have assumed E (εi|X) = 0 which implies E (εiXi) = 0. There are many examples for the violation of this assumption. Example 7 (Simultaneousequations) Let Ct : consumptionattime t Yt : incomeattime t It : investmentattime t TheKeynesianconsumptionfunctionis Ct = α + βYt + εt. But Yt = Ct + It. Usingthesetwoequations,wehave Yt = α + βYt + εt + It ? Yt = 11?β (α + εt + It). Thus Yt and εt arecorrelated. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 8 Example 8 (Autoregressive Moving Average model) Consider the ARMA(1,1) model yt = αyt?1 + εt + θεt?1 εt ~ iidparenleftbig0,σ2parenrightbig |α| < 1,|θ| < 1. Writing yt = αyt?1 + εt + θεt?1 αyt?1 = α2yt?2 + αεt?1 + αθεt?2 ... and adding all of these equations, we obtain yt = εt + (θ + α)εt?1 + α(θ + α)εt?2 + ···. Thus yt?1 and εt?1 are correlated. Example 9 (Measurement error) Let the true regression model be yi = α + βxi + εi. Suppose that we observe x?i = xi + wi parenleftbigwi ~ iidparenleftbig0,σ2wparenrightbigparenrightbig instead of xi due to measurement error. Then, the regression model we use will be yi = α + β (x?i ? wi) + εi = α + βx?i + εi ? βwi. Obviously, x?i and the error terms are correlated. Example 10 (Dynamic pandel data model) Panel data: collection of time series and cross—sectional observations. Example 11 Collection of income survey over a period of time, collection of stock indices over a period of time, etc. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 9 Let yit = δyi,t?1 + xprimeitβ + uit uit = ui + vit (one—way error component model) where ui ~ iid(0,σ2u) and vit ~ iid(0,σ2v). ui is called unobserved individual effect vari- able. Since yi,t?1 is a function of ui, it is correlated with uit. Thus, OLS estimator is inconsistent. Suppose that a sequence of K ×1 vector (Zi) satisfy 1 n summationdisplay Ziεi p→ 0 and 1 n summationdisplay XiZprimei p→ QZX. Then, summationdisplay yiZi = summationdisplay ZiXprimeiβ + summationdisplay Ziεi p→ Q ZXβ Thus, as an estimator of β, we consider bIV = (ZprimeX)?1 Zprimey. Assume 1. E (ε|X) negationslash= 0. 2. 1nZprimeX p→ QZX with rank QZX = K. 3. 1nZprimeZ → QZZ (> 0). 4. E (ε|Z) = 0. 5. E (εεprime|Z) = σ2I. 6. (Zi,εi) is a sequence of independent observations. 7. For any λ ∈Rk and δ > 0, E|λprimeXiεi|2+δ ≤ β < ∞ for all i. CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 10 Write the IV estimator as bIV = β + (ZprimeX)?1 Zprimeε. Since 1nZprimeX p→QZX and 1nZprimeε p→0, bIV p→β as n→∞. In addition, 1√ nZ primeε d→N parenleftbig0,σ2QZZparenrightbig, which implies √ n(bIV ?β) d→N parenleftbig0,σ2Q?1ZXQZZQ?1XZparenrightbig. As for the OLS estimation, a natural estimator of σ2 is ?σ2 = 1n summationdisplay (yi?XprimeibIV )2 . We can show that ?σ2 p→σ2 as n→∞. So far, number of instruments = number of regressors. What if number of instruments > number of regressors? Then we use bIV = bracketleftBig XprimeZ (ZprimeZ)?1 ZprimeX bracketrightBig?1 XprimeZ (ZprimeZ)?1 Zprimey. This is equivalent to parenleftBig ?Xprime ?XparenrightBig?1 ?Xprimey where ?X = Z (ZprimeZ)?1 ZprimeX. (part of X explained by Z) The estimator is called the 2?stage least squares estimator. Its asymptotic properties are: 1. bIV p→β 2. √n(bIV ?β) d→N parenleftbig0,σ2Q?1ZXQZZQ?1XZparenrightbig CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 11 5.8 Durbin—Wu—Husman test SeeDurbin(1954, Review of the International Statistical Institute), Wu(1973, Econometrica), and Hausman (1978, Econometrica). The null hypothesis for the DHS test is H0 : E (Xprimeε) = 0 Under H0, bIV ?b p→0. If H0 is violated, bIV ?b p→δ (negationslash= 0). Thus, the DHS test is based on d = bIV ?b. Since the asymptotic variance—covariance matrix of d is Asy.Var(bIV )?Asy.Var(b), DHS = dprime bracketleftBigbracketleftbig XprimeZ (ZprimeZ)?1 ZprimeXbracketrightbig?1 ?(XprimeX)?1 bracketrightBig d s2 as n→∞, DHS d→χ2K. If only J elements of Xprimeε satisfy plimn→∞1nXprimeε = 0, DHS d→χ2J.