CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 1
Chapter 5 Large—sample properties of the LSE
5.1 Stochastic convergence
Suppose that {Xn} is a sequence of random variables with a corresponding sequence of
distribution functions {Fn}.
If Fn (x) → F (x) at every continuity point x of F, Fn is said to converge weakly to
F, written Fn ? F. In this case, {Xn} is said to converge in distribution to X where X
is a random variable with distribution function F, written Xn d→ X.
If X is a random variable, and for all ε > 0
limn→∞P (|Xn ?X| < ε) = 1,
Xn is said to converge in probability to X, written Xn P→ X. X is known as the probability
limit of Xn, written X =plimXn.
If
limE (Xn ?X)2 = 0,
Xn is said to converge in mean square to X, written Xn m.s.→ X.
Some useful results regarding stochastic convergence are:
1. Xn P→ X and g(·) is a continuous function
? g(Xn) P→ g(X).
Example 1 Let
Xn =
braceleftbigg 1 with probability 1
n0 with probability 1? 1
n
.
Obviously, Xn P→ 0. Let g(x) = x + 1. Then, g(Xn) P→ g(0) = 1.
2. Suppose that Yn d→ Y and Xn P→ c (a constant). Then
(a) Xn + Yn d→ c + Y
(b) XnYn d→ cY
(c) YnXn d→ Yc when c negationslash= 0.
3. Xn d→ X and g(·) is continuous
? g(Xn) d→ g(X).
(This is called continuous mapping theorem)
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 2
Example 2 If Xn d→ N (0,1), X2n d→ χ2 (1).
4. Xn ?Yn P→ 0 and Xn d→ X.
? Yn d→ X.
5. Xn P→ X implies Xn d→ X.
(The converse is not necessarily true.)
6. Xn d→ c (a constant)
? Xn P→ c.
7. Xn m.s.→ X
? Xn P→ X.
If for any ε > 0, there exists Bε < ∞ such that
P
parenleftbigg|X
n|
nr > Bε
parenrightbigg
< ε
for all n ≥ 1, write Xn = Op (nr). parenleftbigXnnr is stochastically boundedparenrightbig
If plimXnnr = 0, write Xn = op (nr).
The weak law of large numbers
1. Let {Xi,i ≥ 1} be a sequence of i.i.d. r.v.s with
EX1 < ∞.
Then
1
n
nsummationdisplay
i=1
Xi P→ EX1 as n →∞.
2. Let {Xi,i ≥ 1} be sequence of independent r.v.s with EXi = m. If E|Xi|1+δ ≤ B <
∞ (δ > 0) for all i. Then
1
n
nsummationdisplay
i=1
Xi P→ m as n →∞.
Example 3 Let εi ~ iid(0,σ2). Then
1
n
nsummationdisplay
i=1
εi P→ Eε1 = 0.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 3
The central limit theorem
1. Let{Xi,i≥ 1}be a sequence of i.i.d. r.v.s withE(X1) =μandVar(X1) =σ2 negationslash= 0.
Then summationtextn
i=1 (Xi ?μ)
σ√n
d→N(0,1) as n→∞.
2. Let {Xi,i≥ 1} be a sequence of independent r.v.s with mean μi and variance σ2i,
and let ˉσ2n = 1n summationtextni=1σ2i. If
max1≤i≤nbracketleftbigE|Xi ?μi|2+δbracketrightbig 12+δ
ˉσn ≤B<∞ (δ>0)
for all n, summationtextn
i=1 (Xi ?μi)
ˉσn√n
d→N(0,1).
Example 4 Let Xi ~iidB(1,p). Then EX1 =p and Var(X1) =p(1?p). Thus,
summationtextn
i=1 (Xi ?p)radicalbig
p(1?p)√n
d→N(0,1).
For vector sequences, we use the following result known as the Cramer—Wold device.
If {Xn} is a sequence of random vectors, Xn d→X iff λprimeXn d→λprimeX for any vector λ.
Example 5 Let Xi
m×1
~iid(0,Σ). Then,
summationtextX
i√
n
d→N(0,Σ).
5.2 Consistency of b
Assume
1. (Xi,εi) is a sequence of independent observations.
2. 1n summationtextni=1XiXprimei parenleftbig= 1nXprimeXparenrightbig P→Q= limn→∞ 1n summationtextni=1E(XiXprimei) (>0).
3. For any λ∈Rk and δ>0,E|λprimeXiεi|2+δ ≤B<∞ for all i.
The least squares estimator b may be written as
b=β+
parenleftbigg1
n
summationdisplay
XiXprimei
parenrightbigg?1parenleftbigg1
n
summationdisplay
Xiεi
parenrightbigg
Consider for λ∈Rk 1
n
summationdisplay
λprimeXiεi = 1n
summationdisplay
wi.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 4
Then, wi is an independent sequence with
E(wi) = EE (λprimeXiεi|X) = 0.
In addition,
Eparenleftbigw2iparenrightbig = E(λprimeXiεi)2 ≤ C < ∞
which implies E|wi|1+δ ≤ D < ∞ for all i.
Lyapounov’s inequality For 0 < α ≤ β, (E|X|α)1/α ≤
parenleftBig
E|X|β
parenrightBig1/β
.
Thus, by the WLLN for an independent sequence,
1
n
summationdisplay
wi P→ 0.
Since this holds for any λ,
1
n
nsummationdisplay
i=1
Xiεi P→ 0
and we have
b P→ β +Q?1 ·0 = β.
5.3 Asymptotic normality of the least squares estimator
Write
b?β =
parenleftBigsummationdisplay
XiXprimei
parenrightBig?1summationdisplay
Xiεi
or
√n(b?β) = parenleftbigg1
n
summationdisplay
XiXprimei
parenrightbigg?1parenleftbigg 1
√n
summationdisplay
Xiεi
parenrightbigg
.
Since 1n summationtextXiXprimei P→ Q by assumption, we need to show that 1√n summationtextXiεi is normally dis-
tributed in the limit.
Consider for λ ∈Rk,
1√
n
summationdisplay
λprimeXiεi = 1√n
summationdisplay
wi.
We wish to check the conditions of the CLT for a sequence of independent r.v.’s.
1. E (wi) = 0 as before.
2. parenleftbigE|wi|2+δparenrightbig 12+δ ≤ B 12+δ for all i and ˉσ2n = 1n summationtextEw2i ≤ B.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 5
Thus summationtextw
i
ˉσn√n
d→ N (0,1).
Since
ˉσ2n = 1n
summationdisplay
Eparenleftbigw2iparenrightbig
= 1n
summationdisplay
E (λprimeXiεiεprimeiXprimeiλ)
= 1n
summationdisplay
EEparenleftbigλprimeXiε2iXprimeiλ|Xparenrightbig
= 1n
summationdisplay
EparenleftbigλprimeXiEparenleftbigε2i|XparenrightbigXprimeiλparenrightbig
= σ
2
n λ
prime
nsummationdisplay
i=1
XiXprimeiλ
→ σ2λprimeQλ,
this result can be written as summationtext
wi√
n
d→ N parenleftbig0,σ2λprimeQλparenrightbig,
which implies summationtext
Xiεi√
n
d→ N parenleftbig0,σ2Qparenrightbig.
Using this and the given assumption, we have
√n(b?β) d→ N parenleftbig0,σ2Q?1parenrightbig.
(Recall that XnYn d→ cY if Xn P→ c (a constant) and Yn d→ Y)
Some authors write this result as
b similarequal N
parenleftbigg
β, 1n parenleftbigσ2Q?1parenrightbig
parenrightbigg
.
That is, b is approximately normal with meanβ and variance—covariance matrix 1n (σ2Q?1).
5.4 Consistency of s2
Write
s2 = 1n?KεprimeMε
= 1n?K
bracketleftBig
εprimeε?εprimeX (XprimeX)?1Xprimeε
bracketrightBig
= nn?K
bracketleftBigg
εprimeε
n ?
εprimeX
n
parenleftbiggXprimeX
n
parenrightbigg?1 Xprimeε
n
bracketrightBigg
.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 6
Because
εprimeε
n =
1
n
nsummationdisplay
i=1
ε2i P→ σ2
εprimeX
n =
1
n
summationdisplay
Xiεi P→ 0
XprimeX
n =
1
n
summationdisplay
XiXprimei → Q
and n
n?K → 1,
s2 P→ σ2.
That is, σ2 is consistently by s2. Alternatively, we may use
?σ2 = 1n?εprime?ε.
This is also consistent.
5.5 Asymptotic distribution of a function of b.
Let f (b) be a vector of J continuous and continuously differentiable functions of b. We
want to find the limiting distribution of f (b). By the Taylor expansion
f (b) = f (β) + ?f (β)?βprime (b?β) + remainder.
?f(β)
?βprime is a matrix of the form
?f1(β)
?β1 ···
?f1(β)
?βk.
..
?fJ(β)
?β1 ···
?fJ(β)
?βk
= Γ.
The remainder term becomes negligible if b P→ β. Thus
√n(f (b)?f (β)) d→ N parenleftbig0,Γparenleftbigσ2Q?1parenrightbigΓprimeparenrightbig.
That is, f (b) also has a normal distribution in the limit.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 7
Example 6 Suppose √
n(b?β) d→ N parenleftbig0,σ2Q?1parenrightbig.
Whatisthedistributionof b21 +···+ b2k?
Here
f (b) = b21 +···b2k
and
Γ = [2β1,···,2βk].
Thus √
nparenleftbigb21 +···+ b2k ?parenleftbigβ21 +···+ β2kparenrightbigparenrightbig d→ N parenleftbig0,Q2ΓQ?1Γprimeparenrightbig.
5.6Moregeneralassumptionontheregressors
We have assumed (Xi) is a sequence of independent observations. This assumption may
occasionally be violated in practice. For example, consider the autoregressive model of
order p
yt = α1yt?1 +···+ αpyt?p + εt
where εt ~ iid(0,σ2). Here, the regressors are correlated over time. Still, consistency and
asymptotic normality of the OLS estimator follows if we make a few extra assumptions.
These should be dealt with in a more advanced course.
5.7Instrumentalvariablesestimation
We have assumed
E (εi|X) = 0
which implies E (εiXi) = 0. There are many examples for the violation of this assumption.
Example 7 (Simultaneousequations)
Let Ct : consumptionattime t
Yt : incomeattime t
It : investmentattime t
TheKeynesianconsumptionfunctionis
Ct = α + βYt + εt.
But Yt = Ct + It. Usingthesetwoequations,wehave
Yt = α + βYt + εt + It ? Yt = 11?β (α + εt + It).
Thus Yt and εt arecorrelated.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 8
Example 8 (Autoregressive Moving Average model)
Consider the ARMA(1,1) model
yt = αyt?1 + εt + θεt?1
εt ~ iidparenleftbig0,σ2parenrightbig
|α| < 1,|θ| < 1.
Writing
yt = αyt?1 + εt + θεt?1
αyt?1 = α2yt?2 + αεt?1 + αθεt?2
...
and adding all of these equations, we obtain
yt = εt + (θ + α)εt?1 + α(θ + α)εt?2 + ···.
Thus yt?1 and εt?1 are correlated.
Example 9 (Measurement error)
Let the true regression model be
yi = α + βxi + εi.
Suppose that we observe
x?i = xi + wi parenleftbigwi ~ iidparenleftbig0,σ2wparenrightbigparenrightbig
instead of xi due to measurement error. Then, the regression model we use will be
yi = α + β (x?i ? wi) + εi
= α + βx?i + εi ? βwi.
Obviously, x?i and the error terms are correlated.
Example 10 (Dynamic pandel data model)
Panel data: collection of time series and cross—sectional observations.
Example 11 Collection of income survey over a period of time, collection of stock indices
over a period of time, etc.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 9
Let
yit = δyi,t?1 + xprimeitβ + uit
uit = ui + vit (one—way error component model)
where ui ~ iid(0,σ2u) and vit ~ iid(0,σ2v). ui is called unobserved individual effect vari-
able. Since yi,t?1 is a function of ui, it is correlated with uit. Thus, OLS estimator is
inconsistent.
Suppose that a sequence of K ×1 vector (Zi) satisfy
1
n
summationdisplay
Ziεi p→ 0
and 1
n
summationdisplay
XiZprimei p→ QZX.
Then,
summationdisplay
yiZi =
summationdisplay
ZiXprimeiβ +
summationdisplay
Ziεi
p→ Q
ZXβ
Thus, as an estimator of β, we consider
bIV = (ZprimeX)?1 Zprimey.
Assume
1. E (ε|X) negationslash= 0.
2. 1nZprimeX p→ QZX with rank QZX = K.
3. 1nZprimeZ → QZZ (> 0).
4. E (ε|Z) = 0.
5. E (εεprime|Z) = σ2I.
6. (Zi,εi) is a sequence of independent observations.
7. For any λ ∈Rk and δ > 0, E|λprimeXiεi|2+δ ≤ β < ∞ for all i.
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 10
Write the IV estimator as
bIV = β + (ZprimeX)?1 Zprimeε.
Since 1nZprimeX p→QZX and 1nZprimeε p→0, bIV p→β as n→∞.
In addition,
1√
nZ
primeε d→N parenleftbig0,σ2QZZparenrightbig,
which implies √
n(bIV ?β) d→N parenleftbig0,σ2Q?1ZXQZZQ?1XZparenrightbig.
As for the OLS estimation, a natural estimator of σ2 is
?σ2 = 1n
summationdisplay
(yi?XprimeibIV )2 .
We can show that
?σ2 p→σ2 as n→∞.
So far, number of instruments = number of regressors. What if number of instruments
> number of regressors? Then we use
bIV =
bracketleftBig
XprimeZ (ZprimeZ)?1 ZprimeX
bracketrightBig?1
XprimeZ (ZprimeZ)?1 Zprimey.
This is equivalent to parenleftBig
?Xprime ?XparenrightBig?1 ?Xprimey
where
?X = Z (ZprimeZ)?1 ZprimeX.
(part of X explained by Z)
The estimator is called the 2?stage least squares estimator. Its asymptotic properties
are:
1. bIV p→β
2. √n(bIV ?β) d→N parenleftbig0,σ2Q?1ZXQZZQ?1XZparenrightbig
CHAPTER 5 LARGE—SAMPLE PROPERTIES OF THE LSE 11
5.8 Durbin—Wu—Husman test
SeeDurbin(1954, Review of the International Statistical Institute), Wu(1973, Econometrica),
and Hausman (1978, Econometrica).
The null hypothesis for the DHS test is
H0 : E (Xprimeε) = 0
Under H0, bIV ?b p→0. If H0 is violated, bIV ?b p→δ (negationslash= 0). Thus, the DHS test is based
on d = bIV ?b. Since the asymptotic variance—covariance matrix of d is
Asy.Var(bIV )?Asy.Var(b),
DHS =
dprime
bracketleftBigbracketleftbig
XprimeZ (ZprimeZ)?1 ZprimeXbracketrightbig?1 ?(XprimeX)?1
bracketrightBig
d
s2
as n→∞,
DHS d→χ2K.
If only J elements of Xprimeε satisfy
plimn→∞1nXprimeε = 0,
DHS d→χ2J.