Ch. 4 Asymptotic Theory
From the discussion of last Chapter it is obvious that determining the dis-
tribution of h(X1; X2; :::; XT ) is by no means a trival exercise. It turns out that
more often than not we cannot determine the distribution exactly. Because of the
importance of the problem, however, we are forced to develop approximations;
the subject of this Chapter.
This Chaper will cover the limit theorem. The terms ’limit theorems’ refers
to several theorems in probability theory under the generic names, ’laws of large
numbers’ (LLN) and ’central limit theorem’ (CLT). These limit theorem consis-
tute one of the most important and elegent chapters of probability theory and
play a crucial roal in statistical inferences.
1 Consistency
In this section we introduce the concepts needed to analyzed the behaviors of a
random variable indexed by the size of a sample, say ^ T , as T !1.
1.1 Limits
De nition:
Let fbTgT1 , or just fbTg be a sequence of real numbers. If there exists a real num-
ber b and if for every > 0 there exist an integer N( ) such that for all T N,
jbT bj < , then b is the limit of the sequence fbTg.
In this de nition the constant can take on any real value, but it is the very
small values of that provide the de nition with its impact. By choosing a very
small , we ensure that bT gets arbitrarily close to its limit b for all T that are suf-
ciently large. When a limit exists, we say that the sequence fbTg converges to b
as T tends to in nity, written as bT ! b as T !1. We also write b = limT!1 bT .
When no ambiguity is possible, we simply write bT ! b or b = lim bT .
Example:
Let
aT = 2
T ( 1)T
2T :
1
Here 1 = limT!1 aT , for
jaT 1j =
2T ( 1)T
2T 1
= 1
2T :
Since by binomial theorem we have
2T = (1 + 1)T = 1 + T + T(T + 1)2 +1 > T:
Hence, if we choose N = 1= or large, we have, for T > N,
jaT 1j = 12T < 1T < 1N :
This complete the solution.
The concept of a limit extends directly to sequences of real vectors. Let bT
be a k 1 vector with real elements bTi, i = 1; :::; k. If bTi ! bi; i = 1; :::; k; then
bT ! b, where b has elements bi; i = 1; :::; k. An analogous extensions applies
to matrices.
De nition:
Given g : Rk !Rl (k; l 2 N) and b 2 Rk. The function g is continous at b if for
any sequence fbTg such that bT ! b, g(bT) ! g(b).
The following de nition compares the behavior of a sequence fbTg with the
behavior of a power of T, say T , where is chosen so that fbTg and fT g behave
similarly.
De nition:
(i). The sequence fbTg is at most of order T , denoted bT = O(T ), if for some
nite real number 4 > 0, there existes a nite interger N such that for all T N,
jT bTj < 4:
(ii). The sequence fbTg is of order smaller than T , denoted bT = o(T ), if for
every real number > 0, there existes a nite interger N( ) such that for all
T N( ), jT bTj < ; i:e:; T bT ! 0:
As we have de ned these notations, bT = O(T ), if fT bTg is eventually
bounded, whereas bT = o(T ) if T bT ! 0. Obviously, if bT = o(T ), then
2
bT = O(T ). Furture, if bT = O(T ), then for every > 0, bT = o(T + ). When
bT = O(T 0), it is simply (eventually) bounded and may or may not have a limit.
We often write O(1) in place of O(T 0). Similarly, bT = o(1) means bT ! 0.
If each element of a vector or matrix is O(T ) or o(T ), then that vector or
matrix is O(T ) or o(T ).
Proposition:
Let aT and bT be scalar.
(i). If aT = O(T ) and bT = O(T ), then aT bT = O(T + ) and aT + bT = O(T ),
where = max[ ; ].
(ii). If aT = o(T ) and bT = o(T ), then aT bT = o(T + ) and aT + bT = o(T ),
where = max[ ; ].
(iii). If aT = O(T ) and bT = o(T ), then aT bT = o(T + ) and aT + bT = O(T ),
where = max[ ; ].
1.2 Almost Sure Convergence
The stochastic convergence concept most closely related to the limit notations
previously discussed is that of almost sure convergence. Recall our discussing a
real-valued random variables bT , we are in fact talking a mapping bT : S ! R.
we let s be a typical element of sample space S, and call the real number bT (s) a
realization of the random variables.
Interest will often center on average such as
bT ( ) = T 1
TX
t=1
Zt( ):
De nition:
Let fbT ( )g be a sequence of real-valued random variables. We say that bT ( )
converges almost surely to b, written bT ( ) a:s: ! b if there exists a real number b
such that Prfs : bT (s) ! bg = 1. When no ambiguity is possible, we may simply
write bT a:s: ! b.
A sequence bT converges almost surely if the probability of obtaining a realiza-
tion of the sequence fZtg for which convergence to b occurs is unity. Equivalently,
3
the probability of observing a realization of fZtg for which convergence to b does
not occur is zero. Failure to converge is possible but will almost never happen
under this de nition.
Proposition:
Given g : Rk !Rl (k;l 2 N) and any sequence of random k 1 vector bT such
that bT a:s: ! b, where b is k 1, if g is continous at b, then g(bT) a:s: ! g(b).
This results is one of the most important in this Chapter, because consistency
results for many of our estimators follows by simply applying this Proposition.
1.3 Convergence in Probability
A weaker stochastic convergence concept is that of convergence in probability.
De nition:
Let fbTg be a sequence of real-valued random variables. If there exists a real num-
ber b such that for every > 0, such that Pr(s : jbT (s) bj < ) ! 1; as T !1,
then bT converge in probability to b, written as bT p ! b or plim bT = b.
Example:
Let ZT T 1 PTt=1 Zt, where fZtg is a sequence of random variables such that
E(Zt) = , V ar(Zt) = 2 < 1 for all t and Cov(Zt; Z ) = 0 fort 6= . Then
ZT p ! by the Chebyshev weak law of large numbers. See the plot of Hamilton
p.184.
When the plim of a sequence of estimator (such as f ZTg1T=1) is equal to
the true population parameter (in thius case, ), the estimator is said to be
consistent.
Convergence in probabbility is also referred as weak consistency, and since
this has been the most familiar stochastic convergence concept in econometrics,
the word "weak" is often simply dropped.
4
Theorem:
LetfbTgbe a sequence of real-valued random variables. If bT a:s: ! b, then bT p ! b.
Proposition:
Given g : Rk !Rl (k;l 2 N) and any sequence of random k 1 vector bT such
that bT p ! b, where b is k 1, if g is continous at b, then g(bT) p ! g(b).
Example:
If X1T p ! c1 and X2T p ! c2, then (X1T + X2T ) p ! (c1 + c2). This follows im-
mediately, since g(X1T ; X2T ) (X1T +X2T) is a continous function of (X1T ; X2T ).
Example:
Consider an alternative estimator of the mean given by Y T = [1=(T 1)] PTt=1 Yt.
This can be written as c1T YT , where c1T [T=(T 1)] and YT (1=T) PTt=1 Yt.
Under general condition, the sample mean is a consistent estimator of the popu-
lation mean, implying that YT p ! . It is also easy to verify that c1T ! 1. Since
c1T YT is a continous function of c1T and YT , it follows that c1T YT p ! 1 = .
Thus Y T is alos a consistent estimator of .
De nition:
(i). The sequence fbTg is at most of order T in probability, denoted bT = Op(T ),
if for every " > 0 there exist a nite 4" > 0, and N" 2 N such that for all T N",
Prfs : jT bT (s)j > 4"g < ".
(ii). The sequence fbTg is of order smaller than T in probability, denoted
bT = op(T ), if T bT p ! 0:
Lemma (Product rule):
Let AT be l k and let bT be k 1. If AT = op(1) and bT = Op(1), then
AT bT = op(1).
Proof:
Each element of AT bT is the sums of the product of Op(T0)op(T0) = op(T0+0) =
op(1) and therefore is op(1).
5
1.4 Convergence in rth mean
A stronger condition than convecrgence in probability is mean square convergence.
De nition:
Let fbTg be a sequence of real-valued random variables such that for some r > 0,
EjbTjr < 1. If there exists a real number b such that E(jbT bjr) ! 0 as T !1,
then bT converge in the rth mean to b, written as bT r:m: ! b.
The most commonly encountered situation is that of in which r = 2, in which
case convergence is said to occur in quadratic mean, denoted bT q:m: ! b, or con-
vergence in mean square, denoted bT m:s ! b.
Proposition (Generalized Chebyshev inequality):
Let Z be a random variable such that EjZjr < 1; r > 0. Then for every " > 0,
Pr(jZj > ") EjZj
r
"r :
When r = 1 we have Markov’s inequality and when r = 2 we have the familiar
Chebyshev inequality.
Theorem:
If bT r:m: ! b for some r > 0, then bT p ! b.
Proof:
Since E(jbT bjr) ! 0 as T ! 1, E(jbT bjr) < 1 for all T su ciently large.
It follows from the Generalized Chebyshev inequality that, for every " > 0,
Pr(s : jbT (s) bj > ") EjbT bj
r
"r :
Hence Pr(s : jbT (s) bj < ") 1 EjbT bjr"r ! 1 as T ! 1, since bT r:m: ! b. It
follows that bT p ! b.
Without further conditions, no necessary relationship holds between conver-
gence in the rth mean and almost sure convergence.
6
2 Convergence in Distribution
The most fundamental concept is that of convergence in distribution.
De nition:
Let fbTg be a sequence of scalar random variables with cumulative distribution
function fFTg. If FT (z) ! F(z) as T ! 1 for every continuity point z, where
F is the (cumulative) distribution of a random variable Z, then bT converge in
distribution to the random variable Z, written as bT d ! Z.
When bT d ! Z, we also say that bT converges in law to Z, written as
bT L ! Z, or that bT is asymptotically distributed as F, denoted as bT A F.
Then F is called the limiting distribution of bT .
Example:
Let fZtg be i:i:d: random variables with mean and nite variance 2 > 0.
De ne
bT
ZT E( ZT )
(V ar( ZT ))1=2 =
T 1=2 PTt=1(Zt )
=
pT( Z
t )
:
Then by the Lindeberg-Levy central limit theorem, bT A N(0; 1). See the plot of
Hamilton p.185.
The above de nition are unchanged if the scalar bT is replaced with an (k 1)
vector bT. A simple way to verify convergence in distribution of a vector is the
following.
Proposition (Cram er-Wold device):
Let fbTg be a sequence of random k 1 vector and suppose that for every real
k 1 vector (such that 0 = 1 ?), the scalar 0bT A 0z where z is a k 1
vector with joint (cmulative) distribution function F. Then the limitting distri-
bution function of bT exists and equals to F.
Lemma:
If bT L ! Z, then bT = Op(1).
7
Lemma (Product rule):
Recall that if AT = op(1) and bT = Op(1), then AT bT = op(1). Hence, if AT p ! 0
and bT d ! Z, then AT bT p ! 0.
Lemma (Asymptotic equivalence):
Let faTg and fbTg be two sequence of random vectors. If aT bT p ! 0 and
bT d ! Z, then aT d ! Z.
The results is helpful in situation in which we wish to nd the asymptotic
distribution of aT but cannot do so directly. Often, however, it is easy to nd
a bT that has a known asymptotic distribution and that satis es aT bT p ! 0.
This Lemma then ensures that aT has the same limiting distribution as bT and
we say that aT is "asymptotically equivalent" to bT .
Lemma:
Given g : Rk !Rl (k; l 2N) and any sequence of random k 1 vector bT such
that bT L ! z, where z is k 1, if g is continous (not dependent on T) at z, then
g(bT) L ! g(z).
Example:
Suppose that XT L ! N(0; 1) Then the square of XT asymptotically behaves as
the square of a N(0; 1) variables: X2T L ! 2(1).
Lemma:
Let fxTg be a sequence of random (n 1) vector with xT p ! c , and let fyTg
be a sequence of random (n 1) vector with yT L ! y. Then the sequence
constructed from the sum fxT + yTg converges in distribution to c + y and the
sequence constructed from the product fx0TyTg converges in distribution to c0y.
Example:
Let fXTg be a sequence of random (m n) matrixwith XT p ! C , and let
fyTg be a sequence of random (n 1) vector with yT L ! y N( ; ).
Then the limitting distribution of XTyT is the same as that of Cy; that is
XTyT L ! N(C ;C C0):
Lemma (Cramer ):
8
Let fxTg be a sequence of random (n 1) vector such that
T b(xT a) L ! x
for some b > 0. If g(x) is a real-valued function with gradient g0(a)(= @g@x0 x=c),
then
T b(g(xT) g(a)) L ! g0(a)x:
Example:
Let fY1; Y2; :::; YTg be an i:i:d: sample of size T deawn from a distribution with
mean 6= 0 and variance 2. Consider the distribution of the reciprocal of the
sample mean, ST = 1= YT, where YT = (1=T) PTt=1 Yt. We know from the CLT
that pT( YT ) L ! Y , where Y N(0; 2). Also, g(y) = 1=y is continous at
y = . Let g0(u)(= @g=@yjy = ) = ( 1= 2): Then pT[ST (1= )] L ! g0( )Y ;
in other word, pT[ST (1= )] L ! N(0; 2= 4).
9
3 Martingales
Some very useful limit theorems pertain to martingale sequence.
De nition:
Let fXt; t 2 Tg be a stochastic process de ned on (S;F; P( )) and let fFtg be
a sequence of fields Ft F for all t (i.e.fFtg is an increasing sequence of
fields) satisfying the following conditions:
(i). Xt is a random variable relatives to fFtg for all t 2T .
(ii). E(jXtj) < 1 for all t 2T .
(iii). E(XtjFt 1) = Xt 1, for all t 2T .
Then fXt; t 2Tg is said to be a martingale with respect to fFt; t 2Tg.
Example (of increasing sequence of fields):
De ne the function X|"the number of heads", then X(fHHg) = 2, X(fTHg) =
1, X(fHTg) = 1, and X(fTTg) = 0. Further we see that X 1(2) = f(HH)g,
X 1(1) = f(TH); (HT)g and X 1(0) = f(TT)g. In fact, it can be shown that
the field related to the random variables, X, so de ned is
F = fS; ?;f(HH)g;f(TT)g;f(TH); (HT)g;f(HH); (TT)g;
f(HT); (TH); (HH)g;f(HT); (TH); (TT)gg:
We further de ne the function X1|"at least one head", then X1(fHHg) =
X1(fTHg) = X1(fHTg) = 1, and X1(fTTg) = 0. Further we see that X 11 (1) =
f(HH); (TH); (HT)g2 F and X 1(0) = f(TT)g 2 F. In fact, it can be shown
that the field related to the random variables, X1, so de ned is
F1 = fS; ?;f(HH); (TH); (HT)g;f(TT)gg:
Finally we de ne the function X2|"two heads", then X2(fHHg) = 1, X2(fTHg) =
X2(fHTg) = X2(fTTg) = 0. Further we see that X 12 (1) = f(HH)g 2 F,
X 1(0) = f(TH); (HT); (TT)g2 F. In fact, it can be shown that the field
related to the random variables, X2, so de ned is
F2 = fS; ?;f(HH)g;f(HT); (TH); (TT)gg:
10
We see that X = X1 + X2 and nd that F1 F.
The above example is a special case of general result where X1; X2; :::; Xn are
random variables on the same probability space (S;F; P( )) and we de ne the
new random variables
Y1 = X1; Y2 = X1 + X2; Y3 = X1 + X2 + X3; :::; Yn = X1 + X2 + ::: + Xn:
If F1;F2; :::;Fn denote the minimal field generated by Y1; Y2; :::; Yn respec-
tively, we can show that
F1 F2 ::: Fn F;
i.e. F1;F2; :::;Fn form an increasing sequence of field in F.
Several aspects of this de nition need commenting on.
1. A martingale is a relative concept; a stochastic process relative to an increasing
sequence of field. That is, field such that F1 F2 ::: Ft ::: and
each Xt is a random variables relative to Ft; t 2 T . A natural choice for such
field will be Ft = (Xt; Xt 1; :::; X1); t 2T .
2. This stochastic process has constant mean because E(Xt) = E[E(XtjFt 1)] =
E(Xt 1).
3. (3) implies that E(Xt+ jFt 1) = Xt 1 for all t 2 T and 0. That is the
best predictor of Xt+ given the information Ft 1 is Xt 1 for any 0.
The importance of martingales stem from the fact that they are general enough
to include most forms of stochastic process of interest in economic modelling as
special case, and resttrictve enough so as to allow various limit theorem needed
for their statistical analysis to go through, thus making probability models based
on martingale largely operational.
Example:
Let fZt; t 2 Tg be a sequence of independent random variabless such that
E(Zt) = 0 for all t 2T . If we de ne Xt by
Xt =
tX
k=1
Zk;
11
thenfZt; t 2Tgis a martingale withFt = (Zt; Zt 1; :::; Z1) = (Xt; Xt 1; :::; X1).
This is because condition (i) and (ii) are automatically satis ed and we can verify
that
E(XtjFt 1) = E[(Xt 1 + Zt)jFt 1] = Xt 1; t 2T :
Example:
Let fZt; t 2 Tg be an arbitrary stochastic process whose only restriction is
E(jZtj) < 1 for all t 2T . If we de ne Xt by
Xt =
tX
k=1
[Zk E(ZkjFk 1)];
where Fk = (Zk; Zk 1; :::; Z1) = (Xk; Xk 1; :::; X1), then fZt; t 2 Tg is a mar-
tingale. Condition (iii) can verify by
E(XtjFt 1) = E[(Xt 1 + Zt E(ZtjFt 1))jFt 1] = Xt 1 + E(ZtjFt 1) E(ZtjFt 1)
= Xt 1; t 2T :
The above two examples illustrate the exibility of martingales very well. As
we can see, the main di erence between then is that in the rst example, Xt
is a linear function of independent r.v.’s and in the second example as a linear
function of dependent r.v’s centred at their conditional means. A special case of
example is that
Yt = Xt E(XtjFt 1); t 2 T :
It can be easily veri ed thatfYt; t 2Tgde nes what is known as a martingale difference
process relative to Ft because
E(YtjFt 1) = 0 t 2T :
We can further deduce that for t > k
E(YtYk) = E[E(YtYkjFt 1)] (since for t > k; E(YkjFt 1) = Yk) (1)
= E[YkE(YtjFt 1)] (2)
= E[Yk 0] = 0: (3)
12
That is, a martingale di erence fYt; t 2 Tg as an orthogonal sequence. ( A
special case of uncorrelatness, for their means are all zero).
De nition:
A stochastic process fYt; t 2 Tg is said to be a martingale di erence process
relative to the increasing sequence of fields, F1 F2 ::: Ft ::: if
(i). Yt is a random variable relatives to fFtg for all t 2T .
(ii). E(jYtj) < 1 for all t 2T .
(iii). E(YtjFt 1) = 0, for all t 2 T .
Note that condition (iii) is stronger than the conditions that Yt is serially un-
correlated as we can see that if Yt is a martingale di erence then it is uncorrelated
from (3). From the point of forecasting, a serially uncorrelated sequence cannot
be forecast on the basis of a linear function of its past value since the forecast
error and the forecast are all linear functions . No function of past values, linear
or nonlinear, can forecast a martingale di erence sequence. While stronger than
absence of serial correlation, the martingale di erence condition is weaker than
independence, since it doesn’t rule out the possibility that higher moments such
as E(Y 2t jYt 1; Yt 2; :::; Y1) might depend on past Y ’s.
Example:
If "t i:i:d:N(0; 2), then Yt = "t"t 1 is a martingale di erence but not serially
independent since
E(YtjFt 1) = E("t"t 1j"t 1; "t 2; :::; "1) = "t 1E("t) = 0; (martinfale difference)
and
E(Y 2t jFt 1) = E("2t "2t 1j"t 1; "t 2; :::; "1) = "2t 1E("2t ) = "2t 1 2 (a function of past
value; so it is not independent)
Proposition:
Let X and Y be independent random variables and let U = g(X) and V = h(Y ).
Then U and V are also independent random variables.
13
4 Laws of Larger Numbers
In this section we study a familiar consistent estimator from the concept of strong
consistency (which automatically imply weakly consistency, or convergence in
probability).
The result that the sample mean is a consistent estimator of the population
mean is known as the law of large number. The law of large number we consider
are all of the following form.
Proposition:
Given restriction on the dependence, heterogeneity, and moments of a sequence
of random variables (you may think this sequence as a sample of size T) fZtg,
ZT T a:s: ! 0;
where
ZT 1
T
TX
t=1
Zt and T E( ZT ):
As we shall see, there are sometimes trade-o among theses restrictions; for
example, relaxing dependence or heterogeneity restrictions may require strength-
ening moment restriction.
4.1 Independent Identically distributed Observations
The simplest case is that of independent identically distributed (i:i:d:) random
variables.
Theorem (Kolmogorov):
Let fZtg be a sequence of i:i:d: random variables. Then
ZT a:s: !
which implies
ZT p !
14
if and only if EjZtj < 1 and E(Zt) = .
Example:
We may make a stronger assumption that V ar(Zt) = 2, then
E( ZT )2 = (1=T 2)V ar(
TX
t=1
Zt) = (1=T 2)
TX
t=1
V ar(Yt) = 2=T:
Since 2=T ! 0 as T ! 0, the is mean that ZT q:m: ! , implying also ZT p ! .
4.2 Independent Heterogeneously distributed Observations
For cross-sectional data, it is often appropriate to assume that the observation
are independent but not identically distributed. A law of large number useful in
these situation is the following.
Theorem (Revised Markov):
Let fZtg be a sequence of independent random variables such that EjZtj1+ <
4 < 1 for some > 0 and all t. Then
ZT a:s: ! T:
The above theorem impose slightly more in the way of moment restriction but
allows the observations to be rather heterogeneous.
4.3 Dependent Identically Distributed Observations (such
as a strongly stationary process)
The assumption of independence is inappropriate for economic time series, which
typically exhibit considerable dependence. To cover this case, we need laws of
large number that allow the random variables to be dependent. To be state be-
low, we need an additional ’memory restriction’ as we relax the independence
assumption.
15
De nition
Let (S;F; P( )) be a probability space and T an index set of real numbers and
de ne the function X( ; ) by X( ; ) : S T ! R: The order sequence of random
variables fX( ; t); t 2Tg is called a stochastic process.
De nition
A stochastic process fX( ; t); t 2 Tg is said to be (strongly) stationary if any
subset (t1; t2; :::; tT ) of T and any , F(X(t1); :::; X(tT )) = F(X(t1+ ); :::; X(tT +
)).
In terms of the marginal distributions F(X(t)); t 2T , stationary implies that
F(X(t)) = F(X(t+ )), and hence F(X(t1)) = F(X(t2)) = ::: = F(X(tT )). That
is stationarity implies that X(t1); :::; X(tT ) are individual identically distributed.
De nition
Let (S;F; P( )) be a probability space. Let fZtg be a strongly stationary sequence
and let K be the measure-preserving transformation function. Then fZtg is
ergodic if
lim
T!1
T 1
TX
t=1
Pr(F \KtG) = Pr(F)Pr(G);
for all events F; G 2 F, where K is de ned on (S;F; P( )) such that Z1(s) =
Z1(s); Z2(s) = Z1(Ks); Z3(s) = Z1(K2s); :::; ZT(s) = Z1(KT 1s) for all s 2 S.
We can think of KtG as being the event G shifted t periods into the future,
and since Pr(KtG) = Pr(G) when K is measure preserving, this de nition say
that an ergodic process is one such that for any events F and G, F and KtG are
independent on average in the limit. Thus ergodicity can be thought of as a form
of "average asymptotic independence".
Theorem (Ergodic Theorem):
Let fZtg be a stronger stationary ergodic scalar random sequence with EjZtj <
1. Then
ZT a:s: ! E(Zt):
16
Lemma:
A stationary linear process is ergodic.
Example:
Let Xt = P1j=0 ’j"t j; t = 1; 2; :::; "t j; j = 0; 1; ::: are i:i:d: random variables
with E("t j) = 0 and f’j; j 0g is a sequence of square summable real number.
Then Xt is ergodic. (see Wang et al. 2003 p.151)
In this example we see that by relaxing the assumption of Xt to be weakly
stationary (or "t is a white noise sequence), we need the stronger conditions that
f’j; j 0g is a sequence of absolute summable real number to make Xt be er-
godic (See Hamilton, p.52).
4.4 Dependent Heterogeneously Distributed Observations
By replacing the ergodicity assumption with somewhat stronger conditions, we
can apply the consistency results to dependent heterogeneously distributed ob-
servations.
Let Bt1 denote the field generated X1; ; :::; Xt where fXt; t 2 Tg is a
stochastic process. A measure of the dependence among the elements of the
stochastic process can be de ned in terms of the events B 2 Bt 1 and A 2 B1t+
by
( ) = sup
jP(A\B) P(A)P(B)j:
De nition
A stochastic process fXt; t 2Tg is said to be strongly (or ) mixing if ( ) ! 0
as !1.
A stronger form of mixing, called uniformly mixing, can be de ned in terms
of the following measure of dependence:
( ) = sup
jP(AjB) P(A)j; P(B) > 0:
De nition
A stochastic process fXt; t 2Tg is said to be uniformly (or ) mixing if ( ) ! 0
17
as !1.
The notation of mixing is a stronger memory requirement than that of ergod-
icity for stationary sequences, since given stationarity, mixing implies ergodicity.
Proposition:
Let fZtg be a stationary sequence. If ( ) ! 0 as ! 0, then fZtg is ergodic.
De nition
Let a 2R.
(i). If ( ) = O( a ") for some " > 0, then is of size a.
(ii). If ( ) = O( a ") for some " > 0, then is of size a.
This de nition allows precise statements about memory of a random sequence
that we shall relate to moment condition expressed in terms of a. As a get smaller,
the sequence exhibits more and more dependence.
Theorem (Revised McLeish):
Let fZtg be a random sequence with
(i) EjZtjr+ < 4 < 1 for some > 0 and all t, and
(ii) fZtg is -mixing with of size r=(r 1); r > 1, or is a -mixing with of
size r=(2r 1); r 1. Then
ZT a:s: ! T:
For sequences with longer memories, r is greater (r=(r 1) = 1+1=(r 1) = a),
and the moment restrictions increase accordingly. Hence we have a clear trade-o
between the amount of allowable dependence and the su cient moment restric-
tions.
4.5 Asymptotic Uncorrelated Observations (such as a weakly
stationary ARMA process)
Although mixing is an appealing dependence concept, it shares with ergodicity
the properties that it can be somewhat di cult to verify theoretically and is im-
possible to verify empirically. An alternative dependence concept that is easier
18
to verify theoretically is a form of asymptotic non-correlation.
Theorem : Let fZtg is an asymptotically uncorrelated scalar sequence with
means t E(Zt) and 2t var(Zt) < 1. Then
ZT a:s: ! T:
Compared with last Theorem, we have relaxed the dependence restriction
from asymptotic independence (mixing) to asymptotic uncorrelation, but we
have altered the moment requirements from restrictions on moments of order
r + (r 1; > 0) to second moments.
Example (Law of large numbers for a covariance-stationary process):
Let (Y1; Y2; :::; YT) represent a sample of size T from a covariance-stationary pro-
cess with
E(Yt) = ; for all t
E(Yt )(Yt j ) = j; for all t
1X
j=0
j jj < 1:
Then
YT q:m ! :
Proof:
19
To see this, it su ces to show that E( YT )2 ! 0. Since
E( YT )2
= E
"
(1=T)
TX
t=1
(Yt )2
#2
= (1=T 2)Ef(Y1 )[(Y1 ) + (Y2 ) + ::: + (YT )]
+(Y2 )[(Y1 ) + (Y2 ) + ::: + (YT )]
+(Y3 )[(Y1 ) + (Y2 ) + ::: + (YT )]
+::: + (YT )[(Y1 ) + (Y2 ) + ::: + (YT )]g
= (1=T 2)f[ 0 + 1 + 2 + 3 + ::: + T 1]
+[ 1 + 0 + 1 + 2 + ::: + T 2]
+[ 2 + 1 + 0 + 1 + ::: + T 3]
+::: + [ T 1 + T 2 + T 3 + ::: + 0]g
= (1=T 2)fT 0 + 2(T 1) 1 + 2(T 2) 2 + ::: + 2 T 1g
= (1=T)f 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)g
= (1=T)j 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)j;
then
T E( YT )2 = j 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)j (4)
fj 0j+ [(T 1)=T] 2j 1j+ [(T 2)=T] 2j 2j+ ::: + [1=T] 2j T 1jg
fj 0j+ 2j 1j+ 2j 2j+ :::g
< 1:
So, E( YT )2 ! 0.
4.6 Martingale Di erence Sequences
A law of large numbers for martingale di erence sequence is the following theorem.
Theorem (Revised Chow):
Let fZt;Ftg be a martingale di erence sequence such that EjZtj2r < 4 < 1 for
20
some r 1 and all t. Then
ZT a:s: ! 0:
21
5 Central Limit Theory
In this section we study various form of central limit theorem (CLT) from the
concept of convergence in distribution.
The central limit theorem we consider are all of the following form:
Proposition
Given restriction on the dependence, heterogeneity, and moments of a sequence
of random variables (you may think this sequence as a sample of size T) fZtg,
( ZT T)
( T =pT) =
pT( Z
T T )
T
L ! N(0; 1);
where
ZT
TX
t=1
Zt; T E( ZT ); and 2T =T var( ZT) (that is 2T = var(
PT
t=1 Zt)
T ):
As with the law of large numbers, there are natural trade-o among theses re-
strictions. Typically, greater dependence or heterogeneity restrictions is allowed
at the expanse of requiring strengthening moment restriction.
5.1 Independent Identically distributed Observations
As with laws of large numbers, the case of i:i:d: observations is the simplest.
Theorem 1(Linderberg-L evy).
Let fZtg be a sequence of i:i:d: random scalars, with E(Zt) and 2
var(Zt) < 1. If 2 6= 0, then
pT( Z
T T )
T =
pT( Z
T )
= T
1=2 PT
t=1(Zt )
L ! N(0; 1):
Compared with the law of large number for i:i:d: observations, we impose a
single additional requirement, i.e., that 2 var(Zt) < 1. Note that this implies
that EjZtj < 1.
22
Proposition:
If the kth moment of a random variable exists, all moments of order less than k
exist.
Proof:
Let fX(x) be the pdf of X. E(Xk) exists if and only if
Z 1
1
jxjk fX(x)dx < 1:
Let 1 j < k, to prove the theorem we must show that
Z 1
1
jxjj fX(x)dx < 1:
But
Z 1
1
jxjj fX(x)dx =
Z jxj 1
1
jxjj fX(x)dx +
Z 1
jxj>1
jxjj fX(x)dx
Z jxj 1
1
fX(x)dx +
Z 1
jxj>1
jxjj fX(x)dx
1 +
Z 1
jxj>1
jxjj fX(x)dx
1 +
Z 1
jxj>1
jxjk fX(x)dx < 1:
5.2 Independent Heterogeneously Distributed Observa-
tions
Several di erent central limit theorems are available for the case in which our
observations are not identically distributed.
Theorem (Liapounov, revised Lindeberg-Feller)
LetfZtgbe a sequence of independent random variables such that t E(Zt); 2t
var(Zt) and EjZt tj2+ < 4 < 1 for some > 0 and all t. If 2T > 0 > 0 for
all T su ciently large, then
pT( Z
T T)
T
L ! N(0; 1):
23
Note that EjZtj2+ < 4 also implies that EjZt tj2+ is uniformly bounded.
Note also the analogy with previous results there we obtained a law of large num-
bers for independent random variables by imposing a uniform bound on EjZtj1+
Now we can obtain a central limit theorem imposing a uniform bound on EjZtj2+ .
5.3 Dependent Identically Distributed Observations
In the last two section we saw that obtaining central limit theorems for indepen-
dent process typically required strengthening the moments restrictions beyond
what was su cient for obtaining laws of large numbers. In the class of stationary
ergodic process, not only will we strengthen the moment requirements, but we
will also impose stronger conditions on the memory of the process.
Theorem (Scott):
Let fZt;Ftg be a stationary ergodic adapted mixingale with m of size 1.
Then 2T var(T 1=2 PTt=1 Zt) ! 2 < 1 as T ! 1 and if 2 > 0, then
T 1=2 ZT = L ! N(0; 1).
5.4 Dependent Heterogeneously distributed Observations
Theorem (Wooldridge-White):
Let fZtg be a scalar random sequence with t E(Zt) and 2t var(Zt) such
that EjZtjr < 4 < 1 for some r 2 for all t and having mixing coe cients of
size r=2(r 1) or of size r=(r 2); r > 2. If 2T var(T 1=2 PTt=1 Zt) > > 0
for all T su ciently large, then pT( ZT T)= T L ! N(0; 1).
5.5 Asymptotic Uncorrelated Observations (such as a sta-
tionary ARMA process)
We now present a central limit theorem for a serial correlated sequence.
24
Theorem:
Let
Yt = +
1X
j=0
’j"t j;
where f"tgis a sequence of i:i:d: random variables with E("2t ) < 1and P1j=0 j’jj <
1. Then
pT( Y
T )
L ! N(0;
1X
j= 1
j):
Proof:
Given this theorem, it su ces to shows that 2T (= T var( YT) = [var(PTt=1 Yt)]=T) =P
1
j= 1 j) from the general form of CLT.
Note that the assumption P1j=0 j’jj < 1 implies that P1j=0 j jj < 1 and
means that for any > 0 there exist a q such that
2j q+1j+ 2j q+2j+ 2j q+3j+ ::::: < =2:
From (4) we have
1X
j= 1
j T var( YT)
= jf 0 + 2 1 + 2 2 + 2 3 + :::g
f 0 + [(T 1)=T]2 1 + [(T 2)=T]2 2 + ::: + [1=T]2 T 1gj
(1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ :::
+(q=T) 2j qj+ 2j q+1j+ 2j q+2j+ 2j q+3j+ :::
(1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ :::
+(q=T) 2j qj+ =2:
Moreover, for this given q, we can nd an N such that
(1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ ::: + (q=T) 2j qj < =2
for all T N, ensuring that
1X
j= 1
j T var( YT)
< :
25
This completes the proof.
5.6 Martingale Di erence Sequences
Theorem:
Let fYtg be a scalar martingale di erence sequence with YT = (1=T) PTt=1 Yt.
Suppose that
(i). E(Y 2t ) = 2t > 0 with (1=T) PTt=1 2t ! 2 > 0,
(ii). EjYtjr < 1 for some r > 2 and all t, and
(iii). (1=T) PTt=1 Y 2t p ! 2,
then pT YT L ! N(0; 2).
26