Ch. 4 Asymptotic Theory From the discussion of last Chapter it is obvious that determining the dis- tribution of h(X1; X2; :::; XT ) is by no means a trival exercise. It turns out that more often than not we cannot determine the distribution exactly. Because of the importance of the problem, however, we are forced to develop approximations; the subject of this Chapter. This Chaper will cover the limit theorem. The terms ’limit theorems’ refers to several theorems in probability theory under the generic names, ’laws of large numbers’ (LLN) and ’central limit theorem’ (CLT). These limit theorem consis- tute one of the most important and elegent chapters of probability theory and play a crucial roal in statistical inferences. 1 Consistency In this section we introduce the concepts needed to analyzed the behaviors of a random variable indexed by the size of a sample, say ^ T , as T !1. 1.1 Limits De nition: Let fbTgT1 , or just fbTg be a sequence of real numbers. If there exists a real num- ber b and if for every > 0 there exist an integer N( ) such that for all T N, jbT bj < , then b is the limit of the sequence fbTg. In this de nition the constant can take on any real value, but it is the very small values of that provide the de nition with its impact. By choosing a very small , we ensure that bT gets arbitrarily close to its limit b for all T that are suf- ciently large. When a limit exists, we say that the sequence fbTg converges to b as T tends to in nity, written as bT ! b as T !1. We also write b = limT!1 bT . When no ambiguity is possible, we simply write bT ! b or b = lim bT . Example: Let aT = 2 T ( 1)T 2T : 1 Here 1 = limT!1 aT , for jaT 1j = 2T ( 1)T 2T 1 = 1 2T : Since by binomial theorem we have 2T = (1 + 1)T = 1 + T + T(T + 1)2 +1 > T: Hence, if we choose N = 1= or large, we have, for T > N, jaT 1j = 12T < 1T < 1N : This complete the solution. The concept of a limit extends directly to sequences of real vectors. Let bT be a k 1 vector with real elements bTi, i = 1; :::; k. If bTi ! bi; i = 1; :::; k; then bT ! b, where b has elements bi; i = 1; :::; k. An analogous extensions applies to matrices. De nition: Given g : Rk !Rl (k; l 2 N) and b 2 Rk. The function g is continous at b if for any sequence fbTg such that bT ! b, g(bT) ! g(b). The following de nition compares the behavior of a sequence fbTg with the behavior of a power of T, say T , where is chosen so that fbTg and fT g behave similarly. De nition: (i). The sequence fbTg is at most of order T , denoted bT = O(T ), if for some nite real number 4 > 0, there existes a nite interger N such that for all T N, jT bTj < 4: (ii). The sequence fbTg is of order smaller than T , denoted bT = o(T ), if for every real number > 0, there existes a nite interger N( ) such that for all T N( ), jT bTj < ; i:e:; T bT ! 0: As we have de ned these notations, bT = O(T ), if fT bTg is eventually bounded, whereas bT = o(T ) if T bT ! 0. Obviously, if bT = o(T ), then 2 bT = O(T ). Furture, if bT = O(T ), then for every > 0, bT = o(T + ). When bT = O(T 0), it is simply (eventually) bounded and may or may not have a limit. We often write O(1) in place of O(T 0). Similarly, bT = o(1) means bT ! 0. If each element of a vector or matrix is O(T ) or o(T ), then that vector or matrix is O(T ) or o(T ). Proposition: Let aT and bT be scalar. (i). If aT = O(T ) and bT = O(T ), then aT bT = O(T + ) and aT + bT = O(T ), where = max[ ; ]. (ii). If aT = o(T ) and bT = o(T ), then aT bT = o(T + ) and aT + bT = o(T ), where = max[ ; ]. (iii). If aT = O(T ) and bT = o(T ), then aT bT = o(T + ) and aT + bT = O(T ), where = max[ ; ]. 1.2 Almost Sure Convergence The stochastic convergence concept most closely related to the limit notations previously discussed is that of almost sure convergence. Recall our discussing a real-valued random variables bT , we are in fact talking a mapping bT : S ! R. we let s be a typical element of sample space S, and call the real number bT (s) a realization of the random variables. Interest will often center on average such as bT ( ) = T 1 TX t=1 Zt( ): De nition: Let fbT ( )g be a sequence of real-valued random variables. We say that bT ( ) converges almost surely to b, written bT ( ) a:s: ! b if there exists a real number b such that Prfs : bT (s) ! bg = 1. When no ambiguity is possible, we may simply write bT a:s: ! b. A sequence bT converges almost surely if the probability of obtaining a realiza- tion of the sequence fZtg for which convergence to b occurs is unity. Equivalently, 3 the probability of observing a realization of fZtg for which convergence to b does not occur is zero. Failure to converge is possible but will almost never happen under this de nition. Proposition: Given g : Rk !Rl (k;l 2 N) and any sequence of random k 1 vector bT such that bT a:s: ! b, where b is k 1, if g is continous at b, then g(bT) a:s: ! g(b). This results is one of the most important in this Chapter, because consistency results for many of our estimators follows by simply applying this Proposition. 1.3 Convergence in Probability A weaker stochastic convergence concept is that of convergence in probability. De nition: Let fbTg be a sequence of real-valued random variables. If there exists a real num- ber b such that for every > 0, such that Pr(s : jbT (s) bj < ) ! 1; as T !1, then bT converge in probability to b, written as bT p ! b or plim bT = b. Example: Let ZT T 1 PTt=1 Zt, where fZtg is a sequence of random variables such that E(Zt) = , V ar(Zt) = 2 < 1 for all t and Cov(Zt; Z ) = 0 fort 6= . Then ZT p ! by the Chebyshev weak law of large numbers. See the plot of Hamilton p.184. When the plim of a sequence of estimator (such as f ZTg1T=1) is equal to the true population parameter (in thius case, ), the estimator is said to be consistent. Convergence in probabbility is also referred as weak consistency, and since this has been the most familiar stochastic convergence concept in econometrics, the word "weak" is often simply dropped. 4 Theorem: LetfbTgbe a sequence of real-valued random variables. If bT a:s: ! b, then bT p ! b. Proposition: Given g : Rk !Rl (k;l 2 N) and any sequence of random k 1 vector bT such that bT p ! b, where b is k 1, if g is continous at b, then g(bT) p ! g(b). Example: If X1T p ! c1 and X2T p ! c2, then (X1T + X2T ) p ! (c1 + c2). This follows im- mediately, since g(X1T ; X2T ) (X1T +X2T) is a continous function of (X1T ; X2T ). Example: Consider an alternative estimator of the mean given by Y T = [1=(T 1)] PTt=1 Yt. This can be written as c1T YT , where c1T [T=(T 1)] and YT (1=T) PTt=1 Yt. Under general condition, the sample mean is a consistent estimator of the popu- lation mean, implying that YT p ! . It is also easy to verify that c1T ! 1. Since c1T YT is a continous function of c1T and YT , it follows that c1T YT p ! 1 = . Thus Y T is alos a consistent estimator of . De nition: (i). The sequence fbTg is at most of order T in probability, denoted bT = Op(T ), if for every " > 0 there exist a nite 4" > 0, and N" 2 N such that for all T N", Prfs : jT bT (s)j > 4"g < ". (ii). The sequence fbTg is of order smaller than T in probability, denoted bT = op(T ), if T bT p ! 0: Lemma (Product rule): Let AT be l k and let bT be k 1. If AT = op(1) and bT = Op(1), then AT bT = op(1). Proof: Each element of AT bT is the sums of the product of Op(T0)op(T0) = op(T0+0) = op(1) and therefore is op(1). 5 1.4 Convergence in rth mean A stronger condition than convecrgence in probability is mean square convergence. De nition: Let fbTg be a sequence of real-valued random variables such that for some r > 0, EjbTjr < 1. If there exists a real number b such that E(jbT bjr) ! 0 as T !1, then bT converge in the rth mean to b, written as bT r:m: ! b. The most commonly encountered situation is that of in which r = 2, in which case convergence is said to occur in quadratic mean, denoted bT q:m: ! b, or con- vergence in mean square, denoted bT m:s ! b. Proposition (Generalized Chebyshev inequality): Let Z be a random variable such that EjZjr < 1; r > 0. Then for every " > 0, Pr(jZj > ") EjZj r "r : When r = 1 we have Markov’s inequality and when r = 2 we have the familiar Chebyshev inequality. Theorem: If bT r:m: ! b for some r > 0, then bT p ! b. Proof: Since E(jbT bjr) ! 0 as T ! 1, E(jbT bjr) < 1 for all T su ciently large. It follows from the Generalized Chebyshev inequality that, for every " > 0, Pr(s : jbT (s) bj > ") EjbT bj r "r : Hence Pr(s : jbT (s) bj < ") 1 EjbT bjr"r ! 1 as T ! 1, since bT r:m: ! b. It follows that bT p ! b. Without further conditions, no necessary relationship holds between conver- gence in the rth mean and almost sure convergence. 6 2 Convergence in Distribution The most fundamental concept is that of convergence in distribution. De nition: Let fbTg be a sequence of scalar random variables with cumulative distribution function fFTg. If FT (z) ! F(z) as T ! 1 for every continuity point z, where F is the (cumulative) distribution of a random variable Z, then bT converge in distribution to the random variable Z, written as bT d ! Z. When bT d ! Z, we also say that bT converges in law to Z, written as bT L ! Z, or that bT is asymptotically distributed as F, denoted as bT A F. Then F is called the limiting distribution of bT . Example: Let fZtg be i:i:d: random variables with mean and nite variance 2 > 0. De ne bT ZT E( ZT ) (V ar( ZT ))1=2 = T 1=2 PTt=1(Zt ) = pT( Z t ) : Then by the Lindeberg-Levy central limit theorem, bT A N(0; 1). See the plot of Hamilton p.185. The above de nition are unchanged if the scalar bT is replaced with an (k 1) vector bT. A simple way to verify convergence in distribution of a vector is the following. Proposition (Cram er-Wold device): Let fbTg be a sequence of random k 1 vector and suppose that for every real k 1 vector (such that 0 = 1 ?), the scalar 0bT A 0z where z is a k 1 vector with joint (cmulative) distribution function F. Then the limitting distri- bution function of bT exists and equals to F. Lemma: If bT L ! Z, then bT = Op(1). 7 Lemma (Product rule): Recall that if AT = op(1) and bT = Op(1), then AT bT = op(1). Hence, if AT p ! 0 and bT d ! Z, then AT bT p ! 0. Lemma (Asymptotic equivalence): Let faTg and fbTg be two sequence of random vectors. If aT bT p ! 0 and bT d ! Z, then aT d ! Z. The results is helpful in situation in which we wish to nd the asymptotic distribution of aT but cannot do so directly. Often, however, it is easy to nd a bT that has a known asymptotic distribution and that satis es aT bT p ! 0. This Lemma then ensures that aT has the same limiting distribution as bT and we say that aT is "asymptotically equivalent" to bT . Lemma: Given g : Rk !Rl (k; l 2N) and any sequence of random k 1 vector bT such that bT L ! z, where z is k 1, if g is continous (not dependent on T) at z, then g(bT) L ! g(z). Example: Suppose that XT L ! N(0; 1) Then the square of XT asymptotically behaves as the square of a N(0; 1) variables: X2T L ! 2(1). Lemma: Let fxTg be a sequence of random (n 1) vector with xT p ! c , and let fyTg be a sequence of random (n 1) vector with yT L ! y. Then the sequence constructed from the sum fxT + yTg converges in distribution to c + y and the sequence constructed from the product fx0TyTg converges in distribution to c0y. Example: Let fXTg be a sequence of random (m n) matrixwith XT p ! C , and let fyTg be a sequence of random (n 1) vector with yT L ! y N( ; ). Then the limitting distribution of XTyT is the same as that of Cy; that is XTyT L ! N(C ;C C0): Lemma (Cramer ): 8 Let fxTg be a sequence of random (n 1) vector such that T b(xT a) L ! x for some b > 0. If g(x) is a real-valued function with gradient g0(a)(= @g@x0 x=c), then T b(g(xT) g(a)) L ! g0(a)x: Example: Let fY1; Y2; :::; YTg be an i:i:d: sample of size T deawn from a distribution with mean 6= 0 and variance 2. Consider the distribution of the reciprocal of the sample mean, ST = 1= YT, where YT = (1=T) PTt=1 Yt. We know from the CLT that pT( YT ) L ! Y , where Y N(0; 2). Also, g(y) = 1=y is continous at y = . Let g0(u)(= @g=@yjy = ) = ( 1= 2): Then pT[ST (1= )] L ! g0( )Y ; in other word, pT[ST (1= )] L ! N(0; 2= 4). 9 3 Martingales Some very useful limit theorems pertain to martingale sequence. De nition: Let fXt; t 2 Tg be a stochastic process de ned on (S;F; P( )) and let fFtg be a sequence of fields Ft F for all t (i.e.fFtg is an increasing sequence of fields) satisfying the following conditions: (i). Xt is a random variable relatives to fFtg for all t 2T . (ii). E(jXtj) < 1 for all t 2T . (iii). E(XtjFt 1) = Xt 1, for all t 2T . Then fXt; t 2Tg is said to be a martingale with respect to fFt; t 2Tg. Example (of increasing sequence of fields): De ne the function X|"the number of heads", then X(fHHg) = 2, X(fTHg) = 1, X(fHTg) = 1, and X(fTTg) = 0. Further we see that X 1(2) = f(HH)g, X 1(1) = f(TH); (HT)g and X 1(0) = f(TT)g. In fact, it can be shown that the field related to the random variables, X, so de ned is F = fS; ?;f(HH)g;f(TT)g;f(TH); (HT)g;f(HH); (TT)g; f(HT); (TH); (HH)g;f(HT); (TH); (TT)gg: We further de ne the function X1|"at least one head", then X1(fHHg) = X1(fTHg) = X1(fHTg) = 1, and X1(fTTg) = 0. Further we see that X 11 (1) = f(HH); (TH); (HT)g2 F and X 1(0) = f(TT)g 2 F. In fact, it can be shown that the field related to the random variables, X1, so de ned is F1 = fS; ?;f(HH); (TH); (HT)g;f(TT)gg: Finally we de ne the function X2|"two heads", then X2(fHHg) = 1, X2(fTHg) = X2(fHTg) = X2(fTTg) = 0. Further we see that X 12 (1) = f(HH)g 2 F, X 1(0) = f(TH); (HT); (TT)g2 F. In fact, it can be shown that the field related to the random variables, X2, so de ned is F2 = fS; ?;f(HH)g;f(HT); (TH); (TT)gg: 10 We see that X = X1 + X2 and nd that F1 F. The above example is a special case of general result where X1; X2; :::; Xn are random variables on the same probability space (S;F; P( )) and we de ne the new random variables Y1 = X1; Y2 = X1 + X2; Y3 = X1 + X2 + X3; :::; Yn = X1 + X2 + ::: + Xn: If F1;F2; :::;Fn denote the minimal field generated by Y1; Y2; :::; Yn respec- tively, we can show that F1 F2 ::: Fn F; i.e. F1;F2; :::;Fn form an increasing sequence of field in F. Several aspects of this de nition need commenting on. 1. A martingale is a relative concept; a stochastic process relative to an increasing sequence of field. That is, field such that F1 F2 ::: Ft ::: and each Xt is a random variables relative to Ft; t 2 T . A natural choice for such field will be Ft = (Xt; Xt 1; :::; X1); t 2T . 2. This stochastic process has constant mean because E(Xt) = E[E(XtjFt 1)] = E(Xt 1). 3. (3) implies that E(Xt+ jFt 1) = Xt 1 for all t 2 T and 0. That is the best predictor of Xt+ given the information Ft 1 is Xt 1 for any 0. The importance of martingales stem from the fact that they are general enough to include most forms of stochastic process of interest in economic modelling as special case, and resttrictve enough so as to allow various limit theorem needed for their statistical analysis to go through, thus making probability models based on martingale largely operational. Example: Let fZt; t 2 Tg be a sequence of independent random variabless such that E(Zt) = 0 for all t 2T . If we de ne Xt by Xt = tX k=1 Zk; 11 thenfZt; t 2Tgis a martingale withFt = (Zt; Zt 1; :::; Z1) = (Xt; Xt 1; :::; X1). This is because condition (i) and (ii) are automatically satis ed and we can verify that E(XtjFt 1) = E[(Xt 1 + Zt)jFt 1] = Xt 1; t 2T : Example: Let fZt; t 2 Tg be an arbitrary stochastic process whose only restriction is E(jZtj) < 1 for all t 2T . If we de ne Xt by Xt = tX k=1 [Zk E(ZkjFk 1)]; where Fk = (Zk; Zk 1; :::; Z1) = (Xk; Xk 1; :::; X1), then fZt; t 2 Tg is a mar- tingale. Condition (iii) can verify by E(XtjFt 1) = E[(Xt 1 + Zt E(ZtjFt 1))jFt 1] = Xt 1 + E(ZtjFt 1) E(ZtjFt 1) = Xt 1; t 2T : The above two examples illustrate the exibility of martingales very well. As we can see, the main di erence between then is that in the rst example, Xt is a linear function of independent r.v.’s and in the second example as a linear function of dependent r.v’s centred at their conditional means. A special case of example is that Yt = Xt E(XtjFt 1); t 2 T : It can be easily veri ed thatfYt; t 2Tgde nes what is known as a martingale difference process relative to Ft because E(YtjFt 1) = 0 t 2T : We can further deduce that for t > k E(YtYk) = E[E(YtYkjFt 1)] (since for t > k; E(YkjFt 1) = Yk) (1) = E[YkE(YtjFt 1)] (2) = E[Yk 0] = 0: (3) 12 That is, a martingale di erence fYt; t 2 Tg as an orthogonal sequence. ( A special case of uncorrelatness, for their means are all zero). De nition: A stochastic process fYt; t 2 Tg is said to be a martingale di erence process relative to the increasing sequence of fields, F1 F2 ::: Ft ::: if (i). Yt is a random variable relatives to fFtg for all t 2T . (ii). E(jYtj) < 1 for all t 2T . (iii). E(YtjFt 1) = 0, for all t 2 T . Note that condition (iii) is stronger than the conditions that Yt is serially un- correlated as we can see that if Yt is a martingale di erence then it is uncorrelated from (3). From the point of forecasting, a serially uncorrelated sequence cannot be forecast on the basis of a linear function of its past value since the forecast error and the forecast are all linear functions . No function of past values, linear or nonlinear, can forecast a martingale di erence sequence. While stronger than absence of serial correlation, the martingale di erence condition is weaker than independence, since it doesn’t rule out the possibility that higher moments such as E(Y 2t jYt 1; Yt 2; :::; Y1) might depend on past Y ’s. Example: If "t i:i:d:N(0; 2), then Yt = "t"t 1 is a martingale di erence but not serially independent since E(YtjFt 1) = E("t"t 1j"t 1; "t 2; :::; "1) = "t 1E("t) = 0; (martinfale difference) and E(Y 2t jFt 1) = E("2t "2t 1j"t 1; "t 2; :::; "1) = "2t 1E("2t ) = "2t 1 2 (a function of past value; so it is not independent) Proposition: Let X and Y be independent random variables and let U = g(X) and V = h(Y ). Then U and V are also independent random variables. 13 4 Laws of Larger Numbers In this section we study a familiar consistent estimator from the concept of strong consistency (which automatically imply weakly consistency, or convergence in probability). The result that the sample mean is a consistent estimator of the population mean is known as the law of large number. The law of large number we consider are all of the following form. Proposition: Given restriction on the dependence, heterogeneity, and moments of a sequence of random variables (you may think this sequence as a sample of size T) fZtg, ZT T a:s: ! 0; where ZT 1 T TX t=1 Zt and T E( ZT ): As we shall see, there are sometimes trade-o among theses restrictions; for example, relaxing dependence or heterogeneity restrictions may require strength- ening moment restriction. 4.1 Independent Identically distributed Observations The simplest case is that of independent identically distributed (i:i:d:) random variables. Theorem (Kolmogorov): Let fZtg be a sequence of i:i:d: random variables. Then ZT a:s: ! which implies ZT p ! 14 if and only if EjZtj < 1 and E(Zt) = . Example: We may make a stronger assumption that V ar(Zt) = 2, then E( ZT )2 = (1=T 2)V ar( TX t=1 Zt) = (1=T 2) TX t=1 V ar(Yt) = 2=T: Since 2=T ! 0 as T ! 0, the is mean that ZT q:m: ! , implying also ZT p ! . 4.2 Independent Heterogeneously distributed Observations For cross-sectional data, it is often appropriate to assume that the observation are independent but not identically distributed. A law of large number useful in these situation is the following. Theorem (Revised Markov): Let fZtg be a sequence of independent random variables such that EjZtj1+ < 4 < 1 for some > 0 and all t. Then ZT a:s: ! T: The above theorem impose slightly more in the way of moment restriction but allows the observations to be rather heterogeneous. 4.3 Dependent Identically Distributed Observations (such as a strongly stationary process) The assumption of independence is inappropriate for economic time series, which typically exhibit considerable dependence. To cover this case, we need laws of large number that allow the random variables to be dependent. To be state be- low, we need an additional ’memory restriction’ as we relax the independence assumption. 15 De nition Let (S;F; P( )) be a probability space and T an index set of real numbers and de ne the function X( ; ) by X( ; ) : S T ! R: The order sequence of random variables fX( ; t); t 2Tg is called a stochastic process. De nition A stochastic process fX( ; t); t 2 Tg is said to be (strongly) stationary if any subset (t1; t2; :::; tT ) of T and any , F(X(t1); :::; X(tT )) = F(X(t1+ ); :::; X(tT + )). In terms of the marginal distributions F(X(t)); t 2T , stationary implies that F(X(t)) = F(X(t+ )), and hence F(X(t1)) = F(X(t2)) = ::: = F(X(tT )). That is stationarity implies that X(t1); :::; X(tT ) are individual identically distributed. De nition Let (S;F; P( )) be a probability space. Let fZtg be a strongly stationary sequence and let K be the measure-preserving transformation function. Then fZtg is ergodic if lim T!1 T 1 TX t=1 Pr(F \KtG) = Pr(F)Pr(G); for all events F; G 2 F, where K is de ned on (S;F; P( )) such that Z1(s) = Z1(s); Z2(s) = Z1(Ks); Z3(s) = Z1(K2s); :::; ZT(s) = Z1(KT 1s) for all s 2 S. We can think of KtG as being the event G shifted t periods into the future, and since Pr(KtG) = Pr(G) when K is measure preserving, this de nition say that an ergodic process is one such that for any events F and G, F and KtG are independent on average in the limit. Thus ergodicity can be thought of as a form of "average asymptotic independence". Theorem (Ergodic Theorem): Let fZtg be a stronger stationary ergodic scalar random sequence with EjZtj < 1. Then ZT a:s: ! E(Zt): 16 Lemma: A stationary linear process is ergodic. Example: Let Xt = P1j=0 ’j"t j; t = 1; 2; :::; "t j; j = 0; 1; ::: are i:i:d: random variables with E("t j) = 0 and f’j; j 0g is a sequence of square summable real number. Then Xt is ergodic. (see Wang et al. 2003 p.151) In this example we see that by relaxing the assumption of Xt to be weakly stationary (or "t is a white noise sequence), we need the stronger conditions that f’j; j 0g is a sequence of absolute summable real number to make Xt be er- godic (See Hamilton, p.52). 4.4 Dependent Heterogeneously Distributed Observations By replacing the ergodicity assumption with somewhat stronger conditions, we can apply the consistency results to dependent heterogeneously distributed ob- servations. Let Bt1 denote the field generated X1; ; :::; Xt where fXt; t 2 Tg is a stochastic process. A measure of the dependence among the elements of the stochastic process can be de ned in terms of the events B 2 Bt 1 and A 2 B1t+ by ( ) = sup jP(A\B) P(A)P(B)j: De nition A stochastic process fXt; t 2Tg is said to be strongly (or ) mixing if ( ) ! 0 as !1. A stronger form of mixing, called uniformly mixing, can be de ned in terms of the following measure of dependence: ( ) = sup jP(AjB) P(A)j; P(B) > 0: De nition A stochastic process fXt; t 2Tg is said to be uniformly (or ) mixing if ( ) ! 0 17 as !1. The notation of mixing is a stronger memory requirement than that of ergod- icity for stationary sequences, since given stationarity, mixing implies ergodicity. Proposition: Let fZtg be a stationary sequence. If ( ) ! 0 as ! 0, then fZtg is ergodic. De nition Let a 2R. (i). If ( ) = O( a ") for some " > 0, then is of size a. (ii). If ( ) = O( a ") for some " > 0, then is of size a. This de nition allows precise statements about memory of a random sequence that we shall relate to moment condition expressed in terms of a. As a get smaller, the sequence exhibits more and more dependence. Theorem (Revised McLeish): Let fZtg be a random sequence with (i) EjZtjr+ < 4 < 1 for some > 0 and all t, and (ii) fZtg is -mixing with of size r=(r 1); r > 1, or is a -mixing with of size r=(2r 1); r 1. Then ZT a:s: ! T: For sequences with longer memories, r is greater (r=(r 1) = 1+1=(r 1) = a), and the moment restrictions increase accordingly. Hence we have a clear trade-o between the amount of allowable dependence and the su cient moment restric- tions. 4.5 Asymptotic Uncorrelated Observations (such as a weakly stationary ARMA process) Although mixing is an appealing dependence concept, it shares with ergodicity the properties that it can be somewhat di cult to verify theoretically and is im- possible to verify empirically. An alternative dependence concept that is easier 18 to verify theoretically is a form of asymptotic non-correlation. Theorem : Let fZtg is an asymptotically uncorrelated scalar sequence with means t E(Zt) and 2t var(Zt) < 1. Then ZT a:s: ! T: Compared with last Theorem, we have relaxed the dependence restriction from asymptotic independence (mixing) to asymptotic uncorrelation, but we have altered the moment requirements from restrictions on moments of order r + (r 1; > 0) to second moments. Example (Law of large numbers for a covariance-stationary process): Let (Y1; Y2; :::; YT) represent a sample of size T from a covariance-stationary pro- cess with E(Yt) = ; for all t E(Yt )(Yt j ) = j; for all t 1X j=0 j jj < 1: Then YT q:m ! : Proof: 19 To see this, it su ces to show that E( YT )2 ! 0. Since E( YT )2 = E " (1=T) TX t=1 (Yt )2 #2 = (1=T 2)Ef(Y1 )[(Y1 ) + (Y2 ) + ::: + (YT )] +(Y2 )[(Y1 ) + (Y2 ) + ::: + (YT )] +(Y3 )[(Y1 ) + (Y2 ) + ::: + (YT )] +::: + (YT )[(Y1 ) + (Y2 ) + ::: + (YT )]g = (1=T 2)f[ 0 + 1 + 2 + 3 + ::: + T 1] +[ 1 + 0 + 1 + 2 + ::: + T 2] +[ 2 + 1 + 0 + 1 + ::: + T 3] +::: + [ T 1 + T 2 + T 3 + ::: + 0]g = (1=T 2)fT 0 + 2(T 1) 1 + 2(T 2) 2 + ::: + 2 T 1g = (1=T)f 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)g = (1=T)j 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)j; then T E( YT )2 = j 0 + [(T 1)=T](2 1) + [(T 2)=T](2 2) + ::: + [1=T](2 T 1)j (4) fj 0j+ [(T 1)=T] 2j 1j+ [(T 2)=T] 2j 2j+ ::: + [1=T] 2j T 1jg fj 0j+ 2j 1j+ 2j 2j+ :::g < 1: So, E( YT )2 ! 0. 4.6 Martingale Di erence Sequences A law of large numbers for martingale di erence sequence is the following theorem. Theorem (Revised Chow): Let fZt;Ftg be a martingale di erence sequence such that EjZtj2r < 4 < 1 for 20 some r 1 and all t. Then ZT a:s: ! 0: 21 5 Central Limit Theory In this section we study various form of central limit theorem (CLT) from the concept of convergence in distribution. The central limit theorem we consider are all of the following form: Proposition Given restriction on the dependence, heterogeneity, and moments of a sequence of random variables (you may think this sequence as a sample of size T) fZtg, ( ZT T) ( T =pT) = pT( Z T T ) T L ! N(0; 1); where ZT TX t=1 Zt; T E( ZT ); and 2T =T var( ZT) (that is 2T = var( PT t=1 Zt) T ): As with the law of large numbers, there are natural trade-o among theses re- strictions. Typically, greater dependence or heterogeneity restrictions is allowed at the expanse of requiring strengthening moment restriction. 5.1 Independent Identically distributed Observations As with laws of large numbers, the case of i:i:d: observations is the simplest. Theorem 1(Linderberg-L evy). Let fZtg be a sequence of i:i:d: random scalars, with E(Zt) and 2 var(Zt) < 1. If 2 6= 0, then pT( Z T T ) T = pT( Z T ) = T 1=2 PT t=1(Zt ) L ! N(0; 1): Compared with the law of large number for i:i:d: observations, we impose a single additional requirement, i.e., that 2 var(Zt) < 1. Note that this implies that EjZtj < 1. 22 Proposition: If the kth moment of a random variable exists, all moments of order less than k exist. Proof: Let fX(x) be the pdf of X. E(Xk) exists if and only if Z 1 1 jxjk fX(x)dx < 1: Let 1 j < k, to prove the theorem we must show that Z 1 1 jxjj fX(x)dx < 1: But Z 1 1 jxjj fX(x)dx = Z jxj 1 1 jxjj fX(x)dx + Z 1 jxj>1 jxjj fX(x)dx Z jxj 1 1 fX(x)dx + Z 1 jxj>1 jxjj fX(x)dx 1 + Z 1 jxj>1 jxjj fX(x)dx 1 + Z 1 jxj>1 jxjk fX(x)dx < 1: 5.2 Independent Heterogeneously Distributed Observa- tions Several di erent central limit theorems are available for the case in which our observations are not identically distributed. Theorem (Liapounov, revised Lindeberg-Feller) LetfZtgbe a sequence of independent random variables such that t E(Zt); 2t var(Zt) and EjZt tj2+ < 4 < 1 for some > 0 and all t. If 2T > 0 > 0 for all T su ciently large, then pT( Z T T) T L ! N(0; 1): 23 Note that EjZtj2+ < 4 also implies that EjZt tj2+ is uniformly bounded. Note also the analogy with previous results there we obtained a law of large num- bers for independent random variables by imposing a uniform bound on EjZtj1+ Now we can obtain a central limit theorem imposing a uniform bound on EjZtj2+ . 5.3 Dependent Identically Distributed Observations In the last two section we saw that obtaining central limit theorems for indepen- dent process typically required strengthening the moments restrictions beyond what was su cient for obtaining laws of large numbers. In the class of stationary ergodic process, not only will we strengthen the moment requirements, but we will also impose stronger conditions on the memory of the process. Theorem (Scott): Let fZt;Ftg be a stationary ergodic adapted mixingale with m of size 1. Then 2T var(T 1=2 PTt=1 Zt) ! 2 < 1 as T ! 1 and if 2 > 0, then T 1=2 ZT = L ! N(0; 1). 5.4 Dependent Heterogeneously distributed Observations Theorem (Wooldridge-White): Let fZtg be a scalar random sequence with t E(Zt) and 2t var(Zt) such that EjZtjr < 4 < 1 for some r 2 for all t and having mixing coe cients of size r=2(r 1) or of size r=(r 2); r > 2. If 2T var(T 1=2 PTt=1 Zt) > > 0 for all T su ciently large, then pT( ZT T)= T L ! N(0; 1). 5.5 Asymptotic Uncorrelated Observations (such as a sta- tionary ARMA process) We now present a central limit theorem for a serial correlated sequence. 24 Theorem: Let Yt = + 1X j=0 ’j"t j; where f"tgis a sequence of i:i:d: random variables with E("2t ) < 1and P1j=0 j’jj < 1. Then pT( Y T ) L ! N(0; 1X j= 1 j): Proof: Given this theorem, it su ces to shows that 2T (= T var( YT) = [var(PTt=1 Yt)]=T) =P 1 j= 1 j) from the general form of CLT. Note that the assumption P1j=0 j’jj < 1 implies that P1j=0 j jj < 1 and means that for any > 0 there exist a q such that 2j q+1j+ 2j q+2j+ 2j q+3j+ ::::: < =2: From (4) we have 1X j= 1 j T var( YT) = jf 0 + 2 1 + 2 2 + 2 3 + :::g f 0 + [(T 1)=T]2 1 + [(T 2)=T]2 2 + ::: + [1=T]2 T 1gj (1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ ::: +(q=T) 2j qj+ 2j q+1j+ 2j q+2j+ 2j q+3j+ ::: (1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ ::: +(q=T) 2j qj+ =2: Moreover, for this given q, we can nd an N such that (1=T) 2j 1j+ (2=T) 2j 2j+ (3=T) 2j 3j+ ::: + (q=T) 2j qj < =2 for all T N, ensuring that 1X j= 1 j T var( YT) < : 25 This completes the proof. 5.6 Martingale Di erence Sequences Theorem: Let fYtg be a scalar martingale di erence sequence with YT = (1=T) PTt=1 Yt. Suppose that (i). E(Y 2t ) = 2t > 0 with (1=T) PTt=1 2t ! 2 > 0, (ii). EjYtjr < 1 for some r > 2 and all t, and (iii). (1=T) PTt=1 Y 2t p ! 2, then pT YT L ! N(0; 2). 26