Ch. 23 Cointegration 1 Introduction An important property of I(1) variables is that there can be linear combinations of theses variables that are I(0). If this is so then these variables are said to be cointegrated. Suppose that we consider two variables Yt and Xt that are I(1). (For example, Yt = Yt 1 + t and Xt = Xt 1 + t.) Then, Yt and Xt are said to be cointegrated if there exists a such that Yt Xt is I(0). What this mean is that the regression equation Yt = Xt + ut make sense because Yt and Xt do not drift too far apart from each other over time. Thus, there is na long-run equilibrium relationship between them. If Yt and Xt are not cointegrated, that is, Yt Xt = ut is also I(1), then Yt and Xt would drift apart from each other over time. In this case the relationship between Yt and Xt that we obtain by regressing Yt and Xt would be spurious. Let us continue the cointegration with the spurious regression setup in which Xt and Yt are independent random walks, consider what happens if we take a nontrivial linear combination of Xt and Yt: a1Yt + a2Xt = a1Yt 1 + a2Xt 1 + a1 t + a2 t; where a1 and a2 are not both zero. We can write this as Zt = Zt 1 + vt; where Zt = a1Yt + a2Xt and vt = a1 t + a2 t. Thus, Zt is again a random walk process, as vt is i:i:d: with mean zero and nite variance, given that t and t each are i:i:d: with mean zero and nite variance. No matter what coe cients a1 and a2 we choose, the resulting linear combination is again a random walk, hence an integrated or unit root process. Now consider what happens when Xt is a random walk as before, but Yt is instead generated according to Yt = Xt + ut, with ut again i:i:d:. By itself, Yt is an integrated process, because Yt Yt 1 = (Xt Xt 1) + ut ut 1; 1 so that Yt = Yt 1 + t + ut ut 1 = Yt 1 + "t; where "t = t + ut ut 1 is readily veri ed to be I(0) process. Despite the fact that both Xt and Yt are integrated processes, the situation is very di erent from that considered at last chapter. Here, there is indeed a linear combination of Xt and Yt that are not an integrated process: putting a1 = 1 and a2 = we have a1Yt + a2Xt = Yt Xt = ut; which is i:i:d: This is an example of a pair fXt; Ytg of cointegrated process. The concept of cointegration was introduced by Granger (1981). This pa- per and that of Engle and Granger (1987) have had a major impact on modern econometrics. Following Engle and Granger (1987), we have the de nition of cointegration formally as follows. De nition 1: The components of the vector xt are said to be co-integrated of order d, b, de- noted xt CI(d; b), if (a). all components of xt are I(d); (b). there exists a vector a(6= 0) so that zt = a0xt I(d b); b > 0. The vector a is called the co-integrating vector. For ease of exposition, only the value d = 1 and b = 1 will be considered in this chapter. For the case that d and b are fractional value, this is called fractional cointegration. We will consider this case in Chapter 25. Clearly, the cointegrating vector a is not unique, for if a0xt is I(0), then so is ba0xt for any nonzero scalar b; if a is a cointegrating vector, then so is ba. If xt has k components, then there may be more than one cointegrating vector a. Indeed, there may be h < k linear independent (k 1) vectors (a1, a2,..., ah) such 2 that A0xt is a I(0) (h 1) vector, where A0 is the following (h k) matrix: A0 = 2 66 66 66 4 a01 a02 : : : a0h 3 77 77 77 5 : Again, the vector (a1 , a2,..., ah) are not unique; if A0xt is a I(0), then for any nonzero (1 h) vector b0, the scalar b0A0xt is also I(0). Then the (k 1) vector given by 0 = b0A0 could also be described as a cointegrating vector. Suppose that there exists an (h k) matrix A0 whose rows are linearly inde- pendent such that A0xt is a (h 1) I(0) vector. Suppose further that if c0 is any (1 k) vector that is linearly independent of the rows of A0, then c0xt is a I(1) scalar. Then we say that there are exactly h cointegrating relations among the elements of xt and that (a1, a2,..., ah) form a basis for the space of the cointe- grating vectors. Example: Let Pt denote an index of the price level in the United States, P t a price index for Italy, and St the exchange rate between the currency. Then purchasing power parity holds that Pt = StP t ; or, taking logarithms, pt = st + p t ; where pt log Pt, st log St, and p t log P t . In equilibrium we need pt st p t = 0. However, in practice, error in measuring price, transportation costs, and di erences in quality prevent purchasing power parity from holding exactly at every date t. A weaker form of the hypothesis is that the variable zt de ned by zt = pt st p t is I(0), even though the individual elements yt = (pt st p t )0 are all I(1). In this case, we have a single cointegrating vector a = (1 1 1)0. The term zt = a0yt 3 is interpreted as the equilibrium error; although it is not always zero, but it can not be apart from zero too often and too far to make sense the equilibrium concept. 4 2 Granger Representation Theorem Let each elements of the (k 1) vector, yt is I(1) with the (k h) cointegrating matrix, A, such that each elements of A0yt is I(0). Then Granger (1983) have the following fundamental results when y are cointegrated. 2.1 Implication of Cointegration For the VMA Represen- tation We now discuss the general implications of cointegration for the moving average representation of a vector system. Since it is assumed the 4yt is I(0), let E(4yt), and de ne ut = 4yt : (1) Suppose that ut has the Wold representation ut = "t + 1"t 1 + 2"t 1 + ::: = (L)"t; where E("t) = 0 and E("t"0 ) = for t = 0 otherwise: Let (1) denotes the (k k) matrix polynomial (z) evaluated at z = 1; that is, (1) Ik + 1 + 2 + 3 + :::; Then the following holds. (a). A0 (1) = 0, (b). A0 = 0. To verify this claim, note that as long as fs sg1s=0 is absolutely summable, the di erence equation (1) implies that (from multivariate B-N decomposition): yt = y0 + t + u1 + u2 + ::: + ut (2) = y0 + t + (1) ("1 +"2 + ::: +"t) + t 0; (3) 5 where t is a stationary process. Premultiplying (3) by A0 results in A0yt = A0(y0 0) + A0 t + A0 (1) ("1 +"2 + ::: +"t) + A0 t I(0):(4) If E("t"0t) is nonsingular, then c0("1 +"2 + ::: +"t) is I(1) for every nonzero (k 1) vector c. Moreover, if some of the series exhibit nonzero drift ( 6= 0), the linear combination A0yt will grow deterministically at rate A0 . Thus if the underlying hypothesis suggesting the possibility of cointegration is that certain linear combination of yt are I(0), this require that both conditions that A0 (1) = 0 and A0 = 0 hold. The second condition means that despite the presence of a drift term in the process generating yt, there is no linear trend in the cointegrated combination. See Banerjee et.al (1993) p. 151 for details. To the implication of the rst con- dition, from partitioned matrix production we have A0 (1) = 2 66 66 66 4 a01(1 k) a02(1 k) : : : ah(1 k) 3 77 77 77 5 (1)(k k) = 2 66 66 66 4 a01 (1) a02 (1) : : : ah (1) 3 77 77 77 5 = 2 66 66 66 4 0 0 : : : 0 3 77 77 77 5 ; which implies a0i (1) = a1i a2i : : : aki 2 66 66 66 4 (1)01(1 k) (1)02(1 k) : : : (1)0k(1 k) 3 77 77 77 5 = kX s=1 asi (1)0s = 0(1 k) for i = 1; 2; :::; k; (5) where asi is the sth elements of the row vector a0i and (1)0i is the i th row of the matrix (1) Equation (5) implies that certain linear combination of the rows of (1) are zero, meaning that the row vector of (1) are linearly dependent. That is, (1) is a singular matrix, or equivalently, the determinant of (1) are zero, 6 i.e. j (1)j = 0. 1 This in turn means that the matrix operator (L) is non- invertible. 2 Thus, a cointegrated system can never be represented by a nite-order vector autoregression in the di erenced data 4yt from the non-invertibility of (L) of the following equations: 4yt = (L)"t: 2.2 Implication of Cointegration For the VAR Represen- tation Suppose that the level of yt can be represented as a non-stationary pth-order vector autoregression: 3 yt = c + 1yt 1 + 2yt 2 + ::: + pyt p +"t; (6) or (L)yt = c +"t: (7) where (L) [Ik 1L 2L2 ::: pLp]: Suppose that 4yt has the Wold representation (1 L)yt = + (L)"t: (8) Premultiplying (8) by (L) results in (1 L) (L)yt = (1) + (L) (L)"t: (9) 1Recall from Theorem 4 on page 7 of Chapter 22, this condition violate the proof of spurious regression 2If the determinant of an (n n) matrix H is not equal zero, its inverse is found by dividing the adjoint by the determinant: H 1 = (1=jH)j [( 1)i+jjHjij]. 3The is not the only model for I(1). See Saikkonen and Luukkonen (1997) in nite VAR and ? VARMA model 7 Substituting (7) into (9), we have (1 L)"t = (1) + (L) (L)"t; (10) since (1 L)c = 0. Now, equation (10) has to hold for all realizations of"t, which require that (1) = 0 (a vector) (11) and that (1 L)Ik and (L) (L) represent the identical polynomials in L. In particular, for L = 1, equation (10) implies that (1) (1) = 0: (a matrix) (12) Let 0i denote ith row of (1). Then (11) and (12) state that 0i (1) = 00 (a row of zero) and 0i = 0 (a zero scalar). Recalling conditions (a) and (b)of section 2.1, this mean that i is a cointegrating vector. If a1, a2,..., ah form a basis for the space of cointegrating vectors, then it must be possible to express i as a linear combination of a1, a2,..., ah{that is, there exist an (h 1) vector bi such that i = [a1 a2 :::: ah]bi or that 0i = b0iA0 for A0 the (h k) matrix whose ith row is a0i. Applying this reasoning to each of the rows of (1), i.e. (1) = 2 66 66 66 4 01 02 : : : 0k 3 77 77 77 5 = 2 66 66 66 4 b01A0 b02A0 : : : b0kA0 3 77 77 77 5 = BA0; (13) where B is an k h matrix. However, it is seen that the matrix A and B is not identi ed since for any choice of h h matrix , the matrix (1) = B 1 A0 = B A 0 implies the same distribution with (1) = BA0. What can be determined 8 is the space spanned by A the cointegrating space which need the concept of the basis. Note that (13) implies that the k k matrix (1) is a singular matrix because rank( (1)) = rank(BA0) min(rank(B); rank(A0)) = h < k: 2.3 Vector Error Correction Representation A nal representation for a cointegrated system is obtained by recalling from equation (1) of Chapter 22 that any V AR (not necessary cointegrated at this stage) in the form of (6) can be equivalently be written as 4yt = 14yt 1 + 24yt 2 + ::: + p 1yt p+1 + c + 0yt 1 +"t; (14) where 0 I = (I 1 2 ::: p) = (1): Note that if yt has h cointegrating relations, then substitution of (13) into (14) results in 4yt = 14yt 1 + 24yt 2 + ::: + p 14yt p+1 + c BA0yt 1 +"t; (15) Denote zt A0yt, noticing that zt is a stationary (h 1) vector. Then (15) can be written as 4yt = 14yt 1 + 24yt 2 + ::: + p 14yt p+1 + c Bzt 1 +"t: (16) Expression (16) is known as the vector error-correction representation of the cointegrated system. It is interesting to see that while a cointegrated system 9 can never be represented by a nite-order vector autoregression in the di erenced data 4yt, it has a vector error correction representation; the di erence is in that the former has ignored the error correction term, Bzt 1. Example: Let the individual elements (pt st p t )0 are all I(1) and have a single cointegrating vector a = (1 1 1)0 among them. Then these three variables has a V ECM representation: 2 4 4pt 4st 4p t 3 5 = 2 64 (1) 11 (1) 12 (1) 13 (1)21 (1)22 (1)23 (1)31 (1)32 (1)33 3 75 2 4 4pt 1 4st 1 4p t 1 3 5 + 2 64 (2) 11 (2) 12 (2) 13 (2)21 (2)22 (2)23 (2)31 (2)32 (2)33 3 75 2 4 4pt 2 4st 2 4p t 2 3 5 + ::: + 2 64 (p 1) 11 (p 1) 12 (p 1) 13 (p 1)21 (p 1)22 (p 1)23 (p 1)31 (p 1)32 (p 1)33 3 75 2 4 4pt p+1 4st p+1 4p t p+1 3 5 + 2 4 cp cs cp 3 5 2 4 b1 b2 b3 3 5 1 1 1 2 4 pt 1 st 1 p t 1 3 5 + 2 64 " (p) t "(s)t "(p )t 3 75 ; from which we see that the dynamics of changes in each variable is not only according to the lags of its own and other variable’s change but also to the levels of each of the elements of zt 1 by the speed B: 4pt = (1)11 4pt 1 + (1)12 4st 1 + (1)13 4p t 1 + (2)11 4pt 2 + (2)12 4st 2 + (2)13 4p t 2 +::: + (p 1)11 4pt p+1 + (p 1)12 4st p+1 + (p 1)13 4p t p+1 + cp b1(pt 1 st 1 p t 1) + "(p)t = (1)11 4pt 1 + (1)12 4st 1 + (1)13 4p t 1 + (2)11 4pt 2 + (2)12 4st 2 + (2)13 4p t 2 +::: + (p 1)11 4pt p+1 + (p 1)12 4st p+1 + (p 1)13 4p t p+1 + cp b1zt 1 + "(p)t : From economics equilibrium, when there is a positive equilibrium error hap- pen in previous period, i.e. zt 1 = pt 1 st p t 1 > 0, at time t, the changes in pt, i.e. 4pt = pt pt 1 should be negatively related with this equilibrium error. Therefore, the parameters of equilibrium error adjustment should be positive in (16). 10 3 Other representations for Cointegration 3.1 Phillips’s Triangular Representation Another convenient representation for a cointegrated system was introduced by Phillips (1991). Suppose that the rows of the (h k) matrix A0 form a basis for the space of the cointegrating vectors. By reordering and normalizing the cointegrating relations can be represented of the form A0 = 2 66 66 66 4 a01 a02 : : : a0h 3 77 77 77 5 = 2 66 66 66 4 1 0 : : : 0 1;h+1 1;h+2 : : : 1;k 0 1 : : : 0 2;h+1 2;h+2 : : : 2;k : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 0 0 : : : 1 h;h+1 h;h+2 : : : h;k 3 77 77 77 5 = Ih 0 ; where 0 is an (h g) matrix of coe cients for g k h. Let zt denote the errors associated with the set of cointegrating relations: zt A0yt: Since zt is I(0), then the mean 1 E(zt) exists, and we can de ne z t zt 1: Partition yt as yt = y 1t(h 1) y2t(g 1) : Then zt = z t + 1 = Ih 0 y 1t(h 1) y2t(g 1) or y1t(h 1) = 0(h g) y2t(g 1) + z t (h 1) + 1(h 1): (17) 11 A representation for y2t is given by the last g rows of (1): 4y2t(g 1) = 2(g 1) + u2t(g 1); (18) where 2 and u2t represent the last g elements of the (k 1) vector and ut in (1), respectively. (17) and (18) constitute Phillips’s (1991) triangular representation of a system with exactly h cointegrating relations. Note that z t and u2t represent zero-mean stationary disturbance in this representation. Example: Let the individual elements (pt st p t )0 are all I(1) and have a single cointegrating vector a = (1 1 1)0 among them. The triangular representation of these three variables are: given A0 = a0 = [1 1 2]; then pt = 1st + 2p t + 1 + z t 4st = s + ust 4p t = p + up ;t; where the hypothesized values are 1 = 2 = 1. 3.2 The Stock-Watson’s Common Trends Representation Another useful representation for any cointegrated system was proposed by Stock and Watson (1988). Suppose that an (k 1) vector yt is characterized by exact h cointegrating relations with g = k h. We have seen that it is possible to order the element of yt in such a way that a triangular representation of the form of (17) and (18) exists with (z0 t ;u02t)0 a I(0) (k 1) vector with zero mean. Suppose that z0 t u02t = 1X s=0 H s"t s Js"t s 12 for "t an (k 1) white noise process with fsHsg1s=0 and fsHsg1s=0 absolutely summable sequences of (h k) and (g k) matrices, respectively. From B-N decomposition we have y2t = y2;0 + 2 t + tX s=1 u2s = y2;0 + 2 t + J(1) ("1 +"2 + ::: +"t) + 2t 2;0; (19) where J(1) (J0 +J1 +J2 +:::), 2t P1s=0 2s"t s, and 2s (Js+1 +Js+2 + Js+3 + :::). Since the (k 1) vector "t is white noise, the (g 1) vector J(1)"t is also white noise, implying that each element of the (g 1) vector 2t de ned by 2t = J(1) ("1 +"2 + ::: +"t) (20) is described by a random walk. Substituting (20) into (19) results in y2t = ~ 2 + 2 t + 2t + 2t (21) for ~ 2 = (y2;0 2;0). Substituting (21) into (17) produces y1t = ~ 1 + 0( 2 t + t) + ~ 1t (22) for ~ 1 = 1 + 0~ 2 and ~ 1t = z t + 0~ 2t. Equations (21) and (22) give Stock and Watson’s (1988) common trends rep- resentation. These equations show that the vector yt can be described as a stationary component, ~ 1 ~ 2 + ~ 1t 2t ; plus linear combinations of up to g common deterministic trends, as described by the (g 1) vector 2 t, the linear combination of the g common stochastic trend as described by the (g 1) vector 2t. Therefore, when we say that a k 1 vector yt is characterized by exactly h cointegrations, it is equivalent to say that there are g(= k h) common trends among them. 13 4 Estimation and Testing of Cointegration from Single Equation 4.1 Testing for Cointegration When the Cointegrating Vec- tor is Known Often when theoretical considerations suggest that certain variables will be coin- tegrated, or that a0yt is stationary for some (k 1) cointegrating vector a, the theory is based on a particular known value for a. In the purchasing power parity example, a = (1 1 1)0. Given the null hypothesis of unit root can not be re- jected from various unit root tests on the individual series pt, st, and p t , the next step is to test whether their particular linear combination zt = a0yt = pt st p t is stationary from various unit root tests. See the example on p.585 of Hamilton. 4.2 Testing the Null Hypothesis of No Cointegration, Residual- Based Tests for Cointegration If the theoretical model of the system dynamic does not suggest a particular value for the cointegrating vector a, then one approach is rst to estimate a by OLS. 4.2.1 Estimating The Cointegrating Vector If it is known for certain that the cointegrating vector has a nonzero coe cient for the rst element of yt (a1 6= 0), then a particularly convenient normalization is to set a1 = 1 and represent subsequent entries of a (a2; a3; :::; an) as the negative s of a set of unknown parameters ( 2; 3; :::; n): 2 66 66 66 66 4 a1 a2 a3 : : : an 3 77 77 77 77 5 = 2 66 66 66 66 4 1 2 3 : : : n 3 77 77 77 77 5 : 14 Then consistent estimation of a is achieved by an OLS regression of the rst element of yt on all of the other: y1t = 2y2t + 3y3t + ::: + nynt + ut: (23) Consistent estimates of 2; 3; :::; n are also obtained when a constant term is included in (23), as in y1t = + 2y2t + 3y3t + ::: + nynt + ut: or y1t = + 0y2t + ut; where 0 = ( 2; 3; :::; n) and y2t = (y2t; y3t; :::; ynt)0. Theorem 1 (Stock, 1986): Let y1t be a scalar y2t be a (g 1) vector. Let k g + 1, and suppose that the (k 1) vector (y1t;y02t)0 is characterized by exactly one cointegrating relation (h = 1) that has a nonzero coe cients on y1t. Let the triangular representation for the system be y1t = + 0y2t + z t (24) 4y2t = u2t: (25) Suppose that z t u2t = (L)"t; where "t is an (k 1) i:i:d: vector with mean zero, nite fourth moments, and positive de nite variance-covariance matrix E("t"0t) = PP0. Suppose further that the sequence of (k k) matrices fs sg1s=0 is absolutely summable and that the rows of (1) are linearly independent. Let ^ T and ^ T be the OLS estimators of (24). Partition (1) P as (1) P = 1 0 (1 n) 2(g n) : Then T1=2(^ T ) T(^ T ) L ! 1 R [W(r)]0dr 2 0 2 R W(r)dr 2 R [W(r)] [W(r)]0dr 20 h 1 h2 ;(26) 15 where W(r) is n-dimensional standard Brownian motion, the integral sign denotes integration over r from 0 to 1, and h1 10 W(1) h2 2 Z [W(r)] [W(r)]0dr 1 + 1X v=0 E(u2tz t+v): This theorem shows that the OLS estimator of the cointegrating vector is consistent. However, it is noted that the correlation between the regressors y2t and the error z t is not to induce inconsistency of ^ T ; instead, the asymptotic distribution exhibits a bias since the distribution of T(^ T ) does not centered around zero. In the next chapter we will consider system estimation of cointegrating vec- tor. Banerjee et al. (1993, p.214) examined one of main reasons for using such an estimation: the large nite-sample biases that can arise static OLS estimates of cointegrating vectors or parameters. While such estimator are super-consistent (T-consistent), Monte Carlo experiments nonetheless suggest that a large number of observations may be necessary before the biases become small. 4.2.2 What Is the Regression Estimating When There Is More Than One Cointegrating Relation ? The limiting distribution of the OLS estimation in Theorem 1 was derived under the assumption that there is only one cointegration (h = 1). In a more general case with h > 1, OLS estimate of (24) should still provide a consistent estimate of a cointegrating vector. But which cointegrating vector is it ? Wooldridge (1991) show that among the set of possible cointegrating relations, OLS estimation of (24) select the relation whose residuals are uncorrelated with any other I(1) linear combination of (y2t; y3t; :::; ynt). 4.2.3 What Is the Regression Estimating When There Is No Cointe- grating Relation ? Let us now consider the properties of OLS estimation when there is no cointegrat- ing relation. Then (24) is a regression of an I(1) variables on a set of (k 1) I(1) 16 variables for which no coe cients produce an I(0) error term. The regression is therefore subject to the spurious regression problem described in Chapter 22. The coe cient ^ and ^ do not provide consistent estimate of any population param- eter, and the OLS sample residual ^ut will be nonstationary. However, this last property can be exploited to test for cointegration. If there is no cointegration, then a regression of ^ut on ^ut 1 should yield a unit coe cient. If there is cointegra- tion, then a regression of ^ut on ^ut 1 should yield a coe cient that is less than one. The proposal is thus to estimate (24) by OLS and then construct one of the standard unit root tests on the estimated residuals, such as the ADF t test or the PP’s Z or Zt test. Although theses test statistics are constructed in the same way as when they are applied to individual series yt, when the test are applied to the residual ^ut from a spurious regression, the critical values that are used to interpret the test statistics are di erent from those employed in Chapter 21. Theorem 2 (Residual-Based test for Cointegration, Test with No Cointegration as Null): Consider an (k 1) vector yt such that (1 L)yt = (L)"t = 1X s=0 s"t s; for "t an i:i:d: vector with mean zero, variance E("t"0t) = = PP0, and nite fourth moment and where fs sg1s=0 is absolutely summable. Let g = (k 1) and = (1)P. Suppose that the (n n) matrix 0 is nonsingular, and let L denote the Cholesky decomposition of ( 0) 1. Partition yt as yt = (Y1t;y02t)0 and consider the OLS regression: Y1t = ^ T + y02t ^ T + ^ut: (27) The residual ^ut can then be regression on its own lagged value ^ut 1 without a constant term (since the original regression (24) has contained a constant term, the disturbance term ut is zero-mean): ^ut = ^ut 1 + et; (28) yield the OLS estimate ^ T = P ^u t^ut 1P ^u2t 1 : (29) 17 We may form standard Dickey-Fuller and Phillips-Perron (Z ; Zt) from (28). Al- ternatively we can form a ADF test from ^ut = 14^ut 1 + 24^ut 2 + ::: + p 14^ut p+1 + ^ut 1 + et: (30) Then the following results hold. (a) The statistics ^ de ned in (29) satis es (standard DF test) (T 1)(^ 1) L ! 1 2 [1 h02] [w (1)][w (1)]0 1 h2 h1[w (1)]0 1 h2 12[1 h02]L0[E(4yt)(4y0t)]L 1 h2 Hn: (31) Here, w denotes n-dimensional standard Brownian motion partitioned as w = W 1 (r)(1 1) w 2(r)(g 1) ; h1 is a scalar and h2 is a (g 1) vector given by h 1 h2 " 1 R 10 [w 2(r)]0drR 1 0 w 2(r)dr R 1 0 [w 2(r)][w 2(r)] 0dr # 1 " R 1 0 W 1 (r)drR 1 0 w 2(r)W 1 (r)dr # ; and Hn = Z 1 0 [W 1 (r)]2dr Z 1 0 W 1 (r)dr Z 1 0 [W 1 (r)][w 2(r)]0dr h 1 h2 : (b) If the l ! 1 (Newey-West truncated parameter) as T ! 1 but l=T ! 0, then the Phillips-Perron statistics constructed from ^ut, Z satis es Z L ! Zn; (32) where Zn 1 2 [1 h02] [w (1)][w (1)]0 1 h2 h1[w (1)]0 1 h2 12(1 + h02h2) Hn: (33) 18 (c) If the l ! 1 as T ! 1 but l=T ! 0, then the Phillips-Perron statistics constructed from ^ut, Zt satis es Zt L ! Zn p Hn (1 + h02h2)1=2: (34) (d) If, in addition to the preceding assumptions, 4yt follows a zero-mean sta- tionary vector ARMA process and if p !1 as t !1 but p=T 1=3 ! 0, then the ADF t test statistics associated with (30) has the same limiting distribution as the test statistics Zt described in (34). Results (a) implies that ^ p ! 1. Note that although W 1 (r) and w 2(r) are standard Brownian motion, the distribution of the term h1, h2, Hn and Zn above depend only on the number of stochastic explanatory variables included in the cointegrating regression (k 1) and on whether a constant term (original coin- tegration regression) appears in the regression, (T 1)(^ 1) are a ected by the variance, correlations and dynamics of 4yt. In the special case when 4yt is i:i:d:, then (L) = In, and the matrix 0 = E(4yt4y0t). Since LL0 = ( 0) 1, it follows that ( 0) = (L0) 1(L) 1. Hence, L0[E(4yt)(4y0t)]L = L0( 0)L = L0[(L0) 1(L) 1]L = In: (35) If (35) is substituted into (31), the results is that when 4yt is i:i:d:, T(^ 1) L ! Zn for Zn de ned in (33). In the more general case when 4yt is serially correlated, the limiting distri- bution of T(^ 1) depends on the nature of this correlation as captured by the elements L. However, the corrections for autocorrelation implicit in Phillips’s Z and Zt statistics or the augmented Dickey-Fuller t test turn out to generate variables whose distribution do not depend on any nuisance parameters. 19 Although the distribution of Z Zt and the ADFt do not depend on nui- sance parameters, the distribution when these statistics are calculated from the residuals ^ut are not the same as the distribution these statistics would have if calculated from the raw data 4yt. Moreover, di erent values for (k 1) (the number of stochastic explanatory variables in the cointegrating regression) imply di erent characterizations of the limiting statistics, h1, h2, Hn, and Zn, mean- ing that a di erent critical value must be used to interpret Z for each value of (k 1). Similarly, the asymptotic distribution of h2, Hn, and Zn are di erent depending on whether a constant term is included in the cointegrating regression. Example: See the purchasing power parity example on Hamilton’s p.598. Exercise: Reproduce the values in case 2 of Table B.8 and B.9 on Hamilton’s p.765-766. 4.3 Tests with Cointegration as Null The test considered in the previous sections are for the null hypothesis of no cointegration. These are based on tests for a unit root hypothesis in the residuals for the cointegrating regression. In Chapter 21 we discussed unit root tests with stationarity as the null hypothesis (e.g. KPSS). Correspondingly these are tests with cointegration as the null. They are (a) the Leybourne and McCabe test (1993) which is based on an unobserved components model; (b) the Park and Choi test (1988,1990) which is based on testing the signi cance of super uous regressors; (c) the Shin (1994) test which is a residual-based test (d) Harris and Inder (1994) test which use non-parametric correction procedure for estimation of cointegration regression. 20 4.4 Testing Hypothesis About the Cointegrating Vector The previous section described some way to test whether a vector yt is coin- tegrated. It was noted that if yt is cointegrated, then a consistent estimate of the cointegrating vector can be obtained by OLS. However, a di culties arise with nonstandard distribution for hypothesis test about the cointegrating vector due to the possibility of nonzero correlation between z t and u2t. The nuisance parameters 1 and 2 which appear in (26) also cause a problem. The basic approach to constructing hypothesis tests will therefore be to transform the re- gression or the estimate so as to eliminate the e ects of this correlation. The rst one is Stock and Watson (1993)’s dynamic OLS which corrects the correlation by adding leads and lags of 4y2t. The second one is Phillips and Hansen (1990)’s fully modi ed OLS estimate. Modi cation of the OLS have been made in two points. See Hatanaka (1996) p.266 for details. 21 5 Simulation of Bivariate Cointegrated System To illustrate the potential di erence in size and power between various residual- based test for cointegration in nite sample, a monte Carlo experiment proposed by Cheung and Lai (1993), similar to that of Engle and Granger (1987), can be conducted. A bivariate system of xt and yt is modeled by xt + yt = ut (36) and xt + 2yt = vt (37) with (1 L)ut = "t, and vt is generated as an AR(1) process (1 L)vt = t: (38) The innovation "t and t are generated as independent standard normal vari- ates. When vt is given by (38) with j j < 1, xt and yt are cointegrated and (37) is their cointegrating relationship. However, when j j = 1, the two series are not cointegrated. Exercise: Use the simulation based on 10000 replication in a sample of size 500 to compare the performance of size and power ( = 0:85) of residual-based ADF and PP test for cointegration on a nominal size 5%. Truncated number is chosen as p = l = 4. 22