Ch. 22 Unit Root in Vector Time Series 1 Multivariate Wiener Processes and Multivari- ate FCLT Section 2.1 of Chapter 21 described univariate standard Brownian motion W(r) as a scalar continuous-time process (W : r 2 [0;1] ! R1). The variable W(r) has a N(0;r) distribution across realization, and for any given realization, W(r) is continuous function of the date r with independent increments. If a set of k such independent processes, denoted W1(r);W2(r);:::;Wk(r), are collected in a ( k 1) vector w(r), the results is k dimentional standardBrownianmotion. De nition 1: A k-dimensional standard Brownian motion w( ) is a continuous-time process as- sociating each date r2 [0;1] with the (k 1) vector w(r) satisfying the following: (a). w(0) = 0; (b). For any dates 0 r1 <r2 <:::<rk 1, the changes [w(r2) w(r1)]; [w(r3) w(r2)];:::;[w(rk) w(rk 1)] are independent multivariate Gaussian with [w(s) w(v)] N(0;(s v)Ik); (c). For any given realization, w(r) is continuous in r with probability 1. Analogous to the univariate case, we can de ne a multivariate random walk as follows. De nition: Let the k 1 random vector yt follow yt = yt 1+"t, t = 1;2;:::; where y0 = 0 and "t is a sequence of i:i:d: random vector such that E("t) = 0 and E("t"0t) = , a nite positive de nite matrix. Then yt is a multivariate (k-dimensional) random walk. We form the rescaled partial sums as wT (r) 1=2T 1=2 [T r] X t=1 "t: 1 The components of wT (r) are the individual partial sums WT j(r) = T 1=2 [T r] X t=1 ~"tj; j = 1;2;:::;k; where ~"tj is the jth element of 1=2"t. The Function Central Limit Theorem (FCLT) provides conditions under which wT (r) converges to the multivariate standard Wiener process w(r). The simplest multivariate FCLT is the multivariate Donsker’s theorem. Theorem 1(Multivariate Donsker): Let "t be a sequence of i:i:d: random vector such that E("t) = 0 and E("t"0t) = , a nite positive de nite matrix. Then wT ( ) =) w( ). Quite general multivariate FCLTs are available. For example, we may applied FCLT to serially dependent vector processes using a generalization of (70) and Theorem 12 of Chapter 21. Theorem 2 (FCLT when ut is a vector MA(1) process): Let ut = 1X s=0 s"t s; then wT ( ) =) w( ); where wT (r) (1) 1 1=2T 1=2P[T r] t=1 "t, "t is a k dimensional i:i:d: random vector with variance covariance , and if (s)ij denote the row i, column j element of s, 1X s=0 s j (s)ij j<1 for each i;j = 1;2;:::;k. Proof: Using multivariate Beveridge-Nelson decomposition and from that to derive the long run variance matrix of ut to be 1TE[P(ut)2] = 2(1) . 2 2 Vector Autoregression Containing Unit Roots Let yt be an (k 1) vector autoregressive process (VAR(p)), i.e. [Ik 1L 2L2 ::: pLp]yt = c + "t: (1) The scalar algebra in (33) of Chapter 21 works perfectly well for matrices, es- tablishing that for any value 1, 2,..., p, the following polynomials are equiv- alent: [Ik 1L 2L2 ::: pLp] = (Ik L) ( 1L+ 2L2 +:::+ p 1Lp 1)(1 L); where 1 + 2 +:::+ p (2) s [ s+1 + s+2 +:::+ p] fors = 1;2;:::;p 1: (3) It follows that any VAR(p) process (1) can always be written in the form (Ik L)yt ( 1L+ 2L2 +:::+ p 1Lp 1)(1 L)yt = c + "t or yt = 14yt 1 + 24yt 2 +:::+ p 1yt p+1 + c + yt 1 + "t: (4) There are tow meanings of a VAR process contains unit roots. First, if the rst di erence of yt follows a VAR(p 1) process: 4yt = 14yt 1 + 24yt 2 +:::+ p 1yt p+1 + c + "t; requiring from (4) that = Ik 3 or from (2) that 1 + 2 +:::+ p = Ik: (5) Second, recalling from (8) of Chapter 18 that a VAR(p) such as in (1) will be said to contain at least one unit root (z = 1) if the following determinant is zero: jIk 1 2 ::: pj = 0: (6) Note that (5) implies (6) but (6) does not imply (5). Vector autoregression for which (6) holds but (5) does not will be considered in Chapter 23. 4 3 Spurious Regression 3.1 Asymptotics for Spurious Regression Consider a regression of the form yt = x0t +ut; (7) for which elements of yt and xt might be nonstationary. If there does not exist some population value for for which the disturbance ut = yt x0t is I(0), then OLS is quite likely to produce spurious results. In a extreme condition that Yt and xt are independent random walks, as we shall see, the OLS estimator of , ^ T is not consistent for = 0 but instead converge to a particular random variable. Because there is truly no relation between Yt and xt, and because ^ T is incapable of revealing this, we call this a case of "spurious regression". This phenomenon was rst considered by Yule (1926), and the dangers of spurious re- gression were forcefully brought to the economists by the Monte Carlo studies of Granger and Newbold (1974) and latter explained theoretically by Phillips (1986). Theorem 3 (Spurious Regression, two independent random walks): Let Xt and Yt be independent random walks, Xt = Xt 1 + t and Yt = Yt 1 + t, and t is independent of zetat. Consider the regression equation for Yt in terms of Xt, formally as Yt = Xt + ut, where = 0 and ut = Yt, re ecting the lack of any relations between Yt and Xt. Then the OLS estimator of , ^ T L ! ( 2= 1) hR1 0 W1(r) 2dr i 1R1 0 W1(r)W2(r)dr, where 2 1 = E( 2 t ) and 2 2 = E( 2 t ). Proof: To proceed, we write W1T (rt 1) = T 1=2 t 1X s=1 s= 1 = T 1=2Xt 1= 1; W2T (rt 1) = T 1=2 t 1X s=1 s= 2 = T 1=2Yt 1= 2 or T 1=2Xt 1 = 1W1T (rt 1) (8) 5 and T 1=2Yt 1 = 2W2T (rt 1); (9) where 21 limT!1Var(T 1=2PTt=1 t) and 22 limT!1Var(T 1=2PTt=1 t), and rt 1 = (t 1)=T as before. From Donsker’s theorem and the continuous mapping theorem we have that T 2PTt=1 X2t 1 ) 21 R10 W1(r)dr and also T 2PTt=1 Y 2t 1 ) 22 R10 W2(r)dr. The multivariate version of Donsker’s theorem states that 2 1 0 0 22 1=2 T 1=2 [T r] X t=1 t ) W 1(r) W2(r) or T 1=2X T (r) T 1=2YT (r) ) 1W1(r) 2W2(r) : From (8) and (9) we have T 1 T 1 TX t=1 Xt 1Yt 1 = T 1 TX t=1 1W1T (rt 1) 2W2T (rt 1) = 1 2T 1 TX t=1 W1T (rt 1)W2T (rt 1) = 1 2 TX t=1 Z t=T (t 1)=T W1T (r)W2T (r)dr = 1 2 Z 1 0 W1T (r)W2T (r)dr ) 1 2 Z 1 0 W1(r)W2(r)dr; where we have use the fact that W1T (r) and W2T (r) is constant for (t 1)=T r<t=T and the continuous mapping theorem to the mapping (x;y) 7! Z 1 0 x(a)y(a)da: 6 Hence for convenience treating ^ T 1 instead of ^ T we have ^ T 1 = T 2 TX t=1 X2t 1 ! 1 T 2 TX t=1 Xt 1Yt 1 ! = 21 Z 1 0 W1(r)dr 1 1 2 Z 1 0 W1(r)W2(r)dr = ( 2= 1) Z 1 0 W1(r)dr 1Z 1 0 W1(r)W2(r)dr: Q:E:D: (10) The spurious regression problem become clear upon inspection of (10). The true value of the derivative of Yt with respect to Xt is zero because the errors generating Xt and Yt series in the regression are independent. Yet ^ T fails to converge in probability to zero and instead has a non-degenerate distribution. Using similar techniques, Phillips (1986) show thatT 1=2t^ T has a non-degenerate distribution, or in other words that the t-statistic for ^ T has a divergent distri- bution. Hence as T ! 1, the probability of a signi cant t-value arising in a regression such as (7) approach one, leading to spurious inference about the ex- istence of a relationship between Xt and Yt. The spurious regression problem not only arise from independent random walks, it even appears among non-cointegrated generally I(1) process. Theorem 4 (Spurious Regression, not cointegrated I(1) process, Hamilton’s Parametric Method): Consider an (k 1) vector yt whose rst di erence is described by (1 L)yt = (L)"t = 1X s=0 s"t s; for "t an i:i:d: vector with mean zero, variance E("t"0t) = = PP0, and nite fourth moment and where fs sg1s=0 is absolutely summable. Let g = (k 1) and = (1)P. Partition yt as yt = (Y1t;y02t)0, and partition 0 as 0 = 11 021 21 22 ; 7 where 11 is (1 1) and 22 is (g g). Suppose that 0 is nonsingular, and de ne ( 1)2 ( 11 021 122 21): Let L22 denote the Cholesky factor of 122 and consider the consequence of an OLS regression of the rst variable on the others and a constant, Y1t = ^ T + y02t ^ T + ^ut; (11) and ant null hypothesis of the form H0 : R = q, where R is a known (r g) matrix representing r separate hypothesis involving and q is a known r 1 vector. Then the following hold. (a). The OLS estimate ^ T and ^ T are characterized by T 1=2 ^ T ^ T 122 21 L ! 1h1 1L22h22 ; where h 1 h2 " 1 R10 [w2(r)]0drR 1 0 w2(r)dr R1 0 [w2(r)][w2(r)] 0dr # 1 " R1 0 W1(r)drR1 0 w2(r)W1(r)dr # andW1(r) denotes scalar standard Brownian motion, w2(r) denotesg-dimensional standard Brownian motion with w2(r) independent of W1(r). (b). The sum of squared errors SSE from the OLS estimation of (11) satis es T 2 SSE L ! ( 1)2 H; where H Z 1 0 [W1(r)]2dr Z 1 0 W1(r)dr Z 1 0 [W1(r)][w2(r)]0dr " 1 R10 [w2(r)]0drR 1 0 w2(r)dr R1 0 [w2(r)][w2(r)] 0dr # 1" R1 0 W1(r)drR1 0 w2(r)W1(r)dr #9= ;: 8 (c). The OLS F test satis es T 1FT L ! ( 1R h2 q )0 8< : 1H[0 R ] " 1 R10 [w2(r)]0drR 1 0 w2(r)dr R1 0 [w2(r)][w2(r)] 0dr # 1 00 R0 9= ; ( 1R h2 q ) r; where R RL22 q q R 122 21: Result (a) implies that neither estimator is consistent. The estimator of the constant, ^ actually diverge, and must divided by T1=2 to obtain a random vari- able with a well-speci ed distribution. The estimator ^ itself is likely to get farther and farther from the true value of zero as the sample T increase. Thing does not get better when we look at ^ . Di erent arbitrary large sample will have randomly di ering estimators ^ . Those usual happenings that ^ p ! 0 and must multiplied by some increasing function of T in order to obtain a nondegenerate asymptotic distribution does not occur. Result (b) implies that the usual OLS estimator of the variance of ut s2T = (T k) 1SSET; again diverge as T ! 1. To obtain an estimator that does not grow with the sample size, the sums of squared errors has to be divided by T2 rather than T. In this respect, the residual ^ut from a spurious regression behave like a unit root process; if t is a scalarI(1) series, then T 1P 2t diverge and T 2P 2t converges. Result (c) means that any OLS t or F test based on the spurious regression also diverge; the OLS F statistics must be divided by T to obtains a variable that does not grow with the sample size. Since an F test of a single restriction is the square of the corresponding t test, any t statistics would have to be divided by T1=2 to obtain a convergent variable. Thus, as the sample size become large, 9 it becomes increasingly that the absolute value of an OLS t test will exceed any arbitrary nite value (such as the usual critical value of t = 2). For example, in the regression of (11), it appears that Y1t and y2t are signi cantly related whereas in reality they are completely independent. Should we be totally pessimistic on the regression of unit root process from above results ? There is, in fact, one case of major importance where the corre- lation properties of Y1t and y2t do interfere with these qualitative results. The conditions in this Theorem require that 0 is nonsingular. From the fact that rank ( 0) =rank ( ), = (1)P, and P is nonsingular we require that (1) is nonsingular or that the determinant j (1)j6= 0. If we allow (1) to be singu- lar, then the asymptotic theory of this theorem no longer holds as stated. The condition that (1) is singular is a necessary conditions for Y1t and y2t to be cointegrated in the sense of Engle and Granger (1987). See Chapter 23 for de- tails. 3.2 Cures For Spurious Regression Many researchers recommend routinely di erencing apparently nonstationary variables before estimating regression (for example, Gordon (1984)): 4Y1t = a+4y02tb +vt; which is believe to avoid the spurious regression problem as well as the nonstan- dard distributions for certain hypotheses associated with the levels regression (11). While this is the ideal cure for the problem discussed in this section, there are two di erent situations in which it might be inappropriate. First, if a economic theory specify a linear relation between Y1t and y2t in level as in (11), then the parameters has its own economical interpretation, for example, @Ct=@Yt = is the marginal propensity to consume which must be positive under normal condition. However, a regression in di erenced data, the parameters has di erent economic implication, e.g. @4Ct=@4Yt = b, which may be positive or negative even though @Ct=@Yt = must be positive. Thus, di erenceing the data 10 before regression avoids the econometrics’s problem but incurs additionally the economic interpretation problem. Second, if both Y1t and y2t are I(1) process, there is an interesting class of models for which the dynamic relation between Y1t and y2t will be misspeci ed if the researchers simply di erences both Y1t and y2t. The class of models, known as cointegratedprocess, is discussed in th following chapters. 11