Ch. 18 Vector Time Series 1 Introduction In dealing with economic variables often the value of one variables is not only related to its predecessors in time but, in addition, it depends on past values of other variables. This naturally extends the concept of univariate stochastic process to vector time series analysis. This chapter describes the dynamic in- teractions among a set of variables collected in an (k 1) vector yt. De nition : Let (S;F;P) be a probability space and T an index set of real numbers and de ne the k-dimensional vector function y( ; ) by y( ; ) : S T ! Rk. The ordered sequence of random vector fy( ;t);t 2 Tg is called a vector stochastic process. 1.1 First Two moments of Stationary Vector Time Series From now on in this chapter we follows convention to use yt in stead of y( ;t) to indicate that we are considering discrete vector time series. The rst two moments of a vector time series yt are E(yt) = t; and t;j = E[(yt t)(yt j t j)0] forallt2 T: If neither t and t;j are function of t, that is, t = and t;j = j, then we say that yt is a covariance-stationary vector process. Note that although j = j for a scalar stationary process, the same is not true of a vector process: j 6= j: Instead, the correct relation is 0j = j since j = E[(yt+j )(y(t+j) j )0] = E[(yt+j )(yt )0]; 1 and taking transpose, 0j = E[(yt )(yt+j )0] = E[(yt )(yt ( j) )0] = j: 1.2 Vector White Noise Process De nition: A k 1 vector process f"t; t2 Tg is said to be a white-noise process if (i): E("t) = 0; (ii): E("t"0 ) = if t = 0 if t6= ; where is an (k k) symmetric positive de nite matrix. It is important to note that in general is not necessary a diagonal matrix, since it is the contempora- neous correlation among variables that called for the needs of vector time series analysis. 1.3 Vector MA(q) Process A vector moving average process of order q takes the form yt = + "t + 1"t 1 + 2"t 2 +:::+ q"t q; where "t is a vector white noise process and j denotes an (k k) matrix of MA coe cients for j = 1;2;:::;q. The mean of yt is , and the variance is 0 = E[(yt )(yt )0] = E["t"0t] + 1E["t 1"0t 1] 01 + 2E["t 2"0t 2] 02 +:::+ qE["t q"0t q] 0q = + 1 01 + 2 02 +:::+ q 0q; with autocovariance (compares with j of Ch. 14 on p.3) j = 8 < : j + j+1 01 + j+2 02 +:::+ q 0q j forj = 1;2;:::;q 0 j + 1 0 j+1 + 2 0 j+2 +:::+ q+j 0q forj = 1; 2;:::; q 0 for jjj>q; 2 where 0 = Ik. Thus any vector MA(q) process is covariance-stationary. 1.4 Vector MA(1) Process The vector MA(1) process is written yt = + "t + 1"t 1 + 2"t 2 +:::: where "t is a vector white noise process and j denotes an (k k) matrix of MA coe cients. De nition: For an (n m) matrix H, the sequence of matrices fHsg1s= 1 is absolutely summable if each of its elements forms an absolutely summable scalar sequence. Example: If (s)ij denotes the row i, column j element of the moving average parameters matrix s associated with lag s, then the sequence f sg1s=0 is absolutely if 1X s=0 j (s)ij j<1 fori = 1;2;:::;k and j = 1;2;:::;k: Theorem: Let yt = + "t + 1"t 1 + 2"t 2 +:::: where "t is a vector white noise process and f lg1l=0 is absolutely summable. Let yit denote the ith element of yt, and let i denote the ith element of . Then (a). the autocovariance between the ith variable at time t and the jth variable s period earlier, E(yit i)(yj;t s j), exist and is given by the row i, column j element of s = 1X v=0 s+v 0v fors = 0;1;2;:::; 3 (b). the sequence of matrices f sg1s=0 is absolutely summable. Proof: (a). By de nition s = E(yt )(yt s )0 or s = E["t + 1"t 1 + 2"t 2 +:::+ s"t s + s+1"t s 1 +::::] ["t s + 1"t s 1 + 2"t s 2 +::::]0 = s 00 + s+1 01 + s+2 02 +::: = 1X v=0 s+v 0v fors = 0;1;2;::: The row i, column j element of s is therefore the autocovariance between the ith variable at time t and the jth variable s period earlier, E(yit i)(yj;t s j). (b). 4 2 Vector Autoregressive Process, VAR A pth order vector autoregression, denoted VAR(p) is written as; yt = c + 1yt 1 + 2yt 2 +:::+ pyt p + "t; (1) where c denotes an (k 1) vector of constants and j an (k k) matrix of au- toregressive coe cients for j = 1;2;:::;p and "t is a vector white noise process. 2.1 Population Characteristics Let ci denotes the ith element of the vector c and let (s)ij denote the row i, column j element of the matrix s, then the rst row of the vector system in (1) speci es that y1t = c1 + (1)11 y1;t 1 + (1)12 y2;t 1 +:::+ (1)1k yk;t 1 + (2)11 y1;t 2 + (2)12 y2;t 2 +::::+ (2)1k yk;t 2 +::::+ (p)11 y1;t p + (p)12 y2;t p +:::+ (p)1k yk;t p +"1t: Thus, a vector autoregression is a system in which each variable is regressed on a constant and p of its own lags as well as on p lags of each of the other (k 1) variables in the VAR. Note that each regression has the same explanatory vari- ables. Using lag operator notation, (1) can be written in this form [Ik 1L 2L2 ::: pLp]yt = c + "t or (L)yt = c + "t: (2) Here (L) indicate an k k matrix polynomial in the lag operator L. The row i, column j elements of (L) is a scalar polynomial in L: (L)ij = [ ij (1)ij L1 (2)ij L2 ::: (p)ij Lp]; where ij is unity if i = j and zero otherwise. 5 If the VAR(p) process is stationary, we can take expectation of both side of (1) to calculate the mean of the process: = c + 1 + 2 +:::+ p ; or = (Ik 1 2 ::: p) 1c: Equation (1) can then be written in terms of deviations from the mean as (yt ) = 1(yt 1 ) + 2(yt 2 ) +:::+ p(yt p ) + "t: (3) 2.1.1 Conditions for Stationarity As in the case of the univariate AR(p) process, it is helpful to rewrite (3) in terms of a VAR(1) process. Toward this end, de ne t = 2 66 66 66 4 yt yt 1 : : : yt p+1 3 77 77 77 5 (kp 1) (4) F = 2 66 66 66 66 4 1 2 3 : : : p 1 p IK 0 0 : : : 0 0 0 IK 0 : : : 0 0 0 0 0 : : : 0 0 0 0 0 : : : 0 0 0 0 0 : : : 0 0 0 0 0 : : : 0 0 3 77 77 77 77 5 (kp kp) (5) and vt = 2 66 66 66 4 "t 0 : : : 0 3 77 77 77 5 (kp 1) : 6 The VAR(p) in (3) can then be rewritten as the following VAR(1): t = F t 1 + vt; (6) which implies t+s = vt+s + Fvt+s 1 + F2vt+s 2 +:::+ Fs 1vt+1 + Fs t; (7) where E(vtv0s) = Q fort = s 0 otherwise and Q = 2 66 66 66 4 0 : : : 0 0 0 : : : 0 : : : : : : : : : : : : : : : : : : 0 0 : : : 0 3 77 77 77 5 : In order for the process to be covariance-stationary, the consequence of any given "t must eventually die out. If the eigenvalues of F all lie inside the unit circle, then the VAR turns out to be covariance-stationary. Proposition: The eigenvalues of the matrix F in (5) satisfy I k p 1 p 1 2 p 2 ::: p = 0: (8) Hence, a VAR(p) is covariance-stationary as long as j j < 1 for all values of satisfying (8). Equivalently, the VAR is stationary if all values z satisfying I k 1z 2z2 ::: pzp = 0 lie outside the unit circle. 7 2.1.2 Vector MA(1) Representation The rst k rows of the vector system represented in (7) constitute a vector system: yt+s = + "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1 +Fs11(yt ) + Fs12(yt 1 ) +::::+ Fs1p(yt p+1 ): Here j = F(j)11 and F(j)11 denotes the upper left block of Fj, where Fj is the matrix F raised to the jth power. If the eigenvalues of F all lie inside the unit circle, then Fs ! 0 as s ! 1 and yt can be expressed as convergent sum of the history of ": yt = + "t + 1"t 1 + 2"t 2 + 3"t 3 +::: = + (L)"t: (9) The moving average matrices j could equivalently be calculated as follows. The operator (L)(= Ik 1L 2L2 ::: pLp) at (2) and (L) at (9) are related by (L) = [ (L)] 1; requiring that [Ik 1L 2L2 ::: pLp][Ik + 1L + 2L2 +:::] = Ik: Setting the coe cient on L1 equal to the zero matrix produces 1 1 = 0: Similarly, setting the coe cient on L2 equal to zero gives 2 = 1 1 + 2; and in general for Ls, s = 1 s 1 + 2 s 2 +:::+ p s p fors = 1;2;::: (10) with 0 = Ik and s = 0 for s< 0. Note that the innovation in the MA(1) representation in (9) is "t, the funda- mental innovation for yt. There are alternative moving average representations 8 based on vector white noise process other than "t. Let H denote a nonsingular (k k) matrix, and de ne ut = H"t: Then certainly ut is white noise: E(ut) = 0 and E(utu0 ) = H H0 fort = 0 fort6= : Moreover, from (9) we could write yt = + H 1H"t + 1H 1H"t 1 + 2H 1H"t 2 + 3H 1H"t 3 +::: = + J0ut + J1ut 1 + J2ut 2 + J3ut 3 +::::; where Js = sH 1: (11) For example, H could be any matrix that diagonalizes , H H0 = D; with D a diagonal matrix. For such a choice of H, the element of ut are uncor- related with one another:a E(utu0t) = H H0 = D: Thus, it is always possible to write a stationary VAR(p) process as a in nite moving average of a white noise vector ut whose elements are mutually uncorre- lated. 2.1.3 Computation of Autocovariances of an Stationary VAR(p) Pro- cess We now consider to express the second moments for yt following a VAR(p). Recall that as in the univariate AR(p) process, the Yule-Walker equation are 9 obtained by postmultiplying (3) with (yt j )0 and taking expectations. For j = 0, using j = 0 j, 0 = E(yt )(yt )0 = 1E(yt 1 )(yt )0 + 2E(yt 2 )(yt )0 +:::+ pE(yt p )(yt )0 +E"t(yt )0 = 1 1 + 2 2 +:::+ p p + = 1 01 + 2 02 +:::+ p 0p + and for j > 0, j = 1 j 1 + 2 j 2 +:::+ p j p: (12) These equations may be used to compute the j recursively for j p if 1,..., p and p 1,..., 0 are known. Let t be as de ned in (4) and let denote the variance of t, = E( t 0t) = E 8 >>> >>>< >>> >>: 2 66 66 66 4 yt yt 1 : : : yt p+1 3 77 77 77 5 (yt )0 (yt 1 )0 : : : (yt p+1 )0 0 9 >>> >>>= >>> >>; = 2 66 66 66 4 0 1 : : : p 1 01 0 : : : p 2 : : : 0p 1 0p 2 : : : 0 3 77 77 77 5 : Postmultiplying (4) by its own transpose and taking expectation gives E[ t 0t] = E[(F t 1 + vt)(F t 1 + vt)0] = FE( t 1 0t 1)F0 +E(vtv0t) or = F F0 + Q: (13) 10 From the result of vec operator on p. 13 of Ch. 1, we have vec( ) = (F F) vec( ) +vec(Q) = Avec( ) +vec(Q); (14) where A (F F): Let r = kp, so that F is an (r r) matrix and A is an (r2 r2) matrix. Equation (14) has the solution vec( ) = [Ir2 A] 1vec(Q); (15) provided that the matrix [Ir2 A] is nonsingular. Thus, the j; j = p+1;:::;p 1 are obtained from (15). Example: Consider the three-dimensional VAR(1) process yt = c + 2 4 0:5 0 0 0:1 0:1 0:3 0 0:2 0:3 3 5yt 1 + "t; with E("t"0t) = = 2 4 2:25 0 0 0 1:0 0:5 0 0:5 0:74 3 5. For this process the reverse characteristic polynomial is det 2 4 2 4 1 0 0 0 1 0 0 0 1 3 5 2 4 0:5 0 0 0:1 0:1 0:3 0 0:2 0:3 3 5z 3 5 = det 2 4 1 0:5z 0 0 0:1z 1 0:1z 0:3z 0 0:2z 1 0:3z 3 5 = (1 0:5z)(1 0:4z 0:03z2): The roots of this polynomial are easily seen to be z1 = 2; z2 = 2:1525; z3 = 15:4858: 11 They are obviously all greater than 1 in absolute value. Therefore the process is stationary. We next obtain the MA(1) coe cients from this VAR(1) process using the relation as described in equation (10). 0 = I3; 1 = 1 0 = 2 4 0:5 0 0 0:1 0:1 0:3 0 0:2 0:3 3 5; 2 = 1 1 = 2 4 0:25 0 0 0:01 0:07 0:12 0:02 0:08 0:15 3 5; 3 = 1 2 = 2 4 0:125 0 0 0:037 0:031 0:057 0:018 0:038 0:069 3 5; 4 = :::: We nally calculate the autocovariance of this process. From (15) we have vec( ) = vec( 0) = [I9 1 1] 1vec( ) = 2 66 66 66 66 66 66 4 0:75 0 0 0 0 0 0 0 0 0:05 0:95 0:15 0 0 0 0 0 0 0 0:1 0:85 0 0 0 0 0 0 0:05 0 0 0:95 0 0 0:15 0 0 0:01 0:01 0:03 0:01 0:99 0:03 0:03 0:03 0:09 0 0:02 0:03 0 0:02 0:97 0 0:06 0:09 0 0 0 0:01 0 0 0:85 0 0 0 0 0 0:02 0:02 0:06 0:03 0:97 0:09 0 0 0 0 0:04 0:06 0 0:060 0:91 3 77 77 77 77 77 77 5 1 2 66 66 66 66 66 66 4 2:25 0 0 0 1:0 0:5 0 0:5 0:74 3 77 77 77 77 77 77 5 = 2 66 66 66 66 66 66 4 3:000 0:161 0:019 0:161 1:172 0:674 0:019 0:674 0:954 3 77 77 77 77 77 77 5 : 12 It follows that 0 = 2 4 3:000 0:161 0:019 0:161 1:172 0:674 0:019 0:674 0:954 3 5 1 = 1 0 = 2 4 1:5000 0:080 0:009 0:322 0:3365 0:355 0:038 0:437 0:421 3 5 2 = 1 1 = 2 4 0:75 0:040 0:005 0:194 0:173 0:163 0:076 0:198 0:197 3 5; 3 = ::: Exercise: Consider the two-dimensional VAR(2) process yt = c + 0:5 0:1 0:4 0:5 yt 1 + 0 0 0:25 0 yt 2 + "t; with E("t"0t) = = 0:09 0 0 0:04 . Please (a). Check whether this process is stationary or not. (b). Find the coe cient matrices in its MA(1) representation, j;j = 0;1;2;3: (c). Find its autocovariance matrices j;j = 0;1;2;3. 2.1.4 Linear Forecast From (9) we see that yt j is a linear function of "t j, "t j 1,... each is uncorrelated with "t+1 for j = 0;1;::: It follows that "t+1 is uncorrelated with yt j for any j 0. Thus, the linear forecast of yt+1 on the basis of yt, yt 1,...is given by ^yt+1jt = + 1(yt ) + 2(yt 1 ) +:::+ p(yt p+1 ); and "t+1 can be interpreted as the fundamental innovation for yt+1, that is, the error in forecasting yt+1 on the basis of a linear function of a constant and yt, yt 1,...Moregenally, it follows from (7) that a forecast of yt+s on the basis of yt,yt 1,... will take the form ^yt+sjt = + F(s)11 (yt ) + F(s)12 (yt 1 ) +:::+ F(s)1p (yt p+1 ): 13 2.2 Estimation: MLE for an Unrestricted VAR Let the k 1 stochastic vector yt, follow a Gaussian VAR(p) process, i.e. yt = c + 1yt 1 + 2yt 2 +:::+ pyt p + "t; (16) where "t i:i:d:N(0; ). In this case, = [c; 1;:::; p; ]. Suppose that we have a sample of size (T + p), as in the scalar AR pro- cess, the simplest approach is to condition on the rst p observations (denoted y p+1;y p+2;:::;y0) and to base estimation on the last T observations (denoted y1;y2;:::;yT ). The objective is then to form the conditional likelihood fYT ;YT 1;:::;Y1jY0;Y 1;:::;Y p+1(yT;yT 1;:::;y1jy0;y 1;:::;y p+1; ) (17) and maximize with respect to . VAR are invariably estimated on the basis of the conditional likelihood function (17) rather than the full-sample unconditional likelihood. For brevity, we will hereafter refer to (17) simply as the "likelihood function" and the value of that maximize (17) as the "maximum likelihood estimator". 2.2.1 The Conditional Likelihood Function for a Vector Autoregres- sion The likelihood function is calculated in the same way as for a scalar autoregres- sion. Conditional on the value of y observed through date t 1, the value of y for date t is equal to a constant c + 1yt 1 + 2yt 2 +:::+ pyt p (18) plus a N(0; ) variables. Thus, for t 1 ytjyt 1;yt 1;::;y0;y 1;::;y p+1 N(c + 1yt 1 + 2yt 2 +:::+ pyt p; ):(19) It will be convenient to use a more compact expression for the conditional mean (18). Let xt ((kp+1) 1) denote a vector containing a constant terms and 14 p lags of each of the elements of yt: xt 2 66 66 66 66 4 1 yt 1 yt 2 : : : yt p 3 77 77 77 77 5 : Let Xt denote the (k (kp+ 1)k) matrix Xt = 2 66 66 66 66 4 x0t 0 : : : 0 0 x0t 0 : : 0 : : : : : : : : : : : : : : : : : : : : : : : : 0 : : : 0 x0t 3 77 77 77 77 5 ; and let the ((kp+ 1)k 1) vector =vec ( 0), it is easy to see that Xt = c + 1yt 1 + 2yt 2 +:::+ pyt p: (20) To see this, for example k = 2 and p = 1, then we have c = c 1 c2 ; 1 = 11 12 21 22 ; yt 1 = y 1;t 1 y2;t 1 : In this case, x0t = [1 y1;t 1 y2;t 1] and = c 1 11 12 c2 21 22 . Therefore Xt = 1 y 1;t 1 y2;t 1 0 0 0 0 0 0 1 y1;t 1 y2;t 1 and = vec( 0) = vec 2 4 c1 c2 11 21 12 22 3 5 = 2 66 66 66 4 c1 11 12 c2 21 22 3 77 77 77 5 : 15 It is easy to see that Xt = 1 y 1;t 1 y2;t 1 0 0 0 0 0 0 1 y1;t 1 y2;t 1 2 66 66 66 4 c1 11 12 c2 21 22 3 77 77 77 5 = c 1 + 11y1;t 1 + 12y2;t 1 c2 + 21y1;t 1 + 22y2;t 1 = c 1 c2 + 11 12 21 22 y 1;t 1 y2;t 1 = c + 1yt 1: Using this notation, (19) can be written more compactly as ytjyt 1;yt 1;::;y0;y 1;::;y p+1 N( Xt ; ): (21) Thus, the conditional density of the tth observation is fYtjYt 1;Yt 2;::;Y p+1(ytjyt 1;yt 2;:::;y p+1; ) = (2 ) k=2j 1j1=2 exp[( 1=2)(yt Xt )0 1(yt Xt )]: The log likelihood function of the fully sample yT;yT 1;:::;y1 conditioned on y0;y 1;:::;y p+1 is therefore L ( ) = ( Tk=2) ln(2 ) + (T=2) lnj 1j (1=2) TX t=1 h (yt Xt )0 1(yt Xt ) i :(22) 2.2.2 MLE of The MLE of is the value ^ maximize (22). At rst glance it is not a trivial work to nd ^ . However, at a close look, Xt is a special matrix as the matrix Xt in Section 5.1 of Ch. 10, i.e. x1t = x2t = ::: = xMt, or the same regressors. Therefore ^ is simply obtained from OLS regression of yit on xt from the results of SURE model. 16 2.2.3 MLE of When evaluated at the MLE ^ , the log likelihood (22) is L ( ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj 1j (1=2) TX t=1 ^"0t 1^"t; (23) where ^"t = yt Xt ^ . Di erentiate (23) with respect to 1 (see p.23 of Ch 1) : @L ( ; ^ ) @ 1 = (T=2) @lnj 1j @ 1 (1=2) TX t=1 @^"0t 1^"t @ 1 = (T=2) 0 (1=2) TX t=1 ^"t ^"0t: The likelihood is maximized when this derivative is set to zero, or when ^ 0 = (1=T) TX t=1 ^"t ^"0t: Since is symmetric, we have ^ = (1=T) TX t=1 ^"t ^"0t also. The row i, column j elements of ^ is ^ ij = (1=T) TX t=1 ^"it^"jt; which is the average product of the OLS residual for variable i and the OLS residual for variable j. 2.2.4 Likelihood Ratio Tests To perform a likelihood ratio test, we need to calculate the maximum value achieved for (22). Thus consider L (^ ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj^ 1j (1=2) TX t=1 ^"0t ^ 1^"t: (24) 17 The last term in (24) is (1=2) TX t=1 ^"0t ^ 1^"t: = (1=2)trace " TX t=1 ^"0t ^ 1^"t # = (1=2)trace " TX t=1 ^ 1^"t^"0t # = (1=2)trace h^ 1(T ^ ) i = (1=2)trace(T Ik) = Tk=2: Substituting this into (24) produces L (^ ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj^ 1j (Tk=2): Suppose we want to test the null hypothesis that was generated from a Gaus- sian VAR with p0 lags against the alternative speci cation of p1 >p0 lags. Then we may estimate the model with MLE under H0 of p0 lags and under H1 of p1 lags and obtains the maximum value for the log likelihood value L 0 = ( Tk=2) ln(2 ) + (T=2) lnj^ 0 1j (Tk=2): and L 1 = ( Tk=2) ln(2 ) + (T=2) lnj^ 1 1j (Tk=2); respectively. Twice the log likelihood ratio is then (see p. 4 of Ch.1) 2(L 1 L 0) = T(lnj^ 1 1j) T(lnj^ 0 1j) = T ln 1 j^ 1j T ln 1 j^ 0j = T lnj^ 1j+T lnj^ 0j = Tflnj^ 0j lnj^ 1jg: Under the null hypothesis, this asymptotically has a 2 distribution with degrees of freedom equal to the number of restrictions imposed under H0. 18 2.3 Bivariate Granger Causality Tests 2.3.1 De nitions of Causality Granger (1969) has de ned a concept of causality which, under suitable condi- tions, is fairly easy to deal with in the context of VAR models. Therefore it has become quite popular in recent years. The idea is thatacausecannotcomeafter the effect. Thus, if a variable y a ect a variable x, the former should help im- proving the predictions of the latter variable. To formalize this idea, we said that yfailtoGrangercausex if for all s> 0 the mean squares error of a forecast of xt+s based on (xt;xt 1;:::) is the same as the MSE of a forecast of xt+s that use both (xt;xt 1;:::) and (yt;yt 1;:::). If we restrict ourselves to linear functions, y fails to Granger-cause x if MSE[ ^E(xt+sjxt;xt 1;:::)] = MSE[ ^E(xt+sjxt;xt 1;:::;yt;yt 1;:::)]: 2.3.2 Alternative Implications of Granger Causality, VAR In a bivariate VAR describing x and y, y does not Granger-cause x if the coe - cients j are lower triangular for all j: x t yt = c 1 c2 + " (1)11 0 (1)21 (1)22 # xt 1 yt 1 + " (2)11 0 (2)21 (2)22 # xt 2 yt 2 +::: + " (p)11 0 (p)21 (p)22 # xt p yt p + " xt "yt : From the rst row of this system, the optimal one-period-ahead forecast of x depends only on its own lagged values and not on lagged y: MSE[ ^E(xt+1jxt;xt 1;:::;yt;yt 1;:::)] = c1 + (1)11 xt + (2)11 xt 1 +:::+ (p)11 xt p+1: By induction, the same is true of an s-period-ahead forecast. Thus for the bivari- ate VAR, y does not Granger-cause x if j is lower triangular for all j. 2.3.3 Alternative Implications of Granger Causality, VMA Recall from (10) that s = 1 s 1 + 2 s 2 +:::+ p s p fors = 1;2;::: 19 with 0 = Ik and s = 0 for s< 0. This expression implies that if j is lower triangular for all j, then the moving average matrices j for the fundamental representation will be lower triangular for all s. Thus if y fails to Granger-cause x, then the VMA(1) representation can be written as x t yt = 1 2 + 11(L) 0 21(L) 22(L) " xt "yt ; where ij(L) = (0)ij + (1)ij L+ (2)ij L2 + (3)ij L3 +:::: with (0)11 = (0)22 = 1 and (0)21 = 0. 2.3.4 Econometric Tests for Granger Causality A simple approach to test whether a particular series y Granger cause x can be based on theVAR . To implement this test, we assume a particular autoregressive lag length p and estimate xt = c1 + 1xt 1 + 2xt 2 +:::+ pxt p + 1yt 1 + 2yt 2 +::::+ pyt p +ut(25) by OLS. We then conduct a F test of the null hypothesis H0 : 1 = 2 = ::: = p = 0: Recalling section 4.2.1 of Chapter 6, one way to implement this test is to calculate the sum of the squared residuals from (25), RSSu = TX t=1 ^u2t; and compare this with the sum of squared residuals of an univariate autoregression for xt, RSSr = TX t=1 ^e2t; where xt = c0 + 1xt 1 + 2xt 2 +:::+ pxt p +et (26) 20 is also estimated by OLS. If S1 (RSSr RSSu)=pRSS u=(T 2p 1) is greater than 5% critical value of an F(p;T 2p 1) distribution, then we reject the null hypothesis that y does not Granger-cause x; that is, if S1 is su ciently large, we conclude that y does not Granger-cause x. Exercise: Please specify a bivariateVAR model for Taiwan’s GDP and Stock Index fromLR test and from this model test the Granger-causality between these two variables. 21 2.4 The Impulse-Response Function In equation (9) a VAR can be written in vector MA(1) form as yt = + "t + 1"t 1 + 2"t 2 + 3"t 3 +:::: (27) Thus, the matrix s has the interpretation @yt+s @"0t = s; that is, the row i, column j element of s identi es the consequence of a one-unit increase in the jth variables’s innovation at date t ("jt) for the values of the ith variable at time t+s (yi;t+s), holding all other innovations at all date constant. A plot of the row i, column j element of s, @yi;t+s @"jt ; (28) as a function of s is called the impulse response function. It describe the re- sponse of yi;t+s to one-time impulse in yjt with other variables dated t or earlier held constant. Suppose that we are told that date t value of the rst observation in the autoregression, y1t, was higher than expected, so that "1t is positive. How does this cause to revise our forecast of yi;t+s ? In other word, what is the response of @yi;t+s @"1t ; i = 1;2;::;k; when we consider that the elements of "t are contemporaneously correlated with one another, the fact that "1t is positive gives us some useful new information about the value of "2t,...,"kt. This implication has further implications for the value of yi;t+s. Thus, we would think the impulse response function so de ned in (28) is a special case when E("t"0t) = is a diagonal matrix. Of course in general is not diagonal. However we may proceed as in section 2.1.2 of this chapter to nd matrices A and D such that = ADA0; (29) where A is a lower triangular matrix with 1s along the principal diagonal and D is a diagonal matrix with positive entries along the principal diagonal. 22 Using this matrix A we can construct an (k 1) vector ut from ut = A 1"t: (30) then we see that the elements of ut are uncorrelated with each other: E(utu0t) = [A 1]E("t"0t)[A 1]0 = [A 1] [A0] 1 = [A 1]ADA0[A0] 1 = D: From (11) we have @yt+s @u0t = sA or 2 66 66 66 64 @y1;t+s @u1 @y1;t+s @u2 : : : @y1;t+s @uk@y 2;t+s @u1 @y2;t+s @u2 : : : @y2;t+s @uk : : : : : : : : : : : : : : : : : : @yk;t+s @u1 @yk;t+s @u2 : : : @yk;t+s @uk 3 77 77 77 75 = sa1 sa2 : : : sak ; where aj are the jth column of A. A plot (how many gures ?) of @yt+s @uj = saj (31) as a functions of s is known as an orthogonalizedimpulse responsefunction. For a given observed sample of size T, we would estimate the autoregressive coe cients ^ 1, ^ 2,..,^ p by CSS (or conditional MLE; that is, OLS from each single equation) and construct ^ s from (10). OLS estimation would also provided the estimate ^ = (1=T) PTt=1 ^"t^"0t. Matrices ^A and ^D satisfying ^ = ^A^D^A0 could then be constructed from ^ . The sample estimate of (31) is then ^ s^aj: 23 Another popular form is also implemented and reported. Recall that D is a diagonal matrix whose (j;j) element is the variance of ujt. Let D1=2 denote the diagonal matrix whose (j;j) element is the standard deviation of ujt. Note that (29) could be written as = AD1=2D1=2A0 = PP0; (32) where P = AD1=2: Expression (32) is the Cholesky decomposition of the matrix . Note that, like A, the (k k) matrix P is lower triangular and has standard deviation of ut along its principal diagonal. In place of ut de ned in (30), some researcher use vt = P 1"t = D 1=2A 1"t = D 1=2ut: Thus, vjt is justujt divided by its standard deviation pdjj. Aone unitincreaseinvjt is the same as one standard deviationincreaseinujt. In place of the dynamic multiplier @yi;t+s=@ujt, these researchers then report @yi;t+s=@vjt. Denote the jth column of P by pj, we have @yt+s @vj = spj (33) from the results of (31). We also note that pj = Ad1=2j = ajpdjj; where d1=2j is the jth column of D1=2. 24 2.5 Forecast Error Variance Decomposition The forecast error of a VAR s periods into the future would be yt+s ^yt+sjt = "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1: The mean-squared error of this s-period-ahead forecast is thus MSE(^yt+sjt) = E[(yt+s ^yt+sjt)(yt+s ^yt+sjt)0] (34) = + 1 01 + 2 02 +:::+ s 1 0s 1: (35) Let us now consider how each of the orthogonalized disturbance (u1t;:::;ukt) con- tributes to this MSE. Write (30) as "t = Aut = a1u1t + a2u2t +:::+ akukt: Then = E("t"0t) = a1a01Var(u1t) + a2a02Var(u2t) +:::+ aka0kVar(ukt): Substituting this result into (35), the MSE of the s-period-ahead forecast can be written as the sums of k terms, one arising from each of the disturbance ujt; MSE(^yt+sjt) = kX j=1 fVar(ujt) [aja0j + 1aja0j 01 + 2aja0j 02 +:::+ s 1aja0j 0s 1]g:(36) With this expression, we can calculate the contribution of the jth orthogonal- ized innovation to the MSE of the s-period-ahead forecast: Var(ujt) [aja0j + 1aja0j 01 + 2aja0j 02 +:::+ s 1aja0j 0s 1]: (37) The ration of (37) to MSE (36) is called the forecast error variance decom- position. 25 Alternatively, recalling that pj = Ad1=2j = ajpVar(ujt), we may express the MSE as MSE(^yt+sjt) = kX j=1 [pjp0j + 1pjp0j 01 + 2pjp0j 02 +:::+ s 1pjp0j 0s 1] also. Exercise: Please plot the impulse response function and forecast error variance decompo- sition from a bivariate VAR(4) model with Taiwan’s GDP and Stock Index data set ( rst di erence of the data may be necessary). 26