Ch. 18 Vector Time Series
1 Introduction
In dealing with economic variables often the value of one variables is not only
related to its predecessors in time but, in addition, it depends on past values
of other variables. This naturally extends the concept of univariate stochastic
process to vector time series analysis. This chapter describes the dynamic in-
teractions among a set of variables collected in an (k 1) vector yt.
De nition :
Let (S;F;P) be a probability space and T an index set of real numbers and
de ne the k-dimensional vector function y( ; ) by y( ; ) : S T ! Rk. The
ordered sequence of random vector fy( ;t);t 2 Tg is called a vector stochastic
process.
1.1 First Two moments of Stationary Vector Time Series
From now on in this chapter we follows convention to use yt in stead of y( ;t)
to indicate that we are considering discrete vector time series. The rst two
moments of a vector time series yt are
E(yt) = t; and
t;j = E[(yt t)(yt j t j)0] forallt2 T:
If neither t and t;j are function of t, that is, t = and t;j = j, then we
say that yt is a covariance-stationary vector process. Note that although j = j
for a scalar stationary process, the same is not true of a vector process:
j 6= j:
Instead, the correct relation is
0j = j
since
j = E[(yt+j )(y(t+j) j )0]
= E[(yt+j )(yt )0];
1
and taking transpose,
0j = E[(yt )(yt+j )0]
= E[(yt )(yt ( j) )0] = j:
1.2 Vector White Noise Process
De nition:
A k 1 vector process f"t; t2 Tg is said to be a white-noise process if
(i): E("t) = 0;
(ii): E("t"0 ) =
if t =
0 if t6= ;
where is an (k k) symmetric positive de nite matrix. It is important to note
that in general is not necessary a diagonal matrix, since it is the contempora-
neous correlation among variables that called for the needs of vector time series
analysis.
1.3 Vector MA(q) Process
A vector moving average process of order q takes the form
yt = + "t + 1"t 1 + 2"t 2 +:::+ q"t q;
where "t is a vector white noise process and j denotes an (k k) matrix of MA
coe cients for j = 1;2;:::;q. The mean of yt is , and the variance is
0 = E[(yt )(yt )0]
= E["t"0t] + 1E["t 1"0t 1] 01 + 2E["t 2"0t 2] 02
+:::+ qE["t q"0t q] 0q
= + 1 01 + 2 02 +:::+ q 0q;
with autocovariance (compares with j of Ch. 14 on p.3)
j =
8
<
:
j + j+1 01 + j+2 02 +:::+ q 0q j forj = 1;2;:::;q
0 j + 1 0 j+1 + 2 0 j+2 +:::+ q+j 0q forj = 1; 2;:::; q
0 for jjj>q;
2
where 0 = Ik. Thus any vector MA(q) process is covariance-stationary.
1.4 Vector MA(1) Process
The vector MA(1) process is written
yt = + "t + 1"t 1 + 2"t 2 +::::
where "t is a vector white noise process and j denotes an (k k) matrix of MA
coe cients.
De nition:
For an (n m) matrix H, the sequence of matrices fHsg1s= 1 is absolutely
summable if each of its elements forms an absolutely summable scalar sequence.
Example:
If (s)ij denotes the row i, column j element of the moving average parameters
matrix s associated with lag s, then the sequence f sg1s=0 is absolutely if
1X
s=0
j (s)ij j<1 fori = 1;2;:::;k and j = 1;2;:::;k:
Theorem:
Let
yt = + "t + 1"t 1 + 2"t 2 +::::
where "t is a vector white noise process and f lg1l=0 is absolutely summable. Let
yit denote the ith element of yt, and let i denote the ith element of . Then
(a). the autocovariance between the ith variable at time t and the jth variable s
period earlier, E(yit i)(yj;t s j), exist and is given by the row i, column j
element of
s =
1X
v=0
s+v 0v fors = 0;1;2;:::;
3
(b). the sequence of matrices f sg1s=0 is absolutely summable.
Proof:
(a). By de nition
s = E(yt )(yt s )0
or
s = E["t + 1"t 1 + 2"t 2 +:::+ s"t s + s+1"t s 1 +::::]
["t s + 1"t s 1 + 2"t s 2 +::::]0
= s 00 + s+1 01 + s+2 02 +:::
=
1X
v=0
s+v 0v fors = 0;1;2;:::
The row i, column j element of s is therefore the autocovariance between the
ith variable at time t and the jth variable s period earlier, E(yit i)(yj;t s j).
(b).
4
2 Vector Autoregressive Process, VAR
A pth order vector autoregression, denoted VAR(p) is written as;
yt = c + 1yt 1 + 2yt 2 +:::+ pyt p + "t; (1)
where c denotes an (k 1) vector of constants and j an (k k) matrix of au-
toregressive coe cients for j = 1;2;:::;p and "t is a vector white noise process.
2.1 Population Characteristics
Let ci denotes the ith element of the vector c and let (s)ij denote the row i, column
j element of the matrix s, then the rst row of the vector system in (1) speci es
that
y1t = c1 + (1)11 y1;t 1 + (1)12 y2;t 1 +:::+ (1)1k yk;t 1
+ (2)11 y1;t 2 + (2)12 y2;t 2 +::::+ (2)1k yk;t 2
+::::+ (p)11 y1;t p + (p)12 y2;t p +:::+ (p)1k yk;t p +"1t:
Thus, a vector autoregression is a system in which each variable is regressed
on a constant and p of its own lags as well as on p lags of each of the other (k 1)
variables in the VAR. Note that each regression has the same explanatory vari-
ables.
Using lag operator notation, (1) can be written in this form
[Ik 1L 2L2 ::: pLp]yt = c + "t
or
(L)yt = c + "t: (2)
Here (L) indicate an k k matrix polynomial in the lag operator L. The row
i, column j elements of (L) is a scalar polynomial in L:
(L)ij = [ ij (1)ij L1 (2)ij L2 ::: (p)ij Lp];
where ij is unity if i = j and zero otherwise.
5
If the VAR(p) process is stationary, we can take expectation of both side of
(1) to calculate the mean of the process:
= c + 1 + 2 +:::+ p ;
or
= (Ik 1 2 ::: p) 1c:
Equation (1) can then be written in terms of deviations from the mean as
(yt ) = 1(yt 1 ) + 2(yt 2 ) +:::+ p(yt p ) + "t: (3)
2.1.1 Conditions for Stationarity
As in the case of the univariate AR(p) process, it is helpful to rewrite (3) in terms
of a VAR(1) process. Toward this end, de ne
t =
2
66
66
66
4
yt
yt 1
:
:
:
yt p+1
3
77
77
77
5
(kp 1)
(4)
F =
2
66
66
66
66
4
1 2 3 : : : p 1 p
IK 0 0 : : : 0 0
0 IK 0 : : : 0 0
0 0 0 : : : 0 0
0 0 0 : : : 0 0
0 0 0 : : : 0 0
0 0 0 : : : 0 0
3
77
77
77
77
5
(kp kp)
(5)
and
vt =
2
66
66
66
4
"t
0
:
:
:
0
3
77
77
77
5
(kp 1)
:
6
The VAR(p) in (3) can then be rewritten as the following VAR(1):
t = F t 1 + vt; (6)
which implies
t+s = vt+s + Fvt+s 1 + F2vt+s 2 +:::+ Fs 1vt+1 + Fs t; (7)
where
E(vtv0s) =
Q fort = s
0 otherwise
and
Q =
2
66
66
66
4
0 : : : 0
0 0 : : : 0
: : : : : :
: : : : : :
: : : : : :
0 0 : : : 0
3
77
77
77
5
:
In order for the process to be covariance-stationary, the consequence of any
given "t must eventually die out. If the eigenvalues of F all lie inside the unit
circle, then the VAR turns out to be covariance-stationary.
Proposition:
The eigenvalues of the matrix F in (5) satisfy
I
k p 1 p 1 2 p 2 ::: p
= 0: (8)
Hence, a VAR(p) is covariance-stationary as long as j j < 1 for all values of
satisfying (8). Equivalently, the VAR is stationary if all values z satisfying
I
k 1z 2z2 ::: pzp
= 0
lie outside the unit circle.
7
2.1.2 Vector MA(1) Representation
The rst k rows of the vector system represented in (7) constitute a vector system:
yt+s = + "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1
+Fs11(yt ) + Fs12(yt 1 ) +::::+ Fs1p(yt p+1 ):
Here j = F(j)11 and F(j)11 denotes the upper left block of Fj, where Fj is the
matrix F raised to the jth power.
If the eigenvalues of F all lie inside the unit circle, then Fs ! 0 as s ! 1
and yt can be expressed as convergent sum of the history of ":
yt = + "t + 1"t 1 + 2"t 2 + 3"t 3 +::: = + (L)"t: (9)
The moving average matrices j could equivalently be calculated as follows.
The operator (L)(= Ik 1L 2L2 ::: pLp) at (2) and (L) at (9) are
related by
(L) = [ (L)] 1;
requiring that
[Ik 1L 2L2 ::: pLp][Ik + 1L + 2L2 +:::] = Ik:
Setting the coe cient on L1 equal to the zero matrix produces
1 1 = 0:
Similarly, setting the coe cient on L2 equal to zero gives
2 = 1 1 + 2;
and in general for Ls,
s = 1 s 1 + 2 s 2 +:::+ p s p fors = 1;2;::: (10)
with 0 = Ik and s = 0 for s< 0.
Note that the innovation in the MA(1) representation in (9) is "t, the funda-
mental innovation for yt. There are alternative moving average representations
8
based on vector white noise process other than "t. Let H denote a nonsingular
(k k) matrix, and de ne
ut = H"t:
Then certainly ut is white noise:
E(ut) = 0 and
E(utu0 ) =
H H0 fort =
0 fort6= :
Moreover, from (9) we could write
yt = + H 1H"t + 1H 1H"t 1 + 2H 1H"t 2 + 3H 1H"t 3 +:::
= + J0ut + J1ut 1 + J2ut 2 + J3ut 3 +::::;
where
Js = sH 1: (11)
For example, H could be any matrix that diagonalizes ,
H H0 = D;
with D a diagonal matrix. For such a choice of H, the element of ut are uncor-
related with one another:a
E(utu0t) = H H0 = D:
Thus, it is always possible to write a stationary VAR(p) process as a in nite
moving average of a white noise vector ut whose elements are mutually uncorre-
lated.
2.1.3 Computation of Autocovariances of an Stationary VAR(p) Pro-
cess
We now consider to express the second moments for yt following a VAR(p).
Recall that as in the univariate AR(p) process, the Yule-Walker equation are
9
obtained by postmultiplying (3) with (yt j )0 and taking expectations. For
j = 0, using j = 0 j,
0 = E(yt )(yt )0
= 1E(yt 1 )(yt )0 + 2E(yt 2 )(yt )0
+:::+ pE(yt p )(yt )0 +E"t(yt )0
= 1 1 + 2 2 +:::+ p p +
= 1 01 + 2 02 +:::+ p 0p +
and for j > 0,
j = 1 j 1 + 2 j 2 +:::+ p j p: (12)
These equations may be used to compute the j recursively for j p if 1,..., p
and p 1,..., 0 are known.
Let t be as de ned in (4) and let denote the variance of t,
= E( t 0t)
= E
8
>>>
>>><
>>>
>>:
2
66
66
66
4
yt
yt 1
:
:
:
yt p+1
3
77
77
77
5
(yt )0 (yt 1 )0 : : : (yt p+1 )0 0
9
>>>
>>>=
>>>
>>;
=
2
66
66
66
4
0 1 : : : p 1
01 0 : : : p 2
:
:
:
0p 1 0p 2 : : : 0
3
77
77
77
5
:
Postmultiplying (4) by its own transpose and taking expectation gives
E[ t 0t] = E[(F t 1 + vt)(F t 1 + vt)0] = FE( t 1 0t 1)F0 +E(vtv0t)
or
= F F0 + Q: (13)
10
From the result of vec operator on p. 13 of Ch. 1, we have
vec( ) = (F F) vec( ) +vec(Q) = Avec( ) +vec(Q); (14)
where
A (F F):
Let r = kp, so that F is an (r r) matrix and A is an (r2 r2) matrix.
Equation (14) has the solution
vec( ) = [Ir2 A] 1vec(Q); (15)
provided that the matrix [Ir2 A] is nonsingular. Thus, the j; j = p+1;:::;p 1
are obtained from (15).
Example:
Consider the three-dimensional VAR(1) process
yt = c +
2
4
0:5 0 0
0:1 0:1 0:3
0 0:2 0:3
3
5yt 1 + "t;
with E("t"0t) = =
2
4
2:25 0 0
0 1:0 0:5
0 0:5 0:74
3
5.
For this process the reverse characteristic polynomial is
det
2
4
2
4
1 0 0
0 1 0
0 0 1
3
5
2
4
0:5 0 0
0:1 0:1 0:3
0 0:2 0:3
3
5z
3
5
= det
2
4
1 0:5z 0 0
0:1z 1 0:1z 0:3z
0 0:2z 1 0:3z
3
5
= (1 0:5z)(1 0:4z 0:03z2):
The roots of this polynomial are easily seen to be
z1 = 2; z2 = 2:1525; z3 = 15:4858:
11
They are obviously all greater than 1 in absolute value. Therefore the process is
stationary.
We next obtain the MA(1) coe cients from this VAR(1) process using the
relation as described in equation (10).
0 = I3; 1 = 1 0 =
2
4
0:5 0 0
0:1 0:1 0:3
0 0:2 0:3
3
5; 2 = 1 1 =
2
4
0:25 0 0
0:01 0:07 0:12
0:02 0:08 0:15
3
5;
3 = 1 2 =
2
4
0:125 0 0
0:037 0:031 0:057
0:018 0:038 0:069
3
5; 4 = ::::
We nally calculate the autocovariance of this process. From (15) we have
vec( ) = vec( 0) = [I9 1 1] 1vec( )
=
2
66
66
66
66
66
66
4
0:75 0 0 0 0 0 0 0 0
0:05 0:95 0:15 0 0 0 0 0 0
0 0:1 0:85 0 0 0 0 0 0
0:05 0 0 0:95 0 0 0:15 0 0
0:01 0:01 0:03 0:01 0:99 0:03 0:03 0:03 0:09
0 0:02 0:03 0 0:02 0:97 0 0:06 0:09
0 0 0 0:01 0 0 0:85 0 0
0 0 0 0:02 0:02 0:06 0:03 0:97 0:09
0 0 0 0 0:04 0:06 0 0:060 0:91
3
77
77
77
77
77
77
5
1
2
66
66
66
66
66
66
4
2:25
0
0
0
1:0
0:5
0
0:5
0:74
3
77
77
77
77
77
77
5
=
2
66
66
66
66
66
66
4
3:000
0:161
0:019
0:161
1:172
0:674
0:019
0:674
0:954
3
77
77
77
77
77
77
5
:
12
It follows that
0 =
2
4
3:000 0:161 0:019
0:161 1:172 0:674
0:019 0:674 0:954
3
5 1 = 1 0 =
2
4
1:5000 0:080 0:009
0:322 0:3365 0:355
0:038 0:437 0:421
3
5
2 = 1 1 =
2
4
0:75 0:040 0:005
0:194 0:173 0:163
0:076 0:198 0:197
3
5; 3 = :::
Exercise:
Consider the two-dimensional VAR(2) process
yt = c +
0:5 0:1
0:4 0:5
yt 1 +
0 0
0:25 0
yt 2 + "t;
with E("t"0t) = =
0:09 0
0 0:04
. Please
(a). Check whether this process is stationary or not.
(b). Find the coe cient matrices in its MA(1) representation, j;j = 0;1;2;3:
(c). Find its autocovariance matrices j;j = 0;1;2;3.
2.1.4 Linear Forecast
From (9) we see that yt j is a linear function of "t j, "t j 1,... each is uncorrelated
with "t+1 for j = 0;1;::: It follows that "t+1 is uncorrelated with yt j for any
j 0. Thus, the linear forecast of yt+1 on the basis of yt, yt 1,...is given by
^yt+1jt = + 1(yt ) + 2(yt 1 ) +:::+ p(yt p+1 );
and "t+1 can be interpreted as the fundamental innovation for yt+1, that is,
the error in forecasting yt+1 on the basis of a linear function of a constant and
yt, yt 1,...Moregenally, it follows from (7) that a forecast of yt+s on the basis of
yt,yt 1,... will take the form
^yt+sjt = + F(s)11 (yt ) + F(s)12 (yt 1 ) +:::+ F(s)1p (yt p+1 ):
13
2.2 Estimation: MLE for an Unrestricted VAR
Let the k 1 stochastic vector yt, follow a Gaussian VAR(p) process, i.e.
yt = c + 1yt 1 + 2yt 2 +:::+ pyt p + "t; (16)
where "t i:i:d:N(0; ). In this case, = [c; 1;:::; p; ].
Suppose that we have a sample of size (T + p), as in the scalar AR pro-
cess, the simplest approach is to condition on the rst p observations (denoted
y p+1;y p+2;:::;y0) and to base estimation on the last T observations (denoted
y1;y2;:::;yT ). The objective is then to form the conditional likelihood
fYT ;YT 1;:::;Y1jY0;Y 1;:::;Y p+1(yT;yT 1;:::;y1jy0;y 1;:::;y p+1; ) (17)
and maximize with respect to . VAR are invariably estimated on the basis of
the conditional likelihood function (17) rather than the full-sample unconditional
likelihood. For brevity, we will hereafter refer to (17) simply as the "likelihood
function" and the value of that maximize (17) as the "maximum likelihood
estimator".
2.2.1 The Conditional Likelihood Function for a Vector Autoregres-
sion
The likelihood function is calculated in the same way as for a scalar autoregres-
sion. Conditional on the value of y observed through date t 1, the value of y
for date t is equal to a constant
c + 1yt 1 + 2yt 2 +:::+ pyt p (18)
plus a N(0; ) variables. Thus, for t 1
ytjyt 1;yt 1;::;y0;y 1;::;y p+1 N(c + 1yt 1 + 2yt 2 +:::+ pyt p; ):(19)
It will be convenient to use a more compact expression for the conditional
mean (18). Let xt ((kp+1) 1) denote a vector containing a constant terms and
14
p lags of each of the elements of yt:
xt
2
66
66
66
66
4
1
yt 1
yt 2
:
:
:
yt p
3
77
77
77
77
5
:
Let Xt denote the (k (kp+ 1)k) matrix
Xt =
2
66
66
66
66
4
x0t 0 : : : 0
0 x0t 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 x0t
3
77
77
77
77
5
;
and let the ((kp+ 1)k 1) vector =vec ( 0), it is easy to see that
Xt = c + 1yt 1 + 2yt 2 +:::+ pyt p: (20)
To see this, for example k = 2 and p = 1, then we have
c =
c
1
c2
; 1 =
11 12
21 22
; yt 1 =
y
1;t 1
y2;t 1
:
In this case, x0t = [1 y1;t 1 y2;t 1] and =
c
1 11 12
c2 21 22
.
Therefore
Xt =
1 y
1;t 1 y2;t 1 0 0 0
0 0 0 1 y1;t 1 y2;t 1
and
= vec( 0) = vec
2
4
c1 c2
11 21
12 22
3
5 =
2
66
66
66
4
c1
11
12
c2
21
22
3
77
77
77
5
:
15
It is easy to see that
Xt =
1 y
1;t 1 y2;t 1 0 0 0
0 0 0 1 y1;t 1 y2;t 1
2
66
66
66
4
c1
11
12
c2
21
22
3
77
77
77
5
=
c
1 + 11y1;t 1 + 12y2;t 1
c2 + 21y1;t 1 + 22y2;t 1
=
c
1
c2
+
11 12
21 22
y
1;t 1
y2;t 1
= c + 1yt 1:
Using this notation, (19) can be written more compactly as
ytjyt 1;yt 1;::;y0;y 1;::;y p+1 N( Xt ; ): (21)
Thus, the conditional density of the tth observation is
fYtjYt 1;Yt 2;::;Y p+1(ytjyt 1;yt 2;:::;y p+1; )
= (2 ) k=2j 1j1=2 exp[( 1=2)(yt Xt )0 1(yt Xt )]:
The log likelihood function of the fully sample yT;yT 1;:::;y1 conditioned on
y0;y 1;:::;y p+1 is therefore
L ( ) = ( Tk=2) ln(2 ) + (T=2) lnj 1j (1=2)
TX
t=1
h
(yt Xt )0 1(yt Xt )
i
:(22)
2.2.2 MLE of
The MLE of is the value ^ maximize (22). At rst glance it is not a trivial
work to nd ^ . However, at a close look, Xt is a special matrix as the matrix
Xt in Section 5.1 of Ch. 10, i.e. x1t = x2t = ::: = xMt, or the same regressors.
Therefore ^ is simply obtained from OLS regression of yit on xt from the results
of SURE model.
16
2.2.3 MLE of
When evaluated at the MLE ^ , the log likelihood (22) is
L ( ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj 1j (1=2)
TX
t=1
^"0t 1^"t; (23)
where ^"t = yt Xt ^ .
Di erentiate (23) with respect to 1 (see p.23 of Ch 1) :
@L ( ; ^ )
@ 1 = (T=2)
@lnj 1j
@ 1 (1=2)
TX
t=1
@^"0t 1^"t
@ 1
= (T=2) 0 (1=2)
TX
t=1
^"t ^"0t:
The likelihood is maximized when this derivative is set to zero, or when
^ 0 = (1=T)
TX
t=1
^"t ^"0t:
Since is symmetric, we have
^ = (1=T)
TX
t=1
^"t ^"0t
also.
The row i, column j elements of ^ is
^ ij = (1=T)
TX
t=1
^"it^"jt;
which is the average product of the OLS residual for variable i and the OLS
residual for variable j.
2.2.4 Likelihood Ratio Tests
To perform a likelihood ratio test, we need to calculate the maximum value
achieved for (22). Thus consider
L (^ ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj^ 1j (1=2)
TX
t=1
^"0t ^ 1^"t: (24)
17
The last term in (24) is
(1=2)
TX
t=1
^"0t ^ 1^"t: = (1=2)trace
" TX
t=1
^"0t ^ 1^"t
#
= (1=2)trace
" TX
t=1
^ 1^"t^"0t
#
= (1=2)trace
h^
1(T ^ )
i
= (1=2)trace(T Ik)
= Tk=2:
Substituting this into (24) produces
L (^ ; ^ ) = ( Tk=2) ln(2 ) + (T=2) lnj^ 1j (Tk=2):
Suppose we want to test the null hypothesis that was generated from a Gaus-
sian VAR with p0 lags against the alternative speci cation of p1 >p0 lags. Then
we may estimate the model with MLE under H0 of p0 lags and under H1 of p1
lags and obtains the maximum value for the log likelihood value
L 0 = ( Tk=2) ln(2 ) + (T=2) lnj^ 0 1j (Tk=2):
and
L 1 = ( Tk=2) ln(2 ) + (T=2) lnj^ 1 1j (Tk=2);
respectively.
Twice the log likelihood ratio is then (see p. 4 of Ch.1)
2(L 1 L 0) = T(lnj^ 1 1j) T(lnj^ 0 1j)
= T ln
1
j^ 1j
T ln
1
j^ 0j
= T lnj^ 1j+T lnj^ 0j
= Tflnj^ 0j lnj^ 1jg:
Under the null hypothesis, this asymptotically has a 2 distribution with
degrees of freedom equal to the number of restrictions imposed under H0.
18
2.3 Bivariate Granger Causality Tests
2.3.1 De nitions of Causality
Granger (1969) has de ned a concept of causality which, under suitable condi-
tions, is fairly easy to deal with in the context of VAR models. Therefore it has
become quite popular in recent years. The idea is thatacausecannotcomeafter
the effect. Thus, if a variable y a ect a variable x, the former should help im-
proving the predictions of the latter variable.
To formalize this idea, we said that yfailtoGrangercausex if for all s> 0
the mean squares error of a forecast of xt+s based on (xt;xt 1;:::) is the same as
the MSE of a forecast of xt+s that use both (xt;xt 1;:::) and (yt;yt 1;:::). If we
restrict ourselves to linear functions, y fails to Granger-cause x if
MSE[ ^E(xt+sjxt;xt 1;:::)] = MSE[ ^E(xt+sjxt;xt 1;:::;yt;yt 1;:::)]:
2.3.2 Alternative Implications of Granger Causality, VAR
In a bivariate VAR describing x and y, y does not Granger-cause x if the coe -
cients j are lower triangular for all j:
x
t
yt
=
c
1
c2
+
"
(1)11 0
(1)21 (1)22
#
xt 1
yt 1
+
"
(2)11 0
(2)21 (2)22
#
xt 2
yt 2
+:::
+
"
(p)11 0
(p)21 (p)22
#
xt p
yt p
+
"
xt
"yt
:
From the rst row of this system, the optimal one-period-ahead forecast of x
depends only on its own lagged values and not on lagged y:
MSE[ ^E(xt+1jxt;xt 1;:::;yt;yt 1;:::)] = c1 + (1)11 xt + (2)11 xt 1 +:::+ (p)11 xt p+1:
By induction, the same is true of an s-period-ahead forecast. Thus for the bivari-
ate VAR, y does not Granger-cause x if j is lower triangular for all j.
2.3.3 Alternative Implications of Granger Causality, VMA
Recall from (10) that
s = 1 s 1 + 2 s 2 +:::+ p s p fors = 1;2;:::
19
with 0 = Ik and s = 0 for s< 0. This expression implies that if j is lower
triangular for all j, then the moving average matrices j for the fundamental
representation will be lower triangular for all s. Thus if y fails to Granger-cause
x, then the VMA(1) representation can be written as
x
t
yt
=
1
2
+
11(L) 0
21(L) 22(L)
"
xt
"yt
;
where
ij(L) = (0)ij + (1)ij L+ (2)ij L2 + (3)ij L3 +::::
with (0)11 = (0)22 = 1 and (0)21 = 0.
2.3.4 Econometric Tests for Granger Causality
A simple approach to test whether a particular series y Granger cause x can be
based on theVAR . To implement this test, we assume a particular autoregressive
lag length p and estimate
xt = c1 + 1xt 1 + 2xt 2 +:::+ pxt p + 1yt 1 + 2yt 2 +::::+ pyt p +ut(25)
by OLS. We then conduct a F test of the null hypothesis
H0 : 1 = 2 = ::: = p = 0:
Recalling section 4.2.1 of Chapter 6, one way to implement this test is to
calculate the sum of the squared residuals from (25),
RSSu =
TX
t=1
^u2t;
and compare this with the sum of squared residuals of an univariate autoregression
for xt,
RSSr =
TX
t=1
^e2t;
where
xt = c0 + 1xt 1 + 2xt 2 +:::+ pxt p +et (26)
20
is also estimated by OLS. If
S1 (RSSr RSSu)=pRSS
u=(T 2p 1)
is greater than 5% critical value of an F(p;T 2p 1) distribution, then we reject
the null hypothesis that y does not Granger-cause x; that is, if S1 is su ciently
large, we conclude that y does not Granger-cause x.
Exercise:
Please specify a bivariateVAR model for Taiwan’s GDP and Stock Index fromLR
test and from this model test the Granger-causality between these two variables.
21
2.4 The Impulse-Response Function
In equation (9) a VAR can be written in vector MA(1) form as
yt = + "t + 1"t 1 + 2"t 2 + 3"t 3 +:::: (27)
Thus, the matrix s has the interpretation
@yt+s
@"0t = s;
that is, the row i, column j element of s identi es the consequence of a one-unit
increase in the jth variables’s innovation at date t ("jt) for the values of the ith
variable at time t+s (yi;t+s), holding all other innovations at all date constant.
A plot of the row i, column j element of s,
@yi;t+s
@"jt ; (28)
as a function of s is called the impulse response function. It describe the re-
sponse of yi;t+s to one-time impulse in yjt with other variables dated t or earlier
held constant.
Suppose that we are told that date t value of the rst observation in the
autoregression, y1t, was higher than expected, so that "1t is positive. How does
this cause to revise our forecast of yi;t+s ? In other word, what is the response of
@yi;t+s
@"1t ; i = 1;2;::;k;
when we consider that the elements of "t are contemporaneously correlated with
one another, the fact that "1t is positive gives us some useful new information
about the value of "2t,...,"kt. This implication has further implications for the
value of yi;t+s. Thus, we would think the impulse response function so de ned in
(28) is a special case when E("t"0t) = is a diagonal matrix.
Of course in general is not diagonal. However we may proceed as in section
2.1.2 of this chapter to nd matrices A and D such that
= ADA0; (29)
where A is a lower triangular matrix with 1s along the principal diagonal and D
is a diagonal matrix with positive entries along the principal diagonal.
22
Using this matrix A we can construct an (k 1) vector ut from
ut = A 1"t: (30)
then we see that the elements of ut are uncorrelated with each other:
E(utu0t) = [A 1]E("t"0t)[A 1]0
= [A 1] [A0] 1
= [A 1]ADA0[A0] 1
= D:
From (11) we have
@yt+s
@u0t = sA
or
2
66
66
66
64
@y1;t+s
@u1
@y1;t+s
@u2 : : :
@y1;t+s
@uk@y
2;t+s
@u1
@y2;t+s
@u2 : : :
@y2;t+s
@uk
: : : : : :
: : : : : :
: : : : : :
@yk;t+s
@u1
@yk;t+s
@u2 : : :
@yk;t+s
@uk
3
77
77
77
75
= sa1 sa2 : : : sak ;
where aj are the jth column of A.
A plot (how many gures ?) of
@yt+s
@uj = saj (31)
as a functions of s is known as an orthogonalizedimpulse responsefunction.
For a given observed sample of size T, we would estimate the autoregressive
coe cients ^ 1, ^ 2,..,^ p by CSS (or conditional MLE; that is, OLS from each
single equation) and construct ^ s from (10). OLS estimation would also provided
the estimate ^ = (1=T) PTt=1 ^"t^"0t. Matrices ^A and ^D satisfying ^ = ^A^D^A0
could then be constructed from ^ . The sample estimate of (31) is then
^ s^aj:
23
Another popular form is also implemented and reported. Recall that D is a
diagonal matrix whose (j;j) element is the variance of ujt. Let D1=2 denote the
diagonal matrix whose (j;j) element is the standard deviation of ujt. Note that
(29) could be written as
= AD1=2D1=2A0 = PP0; (32)
where
P = AD1=2:
Expression (32) is the Cholesky decomposition of the matrix . Note that, like
A, the (k k) matrix P is lower triangular and has standard deviation of ut
along its principal diagonal.
In place of ut de ned in (30), some researcher use
vt = P 1"t = D 1=2A 1"t = D 1=2ut:
Thus, vjt is justujt divided by its standard deviation pdjj. Aone unitincreaseinvjt
is the same as one standard deviationincreaseinujt.
In place of the dynamic multiplier @yi;t+s=@ujt, these researchers then report
@yi;t+s=@vjt. Denote the jth column of P by pj, we have
@yt+s
@vj = spj (33)
from the results of (31). We also note that
pj = Ad1=2j = ajpdjj;
where d1=2j is the jth column of D1=2.
24
2.5 Forecast Error Variance Decomposition
The forecast error of a VAR s periods into the future would be
yt+s ^yt+sjt = "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1:
The mean-squared error of this s-period-ahead forecast is thus
MSE(^yt+sjt) = E[(yt+s ^yt+sjt)(yt+s ^yt+sjt)0] (34)
= + 1 01 + 2 02 +:::+ s 1 0s 1: (35)
Let us now consider how each of the orthogonalized disturbance (u1t;:::;ukt) con-
tributes to this MSE. Write (30) as
"t = Aut = a1u1t + a2u2t +:::+ akukt:
Then
= E("t"0t)
= a1a01Var(u1t) + a2a02Var(u2t) +:::+ aka0kVar(ukt):
Substituting this result into (35), the MSE of the s-period-ahead forecast can
be written as the sums of k terms, one arising from each of the disturbance ujt;
MSE(^yt+sjt) =
kX
j=1
fVar(ujt) [aja0j + 1aja0j 01 + 2aja0j 02 +:::+ s 1aja0j 0s 1]g:(36)
With this expression, we can calculate the contribution of the jth orthogonal-
ized innovation to the MSE of the s-period-ahead forecast:
Var(ujt) [aja0j + 1aja0j 01 + 2aja0j 02 +:::+ s 1aja0j 0s 1]: (37)
The ration of (37) to MSE (36) is called the forecast error variance decom-
position.
25
Alternatively, recalling that pj = Ad1=2j = ajpVar(ujt), we may express the
MSE as
MSE(^yt+sjt) =
kX
j=1
[pjp0j + 1pjp0j 01 + 2pjp0j 02 +:::+ s 1pjp0j 0s 1]
also.
Exercise:
Please plot the impulse response function and forecast error variance decompo-
sition from a bivariate VAR(4) model with Taiwan’s GDP and Stock Index data
set ( rst di erence of the data may be necessary).
26