Ch. 22 Unit Root in Vector Time Series
1 Multivariate Wiener Processes and Multivari-
ate FCLT
Section 2.1 of Chapter 21 described univariate standard Brownian motion W(r)
as a scalar continuous-time process (W : r 2 [0;1] ! R1). The variable W(r)
has a N(0;r) distribution across realization, and for any given realization, W(r)
is continuous function of the date r with independent increments. If a set of
k such independent processes, denoted W1(r);W2(r);:::;Wk(r), are collected in
a ( k 1) vector w(r), the results is k dimentional standardBrownianmotion.
De nition 1:
A k-dimensional standard Brownian motion w( ) is a continuous-time process as-
sociating each date r2 [0;1] with the (k 1) vector w(r) satisfying the following:
(a). w(0) = 0;
(b). For any dates 0 r1 <r2 <:::<rk 1, the changes [w(r2) w(r1)]; [w(r3)
w(r2)];:::;[w(rk) w(rk 1)] are independent multivariate Gaussian with [w(s)
w(v)] N(0;(s v)Ik);
(c). For any given realization, w(r) is continuous in r with probability 1.
Analogous to the univariate case, we can de ne a multivariate random walk
as follows.
De nition:
Let the k 1 random vector yt follow yt = yt 1+"t, t = 1;2;:::; where y0 = 0 and
"t is a sequence of i:i:d: random vector such that E("t) = 0 and E("t"0t) = , a
nite positive de nite matrix. Then yt is a multivariate (k-dimensional) random
walk.
We form the rescaled partial sums as
wT (r) 1=2T 1=2
[T r] X
t=1
"t:
1
The components of wT (r) are the individual partial sums
WT j(r) = T 1=2
[T r] X
t=1
~"tj; j = 1;2;:::;k;
where ~"tj is the jth element of 1=2"t.
The Function Central Limit Theorem (FCLT) provides conditions under which
wT (r) converges to the multivariate standard Wiener process w(r). The simplest
multivariate FCLT is the multivariate Donsker’s theorem.
Theorem 1(Multivariate Donsker):
Let "t be a sequence of i:i:d: random vector such that E("t) = 0 and E("t"0t) = ,
a nite positive de nite matrix. Then wT ( ) =) w( ).
Quite general multivariate FCLTs are available. For example, we may applied
FCLT to serially dependent vector processes using a generalization of (70) and
Theorem 12 of Chapter 21.
Theorem 2 (FCLT when ut is a vector MA(1) process):
Let
ut =
1X
s=0
s"t s;
then
wT ( ) =) w( );
where wT (r) (1) 1 1=2T 1=2P[T r] t=1 "t, "t is a k dimensional i:i:d: random
vector with variance covariance , and if (s)ij denote the row i, column j element
of s,
1X
s=0
s j (s)ij j<1
for each i;j = 1;2;:::;k.
Proof:
Using multivariate Beveridge-Nelson decomposition and from that to derive the
long run variance matrix of ut to be 1TE[P(ut)2] = 2(1) .
2
2 Vector Autoregression Containing Unit Roots
Let yt be an (k 1) vector autoregressive process (VAR(p)), i.e.
[Ik 1L 2L2 ::: pLp]yt = c + "t: (1)
The scalar algebra in (33) of Chapter 21 works perfectly well for matrices, es-
tablishing that for any value 1, 2,..., p, the following polynomials are equiv-
alent:
[Ik 1L 2L2 ::: pLp]
= (Ik L) ( 1L+ 2L2 +:::+ p 1Lp 1)(1 L);
where
1 + 2 +:::+ p (2)
s [ s+1 + s+2 +:::+ p] fors = 1;2;:::;p 1: (3)
It follows that any VAR(p) process (1) can always be written in the form
(Ik L)yt ( 1L+ 2L2 +:::+ p 1Lp 1)(1 L)yt = c + "t
or
yt = 14yt 1 + 24yt 2 +:::+ p 1yt p+1 + c + yt 1 + "t: (4)
There are tow meanings of a VAR process contains unit roots.
First, if the rst di erence of yt follows a VAR(p 1) process:
4yt = 14yt 1 + 24yt 2 +:::+ p 1yt p+1 + c + "t;
requiring from (4) that
= Ik
3
or from (2) that
1 + 2 +:::+ p = Ik: (5)
Second, recalling from (8) of Chapter 18 that a VAR(p) such as in (1) will be
said to contain at least one unit root (z = 1) if the following determinant is zero:
jIk 1 2 ::: pj = 0: (6)
Note that (5) implies (6) but (6) does not imply (5). Vector autoregression
for which (6) holds but (5) does not will be considered in Chapter 23.
4
3 Spurious Regression
3.1 Asymptotics for Spurious Regression
Consider a regression of the form
yt = x0t +ut; (7)
for which elements of yt and xt might be nonstationary. If there does not exist
some population value for for which the disturbance ut = yt x0t is I(0), then
OLS is quite likely to produce spurious results. In a extreme condition that
Yt and xt are independent random walks, as we shall see, the OLS estimator of
, ^ T is not consistent for = 0 but instead converge to a particular random
variable. Because there is truly no relation between Yt and xt, and because ^ T
is incapable of revealing this, we call this a case of "spurious regression". This
phenomenon was rst considered by Yule (1926), and the dangers of spurious re-
gression were forcefully brought to the economists by the Monte Carlo studies of
Granger and Newbold (1974) and latter explained theoretically by Phillips (1986).
Theorem 3 (Spurious Regression, two independent random walks):
Let Xt and Yt be independent random walks, Xt = Xt 1 + t and Yt = Yt 1 + t,
and t is independent of zetat. Consider the regression equation for Yt in terms
of Xt, formally as Yt = Xt + ut, where = 0 and ut = Yt, re ecting the lack
of any relations between Yt and Xt. Then the OLS estimator of , ^ T L !
( 2= 1)
hR1
0 W1(r)
2dr
i 1R1
0 W1(r)W2(r)dr, where
2
1 = E(
2
t ) and
2
2 = E(
2
t ).
Proof:
To proceed, we write
W1T (rt 1) = T 1=2
t 1X
s=1
s= 1 = T 1=2Xt 1= 1;
W2T (rt 1) = T 1=2
t 1X
s=1
s= 2 = T 1=2Yt 1= 2
or
T 1=2Xt 1 = 1W1T (rt 1) (8)
5
and
T 1=2Yt 1 = 2W2T (rt 1); (9)
where 21 limT!1Var(T 1=2PTt=1 t) and 22 limT!1Var(T 1=2PTt=1 t),
and rt 1 = (t 1)=T as before.
From Donsker’s theorem and the continuous mapping theorem we have that
T 2PTt=1 X2t 1 ) 21 R10 W1(r)dr and also T 2PTt=1 Y 2t 1 ) 22 R10 W2(r)dr. The
multivariate version of Donsker’s theorem states that
2
1 0
0 22
1=2
T 1=2
[T r] X
t=1
t
)
W
1(r)
W2(r)
or
T 1=2X
T (r)
T 1=2YT (r)
)
1W1(r)
2W2(r)
:
From (8) and (9) we have
T 1 T 1
TX
t=1
Xt 1Yt 1 = T 1
TX
t=1
1W1T (rt 1) 2W2T (rt 1)
= 1 2T 1
TX
t=1
W1T (rt 1)W2T (rt 1)
= 1 2
TX
t=1
Z t=T
(t 1)=T
W1T (r)W2T (r)dr
= 1 2
Z 1
0
W1T (r)W2T (r)dr
) 1 2
Z 1
0
W1(r)W2(r)dr;
where we have use the fact that W1T (r) and W2T (r) is constant for (t 1)=T
r<t=T and the continuous mapping theorem to the mapping
(x;y) 7!
Z 1
0
x(a)y(a)da:
6
Hence for convenience treating ^ T 1 instead of ^ T we have
^ T 1 =
T 2
TX
t=1
X2t 1
! 1
T 2
TX
t=1
Xt 1Yt 1
!
=
21
Z 1
0
W1(r)dr
1
1 2
Z 1
0
W1(r)W2(r)dr
= ( 2= 1)
Z 1
0
W1(r)dr
1Z 1
0
W1(r)W2(r)dr: Q:E:D: (10)
The spurious regression problem become clear upon inspection of (10). The
true value of the derivative of Yt with respect to Xt is zero because the errors
generating Xt and Yt series in the regression are independent. Yet ^ T fails to
converge in probability to zero and instead has a non-degenerate distribution.
Using similar techniques, Phillips (1986) show thatT 1=2t^
T
has a non-degenerate
distribution, or in other words that the t-statistic for ^ T has a divergent distri-
bution. Hence as T ! 1, the probability of a signi cant t-value arising in a
regression such as (7) approach one, leading to spurious inference about the ex-
istence of a relationship between Xt and Yt.
The spurious regression problem not only arise from independent random
walks, it even appears among non-cointegrated generally I(1) process.
Theorem 4 (Spurious Regression, not cointegrated I(1) process, Hamilton’s
Parametric Method):
Consider an (k 1) vector yt whose rst di erence is described by
(1 L)yt = (L)"t =
1X
s=0
s"t s;
for "t an i:i:d: vector with mean zero, variance E("t"0t) = = PP0, and nite
fourth moment and where fs sg1s=0 is absolutely summable.
Let g = (k 1) and = (1)P. Partition yt as yt = (Y1t;y02t)0, and partition
0 as
0 =
11 021
21 22
;
7
where 11 is (1 1) and 22 is (g g).
Suppose that 0 is nonsingular, and de ne
( 1)2 ( 11 021 122 21):
Let L22 denote the Cholesky factor of 122 and consider the consequence of an
OLS regression of the rst variable on the others and a constant,
Y1t = ^ T + y02t ^ T + ^ut; (11)
and ant null hypothesis of the form H0 : R = q, where R is a known (r g)
matrix representing r separate hypothesis involving and q is a known r 1
vector. Then the following hold.
(a). The OLS estimate ^ T and ^ T are characterized by
T 1=2 ^
T
^ T 122 21
L !
1h1
1L22h22
;
where
h
1
h2
"
1 R10 [w2(r)]0drR
1
0 w2(r)dr
R1
0 [w2(r)][w2(r)]
0dr
# 1
" R1
0 W1(r)drR1
0 w2(r)W1(r)dr
#
andW1(r) denotes scalar standard Brownian motion, w2(r) denotesg-dimensional
standard Brownian motion with w2(r) independent of W1(r).
(b). The sum of squared errors SSE from the OLS estimation of (11) satis es
T 2 SSE L ! ( 1)2 H;
where
H
Z 1
0
[W1(r)]2dr
Z 1
0
W1(r)dr
Z 1
0
[W1(r)][w2(r)]0dr
"
1 R10 [w2(r)]0drR
1
0 w2(r)dr
R1
0 [w2(r)][w2(r)]
0dr
# 1" R1
0 W1(r)drR1
0 w2(r)W1(r)dr
#9=
;:
8
(c). The OLS F test satis es
T 1FT L ! ( 1R h2 q )0
8<
:
1H[0 R
]
"
1 R10 [w2(r)]0drR
1
0 w2(r)dr
R1
0 [w2(r)][w2(r)]
0dr
# 1
00
R0
9=
;
( 1R h2 q ) r;
where
R RL22
q q R 122 21:
Result (a) implies that neither estimator is consistent. The estimator of the
constant, ^ actually diverge, and must divided by T1=2 to obtain a random vari-
able with a well-speci ed distribution. The estimator ^ itself is likely to get
farther and farther from the true value of zero as the sample T increase. Thing
does not get better when we look at ^ . Di erent arbitrary large sample will have
randomly di ering estimators ^ . Those usual happenings that ^ p ! 0 and must
multiplied by some increasing function of T in order to obtain a nondegenerate
asymptotic distribution does not occur.
Result (b) implies that the usual OLS estimator of the variance of ut
s2T = (T k) 1SSET;
again diverge as T ! 1. To obtain an estimator that does not grow with the
sample size, the sums of squared errors has to be divided by T2 rather than T.
In this respect, the residual ^ut from a spurious regression behave like a unit root
process; if t is a scalarI(1) series, then T 1P 2t diverge and T 2P 2t converges.
Result (c) means that any OLS t or F test based on the spurious regression
also diverge; the OLS F statistics must be divided by T to obtains a variable
that does not grow with the sample size. Since an F test of a single restriction is
the square of the corresponding t test, any t statistics would have to be divided
by T1=2 to obtain a convergent variable. Thus, as the sample size become large,
9
it becomes increasingly that the absolute value of an OLS t test will exceed any
arbitrary nite value (such as the usual critical value of t = 2). For example, in
the regression of (11), it appears that Y1t and y2t are signi cantly related whereas
in reality they are completely independent.
Should we be totally pessimistic on the regression of unit root process from
above results ? There is, in fact, one case of major importance where the corre-
lation properties of Y1t and y2t do interfere with these qualitative results. The
conditions in this Theorem require that 0 is nonsingular. From the fact that
rank ( 0) =rank ( ), = (1)P, and P is nonsingular we require that (1)
is nonsingular or that the determinant j (1)j6= 0. If we allow (1) to be singu-
lar, then the asymptotic theory of this theorem no longer holds as stated. The
condition that (1) is singular is a necessary conditions for Y1t and y2t to be
cointegrated in the sense of Engle and Granger (1987). See Chapter 23 for de-
tails.
3.2 Cures For Spurious Regression
Many researchers recommend routinely di erencing apparently nonstationary
variables before estimating regression (for example, Gordon (1984)):
4Y1t = a+4y02tb +vt;
which is believe to avoid the spurious regression problem as well as the nonstan-
dard distributions for certain hypotheses associated with the levels regression
(11). While this is the ideal cure for the problem discussed in this section, there
are two di erent situations in which it might be inappropriate.
First, if a economic theory specify a linear relation between Y1t and y2t in level as
in (11), then the parameters has its own economical interpretation, for example,
@Ct=@Yt = is the marginal propensity to consume which must be positive under
normal condition. However, a regression in di erenced data, the parameters has
di erent economic implication, e.g. @4Ct=@4Yt = b, which may be positive or
negative even though @Ct=@Yt = must be positive. Thus, di erenceing the data
10
before regression avoids the econometrics’s problem but incurs additionally the
economic interpretation problem.
Second, if both Y1t and y2t are I(1) process, there is an interesting class of models
for which the dynamic relation between Y1t and y2t will be misspeci ed if the
researchers simply di erences both Y1t and y2t. The class of models, known as
cointegratedprocess, is discussed in th following chapters.
11