Ch. 23 Cointegration
1 Introduction
An important property of I(1) variables is that there can be linear combinations
of theses variables that are I(0). If this is so then these variables are said to be
cointegrated. Suppose that we consider two variables Yt and Xt that are I(1).
(For example, Yt = Yt 1 + t and Xt = Xt 1 + t.) Then, Yt and Xt are said to
be cointegrated if there exists a such that Yt Xt is I(0). What this mean is
that the regression equation
Yt = Xt + ut
make sense because Yt and Xt do not drift too far apart from each other over
time. Thus, there is na long-run equilibrium relationship between them. If Yt
and Xt are not cointegrated, that is, Yt Xt = ut is also I(1), then Yt and Xt
would drift apart from each other over time. In this case the relationship between
Yt and Xt that we obtain by regressing Yt and Xt would be spurious.
Let us continue the cointegration with the spurious regression setup in which
Xt and Yt are independent random walks, consider what happens if we take a
nontrivial linear combination of Xt and Yt:
a1Yt + a2Xt = a1Yt 1 + a2Xt 1 + a1 t + a2 t;
where a1 and a2 are not both zero. We can write this as
Zt = Zt 1 + vt;
where Zt = a1Yt + a2Xt and vt = a1 t + a2 t. Thus, Zt is again a random walk
process, as vt is i:i:d: with mean zero and nite variance, given that t and t
each are i:i:d: with mean zero and nite variance. No matter what coe cients a1
and a2 we choose, the resulting linear combination is again a random walk, hence
an integrated or unit root process.
Now consider what happens when Xt is a random walk as before, but Yt is
instead generated according to Yt = Xt + ut, with ut again i:i:d:. By itself, Yt is
an integrated process, because
Yt Yt 1 = (Xt Xt 1) + ut ut 1;
1
so that
Yt = Yt 1 + t + ut ut 1
= Yt 1 + "t;
where "t = t + ut ut 1 is readily veri ed to be I(0) process.
Despite the fact that both Xt and Yt are integrated processes, the situation is
very di erent from that considered at last chapter. Here, there is indeed a linear
combination of Xt and Yt that are not an integrated process: putting a1 = 1 and
a2 = we have
a1Yt + a2Xt = Yt Xt = ut;
which is i:i:d: This is an example of a pair fXt; Ytg of cointegrated process.
The concept of cointegration was introduced by Granger (1981). This pa-
per and that of Engle and Granger (1987) have had a major impact on modern
econometrics. Following Engle and Granger (1987), we have the de nition of
cointegration formally as follows.
De nition 1:
The components of the vector xt are said to be co-integrated of order d, b, de-
noted xt CI(d; b), if
(a). all components of xt are I(d);
(b). there exists a vector a(6= 0) so that zt = a0xt I(d b); b > 0. The vector
a is called the co-integrating vector.
For ease of exposition, only the value d = 1 and b = 1 will be considered in
this chapter. For the case that d and b are fractional value, this is called fractional
cointegration. We will consider this case in Chapter 25.
Clearly, the cointegrating vector a is not unique, for if a0xt is I(0), then so
is ba0xt for any nonzero scalar b; if a is a cointegrating vector, then so is ba. If
xt has k components, then there may be more than one cointegrating vector a.
Indeed, there may be h < k linear independent (k 1) vectors (a1, a2,..., ah) such
2
that A0xt is a I(0) (h 1) vector, where A0 is the following (h k) matrix:
A0 =
2
66
66
66
4
a01
a02
:
:
:
a0h
3
77
77
77
5
:
Again, the vector (a1 , a2,..., ah) are not unique; if A0xt is a I(0), then for
any nonzero (1 h) vector b0, the scalar b0A0xt is also I(0). Then the (k 1)
vector given by 0 = b0A0 could also be described as a cointegrating vector.
Suppose that there exists an (h k) matrix A0 whose rows are linearly inde-
pendent such that A0xt is a (h 1) I(0) vector. Suppose further that if c0 is any
(1 k) vector that is linearly independent of the rows of A0, then c0xt is a I(1)
scalar. Then we say that there are exactly h cointegrating relations among the
elements of xt and that (a1, a2,..., ah) form a basis for the space of the cointe-
grating vectors.
Example:
Let Pt denote an index of the price level in the United States, P t a price index
for Italy, and St the exchange rate between the currency. Then purchasing power
parity holds that
Pt = StP t ;
or, taking logarithms,
pt = st + p t ;
where pt log Pt, st log St, and p t log P t . In equilibrium we need pt
st p t = 0. However, in practice, error in measuring price, transportation costs,
and di erences in quality prevent purchasing power parity from holding exactly
at every date t. A weaker form of the hypothesis is that the variable zt de ned
by
zt = pt st p t
is I(0), even though the individual elements yt = (pt st p t )0 are all I(1). In this
case, we have a single cointegrating vector a = (1 1 1)0. The term zt = a0yt
3
is interpreted as the equilibrium error; although it is not always zero, but it
can not be apart from zero too often and too far to make sense the equilibrium
concept.
4
2 Granger Representation Theorem
Let each elements of the (k 1) vector, yt is I(1) with the (k h) cointegrating
matrix, A, such that each elements of A0yt is I(0). Then Granger (1983) have
the following fundamental results when y are cointegrated.
2.1 Implication of Cointegration For the VMA Represen-
tation
We now discuss the general implications of cointegration for the moving average
representation of a vector system. Since it is assumed the 4yt is I(0), let
E(4yt), and de ne
ut = 4yt : (1)
Suppose that ut has the Wold representation
ut = "t + 1"t 1 + 2"t 1 + ::: = (L)"t;
where E("t) = 0 and
E("t"0 ) =
for t =
0 otherwise:
Let (1) denotes the (k k) matrix polynomial (z) evaluated at z = 1; that
is,
(1) Ik + 1 + 2 + 3 + :::;
Then the following holds.
(a). A0 (1) = 0,
(b). A0 = 0.
To verify this claim, note that as long as fs sg1s=0 is absolutely summable,
the di erence equation (1) implies that (from multivariate B-N decomposition):
yt = y0 + t + u1 + u2 + ::: + ut (2)
= y0 + t + (1) ("1 +"2 + ::: +"t) + t 0; (3)
5
where t is a stationary process. Premultiplying (3) by A0 results in
A0yt = A0(y0 0) + A0 t + A0 (1) ("1 +"2 + ::: +"t) + A0 t I(0):(4)
If E("t"0t) is nonsingular, then c0("1 +"2 + ::: +"t) is I(1) for every nonzero
(k 1) vector c. Moreover, if some of the series exhibit nonzero drift ( 6= 0),
the linear combination A0yt will grow deterministically at rate A0 . Thus if the
underlying hypothesis suggesting the possibility of cointegration is that certain
linear combination of yt are I(0), this require that both conditions that A0 (1) =
0 and A0 = 0 hold.
The second condition means that despite the presence of a drift term in the
process generating yt, there is no linear trend in the cointegrated combination.
See Banerjee et.al (1993) p. 151 for details. To the implication of the rst con-
dition, from partitioned matrix production we have
A0 (1) =
2
66
66
66
4
a01(1 k)
a02(1 k)
:
:
:
ah(1 k)
3
77
77
77
5
(1)(k k) =
2
66
66
66
4
a01 (1)
a02 (1)
:
:
:
ah (1)
3
77
77
77
5
=
2
66
66
66
4
0
0
:
:
:
0
3
77
77
77
5
;
which implies
a0i (1) = a1i a2i : : : aki
2
66
66
66
4
(1)01(1 k)
(1)02(1 k)
:
:
:
(1)0k(1 k)
3
77
77
77
5
=
kX
s=1
asi (1)0s = 0(1 k) for i = 1; 2; :::; k; (5)
where asi is the sth elements of the row vector a0i and (1)0i is the i th row of the
matrix (1)
Equation (5) implies that certain linear combination of the rows of (1) are
zero, meaning that the row vector of (1) are linearly dependent. That is,
(1) is a singular matrix, or equivalently, the determinant of (1) are zero,
6
i.e. j (1)j = 0. 1 This in turn means that the matrix operator (L) is non-
invertible. 2 Thus, a cointegrated system can never be represented by a
nite-order vector autoregression in the di erenced data 4yt from the
non-invertibility of (L) of the following equations:
4yt = (L)"t:
2.2 Implication of Cointegration For the VAR Represen-
tation
Suppose that the level of yt can be represented as a non-stationary pth-order
vector autoregression: 3
yt = c + 1yt 1 + 2yt 2 + ::: + pyt p +"t; (6)
or
(L)yt = c +"t: (7)
where
(L) [Ik 1L 2L2 ::: pLp]:
Suppose that 4yt has the Wold representation
(1 L)yt = + (L)"t: (8)
Premultiplying (8) by (L) results in
(1 L) (L)yt = (1) + (L) (L)"t: (9)
1Recall from Theorem 4 on page 7 of Chapter 22, this condition violate the proof of spurious
regression
2If the determinant of an (n n) matrix H is not equal zero, its inverse is found by dividing
the adjoint by the determinant: H 1 = (1=jH)j [( 1)i+jjHjij].
3The is not the only model for I(1). See Saikkonen and Luukkonen (1997) in nite VAR and
? VARMA model
7
Substituting (7) into (9), we have
(1 L)"t = (1) + (L) (L)"t; (10)
since (1 L)c = 0. Now, equation (10) has to hold for all realizations of"t, which
require that
(1) = 0 (a vector) (11)
and that (1 L)Ik and (L) (L) represent the identical polynomials in L. In
particular, for L = 1, equation (10) implies that
(1) (1) = 0: (a matrix) (12)
Let 0i denote ith row of (1). Then (11) and (12) state that 0i (1) = 00 (a row
of zero) and 0i = 0 (a zero scalar). Recalling conditions (a) and (b)of section
2.1, this mean that i is a cointegrating vector. If a1, a2,..., ah form a basis for
the space of cointegrating vectors, then it must be possible to express i as a
linear combination of a1, a2,..., ah{that is, there exist an (h 1) vector bi such
that
i = [a1 a2 :::: ah]bi
or that
0i = b0iA0
for A0 the (h k) matrix whose ith row is a0i. Applying this reasoning to each of
the rows of (1), i.e.
(1) =
2
66
66
66
4
01
02
:
:
:
0k
3
77
77
77
5
=
2
66
66
66
4
b01A0
b02A0
:
:
:
b0kA0
3
77
77
77
5
= BA0; (13)
where B is an k h matrix. However, it is seen that the matrix A and B is not
identi ed since for any choice of h h matrix , the matrix (1) = B 1 A0 =
B A 0 implies the same distribution with (1) = BA0. What can be determined
8
is the space spanned by A the cointegrating space which need the concept of the
basis.
Note that (13) implies that the k k matrix (1) is a singular matrix because
rank( (1)) = rank(BA0) min(rank(B); rank(A0)) = h < k:
2.3 Vector Error Correction Representation
A nal representation for a cointegrated system is obtained by recalling from
equation (1) of Chapter 22 that any V AR (not necessary cointegrated at this
stage) in the form of (6) can be equivalently be written as
4yt = 14yt 1 + 24yt 2 + ::: + p 1yt p+1 + c + 0yt 1 +"t; (14)
where
0 I = (I 1 2 ::: p) = (1):
Note that if yt has h cointegrating relations, then substitution of (13) into
(14) results in
4yt = 14yt 1 + 24yt 2 + ::: + p 14yt p+1 + c BA0yt 1 +"t; (15)
Denote zt A0yt, noticing that zt is a stationary (h 1) vector. Then (15)
can be written as
4yt = 14yt 1 + 24yt 2 + ::: + p 14yt p+1 + c Bzt 1 +"t: (16)
Expression (16) is known as the vector error-correction representation of
the cointegrated system. It is interesting to see that while a cointegrated system
9
can never be represented by a nite-order vector autoregression in the di erenced
data 4yt, it has a vector error correction representation; the di erence is in that
the former has ignored the error correction term, Bzt 1.
Example:
Let the individual elements (pt st p t )0 are all I(1) and have a single cointegrating
vector a = (1 1 1)0 among them. Then these three variables has a V ECM
representation:
2
4
4pt
4st
4p t
3
5 =
2
64
(1)
11
(1)
12
(1)
13
(1)21 (1)22 (1)23
(1)31 (1)32 (1)33
3
75
2
4
4pt 1
4st 1
4p t 1
3
5 +
2
64
(2)
11
(2)
12
(2)
13
(2)21 (2)22 (2)23
(2)31 (2)32 (2)33
3
75
2
4
4pt 2
4st 2
4p t 2
3
5 + :::
+
2
64
(p 1)
11
(p 1)
12
(p 1)
13
(p 1)21 (p 1)22 (p 1)23
(p 1)31 (p 1)32 (p 1)33
3
75
2
4
4pt p+1
4st p+1
4p t p+1
3
5
+
2
4
cp
cs
cp
3
5
2
4
b1
b2
b3
3
5 1 1 1
2
4
pt 1
st 1
p t 1
3
5 +
2
64 "
(p)
t
"(s)t
"(p )t
3
75 ;
from which we see that the dynamics of changes in each variable is not only
according to the lags of its own and other variable’s change but also to the levels
of each of the elements of zt 1 by the speed B:
4pt = (1)11 4pt 1 + (1)12 4st 1 + (1)13 4p t 1 + (2)11 4pt 2 + (2)12 4st 2 + (2)13 4p t 2
+::: + (p 1)11 4pt p+1 + (p 1)12 4st p+1 + (p 1)13 4p t p+1 + cp
b1(pt 1 st 1 p t 1) + "(p)t
= (1)11 4pt 1 + (1)12 4st 1 + (1)13 4p t 1 + (2)11 4pt 2 + (2)12 4st 2 + (2)13 4p t 2
+::: + (p 1)11 4pt p+1 + (p 1)12 4st p+1 + (p 1)13 4p t p+1 + cp
b1zt 1 + "(p)t :
From economics equilibrium, when there is a positive equilibrium error hap-
pen in previous period, i.e. zt 1 = pt 1 st p t 1 > 0, at time t, the changes in
pt, i.e. 4pt = pt pt 1 should be negatively related with this equilibrium error.
Therefore, the parameters of equilibrium error adjustment should be positive in
(16).
10
3 Other representations for Cointegration
3.1 Phillips’s Triangular Representation
Another convenient representation for a cointegrated system was introduced by
Phillips (1991). Suppose that the rows of the (h k) matrix A0 form a basis
for the space of the cointegrating vectors. By reordering and normalizing the
cointegrating relations can be represented of the form
A0 =
2
66
66
66
4
a01
a02
:
:
:
a0h
3
77
77
77
5
=
2
66
66
66
4
1 0 : : : 0 1;h+1 1;h+2 : : : 1;k
0 1 : : : 0 2;h+1 2;h+2 : : : 2;k
: : : : : : : : : : : :
: : : : : : : : : : : :
: : : : : : : : : : : :
0 0 : : : 1 h;h+1 h;h+2 : : : h;k
3
77
77
77
5
= Ih 0 ;
where 0 is an (h g) matrix of coe cients for g k h.
Let zt denote the errors associated with the set of cointegrating relations:
zt A0yt:
Since zt is I(0), then the mean 1 E(zt) exists, and we can de ne
z t zt 1:
Partition yt as
yt =
y
1t(h 1)
y2t(g 1)
:
Then
zt = z t + 1 = Ih 0
y
1t(h 1)
y2t(g 1)
or
y1t(h 1) = 0(h g) y2t(g 1) + z t (h 1) + 1(h 1): (17)
11
A representation for y2t is given by the last g rows of (1):
4y2t(g 1) = 2(g 1) + u2t(g 1); (18)
where 2 and u2t represent the last g elements of the (k 1) vector and ut in (1),
respectively. (17) and (18) constitute Phillips’s (1991) triangular representation
of a system with exactly h cointegrating relations. Note that z t and u2t represent
zero-mean stationary disturbance in this representation.
Example:
Let the individual elements (pt st p t )0 are all I(1) and have a single cointegrating
vector a = (1 1 1)0 among them. The triangular representation of these
three variables are: given
A0 = a0 = [1 1 2];
then
pt = 1st + 2p t + 1 + z t
4st = s + ust
4p t = p + up ;t;
where the hypothesized values are 1 = 2 = 1.
3.2 The Stock-Watson’s Common Trends Representation
Another useful representation for any cointegrated system was proposed by Stock
and Watson (1988). Suppose that an (k 1) vector yt is characterized by exact
h cointegrating relations with g = k h. We have seen that it is possible to order
the element of yt in such a way that a triangular representation of the form of
(17) and (18) exists with (z0 t ;u02t)0 a I(0) (k 1) vector with zero mean. Suppose
that
z0
t
u02t
=
1X
s=0
H
s"t s
Js"t s
12
for "t an (k 1) white noise process with fsHsg1s=0 and fsHsg1s=0 absolutely
summable sequences of (h k) and (g k) matrices, respectively. From B-N
decomposition we have
y2t = y2;0 + 2 t +
tX
s=1
u2s
= y2;0 + 2 t + J(1) ("1 +"2 + ::: +"t) + 2t 2;0; (19)
where J(1) (J0 +J1 +J2 +:::), 2t P1s=0 2s"t s, and 2s (Js+1 +Js+2 +
Js+3 + :::). Since the (k 1) vector "t is white noise, the (g 1) vector J(1)"t is
also white noise, implying that each element of the (g 1) vector 2t de ned by
2t = J(1) ("1 +"2 + ::: +"t) (20)
is described by a random walk.
Substituting (20) into (19) results in
y2t = ~ 2 + 2 t + 2t + 2t (21)
for ~ 2 = (y2;0 2;0).
Substituting (21) into (17) produces
y1t = ~ 1 + 0( 2 t + t) + ~ 1t (22)
for ~ 1 = 1 + 0~ 2 and ~ 1t = z t + 0~ 2t.
Equations (21) and (22) give Stock and Watson’s (1988) common trends rep-
resentation. These equations show that the vector yt can be described as a
stationary component,
~
1
~ 2
+
~
1t
2t
;
plus linear combinations of up to g common deterministic trends, as described
by the (g 1) vector 2 t, the linear combination of the g common stochastic
trend as described by the (g 1) vector 2t. Therefore, when we say that a k 1
vector yt is characterized by exactly h cointegrations, it is equivalent to say that
there are g(= k h) common trends among them.
13
4 Estimation and Testing of Cointegration from
Single Equation
4.1 Testing for Cointegration When the Cointegrating Vec-
tor is Known
Often when theoretical considerations suggest that certain variables will be coin-
tegrated, or that a0yt is stationary for some (k 1) cointegrating vector a, the
theory is based on a particular known value for a. In the purchasing power parity
example, a = (1 1 1)0. Given the null hypothesis of unit root can not be re-
jected from various unit root tests on the individual series pt, st, and p t , the next
step is to test whether their particular linear combination zt = a0yt = pt st p t
is stationary from various unit root tests. See the example on p.585 of Hamilton.
4.2 Testing the Null Hypothesis of No Cointegration, Residual-
Based Tests for Cointegration
If the theoretical model of the system dynamic does not suggest a particular value
for the cointegrating vector a, then one approach is rst to estimate a by OLS.
4.2.1 Estimating The Cointegrating Vector
If it is known for certain that the cointegrating vector has a nonzero coe cient for
the rst element of yt (a1 6= 0), then a particularly convenient normalization is
to set a1 = 1 and represent subsequent entries of a (a2; a3; :::; an) as the negative
s of a set of unknown parameters ( 2; 3; :::; n):
2
66
66
66
66
4
a1
a2
a3
:
:
:
an
3
77
77
77
77
5
=
2
66
66
66
66
4
1
2
3
:
:
:
n
3
77
77
77
77
5
:
14
Then consistent estimation of a is achieved by an OLS regression of the rst
element of yt on all of the other:
y1t = 2y2t + 3y3t + ::: + nynt + ut: (23)
Consistent estimates of 2; 3; :::; n are also obtained when a constant term is
included in (23), as in
y1t = + 2y2t + 3y3t + ::: + nynt + ut:
or
y1t = + 0y2t + ut;
where 0 = ( 2; 3; :::; n) and y2t = (y2t; y3t; :::; ynt)0.
Theorem 1 (Stock, 1986):
Let y1t be a scalar y2t be a (g 1) vector. Let k g + 1, and suppose that
the (k 1) vector (y1t;y02t)0 is characterized by exactly one cointegrating relation
(h = 1) that has a nonzero coe cients on y1t. Let the triangular representation
for the system be
y1t = + 0y2t + z t (24)
4y2t = u2t: (25)
Suppose that
z
t
u2t
= (L)"t;
where "t is an (k 1) i:i:d: vector with mean zero, nite fourth moments, and
positive de nite variance-covariance matrix E("t"0t) = PP0. Suppose further that
the sequence of (k k) matrices fs sg1s=0 is absolutely summable and that the
rows of (1) are linearly independent. Let ^ T and ^ T be the OLS estimators of
(24). Partition (1) P as
(1) P =
1
0
(1 n)
2(g n)
:
Then
T1=2(^
T )
T(^ T )
L !
1 R [W(r)]0dr
2
0
2 R W(r)dr 2 R [W(r)] [W(r)]0dr 20
h
1
h2
;(26)
15
where W(r) is n-dimensional standard Brownian motion, the integral sign denotes
integration over r from 0 to 1, and
h1 10 W(1)
h2 2
Z
[W(r)] [W(r)]0dr
1 +
1X
v=0
E(u2tz t+v):
This theorem shows that the OLS estimator of the cointegrating vector is
consistent. However, it is noted that the correlation between the regressors y2t
and the error z t is not to induce inconsistency of ^ T ; instead, the asymptotic
distribution exhibits a bias since the distribution of T(^ T ) does not centered
around zero.
In the next chapter we will consider system estimation of cointegrating vec-
tor. Banerjee et al. (1993, p.214) examined one of main reasons for using such an
estimation: the large nite-sample biases that can arise static OLS estimates of
cointegrating vectors or parameters. While such estimator are super-consistent
(T-consistent), Monte Carlo experiments nonetheless suggest that a large number
of observations may be necessary before the biases become small.
4.2.2 What Is the Regression Estimating When There Is More Than
One Cointegrating Relation ?
The limiting distribution of the OLS estimation in Theorem 1 was derived under
the assumption that there is only one cointegration (h = 1). In a more general
case with h > 1, OLS estimate of (24) should still provide a consistent estimate of
a cointegrating vector. But which cointegrating vector is it ? Wooldridge (1991)
show that among the set of possible cointegrating relations, OLS estimation of
(24) select the relation whose residuals are uncorrelated with any other I(1) linear
combination of (y2t; y3t; :::; ynt).
4.2.3 What Is the Regression Estimating When There Is No Cointe-
grating Relation ?
Let us now consider the properties of OLS estimation when there is no cointegrat-
ing relation. Then (24) is a regression of an I(1) variables on a set of (k 1) I(1)
16
variables for which no coe cients produce an I(0) error term. The regression is
therefore subject to the spurious regression problem described in Chapter 22. The
coe cient ^ and ^ do not provide consistent estimate of any population param-
eter, and the OLS sample residual ^ut will be nonstationary. However, this last
property can be exploited to test for cointegration. If there is no cointegration,
then a regression of ^ut on ^ut 1 should yield a unit coe cient. If there is cointegra-
tion, then a regression of ^ut on ^ut 1 should yield a coe cient that is less than one.
The proposal is thus to estimate (24) by OLS and then construct one of the
standard unit root tests on the estimated residuals, such as the ADF t test
or the PP’s Z or Zt test. Although theses test statistics are constructed in the
same way as when they are applied to individual series yt, when the test are
applied to the residual ^ut from a spurious regression, the critical values that are
used to interpret the test statistics are di erent from those employed in Chapter
21.
Theorem 2 (Residual-Based test for Cointegration, Test with No Cointegration
as Null):
Consider an (k 1) vector yt such that
(1 L)yt = (L)"t =
1X
s=0
s"t s;
for "t an i:i:d: vector with mean zero, variance E("t"0t) = = PP0, and nite
fourth moment and where fs sg1s=0 is absolutely summable. Let g = (k 1)
and = (1)P. Suppose that the (n n) matrix 0 is nonsingular, and let
L denote the Cholesky decomposition of ( 0) 1. Partition yt as yt = (Y1t;y02t)0
and consider the OLS regression:
Y1t = ^ T + y02t ^ T + ^ut: (27)
The residual ^ut can then be regression on its own lagged value ^ut 1 without a
constant term (since the original regression (24) has contained a constant term,
the disturbance term ut is zero-mean):
^ut = ^ut 1 + et; (28)
yield the OLS estimate
^ T =
P ^u
t^ut 1P
^u2t 1 : (29)
17
We may form standard Dickey-Fuller and Phillips-Perron (Z ; Zt) from (28). Al-
ternatively we can form a ADF test from
^ut = 14^ut 1 + 24^ut 2 + ::: + p 14^ut p+1 + ^ut 1 + et: (30)
Then the following results hold.
(a) The statistics ^ de ned in (29) satis es (standard DF test)
(T 1)(^ 1) L !
1
2
[1 h02] [w (1)][w (1)]0
1
h2
h1[w (1)]0
1
h2
12[1 h02]L0[E(4yt)(4y0t)]L
1
h2
Hn: (31)
Here, w denotes n-dimensional standard Brownian motion partitioned as
w =
W
1 (r)(1 1)
w 2(r)(g 1)
;
h1 is a scalar and h2 is a (g 1) vector given by
h
1
h2
"
1 R 10 [w 2(r)]0drR
1
0 w
2(r)dr
R 1
0 [w
2(r)][w
2(r)]
0dr
# 1
" R 1
0 W
1 (r)drR 1
0 w
2(r)W
1 (r)dr
#
;
and
Hn =
Z 1
0
[W 1 (r)]2dr
Z 1
0
W 1 (r)dr
Z 1
0
[W 1 (r)][w 2(r)]0dr
h
1
h2
:
(b) If the l ! 1 (Newey-West truncated parameter) as T ! 1 but l=T ! 0,
then the Phillips-Perron statistics constructed from ^ut, Z satis es
Z L ! Zn; (32)
where
Zn
1
2
[1 h02] [w (1)][w (1)]0
1
h2
h1[w (1)]0
1
h2
12(1 + h02h2)
Hn: (33)
18
(c) If the l ! 1 as T ! 1 but l=T ! 0, then the Phillips-Perron statistics
constructed from ^ut, Zt satis es
Zt L ! Zn
p
Hn (1 + h02h2)1=2: (34)
(d) If, in addition to the preceding assumptions, 4yt follows a zero-mean sta-
tionary vector ARMA process and if p !1 as t !1 but p=T 1=3 ! 0, then the
ADF t test statistics associated with (30) has the same limiting distribution as
the test statistics Zt described in (34).
Results (a) implies that ^ p ! 1. Note that although W 1 (r) and w 2(r) are
standard Brownian motion, the distribution of the term h1, h2, Hn and Zn above
depend only on the number of stochastic explanatory variables included in the
cointegrating regression (k 1) and on whether a constant term (original coin-
tegration regression) appears in the regression, (T 1)(^ 1) are a ected by the
variance, correlations and dynamics of 4yt.
In the special case when 4yt is i:i:d:, then (L) = In, and the matrix 0 =
E(4yt4y0t). Since LL0 = ( 0) 1, it follows that ( 0) = (L0) 1(L) 1. Hence,
L0[E(4yt)(4y0t)]L = L0( 0)L = L0[(L0) 1(L) 1]L = In: (35)
If (35) is substituted into (31), the results is that when 4yt is i:i:d:,
T(^ 1) L ! Zn
for Zn de ned in (33).
In the more general case when 4yt is serially correlated, the limiting distri-
bution of T(^ 1) depends on the nature of this correlation as captured by the
elements L. However, the corrections for autocorrelation implicit in Phillips’s
Z and Zt statistics or the augmented Dickey-Fuller t test turn out to generate
variables whose distribution do not depend on any nuisance parameters.
19
Although the distribution of Z Zt and the ADFt do not depend on nui-
sance parameters, the distribution when these statistics are calculated from the
residuals ^ut are not the same as the distribution these statistics would have
if calculated from the raw data 4yt. Moreover, di erent values for (k 1) (the
number of stochastic explanatory variables in the cointegrating regression) imply
di erent characterizations of the limiting statistics, h1, h2, Hn, and Zn, mean-
ing that a di erent critical value must be used to interpret Z for each value of
(k 1). Similarly, the asymptotic distribution of h2, Hn, and Zn are di erent
depending on whether a constant term is included in the cointegrating regression.
Example:
See the purchasing power parity example on Hamilton’s p.598.
Exercise:
Reproduce the values in case 2 of Table B.8 and B.9 on Hamilton’s p.765-766.
4.3 Tests with Cointegration as Null
The test considered in the previous sections are for the null hypothesis of no
cointegration. These are based on tests for a unit root hypothesis in the residuals
for the cointegrating regression. In Chapter 21 we discussed unit root tests with
stationarity as the null hypothesis (e.g. KPSS). Correspondingly these are tests
with cointegration as the null. They are
(a) the Leybourne and McCabe test (1993) which is based on an unobserved
components model;
(b) the Park and Choi test (1988,1990) which is based on testing the signi cance
of super uous regressors;
(c) the Shin (1994) test which is a residual-based test
(d) Harris and Inder (1994) test which use non-parametric correction procedure
for estimation of cointegration regression.
20
4.4 Testing Hypothesis About the Cointegrating Vector
The previous section described some way to test whether a vector yt is coin-
tegrated. It was noted that if yt is cointegrated, then a consistent estimate of
the cointegrating vector can be obtained by OLS. However, a di culties arise
with nonstandard distribution for hypothesis test about the cointegrating vector
due to the possibility of nonzero correlation between z t and u2t. The nuisance
parameters 1 and 2 which appear in (26) also cause a problem. The basic
approach to constructing hypothesis tests will therefore be to transform the re-
gression or the estimate so as to eliminate the e ects of this correlation. The rst
one is Stock and Watson (1993)’s dynamic OLS which corrects the correlation
by adding leads and lags of 4y2t. The second one is Phillips and Hansen (1990)’s
fully modi ed OLS estimate. Modi cation of the OLS have been made in two
points. See Hatanaka (1996) p.266 for details.
21
5 Simulation of Bivariate Cointegrated System
To illustrate the potential di erence in size and power between various residual-
based test for cointegration in nite sample, a monte Carlo experiment proposed
by Cheung and Lai (1993), similar to that of Engle and Granger (1987), can be
conducted. A bivariate system of xt and yt is modeled by
xt + yt = ut (36)
and
xt + 2yt = vt (37)
with (1 L)ut = "t, and vt is generated as an AR(1) process
(1 L)vt = t: (38)
The innovation "t and t are generated as independent standard normal vari-
ates. When vt is given by (38) with j j < 1, xt and yt are cointegrated and (37)
is their cointegrating relationship. However, when j j = 1, the two series are not
cointegrated.
Exercise:
Use the simulation based on 10000 replication in a sample of size 500 to compare
the performance of size and power ( = 0:85) of residual-based ADF and PP test
for cointegration on a nominal size 5%. Truncated number is chosen as p = l = 4.
22