Ch. 19 Models of Nonstationary Time Series
In time series analysis we do not con ne ourselves to the analysis of stationary
time series. In fact, most of the time series we encounter are nonstationary. How
to deal with the nonstationary data and use what we have learned from stationary
model are the main subjects of this chapter.
1 Integrated Process
Consider the following two process
Xt = Xt 1 +ut; j j< 1;
Yt = Yt 1 +vt;
where ut and vt are mutually uncorrelated white noise process with variance
2u and 2v, respectively. Both Xt and Yt are AR(1) process. The di erence
between two models is that Yt is a special case of a Xt process when = 1
and is called a random walk process. It is also refereed to as a AR(1) model
with a unit root since the root of the AR(1) process is 1. When we consider
the statistical behavior of the two processes by investigating the mean (the rst
moment), and the variance and autocovariance (the second moment), they are
completely di erent. Although the two process belong to the same AR(1) class,
Xt is a stationary process, while Yt is a nonstationary process.
Assume that t 2 T ;T = f0;1;2;:::g, 1 the two stochastic processes can be
expressed ad
Xt = tX0 +
t 1X
i=0
iut i:
Similarly, in the unit root case
Yt = Y0 +
t 1X
i=0
vt i:
Suppose that the initial observation is zero, X0 = 0 and Y0 = 0. The means of
the two process are
E(Xt) = 0 and E(Yt) = 0;
1This assumption is required to derive the convergence of integrated process to standard
Brownian Motion. A standard Brown Motion is de ned on t 2 [0; 1].
1
and variances are
Var(Xt) =
t 1X
i=0
2iVar(ut i) ! 11 2 2u
and
Var(Yt) =
t 1X
i=0
Var(vt i) = t 2v:
The autocovariance of the two series are
X = E(XtXt ) = E
" t 1X
i=0
iut i
! t 1X
i=0
iut i
!#
= E[(ut + 1ut 1 +:::+ ut +:::+ t 1u1)(ut + 1ut 1 +:::+ t 1u1)
=
t 1X
i=0
i +i 2u
= 2u (
t 1X
i=0
2i)
!
1 2
2
u
= X0 :
and
Y = E(YtYt ) = E
" t 1X
i=0
vt i
! t 1X
i=0
vt i
!#
= E[(vt +vt 1 +:::+vt +vt 1 +:::+v1)(vt +vt 1 +:::+v1)]
= (t ) 2v:
We may expect that the autocorrelation functions are
rX =
X
X0 =
! 0
and
rY =
Y
Y0 =
(t )
t ! 1 8 :
The means of Xt and Yt are the same, but the variances (including autoco-
variance) are di erent. The important thing to note is that the variances and
2
autocovariance of Yt are function of t, while those of Xt converge to a constant
asymptotically. Thus as t increase the variance of Yt increase, while the variance
of Xt converges to a constant.
If we add a constant to the AR(1) process, then the means of two processes
also behave di erently. Consider the AR(1) process with a constant (or drift) as
follows
Xt = + Xt 1 +ut; j j< 1
and
Yt = +Yt 1 +vt:
The successive substitution yields
Xt = tX0 +
t 1X
i=0
i +
t 1X
i=0
iut i
and
Yt = Y0 + t+
t 1X
i=0
vt i: (1)
Note that Yt contains a (deterministic) trend t. If the initial observations are
zero, X0 = 0 and Y0 = 0, then the means of two process are
E(Xt) ! 1
E(Yt) = t
but the variances and the autocovariance are the same as those derived from
AR(1) model without the constant. By adding a constant to the AR(1) pro-
cesses, the means of two processes as well the variance are di erent. Both mean
and variance of Yt are time varying, while those of Xt are constant.
Since the variance (the second moment) and even mean (the rst moment) of
the nonstationary series is not constant over time, the conventional asymptotic
theory cannot be applied for these series (Recall the moment condition in CLT
on p.22 of Ch. 4).
3
2 Deterministic Trend and Stochastic Trend
Many economic and nancial times series do trended upward over time (such as
GNP, M2, Stock Index etc.). See the plots of Hamilton, p.436. For a long time
each trending (nonstationary) economic time series has been decomposed into a
deterministic trend and a stationary process. In recent years the idea of stochas-
tic trend has emerged, and enriched the framework of analysis to investigate
economic time series.
2.1 Detrending Methods
2.1.1 Di erencing-Stationary
One of the easiest ways to analyze those nonstationary-trending series is to make
those series stationary by di erencing. In our example, the random walk series
with drift Yt can be transformed to a stationary series by di erencing once
4Yt = Yt Yt 1 = (1 L)Yt = +vt:
Since vt is assumed to be a white noise process, the rst di erence of Yt is sta-
tionary. The variance of 4Yt is constant over the sample period. In the I(1)
process,
Yt = Y0 + t+
t 1X
i=0
vt i; (2)
t is a deterministic trend while Pt 1i=0 vt i is a stochastic trend.
When the nonstationary series can be transformed to the stationary series by
di erencing once, the series is said to be integratedof order 1 and is denoted by
I(1), or in common, a unit root process. If the series needs to be di erenced d
times to be stationery, then the series is said to be I(d). The I(d) series (d6= 0)
is also called a differencing stationary process (DSP). When (1 L)dYt
is a stationary and invertible series that can be represented by an ARMA(p;q)
model, i.e.
(1 1L 2L2 ::: pLp)(1 L)dYt = + (1 + 1L+ 2L2 +:::+ qLq)"t (3)
or
(L)4dYt = + (L)"t;
4
where all the roots of (L) = 0 and (L) = 0 lie outside the unit circle, we say
that Yt is an autoregressive integrated moving-average ARIMA(p;d;q) process.
In particular an unit root process, d = 1 or anARIMA(p;1;q) process is therefore
(L)4Yt = + (L)"t
or
(1 L)Yt = + (L)"t; (4)
where (L) = 1(L) (L) and is absolutely summable.
Successive substitution yields
Yt = Y0 + t+ (L)
t 1X
i=0
"t i: (5)
2.1.2 Trend-Stationary
Another important class is the trend stationaryprocess (TSP). Consider the
series
Xt = + t+ (L)"t; (6)
where the coe cients of (L) is absolute summable.
The mean of Xt is E(Xt) = + t and is not constant over time, while
the variance of Xt is Var(Xt) = (1+ 21 + 22 + :::) 2 and constant. Although
the mean of Xt is not constant over the period, it can be forecasted perfectly
whenever we know the value of t and the parameters and . In the sense
it is stationary around the deterministic trend t and Xt can be transformed to
stationarity by regressing it on time. Note that both DSP model equation (5)
and the TSP model equation (6) exhibit a linear trend, but the appropriated
method of eliminating the trend di ers. (It can be seen that the DSP is trend
nonstationary from the de nition of TSP.)
Most economic analysis is based the variance and covariance among the vari-
ables. For example, The OLS estimator from the regression Yt on Xt is the ratio
of the covariance between Yt and Xt to variance of Xt. Thus if the variance of the
5
variables behave di erently, the conventional asymptotic theory cannot be appli-
cable. When the order of integration is di erent, the variance of each process
behave di erently. For example, if Yt is an I(0) variable and Xt is I(1), the OLS
estimator from the regression Yt on Xt converges to zero asymptotically, since
the denominator of the OLS estimator, the variance of Xt, increase as t increase,
and thus it dominates the numerator, the covariance between Xt and Yt. That is,
the OLS estimator does not have an asymptotic distribution. (It is degenerated
with the conventional normalization of pT. See Ch. 21 for details)
2.2 Comparison of Trend-stationary and Di erencing
-Stationary Process
The best way to under the meaning of stochastic and deterministic trend is to
compare their time series properties. This section compares a trend-stationary
process (6) with a unit root process (4) in terms of forecasts of the series, variance
of the forecast error, dynamic multiplier, and transformations needs to achieve
stationarity.
2.2.1 Returning to a Central Line ?
The TSP model (6) has a central line + t, around which, Xt oscillates. Even
if shock let Xt deviate temporarily from the line there takes place a force to bring
it back to the line. On the other hand, the unit root process (5) has no such a
central line. One might wonder about a deterministic trend combined with a ran-
dom walk. The discrepancy between Yt and the line Y0 + t, became unbounded
as t! 1.
2.2.2 Forecast Error
The TSP and unit root speci cations are also very di erent in their implications
for the variance of the forecast error. For the trend-stationary process (6), the s
6
ahead forecast is
^Xt+sjt = + (t+s) + s"t + s+1"t 1 + s+2"t 2 +::::
which are associated with forecast error
Xt+s ^Xt+sjt = f + (t+s) +"t+s + 1"t+s 1 + 2"t+s 2 +::::
+ s 1"t+1 + s"t + s+1"t 1 +::::g
f + (t+s) + s"t + s+1"t 1 +::::g
= "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1:
The MSE of this forecast is
E[Xt+s ^Xt+sjt]2 = f1 + 21 + 22 +:::+ 2s 1g 2:
The MSE increases with the forecasting horizon s, though as s becomes large,
the added uncertainty from forecasting farther into the future becomes negligible:
lim
s!1
E[Xt+s ^Xt+sjt]2 = f1 + 21 + 22 +:::g 2:
Note that the limiting MSE is just the unconditional variance of the stationary
component (L)"t.
To forecast the unit root process (4), recall that the change 4Yt is a stationary
process that can be forecast using the standard formula:
4^Yt+sjt = ^E[(Yt+s Yt+s 1)jYt;Yt 1;:::]
= + s"t + s+1"t 1 + s+2"t 2 +:::
The level of the variable at date t+s is simply the sum of the change between t
and t+s:
Yt+s = (Yt+s Yt+s 1) + (Yt+s 1 Yt+s 2) +:::+ (Yt+1 Yt) +Yt (7)
= 4Yt+s + 4Yt+s 1 +:::+ 4Yt+1 +Yt: (8)
Therefore the s period ahead forecast error for the unit root process is
Yt+s ^Yt+sjt = f4Yt+s + 4Yt+s 1 +:::+ 4Yt+1 +Ytg
f4^Yt+sjt + 4^Yt+s 1jt +:::+ 4^Yt+1jt +Ytg
= f"t+s + 1"t+s 1 +:::+ s 1"t+1g
+f"t+s 1 + 1"t+s 2 +:::+ s 2"t+1g +:::+ f"t+1g
= "t+s + [1 + 1]"t+s 1 + [1 + 1 + 2]"t+s 2 +:::+ [1 + 1 + 2 +:::+ s 1]"t+1;
7
with MSE
E[Yt+s ^Yt+sjt]2 = f1 + [1 + 1]2 + [1 + 1 + 2]2 +:::+ [1 + 1 + 2 +:::+ s 1]2g 2:
The MSE again increase with the length of the forecasting horizon s, though in
contrast to the trend-stationary case. The MSE does not converge to any xed
value as s goes to in nity. See Figures 15.2 on p. 441 of Hamilton.
The model of a TSP and the model of DSP have totally di erent views about
how the world evolves in future. In the former the forecast error is bounded even
in the in nite horizon, but in the latter the error become unbounded as the hori-
zon extends.
One result is very important to understanding the asymptotic statistical prop-
erties to be presented in the subsequent chapter. The (deterministic) trend intro-
duced by a nonzero drift , ( t is O(T)) asymptotically dominates the increas-
ing variability arising over time due to the unit root component. (Pt 1i=0 "t i is
O(T1=2).) This means that data from a unit root with positive drift are certain
to exhibit an upward trend if observed for a su ciently long period.
2.2.3 Impulse Response
Another di erence between TSP and unit root process is the persistence of in-
novations. Consider the consequences for Xt+s if "t were to increase by one unit
with "0s for all other dates una ected. For the TSP process (4), this impulse
response is given by
@Xt+s
@"t = s:
For a trend-stationary process, then, the e ect of any stochastic disturbance
eventually wears o :
lim
s!1
@Xt+s
@"t = 0:
8
By contrast, for a unit root process, the e ect of "t on Yt+s is seen from (8)
and (4) to be
@Yt+s
@"t =
@4Yt+s
@"t +
@4Yt+s 1
@"t +:::+
@4Yt+1
@"t +
@Yt
@"t
= s + s 1 +:::+ 1 + 1 (since@4Yt+s@"
t
= s from (4))
An innovation "t has a permanent e ect on the level of Y that is captured by
lims!1 @Yt+s@"
t
= 1 + 1 + 2 +::: = (1):
Example:
The following ARIMA(4;1;0) model was estimated for Yt:
4Yt = 0:555 + 0:3124Yt 1 + 0:1224Yt 2 0:1164Yt 3 0:0814yt 4 + ^"t:
For this speci cation, the permanent e ect of a one-unit change in "t on the level
of Yt is estimated to be
(1) = 1 (1) = 1(1 0:312 0:122 + 0:116 + 0:081) = 1:31:
2.2.4 Transformations to Achieve Stationarity
A nal di erence between trend-stationary and unit root process that deserves
comment is the transformation of the data needed to generate a stationary time
series. If the process is really trend stationary as in (6), the appropriate treatment
is to subtract t from Xt to produce a stationary representation. By contrast,
if the data were really generated by the unit root process (5), subtracting t
from Yt, would succeed in removing the time-dependence of the mean but not the
variance as seen in (5).
There have been several papers that have studied the consequence ofoverdiffer
encing and underdifferencing:
1. If the process is really TSP as in (6), di erence it would be
4Xt = + t (t 1) + (L)(1 L)"t = + (L)"t: (9)
9
In this representation, this look like a DSP however, a unit root has been intro-
duced into the moving average representation, (L) which violates the de nition
of I(d) process as in (4). This is the case of overdi erencing.
2. If the process is really DSP as in (6), and we treat it as TSP, we have a case
of underdi erencing.
10
3 Other Approaches to Trended Time Series
3.1 Fractional Integration
See Chapter 23 ? for detail.
3.2 Occasional Breaks in trend
According to the unit root speci cation (6), events are occurring all the time that
permanently a ect Y. Perron (1989) and Rappoport and Reichlin (1989) have
argued that economic events that have large permanent e ects are relatively rare.
This idea can be illustrated with the following model, in which Yt is a TSP but
with a single break:
Yt =
1 + t+"t fort<T0
2 + t+"t fort T0: (10)
We rst di erence (10) to obtain
4Yt = t + +"t "t 1; (11)
where t = ( 2 1) when t = T0 and is zero otherwise. Suppose that t is viewed
as a random variable with Bernoulli distribution,
t =
2 1 withprobability p
0 withprobability 1 p:
Then, t is a white noise with mean E( t) = p( 2 1). (11) could be rewritten
as
4Yt = + t; (12)
where
= p( 2 1) +
t = t p( 2 1) +"t "t 1:
But t is the sum of a zero mean white noise process t = [ t p( 2 1)]
and an independent MA(1) process ["t "t 1]. t has mean zero E( t = 0) and
autocovariance functions, = 0 for 2. Therefore an MA(1) representation
11
for t exists, say t = t t 1. From this perspective, (11) could be viewed as
an ARIMA(0;1;1)process,
4Yt = + t t 1;
with a non-Gaussian distribution for the innovation t which is a sum of Gaussian
and Bernoulli distribution. See the plot on p.451 of Hamilton.
12