Ch. 19 Models of Nonstationary Time Series In time series analysis we do not con ne ourselves to the analysis of stationary time series. In fact, most of the time series we encounter are nonstationary. How to deal with the nonstationary data and use what we have learned from stationary model are the main subjects of this chapter. 1 Integrated Process Consider the following two process Xt = Xt 1 +ut; j j< 1; Yt = Yt 1 +vt; where ut and vt are mutually uncorrelated white noise process with variance 2u and 2v, respectively. Both Xt and Yt are AR(1) process. The di erence between two models is that Yt is a special case of a Xt process when = 1 and is called a random walk process. It is also refereed to as a AR(1) model with a unit root since the root of the AR(1) process is 1. When we consider the statistical behavior of the two processes by investigating the mean (the rst moment), and the variance and autocovariance (the second moment), they are completely di erent. Although the two process belong to the same AR(1) class, Xt is a stationary process, while Yt is a nonstationary process. Assume that t 2 T ;T = f0;1;2;:::g, 1 the two stochastic processes can be expressed ad Xt = tX0 + t 1X i=0 iut i: Similarly, in the unit root case Yt = Y0 + t 1X i=0 vt i: Suppose that the initial observation is zero, X0 = 0 and Y0 = 0. The means of the two process are E(Xt) = 0 and E(Yt) = 0; 1This assumption is required to derive the convergence of integrated process to standard Brownian Motion. A standard Brown Motion is de ned on t 2 [0; 1]. 1 and variances are Var(Xt) = t 1X i=0 2iVar(ut i) ! 11 2 2u and Var(Yt) = t 1X i=0 Var(vt i) = t 2v: The autocovariance of the two series are X = E(XtXt ) = E " t 1X i=0 iut i ! t 1X i=0 iut i !# = E[(ut + 1ut 1 +:::+ ut +:::+ t 1u1)(ut + 1ut 1 +:::+ t 1u1) = t 1X i=0 i +i 2u = 2u ( t 1X i=0 2i) ! 1 2 2 u = X0 : and Y = E(YtYt ) = E " t 1X i=0 vt i ! t 1X i=0 vt i !# = E[(vt +vt 1 +:::+vt +vt 1 +:::+v1)(vt +vt 1 +:::+v1)] = (t ) 2v: We may expect that the autocorrelation functions are rX = X X0 = ! 0 and rY = Y Y0 = (t ) t ! 1 8 : The means of Xt and Yt are the same, but the variances (including autoco- variance) are di erent. The important thing to note is that the variances and 2 autocovariance of Yt are function of t, while those of Xt converge to a constant asymptotically. Thus as t increase the variance of Yt increase, while the variance of Xt converges to a constant. If we add a constant to the AR(1) process, then the means of two processes also behave di erently. Consider the AR(1) process with a constant (or drift) as follows Xt = + Xt 1 +ut; j j< 1 and Yt = +Yt 1 +vt: The successive substitution yields Xt = tX0 + t 1X i=0 i + t 1X i=0 iut i and Yt = Y0 + t+ t 1X i=0 vt i: (1) Note that Yt contains a (deterministic) trend t. If the initial observations are zero, X0 = 0 and Y0 = 0, then the means of two process are E(Xt) ! 1 E(Yt) = t but the variances and the autocovariance are the same as those derived from AR(1) model without the constant. By adding a constant to the AR(1) pro- cesses, the means of two processes as well the variance are di erent. Both mean and variance of Yt are time varying, while those of Xt are constant. Since the variance (the second moment) and even mean (the rst moment) of the nonstationary series is not constant over time, the conventional asymptotic theory cannot be applied for these series (Recall the moment condition in CLT on p.22 of Ch. 4). 3 2 Deterministic Trend and Stochastic Trend Many economic and nancial times series do trended upward over time (such as GNP, M2, Stock Index etc.). See the plots of Hamilton, p.436. For a long time each trending (nonstationary) economic time series has been decomposed into a deterministic trend and a stationary process. In recent years the idea of stochas- tic trend has emerged, and enriched the framework of analysis to investigate economic time series. 2.1 Detrending Methods 2.1.1 Di erencing-Stationary One of the easiest ways to analyze those nonstationary-trending series is to make those series stationary by di erencing. In our example, the random walk series with drift Yt can be transformed to a stationary series by di erencing once 4Yt = Yt Yt 1 = (1 L)Yt = +vt: Since vt is assumed to be a white noise process, the rst di erence of Yt is sta- tionary. The variance of 4Yt is constant over the sample period. In the I(1) process, Yt = Y0 + t+ t 1X i=0 vt i; (2) t is a deterministic trend while Pt 1i=0 vt i is a stochastic trend. When the nonstationary series can be transformed to the stationary series by di erencing once, the series is said to be integratedof order 1 and is denoted by I(1), or in common, a unit root process. If the series needs to be di erenced d times to be stationery, then the series is said to be I(d). The I(d) series (d6= 0) is also called a differencing stationary process (DSP). When (1 L)dYt is a stationary and invertible series that can be represented by an ARMA(p;q) model, i.e. (1 1L 2L2 ::: pLp)(1 L)dYt = + (1 + 1L+ 2L2 +:::+ qLq)"t (3) or (L)4dYt = + (L)"t; 4 where all the roots of (L) = 0 and (L) = 0 lie outside the unit circle, we say that Yt is an autoregressive integrated moving-average ARIMA(p;d;q) process. In particular an unit root process, d = 1 or anARIMA(p;1;q) process is therefore (L)4Yt = + (L)"t or (1 L)Yt = + (L)"t; (4) where (L) = 1(L) (L) and is absolutely summable. Successive substitution yields Yt = Y0 + t+ (L) t 1X i=0 "t i: (5) 2.1.2 Trend-Stationary Another important class is the trend stationaryprocess (TSP). Consider the series Xt = + t+ (L)"t; (6) where the coe cients of (L) is absolute summable. The mean of Xt is E(Xt) = + t and is not constant over time, while the variance of Xt is Var(Xt) = (1+ 21 + 22 + :::) 2 and constant. Although the mean of Xt is not constant over the period, it can be forecasted perfectly whenever we know the value of t and the parameters and . In the sense it is stationary around the deterministic trend t and Xt can be transformed to stationarity by regressing it on time. Note that both DSP model equation (5) and the TSP model equation (6) exhibit a linear trend, but the appropriated method of eliminating the trend di ers. (It can be seen that the DSP is trend nonstationary from the de nition of TSP.) Most economic analysis is based the variance and covariance among the vari- ables. For example, The OLS estimator from the regression Yt on Xt is the ratio of the covariance between Yt and Xt to variance of Xt. Thus if the variance of the 5 variables behave di erently, the conventional asymptotic theory cannot be appli- cable. When the order of integration is di erent, the variance of each process behave di erently. For example, if Yt is an I(0) variable and Xt is I(1), the OLS estimator from the regression Yt on Xt converges to zero asymptotically, since the denominator of the OLS estimator, the variance of Xt, increase as t increase, and thus it dominates the numerator, the covariance between Xt and Yt. That is, the OLS estimator does not have an asymptotic distribution. (It is degenerated with the conventional normalization of pT. See Ch. 21 for details) 2.2 Comparison of Trend-stationary and Di erencing -Stationary Process The best way to under the meaning of stochastic and deterministic trend is to compare their time series properties. This section compares a trend-stationary process (6) with a unit root process (4) in terms of forecasts of the series, variance of the forecast error, dynamic multiplier, and transformations needs to achieve stationarity. 2.2.1 Returning to a Central Line ? The TSP model (6) has a central line + t, around which, Xt oscillates. Even if shock let Xt deviate temporarily from the line there takes place a force to bring it back to the line. On the other hand, the unit root process (5) has no such a central line. One might wonder about a deterministic trend combined with a ran- dom walk. The discrepancy between Yt and the line Y0 + t, became unbounded as t! 1. 2.2.2 Forecast Error The TSP and unit root speci cations are also very di erent in their implications for the variance of the forecast error. For the trend-stationary process (6), the s 6 ahead forecast is ^Xt+sjt = + (t+s) + s"t + s+1"t 1 + s+2"t 2 +:::: which are associated with forecast error Xt+s ^Xt+sjt = f + (t+s) +"t+s + 1"t+s 1 + 2"t+s 2 +:::: + s 1"t+1 + s"t + s+1"t 1 +::::g f + (t+s) + s"t + s+1"t 1 +::::g = "t+s + 1"t+s 1 + 2"t+s 2 +:::+ s 1"t+1: The MSE of this forecast is E[Xt+s ^Xt+sjt]2 = f1 + 21 + 22 +:::+ 2s 1g 2: The MSE increases with the forecasting horizon s, though as s becomes large, the added uncertainty from forecasting farther into the future becomes negligible: lim s!1 E[Xt+s ^Xt+sjt]2 = f1 + 21 + 22 +:::g 2: Note that the limiting MSE is just the unconditional variance of the stationary component (L)"t. To forecast the unit root process (4), recall that the change 4Yt is a stationary process that can be forecast using the standard formula: 4^Yt+sjt = ^E[(Yt+s Yt+s 1)jYt;Yt 1;:::] = + s"t + s+1"t 1 + s+2"t 2 +::: The level of the variable at date t+s is simply the sum of the change between t and t+s: Yt+s = (Yt+s Yt+s 1) + (Yt+s 1 Yt+s 2) +:::+ (Yt+1 Yt) +Yt (7) = 4Yt+s + 4Yt+s 1 +:::+ 4Yt+1 +Yt: (8) Therefore the s period ahead forecast error for the unit root process is Yt+s ^Yt+sjt = f4Yt+s + 4Yt+s 1 +:::+ 4Yt+1 +Ytg f4^Yt+sjt + 4^Yt+s 1jt +:::+ 4^Yt+1jt +Ytg = f"t+s + 1"t+s 1 +:::+ s 1"t+1g +f"t+s 1 + 1"t+s 2 +:::+ s 2"t+1g +:::+ f"t+1g = "t+s + [1 + 1]"t+s 1 + [1 + 1 + 2]"t+s 2 +:::+ [1 + 1 + 2 +:::+ s 1]"t+1; 7 with MSE E[Yt+s ^Yt+sjt]2 = f1 + [1 + 1]2 + [1 + 1 + 2]2 +:::+ [1 + 1 + 2 +:::+ s 1]2g 2: The MSE again increase with the length of the forecasting horizon s, though in contrast to the trend-stationary case. The MSE does not converge to any xed value as s goes to in nity. See Figures 15.2 on p. 441 of Hamilton. The model of a TSP and the model of DSP have totally di erent views about how the world evolves in future. In the former the forecast error is bounded even in the in nite horizon, but in the latter the error become unbounded as the hori- zon extends. One result is very important to understanding the asymptotic statistical prop- erties to be presented in the subsequent chapter. The (deterministic) trend intro- duced by a nonzero drift , ( t is O(T)) asymptotically dominates the increas- ing variability arising over time due to the unit root component. (Pt 1i=0 "t i is O(T1=2).) This means that data from a unit root with positive drift are certain to exhibit an upward trend if observed for a su ciently long period. 2.2.3 Impulse Response Another di erence between TSP and unit root process is the persistence of in- novations. Consider the consequences for Xt+s if "t were to increase by one unit with "0s for all other dates una ected. For the TSP process (4), this impulse response is given by @Xt+s @"t = s: For a trend-stationary process, then, the e ect of any stochastic disturbance eventually wears o : lim s!1 @Xt+s @"t = 0: 8 By contrast, for a unit root process, the e ect of "t on Yt+s is seen from (8) and (4) to be @Yt+s @"t = @4Yt+s @"t + @4Yt+s 1 @"t +:::+ @4Yt+1 @"t + @Yt @"t = s + s 1 +:::+ 1 + 1 (since@4Yt+s@" t = s from (4)) An innovation "t has a permanent e ect on the level of Y that is captured by lims!1 @Yt+s@" t = 1 + 1 + 2 +::: = (1): Example: The following ARIMA(4;1;0) model was estimated for Yt: 4Yt = 0:555 + 0:3124Yt 1 + 0:1224Yt 2 0:1164Yt 3 0:0814yt 4 + ^"t: For this speci cation, the permanent e ect of a one-unit change in "t on the level of Yt is estimated to be (1) = 1 (1) = 1(1 0:312 0:122 + 0:116 + 0:081) = 1:31: 2.2.4 Transformations to Achieve Stationarity A nal di erence between trend-stationary and unit root process that deserves comment is the transformation of the data needed to generate a stationary time series. If the process is really trend stationary as in (6), the appropriate treatment is to subtract t from Xt to produce a stationary representation. By contrast, if the data were really generated by the unit root process (5), subtracting t from Yt, would succeed in removing the time-dependence of the mean but not the variance as seen in (5). There have been several papers that have studied the consequence ofoverdiffer encing and underdifferencing: 1. If the process is really TSP as in (6), di erence it would be 4Xt = + t (t 1) + (L)(1 L)"t = + (L)"t: (9) 9 In this representation, this look like a DSP however, a unit root has been intro- duced into the moving average representation, (L) which violates the de nition of I(d) process as in (4). This is the case of overdi erencing. 2. If the process is really DSP as in (6), and we treat it as TSP, we have a case of underdi erencing. 10 3 Other Approaches to Trended Time Series 3.1 Fractional Integration See Chapter 23 ? for detail. 3.2 Occasional Breaks in trend According to the unit root speci cation (6), events are occurring all the time that permanently a ect Y. Perron (1989) and Rappoport and Reichlin (1989) have argued that economic events that have large permanent e ects are relatively rare. This idea can be illustrated with the following model, in which Yt is a TSP but with a single break: Yt = 1 + t+"t fort<T0 2 + t+"t fort T0: (10) We rst di erence (10) to obtain 4Yt = t + +"t "t 1; (11) where t = ( 2 1) when t = T0 and is zero otherwise. Suppose that t is viewed as a random variable with Bernoulli distribution, t = 2 1 withprobability p 0 withprobability 1 p: Then, t is a white noise with mean E( t) = p( 2 1). (11) could be rewritten as 4Yt = + t; (12) where = p( 2 1) + t = t p( 2 1) +"t "t 1: But t is the sum of a zero mean white noise process t = [ t p( 2 1)] and an independent MA(1) process ["t "t 1]. t has mean zero E( t = 0) and autocovariance functions, = 0 for 2. Therefore an MA(1) representation 11 for t exists, say t = t t 1. From this perspective, (11) could be viewed as an ARIMA(0;1;1)process, 4Yt = + t t 1; with a non-Gaussian distribution for the innovation t which is a sum of Gaussian and Bernoulli distribution. See the plot on p.451 of Hamilton. 12