Ch. 16 Stochastic Model Building Unlike linear regression model which usually has an economic theoretic model built somewhere in economic literature, the time series analysis of a stochastic process needs the ability to relating a stationary ARMA model to real data. It is usually best achieved by a three-stage iterative procedure based on identification, estimation, and diagnostic checking as suggested by Box and Jenkins (1976). 1 Model Identi cation By identi cation we mean the use of the data, and of any information on how the series was generated, to suggest a subclass of parsimonious model worthy to be entertained. We usually transform the data, if necessary, so the assumption of covariance stationarity is a reasonable one. We then at this stage make an initial guess of small values of p and q for an ARMA(p; q) model that might describe the transformed data. 1.1 Identifying the degree of Di erence Trend stationary or di erence stationary ? See Ch. 19. 1.2 Use of the Autocorrelation and Partial Autocorrela- tion Function in Identi cation 1.2.1 Autocorrelation Recall that if the data really follow an MA(q) process, then its (population) autocorrelation rj(= j= 0) will be zero for j > q. By contrast, if the data follow an AR(p) process, then rj will gradually decay toward zero as a mixture of exponential or damped sinusoids. On guide for distinguishing MA and AR representation, then, would be the decay properties of rj. It is useful to have a rough check on whether rj is e ectively zero beyond a certain lag. 1 A natural estimate of the population autocorrelation rj is provided by the corresponding sample moment: (remember at this stage, you still have no "model" to estimate, so it is natural to use moment estimator) ^rj = ^ j^ 0 ; where ^ j = 1T TX t=j+1 (Yt Y )(Yt j Y ) for j = 0; 1; 2; :::; T 1 Y = 1 T TX t=1 Yt: If the data were really generated by a Gauss MA(q) process, then the covariance of the estimated autocorrelation ^rj, could be approximated by (see Box et al. (1994), p. 33) V ar(^rj) = 1T ( 1 + 2 qX i=1 r2i ) for j = q + 1; q + 2; ::: (1) To use (1) in practice, the estimated autocorrelation ^rj(j = 1; 2; :::; q) are substituted for the theoretical autocorrelation rj, and when this is done we shall refer to the square root of (1) as the large lag standard error. In particu- lar, if we suspect that the data were generated by Gaussian white noise, then ^rj N(0; 1=T) for j 6= 0, that is ^rj should lie between 2=pT about 95% of the time. Example: The following estimated autocorrelations were obtained from a time series of length T = 200 observations, generated from a stochastic process for which it was known that r1 = 0:4 and rj = 0 for j 2: ^r1 = 0:38, ^r2 = 0:08, ^r3 = 0:11, ^r4 = 0:08, ^r5 = 0:02, ^r6 = 0:00, ^r7 = 0:00, ^r8 = 0:00, ^r9 = 0:07 and ^r10 = 0:08. On the assumption that the series is complete random: H0 : q = 0, then for all lags, (1) yields V ar(^r1) = 1T = 1200 = 0:005: 2 Under the null hypothesis, ^r1 N(0; 0:005) or the 95% con dence interval is 2 < ^r1p0:005 < 2 0:14 < ^r1 < 0:14: Since the value of estimated ^r1 is 0:38, which is outside the con dence interval, it can be conclude that the hypothesis that q = 0 is rejected. It might be reasonable to ask next whether the series was compatible with the hypothesis that q = 1. Using (1) with q = 1, the estimated large-lag variance under this assumption is V ar(^r2) = 1200[1 + 2( 0:38)2] = 0:0064: Under the null hypothesis, ^r2 N(0; 0:0064) or the 95% con dence interval is 2 < ^r2p0:0064 < 2 0:16 < ^r2 < 0:16: Since the value of estimated ^r2 is 0:08, which is lying in the con dence interval, it can be conclude that the hypothesis that q = 1 is accepted. 1.2.2 Partial Autocorrelation Function Another useful measures is the partial autocorrelation which is a device to exploits the fact that whereas an AR(p) has an autocorrelation function which is in nite in extent, it can by its very nature be described in terms of p nonzero functions of the autocorrelations. The mth population partial autocorrelation 3 (denoted (m)m ) is de ned as the last coe cient in a linear projection of Y on its m most recent value: ^Yt+1jt = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ): (2) We saw in (15) of Chapter 15 that the vector (m) can be calculated from 2 66 66 66 64 (m)1 (m)2 : : : (m)m 3 77 77 77 75 = 2 66 66 66 4 0 1 : : : m 1 1 0 : : : m 2 : : : : : : : : : : : : : : : : : : m 1 m 2 : : : 0 3 77 77 77 5 1 2 66 66 66 4 1 2 : : : m 3 77 77 77 5 : Recall that if the data were really generated by an AR(p) process, only the p most recent values of Y would be useful for forecasting. In this case, the projection coe cients on Y ’s more than p periods in the past are equal to zeros: (m)m = 0 for m = p + 1; p + 2; ::: By contrast, if the data really were generated by an MA(q) process with q 1, then the partial autocorrelation (m)m asymptotically approaches zero instead of cutting o abruptly. Since forecast error "t+1 is uncorrelated with xt, we could rewrite (2) as Yt+1 = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ) + "t+1; t 2 T or Yt = (m)1 (Yt 1 ) + (m)2 (Yt 2 ) + ::: + (m)m (Yt m ) + "t; t 2 T : (3) The reason why the quantity (m)m de ned through (2) is called the partial autocorrelation of the process fYtg at lag m is clear from (3), since it is actually equal to the partial correlation between the variable Yt and Yt m adjusted for the interme- diate variables Yt 1; Yt 2; :::; Yt m+1, and (m)m measures the correlation between Yt and Yt m after adjusting for the e ect of Yt 1; Yt 2; :::; Yt m+1 (or the corre- lation between Yt and Yt m not account for by Yt 1; Yt 2; :::; Yt m+1). See the 4 counterpart-result from sample on p.6 of Chapter 6. A natural estimate of the mth partial autocorrelations is the last coe cients in an OLS regression of Y on a constant and its m most recent values: Yt = ^c + ^ (m)1 Yt 1 + ^ (m)2 Yt 2 + ::: + ^ (m)m Yt m + ^et; (4) where ^et denotes the OLS regression residual. If the data were really generated by an AR(p) process, then the sample estimate ^ (m)m would have a variance around the true value (0) that could be approximated by (see Box et al. 1994, p.68) V ar(^ (m)m ) = 1T for m = p + 1; p + 2; ::: 1.3 Use of Model Selection Criteria Another approach to model selection is the use of information criteria such as AIC proposed by Akaike (1974) or the BIC of Schwarz (1978). In the implementation of this approach, a range of potential ARMA models is estimated by maximum likelihood methods to be discussed in Chapter 17, and for each, a criterion such as AIC (normalized by sample size T, given by AICp;q = 2 ln(maximized likelihood) + 2mT ln(^ 2) + 2mT or the related BIC given by BICp;q = ln(^ 2) + m ln(T)T is evaluated, where ^ 2 denotes the maximum likelihood estimate of 2, and m = p + q + 1 denotes the number of parameters estimated in the model, in- cluding a constant term. In the criteria above, the rst term essentially corre- sponds to minus 2=T times the log of the maximized likelihood, while the second term is a "penalty factor" for inclusion of additional parameters in the model. In the information criteria approach, models that yield a minimum value for the criterion are to be preferred, and the AIC or BIC values are compared among various model as the basis for selection of the models. However, one immediate 5 disadvantage of this approach is that several models may have to be estimated by MLE, which is computationally time consuming and expensive. For this reason, Hannan and Rissanen (1982) propose an alternative model selection procedure. 2 Model Estimation By estimation we mean e cient use of the data to make inference about pa- rameters conditional on the adequacy of the model entertained. See Chapter 17 for details. 3 Model Diagnostic Checking By diagnostic checking we mean checking the tted model in its relation to the data with intent to reveal model inadequacies and so to achieve model im- provement. Suppose that using a particular time series, the model has been identi ed and the parameters estimated using the methods described in Chapter 17. The ques- tion remains (unlike the regression analysis where an economic or nance model is provided by theoretical literature) of deciding whether this model is adequate. If there should be evidence of serious inadequacy, we shall need to know how the model should be modi ed. By reference to familiar procedures outside time se- ries analysis, the scrutiny of residuals for the analysis of variance would be called diagnostic checks. 3.1 Diagnostic Checks Applied to residuals It cannot be too strongly emphasized that visual inspection of a plot of the residual is an indispensable rst step in the checking process. 6 3.1.1 Autocorrelation Check Suppose we have identi ed and tted a model (L)Yt = (L)"t with MLE estimator (^ ; ^ ) obtained for the parameters. Then we shall refer the quantities ^"t = ^ 1(L)^ (L)Yt as the residuals. The residuals are computed recursive from ^ (L)^"t = ^ (L)Yt as ^"t = Yt pX j=1 ^ jYt j + qX j=1 ^ j ^"t j t = 1; 2; :::; T using either zero initial values (conditional method) or back-forecasted initial value (exact method) for the initial ^"0s and Y 0s. Now it is possible to show that if the model is adequate, ^"t = "t + O 1 pT : (read as big O T 1=2; it means this term has to multiply T1=2 to be bounded: That is; it converges to zero itself !) As the series length increase, the ^"t’s become close to the white noise "t’s. There- fore, one might expect that study of the ^"t’s could indicate the existence and nature of model adequacy. In particular, recognizable patterns in the estimated autocorrelations function of the ^"t’s, ^rj(^"), and using (1), could point out to ap- propriate modi cation in the model. 3.1.2 Portmanteau Lack-of-Fit Test Rather than consider the ^rj(^")’s individually, an indication is often needed of whether, say, the rst 20 autocorrelations of the ^"t’s taken as a whole, indicating inadequacy of the model. Suppose we have the rst k autocorrelation 1 ^rj(^"); j = 1Here, k is chosen su ciently large so that the weight ’j in the model written in the form Yt = (L) 1 (L)"t = ’(L)"t will be negligible small after j = k. 7 1; 2; :::; k form any ARMA(p; q) process; then it is possible to show that if the model is appropriate, the Box-Pierce (1970) Q statistics Q = T kX j=1 ^r2j (^"); is approximately distributed as 2k. On the other hand, if the model is inappro- priate, the average value of Q will be in ated. A re nement that appears to have better nite-sample properties is the Ljung-Box (1979) statistics: Q0 = T(T + 2) kX j=1 ^r2j (^") T k: The limiting distribution of Q0 is the same as that of Q. Exercise: Build up a stochastic model to the data set I give to you from Box-Jenkins procedure. 8