Ch. 15 Forecasting Having considered in Chapter 14 some of the properties of ARMA models, we now show how they may be used to forecast future values of an observed time series. For the present we proceed as if the model were known exactly. Forecasting is an important concept for the studies of time series analysis. In the scope of regression model we usually has an existing economic theory model for us to estimate their parameters. The estimated coe cients have already a role to play such as to con rm some economic theories. Therefore, to forecast or not from this estimated model depends on researcher’s own interest. However, the estimated coe cients from a time series model have no signi cant meaning to economic theory. An important role that a time series analysis is therefore to be able to forecast precisely from this pure mechanical model. 1 Principle of Forecasting 1.1 Forecasts Based on Conditional Expectations Suppose we are interested in forecasting the value of a variables Yt+1 based on a set of variables xt observed at date t. For example, we might want to forecast Yt+1 based on its m most recent values. In this case, xt = [Yt; Yt 1; ::::; Yt m+1]0. Let Y t+1jt denote a forecast of Yt+1 based on xt (a function of xt, depending on how they are realized). To evaluate the usefulness of this forecast, we need to specify a loss function. A quadratic loss function means choosing the forecast Y t+1jt so as to minimize MSE(Y t+1jt) = E(Yt+1 Y t+1jt)2; which is known as the mean squared error. Theorem: The smallest mean squared error of in the forecast Y t+1jt is the expectation of Yt+1 conditional on xt: Y t+1jt = E(Yt+1jxt): 1 Proof: Let g(xt) be a forecasting function of Yt+1 other then the conditional expectation E(Yt+1jxt). Then the MSE associated with g(xt) would be E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt) + E(Yt+1jxt) g(xt)]2 = E[Yt+1 E(Yt+1jxt)]2 +2Ef[Yt+1 E(Yt+1jxt)][E(Yt+1jxt) g(xt)]g +EfE[(Yt+1jxt) g(xt)]2g: Denote t+1 Ef[Yt+1 E(Yt+1jxt)][E(Yt+1jxt) g(xt)]g we have E( t+1jxt) = [E(Yt+1jxt) g(xt)] E([Yt+1 E(Yt+1jxt)]jxt) = [E(Yt+1jxt) g(xt)] 0 = 0: By laws of iterated expectation, it follows that E( t+1) = ExtE(E[ t+1jxt]) = 0: Therefore we have E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt)]2 + EfE[(Yt+1jxt) g(xt)]2g: (1) The second term on the right hand side of (1) cannot be made smaller than zero and the rst term does not depend on g(xt). The function g(xt) that can makes the mean square error (1) as small as possible is the function that sets the second term in (1) to zero: g(xt) = E(Yt+1jxt): The MSE of this optimal forecast is E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt)]2: 2 1.2 Forecasts Based on Linear Projection Suppose we now consider only the class of forecast that Yt+1 is a linear function of xt: Y t+1jt = 0xt: De nition: The forecast 0xt is called the linear projection of Yt+1 on xt if the forecast error (Yt+1 0xt) is uncorrelated with xt: E[(Yt+1 0xt)x0t] = 00: (2) Theorem: The linear projection produces the smallest mean squared error among the class of linear forecasting rule. Proof: Let g0xt be any arbitrary linear forecasting function of Yt+1. Then the MSE associated with g0xt would be E[Yt+1 g0xt]2 = E[Yt+1 0xt + 0xt g0xt]2 = E[Yt+1 0xt]2 +2Ef[Yt+1 0xt][ 0xt g0xt]g +E[ 0xt g0xt]2: Denote t+1 Ef[Yt+1 0xt][ 0xt g0xt]g we have E( t+1) = Ef[Yt+1 0xt][ 0 g0]xtg = (E[Yt+1 0xt]x0t)[ g] = 00[ g] = 00: Therefore we have E[Yt+1 g0xt]2 = E[Yt+1 0xt)]2 + E[ 0xt g0xt]2: (3) The second term on the right hand side of (3) cannot be made smaller than zero and the rst term does not depend on g0xt. The function g0xt that can makes 3 the mean square error (3) as small as possible is the function that sets the second term in (3) to zero: g0xt = 0xt: The MSE of this optimal forecast is E[Yt+1 g0xt]2 = E[Yt+1 0xt]2: For 0xt is a linear projection of Yt+1 on xt, we will use the notation ^P(Yt+1jxt) = 0xt; to indicate the linear projection of Yt+1 on xt. Notice that MSE[ ^P(Yt+1jxt)] MSE[E(Yt+1jxt)]; since the conditional expectation o ers the best possible forecast. For most applications a constant term will be included in the projection. We will use the symbol ^E to indicate a linear projection on a vector of random variables xt along a constant term: ^E(Yt+1jxt) ^P(Yt+1j1;xt): 4 2 Forecasts Based on an In nite Number of Ob- servation Recall that a general stationary and invertible ARMA(p; q) process is written in this form: (L)(Yt ) = (L)"t; (4) where (L) = 1 1L 2L2 ::: pLp, (L) = 1 + 1L + 2L2 + ::: + qLq and all the roots of (L) = 0 and (L) = 0 lie outside the unit circle. 2.1 Forecasting Based on Lagged "0s, MA(1) form Consider an MA(1) form of (4): Yt = ’(L)"t (5) with "t white noise and ’(L) = (L) 1(L) = 1X j=0 ’jLj; ’0 = 1; 1X j=0 j’jj < 1: Suppose that we have an in nite number of observations on " through date t, that is f"t; "t 1; "t 2; ::::g, and further know the value of and f’1; ’2; :::g: Say we want to forecast the value of Yt+s from now. Note that (5) implies Yt+s = + "t+s + ’1"t+s 1 + ::: + ’s 1"t+1 + ’s"t +’s+1"t 1 + :::: The best linear forecast takes the form ^E(Yt+sj"t; "t 1; :::) = + ’s"t + ’s+1"t 1 + :::: (6) = [ ; ’s; ’s+1; :::][1; "t; "t 1; :::]0 (7) = 0xt: (8) 5 The error associated with this forecast is uncorrelated with xt = [1; "t; "t 1; :::]0, or Efxt [Yt+s ^E(Yt+sj"t; "t 1; :::)]g = E 8 >>> >>< >>> >>: 2 66 66 66 4 1 "t "t 1 : : : 3 77 77 77 5 ("t+s + ’1"t+s 1 + ::: + ’s 1"t+1) 9 >>> >>= >>> >>; = 0: The mean squared error associated with this forecast is E[Yt+s ^E(Yt+sj"t; "t 1; :::)]2 = (1 + ’21 + ’22 + ::: + ’2s 1) 2: Example: For an MA(q) process, the optimal linear forecast is ^E(Yt+sj"t; "t 1; :::] = + s"t + s+1"t 1 + ::: + q"t q+s for s = 1; 2; :::; q for s = q + 1; q + 2; ::: The MSE is 2 for s = 1 (1 + 21 + 22 + ::: + 2s 1) 2 for s = 2; 3; :::; q ((1 + 21 + 22 + ::: + 2q) 2 for s = q + 1; q + 2; :::: The MSE increase with the forecast horizon s up until s = q. If we try to forecast an MA(q) farther than q periods into the future, the forecast is simply the unconditional mean of the series (E(Yt+s) = ) and the MSE is the uncondi- tional variance of the series (V ar(Yt+s) = (1 + 21 + 22 + ::: + 2q) 2). A compact lag operator expression for the forecast in (6) is sometimes used. Rewrite Yt+s as in (5) as Yt+s = + ’(L)"t+s = + ’(L)L s"t: 6 Consider polynomial that ’(L) are divided by Ls: ’(L) Ls = L s + ’1L1 s + ’2L2 s + ::: + ’s 1L 1 + ’sL0 +’s+1L1 + ’s+2L2 + :::: The annihilation operator replace negative powers of L by zero; for example, ’(L) Ls + = ’sL0 + ’s+1L1 + ’s+2L2 + :::: Therefore the optimal forecast (6) could be written in lag operator notation as ^E(Yt+sj"t; "t 1; :::] = + ’(L) Ls + "t: (9) 2.2 Forecasting Based on Lagged Y 0s The previous forecasts were based on the assumption that "t is observed directly. In the usual forecasting situation, we actually have observation on lagged Y 0s, not lagged "0s. Suppose that the general ARMA(p; q) has an AR(1) representation given by (L)(Yt ) = "t (10) with "t white noise and (L) = 1(L) (L) = 1X j=0 jLj = ’ 1(L); 0 = 1; 1X j=0 j jj < 1: Under these conditions, we can substitute (10) into (9) to obtain the forecast of Yts as a function of lagged Y 0s: ^E(Yt+sjYt; Yt 1; :::] = + ’(L) Ls + (L)(Yt ) (11) 7 or ^E(Yt+sjYt; Yt 1; :::] = + ’(L) Ls + 1 ’(L)(Yt ): (12) Equation (12) is known as the Wiener Kolmogorov prediction formula. 2.2.1 Forecasting an AR(1) Process 1. Use Wiener Kolmogorov prediction formula: For the covariance stationary AR(1) process, we have ’(L) = 11 L = 1 + L + 2L2 + 3L3 + ::: and ’(L) Ls + = s + s+1L1 + s+2L2 + ::: = s 1 L: The optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: ^E(Yt+sjYt; Yt 1; :::] = + s 1 L(1 L)(Yt ) = + s(Yt ): The forecast decays geometrically from (Yt ) toward as the forecast hori- zon s increase. 2. Use Recursive substitution and Lag operator: The AR(1) process can be represented as (using (1.1.9) on p.3 of Hamilton) Yt+s = s(Yt ) + s 1"t+1 + s 2"t+2 + ::: + "t+s 1 + "t+s; Setting E("t+h) = 0; h = 1; 2; :::; s, the optimal linear s-period ahead forecast for a stationary AR(1) process is therefore: ^E(Yt+sjYt; Yt 1; :::] = + s(Yt ); 8 which has MSE of forecast to be E( s 1"t+1 + s 2"t+2 + ::: + "t+s 1 + "t+s)2 = (1 + 2 + 4 + ::: + 2(s 1)) 2: Notice that this grows with s and asymptotically approach 2=(1 2), the un- conditional variance of Y . 2.2.2 Forecasting an AR(p) Process 1. Use Recursive substitution and Lag operator: Following (11) in Chapter 13, the value of Y at t + s of an AR(p) process can be represented as Yt+s = fs11(Yt ) + fs12(Yt 1 ) + ::: + fs1p(Yt p+1 ) + fs 111 "t+1 + fs 211 "t+2 + ::: +f111"t+s 1 + "t+s; where fj11 is the (1; 1) elements of Fj, F 2 66 66 66 66 4 1 2 3 : : p 1 p 1 0 0 : : 0 0 0 1 0 : : 0 0 : : : : : : : : : : : : : : : : : : : : : 0 0 0 : : 1 0 3 77 77 77 77 5 : Setting E("t+h) = 0; h = 1; 2; :::; s, the optimal linear s-period ahead forecast for a stationary AR(p) process is therefore: ^E(Yt+sjYt; Yt 1; :::] = + fs11(Yt ) + fs12(Yt 1 ) + ::: + fs1p(Yt p+1 ): The associated forecast error is Yt+s ^E(Yt+s) = fs 111 "t+1 + fs 211 "t+2 + ::: + f111"t+s 1 + "t+s: It is important to note that to forecast an AR(p) process, an optimal s-period- ahead linear forecast based on an in nite number of observations fYt; Yt 1; :::g in fact make use of only the p most recent value fYt; Yt 1; ::; Yt p+1g. 9 2.2.3 Forecasting an MA(1) Process An invertible MA(1) process: Yt = (1 + L)"t with j j < 1. 1. Applying the Wiener-Kolmogorov formula we have ^Yt+sjt = + 1 + L Ls + 1 1 + L(Yt ); To forecast an MA(1) process one period ahead (s = 1), 1 + L L1 + = ; and so ^Yt+sjt = + 1 + L(Yt ) = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) :::: To forecast an MA(1) process for s = 2; 3; ::: periods into the future, 1 + L Ls + = 0 for s = 2; 3; :::; an so ^Yt+sjt = for s = 2; 3; :::: 2. From recursive substitution: An MA(1) process at period t + 1 is Yt+1 = "t+1 + "t: At period t, E("t+s) = 0; s = 1; 2; :::. The optimal linear 1-period ahead forecast for a stationary MA(1) process is therefore: ^Yt+1jt = + "t = + (1 + L) 1(Yt ) = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) :::: 10 An MA(1) process at period t + s is Yt+s = "t+s + "t+s 1; To forecast an MA(1) process for s = 2; 3; ::: periods into the future therefore is ^Yt+sjt = for s = 2; 3; :::: 2.2.4 Forecasting an MA(q) Process For an invertible MA(q) process, (Yt ) = (1 + 1L + 2L2 + ::: + qLq)"t; the forecast becomes ^Yt+sjt = + 1 + 1L + 2L2 + ::: + qLq Ls + 1 1 + 1L + 2L2 + ::: + qLq (Yt ): Now 1 + 1L + 2L2 + ::: + qLq Ls + = s + s+1L + s+2L2 + ::: + qLq s for s = 1; 2; :::; q 0 for s = q + 1; q + 2; ::: Thus for horizons of s = 1; 2; :::; q, the forecast is given by ^Yt+sjt = + ( s + s+1L + s+2L2 + ::: + qLq s) 1 1 + 1L + 2L2 + ::: + qLq (Yt ): A Forecast farther then q periods into the future is simply the unconditional mean . It is important to note that to forecast an MA(q) process, an optimal s- period-ahead linear forecast would in principle requires all of the historical value of Y fYt; Yt 1; :::g. 11 3 Forecast Based on a Finite Number of Obser- vations The section continues to assume that population parameters are known with cer- tainty, but develops forecasts based on a nite m observations, fYt; Yt 1; :::; Yt m+1g 3.1 Approximations to Optimal Forecast In reality we do not have in nite number of observations for forecasting. One approach to forecasting based on a nite number of observations is to act as if presample Y ’s were all equal to its mean value (or " = 0). This idea is thus to use the approximation ^E(Yt+sjYt; Yt 1; :::) = ^E(Yt+sjYt; Yt 1; :::; Yt m+1; Yt m = ; Yt m 1 = ; :::): For example in the forecasting of an MA(1) process, the 1 period ahead forecast ^Yt+1jt = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) :::: is approximated by ^Yt+1jt = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) :::: + ( 1)m 1 m(Yt m+1 ): For m large and j j small (then the real di erence between Y and will mul- tiply a small and smaller number), this clearly gives an excellent approximation. For j j closer to unity, the approximation may be poor. 3.2 Exact Finite-Sample Forecast An alternative approach is to calculate the exact projection of Yt+1 on its m most recent values. Let xt = 2 66 66 66 66 4 1 Yt Yt 1 : : : Yt m+1 3 77 77 77 77 5 12 We thus seek a linear forecast of the form ^E(Yt+1jxt) = 0(m)xt = (m)0 + (m)1 Yt + (m)2 Yt 1 + ::: + (m)m Yt m+1: (13) The coe cient relating Yt+1 to Yt in a projection of Yt+1 on the m most recent value of Y is denoted (m)1 in (13). This will in general be di erent from the coe cient relating Yt+1 to Yt in a projection of Yt+1 on the (m + 1) most recent value of Y ; the latter coe cient would be denoted (m+1)1 . From (2) we know that E(Yt+1x0t) = 0E(xtx0t); or 0 = E(Yt+1x0t)[E(xtx0t)] 1; (14) assuming that E(xtx0t) is a nonsingular matrix. Since for a covariance stationary process Yt, j = E(Yt+j )(Yt ) = E(Yt+jYt) 2, here E(Yt+1x0t) = E(Yt+1)[1 Yt Yt 1 :::::: Yt m+1] = [ ( 1 + 2) ( 2 + 2) ::::: ( m + 2)] and E(xtx0t) = E 2 66 66 66 66 4 1 Yt Yt 1 : : : Yt m+1 3 77 77 77 77 5 1 Y t Yt 1 : : : Yt m+1 = 2 66 66 66 66 4 1 : : : 0 + 2 1 + 2 : : : m 1 + 2 1 + 2 0 + 2 : : : m 2 + 2 : : : : : : : : : : : : : : : : : : : : : m 1 + 2 m 2 + 2 : : : 0 + 2 3 77 77 77 77 5 : 13 Then 0(m) = [ ( 1 + 2) ( 2 + 2) ::::: ( m + 2)]2 66 66 66 66 4 1 : : : 0 + 2 1 + 2 : : : m 1 + 2 1 + 2 0 + 2 : : : m 2 + 2 : : : : : : : : : : : : : : : : : : : : : m 1 + 2 m 2 + 2 : : : 0 + 2 3 77 77 77 77 5 1 : When a constant term is included in xt, it is more convenient to express variables in deviations from the mean. Then we could calculate the projection of (Yt+1 ) on xt = [(Yt ); (Yt 1 ); ::::::; (Yt m+1 )]0: ^Yt+1jt = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ): For this de nition of xt the coe cients can be calculated from (14) to be (m) = 2 66 66 66 64 (m)1 (m)2 : : : (m)m 3 77 77 77 75 = 2 66 66 66 4 0 1 : : : m 1 1 0 : : : m 2 : : : : : : : : : : : : : : : : : : m 1 m 2 : : : 0 3 77 77 77 5 1 2 66 66 66 4 1 2 : : : m 3 77 77 77 5 : (15) To generate an s-period-ahead forecast ^Yt+sjt we would use ^Yt+sjt = (m;s)1 (Yt ) + (m;s)2 (Yt 1 ) + ::: + (m;s)m (Yt m+1 ); where 2 66 66 66 64 (m;s)1 (m;s)2 : : : (m;s)m 3 77 77 77 75 = 2 66 66 66 4 0 1 : : : m 1 1 0 : : : m 2 : : : : : : : : : : : : : : : : : : m 1 m 2 : : : 0 3 77 77 77 5 1 2 66 66 66 4 s s+1 : : : s+m 1 3 77 77 77 5 : 14