Ch. 15 Forecasting
Having considered in Chapter 14 some of the properties of ARMA models, we
now show how they may be used to forecast future values of an observed time
series. For the present we proceed as if the model were known exactly.
Forecasting is an important concept for the studies of time series analysis. In
the scope of regression model we usually has an existing economic theory model
for us to estimate their parameters. The estimated coe cients have already a
role to play such as to con rm some economic theories. Therefore, to forecast or
not from this estimated model depends on researcher’s own interest. However,
the estimated coe cients from a time series model have no signi cant meaning
to economic theory. An important role that a time series analysis is therefore to
be able to forecast precisely from this pure mechanical model.
1 Principle of Forecasting
1.1 Forecasts Based on Conditional Expectations
Suppose we are interested in forecasting the value of a variables Yt+1 based on a
set of variables xt observed at date t. For example, we might want to forecast
Yt+1 based on its m most recent values. In this case, xt = [Yt; Yt 1; ::::; Yt m+1]0.
Let Y t+1jt denote a forecast of Yt+1 based on xt (a function of xt, depending
on how they are realized). To evaluate the usefulness of this forecast, we need
to specify a loss function. A quadratic loss function means choosing the forecast
Y t+1jt so as to minimize
MSE(Y t+1jt) = E(Yt+1 Y t+1jt)2;
which is known as the mean squared error.
Theorem:
The smallest mean squared error of in the forecast Y t+1jt is the expectation of
Yt+1 conditional on xt:
Y t+1jt = E(Yt+1jxt):
1
Proof:
Let g(xt) be a forecasting function of Yt+1 other then the conditional expectation
E(Yt+1jxt). Then the MSE associated with g(xt) would be
E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt) + E(Yt+1jxt) g(xt)]2
= E[Yt+1 E(Yt+1jxt)]2
+2Ef[Yt+1 E(Yt+1jxt)][E(Yt+1jxt) g(xt)]g
+EfE[(Yt+1jxt) g(xt)]2g:
Denote t+1 Ef[Yt+1 E(Yt+1jxt)][E(Yt+1jxt) g(xt)]g we have
E( t+1jxt) = [E(Yt+1jxt) g(xt)] E([Yt+1 E(Yt+1jxt)]jxt)
= [E(Yt+1jxt) g(xt)] 0
= 0:
By laws of iterated expectation, it follows that
E( t+1) = ExtE(E[ t+1jxt]) = 0:
Therefore we have
E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt)]2 + EfE[(Yt+1jxt) g(xt)]2g: (1)
The second term on the right hand side of (1) cannot be made smaller than zero
and the rst term does not depend on g(xt). The function g(xt) that can makes
the mean square error (1) as small as possible is the function that sets the second
term in (1) to zero:
g(xt) = E(Yt+1jxt):
The MSE of this optimal forecast is
E[Yt+1 g(xt)]2 = E[Yt+1 E(Yt+1jxt)]2:
2
1.2 Forecasts Based on Linear Projection
Suppose we now consider only the class of forecast that Yt+1 is a linear function
of xt:
Y t+1jt = 0xt:
De nition:
The forecast 0xt is called the linear projection of Yt+1 on xt if the forecast error
(Yt+1 0xt) is uncorrelated with xt:
E[(Yt+1 0xt)x0t] = 00: (2)
Theorem:
The linear projection produces the smallest mean squared error among the class
of linear forecasting rule.
Proof:
Let g0xt be any arbitrary linear forecasting function of Yt+1. Then the MSE
associated with g0xt would be
E[Yt+1 g0xt]2 = E[Yt+1 0xt + 0xt g0xt]2
= E[Yt+1 0xt]2
+2Ef[Yt+1 0xt][ 0xt g0xt]g
+E[ 0xt g0xt]2:
Denote t+1 Ef[Yt+1 0xt][ 0xt g0xt]g we have
E( t+1) = Ef[Yt+1 0xt][ 0 g0]xtg
= (E[Yt+1 0xt]x0t)[ g]
= 00[ g]
= 00:
Therefore we have
E[Yt+1 g0xt]2 = E[Yt+1 0xt)]2 + E[ 0xt g0xt]2: (3)
The second term on the right hand side of (3) cannot be made smaller than zero
and the rst term does not depend on g0xt. The function g0xt that can makes
3
the mean square error (3) as small as possible is the function that sets the second
term in (3) to zero:
g0xt = 0xt:
The MSE of this optimal forecast is
E[Yt+1 g0xt]2 = E[Yt+1 0xt]2:
For 0xt is a linear projection of Yt+1 on xt, we will use the notation
^P(Yt+1jxt) = 0xt;
to indicate the linear projection of Yt+1 on xt. Notice that
MSE[ ^P(Yt+1jxt)] MSE[E(Yt+1jxt)];
since the conditional expectation o ers the best possible forecast.
For most applications a constant term will be included in the projection. We
will use the symbol ^E to indicate a linear projection on a vector of random
variables xt along a constant term:
^E(Yt+1jxt) ^P(Yt+1j1;xt):
4
2 Forecasts Based on an In nite Number of Ob-
servation
Recall that a general stationary and invertible ARMA(p; q) process is written in
this form:
(L)(Yt ) = (L)"t; (4)
where (L) = 1 1L 2L2 ::: pLp, (L) = 1 + 1L + 2L2 + ::: + qLq and
all the roots of (L) = 0 and (L) = 0 lie outside the unit circle.
2.1 Forecasting Based on Lagged "0s, MA(1) form
Consider an MA(1) form of (4):
Yt = ’(L)"t (5)
with "t white noise and
’(L) = (L) 1(L) =
1X
j=0
’jLj;
’0 = 1;
1X
j=0
j’jj < 1:
Suppose that we have an in nite number of observations on " through date t,
that is f"t; "t 1; "t 2; ::::g, and further know the value of and f’1; ’2; :::g: Say
we want to forecast the value of Yt+s from now. Note that (5) implies
Yt+s = + "t+s + ’1"t+s 1 + ::: + ’s 1"t+1 + ’s"t
+’s+1"t 1 + ::::
The best linear forecast takes the form
^E(Yt+sj"t; "t 1; :::) = + ’s"t + ’s+1"t 1 + :::: (6)
= [ ; ’s; ’s+1; :::][1; "t; "t 1; :::]0 (7)
= 0xt: (8)
5
The error associated with this forecast is uncorrelated with xt = [1; "t; "t 1; :::]0,
or
Efxt [Yt+s ^E(Yt+sj"t; "t 1; :::)]g = E
8
>>>
>><
>>>
>>:
2
66
66
66
4
1
"t
"t 1
:
:
:
3
77
77
77
5
("t+s + ’1"t+s 1 + ::: + ’s 1"t+1)
9
>>>
>>=
>>>
>>;
= 0:
The mean squared error associated with this forecast is
E[Yt+s ^E(Yt+sj"t; "t 1; :::)]2 = (1 + ’21 + ’22 + ::: + ’2s 1) 2:
Example:
For an MA(q) process, the optimal linear forecast is
^E(Yt+sj"t; "t 1; :::] =
+
s"t + s+1"t 1 + ::: + q"t q+s for s = 1; 2; :::; q
for s = q + 1; q + 2; :::
The MSE is
2 for s = 1
(1 + 21 + 22 + ::: + 2s 1) 2 for s = 2; 3; :::; q
((1 + 21 + 22 + ::: + 2q) 2 for s = q + 1; q + 2; ::::
The MSE increase with the forecast horizon s up until s = q. If we try to
forecast an MA(q) farther than q periods into the future, the forecast is simply
the unconditional mean of the series (E(Yt+s) = ) and the MSE is the uncondi-
tional variance of the series (V ar(Yt+s) = (1 + 21 + 22 + ::: + 2q) 2).
A compact lag operator expression for the forecast in (6) is sometimes used.
Rewrite Yt+s as in (5) as
Yt+s = + ’(L)"t+s
= + ’(L)L s"t:
6
Consider polynomial that ’(L) are divided by Ls:
’(L)
Ls = L
s + ’1L1 s + ’2L2 s + ::: + ’s 1L 1 + ’sL0
+’s+1L1 + ’s+2L2 + ::::
The annihilation operator replace negative powers of L by zero; for example,
’(L)
Ls
+
= ’sL0 + ’s+1L1 + ’s+2L2 + ::::
Therefore the optimal forecast (6) could be written in lag operator notation
as
^E(Yt+sj"t; "t 1; :::] = +
’(L)
Ls
+
"t: (9)
2.2 Forecasting Based on Lagged Y 0s
The previous forecasts were based on the assumption that "t is observed directly.
In the usual forecasting situation, we actually have observation on lagged Y 0s, not
lagged "0s. Suppose that the general ARMA(p; q) has an AR(1) representation
given by
(L)(Yt ) = "t (10)
with "t white noise and
(L) = 1(L) (L) =
1X
j=0
jLj = ’ 1(L);
0 = 1;
1X
j=0
j jj < 1:
Under these conditions, we can substitute (10) into (9) to obtain the forecast
of Yts as a function of lagged Y 0s:
^E(Yt+sjYt; Yt 1; :::] = +
’(L)
Ls
+
(L)(Yt ) (11)
7
or
^E(Yt+sjYt; Yt 1; :::] = +
’(L)
Ls
+
1
’(L)(Yt ): (12)
Equation (12) is known as the Wiener Kolmogorov prediction formula.
2.2.1 Forecasting an AR(1) Process
1. Use Wiener Kolmogorov prediction formula:
For the covariance stationary AR(1) process, we have
’(L) = 11 L = 1 + L + 2L2 + 3L3 + :::
and
’(L)
Ls
+
= s + s+1L1 + s+2L2 + ::: =
s
1 L:
The optimal linear s-period ahead forecast for a stationary AR(1) process is
therefore:
^E(Yt+sjYt; Yt 1; :::] = + s
1 L(1 L)(Yt )
= + s(Yt ):
The forecast decays geometrically from (Yt ) toward as the forecast hori-
zon s increase.
2. Use Recursive substitution and Lag operator:
The AR(1) process can be represented as (using (1.1.9) on p.3 of Hamilton)
Yt+s = s(Yt ) + s 1"t+1 + s 2"t+2 + ::: + "t+s 1 + "t+s;
Setting E("t+h) = 0; h = 1; 2; :::; s, the optimal linear s-period ahead forecast for
a stationary AR(1) process is therefore:
^E(Yt+sjYt; Yt 1; :::] = + s(Yt );
8
which has MSE of forecast to be
E( s 1"t+1 + s 2"t+2 + ::: + "t+s 1 + "t+s)2 = (1 + 2 + 4 + ::: + 2(s 1)) 2:
Notice that this grows with s and asymptotically approach 2=(1 2), the un-
conditional variance of Y .
2.2.2 Forecasting an AR(p) Process
1. Use Recursive substitution and Lag operator:
Following (11) in Chapter 13, the value of Y at t + s of an AR(p) process can
be represented as
Yt+s = fs11(Yt ) + fs12(Yt 1 ) + ::: + fs1p(Yt p+1 ) + fs 111 "t+1 + fs 211 "t+2 + :::
+f111"t+s 1 + "t+s;
where fj11 is the (1; 1) elements of Fj,
F
2
66
66
66
66
4
1 2 3 : : p 1 p
1 0 0 : : 0 0
0 1 0 : : 0 0
: : : : : : :
: : : : : : :
: : : : : : :
0 0 0 : : 1 0
3
77
77
77
77
5
:
Setting E("t+h) = 0; h = 1; 2; :::; s, the optimal linear s-period ahead forecast for
a stationary AR(p) process is therefore:
^E(Yt+sjYt; Yt 1; :::] = + fs11(Yt ) + fs12(Yt 1 ) + ::: + fs1p(Yt p+1 ):
The associated forecast error is
Yt+s ^E(Yt+s) = fs 111 "t+1 + fs 211 "t+2 + ::: + f111"t+s 1 + "t+s:
It is important to note that to forecast an AR(p) process, an optimal s-period-
ahead linear forecast based on an in nite number of observations fYt; Yt 1; :::g in
fact make use of only the p most recent value fYt; Yt 1; ::; Yt p+1g.
9
2.2.3 Forecasting an MA(1) Process
An invertible MA(1) process:
Yt = (1 + L)"t
with j j < 1.
1. Applying the Wiener-Kolmogorov formula we have
^Yt+sjt = +
1 + L
Ls
+
1
1 + L(Yt );
To forecast an MA(1) process one period ahead (s = 1),
1 + L
L1
+
= ;
and so
^Yt+sjt = +
1 + L(Yt )
= + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) ::::
To forecast an MA(1) process for s = 2; 3; ::: periods into the future,
1 + L
Ls
+
= 0 for s = 2; 3; :::;
an so
^Yt+sjt = for s = 2; 3; ::::
2. From recursive substitution:
An MA(1) process at period t + 1 is
Yt+1 = "t+1 + "t:
At period t, E("t+s) = 0; s = 1; 2; :::. The optimal linear 1-period ahead forecast
for a stationary MA(1) process is therefore:
^Yt+1jt = + "t
= + (1 + L) 1(Yt )
= + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) ::::
10
An MA(1) process at period t + s is
Yt+s = "t+s + "t+s 1;
To forecast an MA(1) process for s = 2; 3; ::: periods into the future therefore is
^Yt+sjt = for s = 2; 3; ::::
2.2.4 Forecasting an MA(q) Process
For an invertible MA(q) process,
(Yt ) = (1 + 1L + 2L2 + ::: + qLq)"t;
the forecast becomes
^Yt+sjt = +
1 +
1L + 2L2 + ::: + qLq
Ls
+
1
1 + 1L + 2L2 + ::: + qLq (Yt ):
Now
1 +
1L + 2L2 + ::: + qLq
Ls
+
=
s + s+1L + s+2L2 + ::: + qLq s for s = 1; 2; :::; q
0 for s = q + 1; q + 2; :::
Thus for horizons of s = 1; 2; :::; q, the forecast is given by
^Yt+sjt = + ( s + s+1L + s+2L2 + ::: + qLq s) 1
1 + 1L + 2L2 + ::: + qLq (Yt ):
A Forecast farther then q periods into the future is simply the unconditional mean
.
It is important to note that to forecast an MA(q) process, an optimal s-
period-ahead linear forecast would in principle requires all of the historical
value of Y fYt; Yt 1; :::g.
11
3 Forecast Based on a Finite Number of Obser-
vations
The section continues to assume that population parameters are known with cer-
tainty, but develops forecasts based on a nite m observations, fYt; Yt 1; :::; Yt m+1g
3.1 Approximations to Optimal Forecast
In reality we do not have in nite number of observations for forecasting. One
approach to forecasting based on a nite number of observations is to act as if
presample Y ’s were all equal to its mean value (or " = 0). This idea is thus to
use the approximation
^E(Yt+sjYt; Yt 1; :::) = ^E(Yt+sjYt; Yt 1; :::; Yt m+1; Yt m = ; Yt m 1 = ; :::):
For example in the forecasting of an MA(1) process, the 1 period ahead forecast
^Yt+1jt = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) ::::
is approximated by
^Yt+1jt = + (Yt ) 2(Yt 1 ) + 3(Yt 2 ) :::: + ( 1)m 1 m(Yt m+1 ):
For m large and j j small (then the real di erence between Y and will mul-
tiply a small and smaller number), this clearly gives an excellent approximation.
For j j closer to unity, the approximation may be poor.
3.2 Exact Finite-Sample Forecast
An alternative approach is to calculate the exact projection of Yt+1 on its m most
recent values. Let
xt =
2
66
66
66
66
4
1
Yt
Yt 1
:
:
:
Yt m+1
3
77
77
77
77
5
12
We thus seek a linear forecast of the form
^E(Yt+1jxt) = 0(m)xt = (m)0 + (m)1 Yt + (m)2 Yt 1 + ::: + (m)m Yt m+1: (13)
The coe cient relating Yt+1 to Yt in a projection of Yt+1 on the m most recent
value of Y is denoted (m)1 in (13). This will in general be di erent from the
coe cient relating Yt+1 to Yt in a projection of Yt+1 on the (m + 1) most recent
value of Y ; the latter coe cient would be denoted (m+1)1 .
From (2) we know that
E(Yt+1x0t) = 0E(xtx0t);
or
0 = E(Yt+1x0t)[E(xtx0t)] 1; (14)
assuming that E(xtx0t) is a nonsingular matrix.
Since for a covariance stationary process Yt, j = E(Yt+j )(Yt ) =
E(Yt+jYt) 2, here
E(Yt+1x0t) = E(Yt+1)[1 Yt Yt 1 :::::: Yt m+1]
= [ ( 1 + 2) ( 2 + 2) ::::: ( m + 2)]
and
E(xtx0t) = E
2
66
66
66
66
4
1
Yt
Yt 1
:
:
:
Yt m+1
3
77
77
77
77
5
1 Y
t Yt 1 : : : Yt m+1
=
2
66
66
66
66
4
1 : : :
0 + 2 1 + 2 : : : m 1 + 2
1 + 2 0 + 2 : : : m 2 + 2
: : : : : : :
: : : : : : :
: : : : : : :
m 1 + 2 m 2 + 2 : : : 0 + 2
3
77
77
77
77
5
:
13
Then
0(m) = [ ( 1 + 2) ( 2 + 2) ::::: ( m + 2)]2
66
66
66
66
4
1 : : :
0 + 2 1 + 2 : : : m 1 + 2
1 + 2 0 + 2 : : : m 2 + 2
: : : : : : :
: : : : : : :
: : : : : : :
m 1 + 2 m 2 + 2 : : : 0 + 2
3
77
77
77
77
5
1
:
When a constant term is included in xt, it is more convenient to express
variables in deviations from the mean. Then we could calculate the projection of
(Yt+1 ) on xt = [(Yt ); (Yt 1 ); ::::::; (Yt m+1 )]0:
^Yt+1jt = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ):
For this de nition of xt the coe cients can be calculated from (14) to be
(m) =
2
66
66
66
64
(m)1
(m)2
:
:
:
(m)m
3
77
77
77
75
=
2
66
66
66
4
0 1 : : : m 1
1 0 : : : m 2
: : : : : :
: : : : : :
: : : : : :
m 1 m 2 : : : 0
3
77
77
77
5
1 2
66
66
66
4
1
2
:
:
:
m
3
77
77
77
5
: (15)
To generate an s-period-ahead forecast ^Yt+sjt we would use
^Yt+sjt = (m;s)1 (Yt ) + (m;s)2 (Yt 1 ) + ::: + (m;s)m (Yt m+1 );
where
2
66
66
66
64
(m;s)1
(m;s)2
:
:
:
(m;s)m
3
77
77
77
75
=
2
66
66
66
4
0 1 : : : m 1
1 0 : : : m 2
: : : : : :
: : : : : :
: : : : : :
m 1 m 2 : : : 0
3
77
77
77
5
1 2
66
66
66
4
s
s+1
:
:
:
s+m 1
3
77
77
77
5
:
14