5-1
Chapter 5
Univariate time series
modelling and forecasting
5-2
1 introduction
? 单变量时间序列模型
– 只利用变量的过去信息和可能的误差项的当前和过去值来建模和预测
的一类模型 (设定 )。
– 与结构模型不同;通常不依赖于经济和金融理论
– 用于描述被观测数据的经验性相关特征
? ARIMA( AutoRegressive Integrated Moving Average)是
一类重要的时间序列模型
– Box-Jenkins 1976
? 当结构模型不适用时,时间序列模型却很有用
– 如引起因变量变化的因素中包含不可观测因素,解释变量等观测频率
较低。结构模型常常不适用于进行预测
? 本章主要解决两个问题
– 一个给定参数的时间序列模型,其变动特征是什么?
– 给定一组具有确定性特征的数据,描述它们的合适模型是什么?
5-3
? A Strictly Stationary Process
A strictly stationary process is one where
? For any t1,t2,…,tn ∈ Z,any m ∈ Z,n=1,2,…
? A Weakly Stationary Process
If a series satisfies the next three equations,it is said to be weakly
or covariance stationary
1,E(yt) = ?,t = 1,2,...,?
2.
3,? t1,t2
2 Some Notation and Concepts
P y b y b P y b y bt t n t m t m nn n{,.,,,} {,.,,,}1 11 1? ? ? ? ?? ?
E y yt t t t( )( )1 2 2 1? ? ? ?? ? ?
E y yt t( )( )? ? ? ? ?? ? ? 2
5-4
? So if the process is covariance stationary,all the variances are
the same and all the covariances depend on the difference
between t1 and t2,The moments
,s = 0,1,2,...
are known as the covariance function.
? The covariances,?s,are known as autocovariances.
? However,the value of the autocovariances depend on the units
of measurement of yt.
? It is thus more convenient to use the autocorrelations which
are the autocovariances normalised by dividing by the
variance:
,s = 0,1,2,...
If we plot ?s against s=0,1,2,..,then we obtain the
autocorrelation function (acf) or correlogram.
Some Notation and Concepts
? ??s s?
0
E y E y y E yt t t s t s s( ( ))( ( ))? ? ?? ? ?
5-5
? A white noise process is one with no discernible structure.
? Thus the autocorrelation function will be zero apart from a
single peak of 1 at s = 0.
? 如果假设 yt服从标准正态分布,则 ? approximately N(0,1/T)
? We can use this to do significance tests for the autocorrelation
coefficients by constructing a confidence interval.
? a 95% confidence interval would be given by,
? If the sample autocorrelation coefficient,,falls outside this
region for any value of s,then we reject the null hypothesis that
the true value of the coefficient at lag s is zero.
A White Noise Process
E y
V ar y
if t r
ot he r wi s e
t
t
t r
( )
( )
?
?
? ?
?
?
??
?
?
? ?
2
2
0
??s
T
196.1 ??
s??
5-6
? We can also test the joint hypothesis that all m of the ?k
correlation coefficients are simultaneously equal to zero using
the Q-statistic developed by Box and Pierce:
where T = sample size,m = maximum lag length
? The Q-statistic is asymptotically distributed as a,
? However,the Box Pierce test has poor small sample
properties,so a variant has been developed,called the Ljung-
Box statistic:
? This statistic is very useful as a portmanteau (general) test of
linear dependence in time series.
Joint Hypothesis Tests
?m2
?
?
?
m
k
kTQ
1
2?
? ? 2
1
2
~2 m
m
k
k
kTTTQ ?
??
?
?
???
5-7
? Question:
Suppose that we had estimated the first 5 autocorrelation
coefficients using a series of length 100 observations,and found
them to be (from 1 to 5),0.207,-0.013,0.086,0.005,-0.022.
Test each of the individual coefficient for significance,and use
both the Box-Pierce and Ljung-Box tests to establish whether they
are jointly significant.
? Solution:
A coefficient would be significant if it lies outside (-0.196,+0.196)
at the 5% level,so only the first autocorrelation coefficient is
significant.
Q=5.09 and Q*=5.26
Compared with a tabulated ?2(5)=11.1 at the 5% level,so the 5
coefficients are jointly insignificant.
An ACF Example (p234)
5-8
? Let ut (t=1,2,3,...) be a sequence of independently and
identically distributed (iid) random variables with E(ut)=0 and
Var(ut)= ?2,then
yt = ? + ut + ?1ut-1 + ?2ut-2 +,.,+ ?qut-q
is a qth order moving average model MA(q).
? Or using the lag operator notation:
Lyt = yt-1 Liyt = yt-i
通常,可以将常数项从方程中去掉,而并不失一般性。
3 Moving Average Processes
t
q
i
tt
i
it uLuuLy )(
1
???? ????? ?
?
qq LLLL ???? ????? ?2211)(
5-9
移动平均过程的性质
? Its properties are
E( yt )=?
Var( yt ) = ?0 = (1+ )?2
Covariances
自相关函数
? ? ?12 22 2? ? ?.,, q
??
?
?
?
?
?????
? ???
qsf o r
qsf o rsqqsss
s 0
,.,,,2,1).,,( 22211 ????????
?
5-10
Consider the following MA(2) process:
where ut is a zero mean white noise process with variance,
(i) Calculate the mean and variance of Xt
(ii) Derive the autocorrelation function for this process (i.e.
express the autocorrelations,?1,?2,..,as functions of the
parameters ?1 and ?2).
(iii) If ?1 = -0.5 and ?2 = 0.25,sketch the acf of Xt.
Example of an MA Process
2211 ?? ??? tttt uuuX ??
2?
5-11
(i) If E(ut )=0,then E(ut-i )=0 ? i,So
E(Xt ) = E(ut + ?1ut-1+?2ut-2 )= E(ut )+ ?1E(ut-1 )+ ?2E(ut-2 )=0
Var(Xt ) = E[Xt -E(Xt )][Xt -E(Xt )]
Var(Xt) = E[(Xt )(Xt )]
= E[(ut + ?1ut-1+?2ut-2)(ut + ?1ut-1+?2ut-2)]
= E[ +cross-products]
But E[cross-products]=0,since Cov(ut,ut-s)=0 for s?0,So
Var(Xt ) = ?0= E [ ]
=
=
Solution
2 2222 1212 ?? ?? ttt uuu ??
2 2222 1212 ?? ?? ttt uuu ??
2222212 ????? ??
22221 )1( ??? ??
5-12
(ii) The acf of Xt
?1 = E[Xt-E(Xt )][Xt-1-E(Xt-1 )]
= E[Xt ][Xt-1 ]
= E[(ut +?1ut-1+?2ut-2 )(ut-1 + ?1ut-2+?2ut-3 )]
= E[( )]
=
=
?2 = E[Xt -E(Xt )][Xt-2 -E(Xt-2 )]
= E[Xt ][Xt-2 ]
= E[(ut +??1ut-1+?2ut-2 )(ut-2 +?1ut-3+?2ut-4 )]
= E[( )]
=
Solution (cont’d)
2 2212 11 ?? ? tt uu ???
22121 ????? ?
2211 )( ???? ?
2 22 ?tu?
22??
5-13
?3 = E[Xt ][Xt-3 ]
= E[(ut +?1ut-1+?2ut-2 )(ut-3 +?1ut-4+?2ut-5 )]
= 0
So ?s = 0 for s > 2.
now calculate the autocorrelations:
Solution (cont’d)
? ??0 00 1? ?
? ??s s s? ? ? ?0 0 2
)1(
)(
)1(
)(
2
2
2
1
211
22
2
2
1
2
211
0
1
1 ??
???
???
????
?
??
??
??
??
???
)1()1(
)(
2
2
2
1
2
22
2
2
1
2
2
0
2
2 ??
?
???
??
?
??
???????
5-14
(iii) For ?1 = -0.5 and ?2 = 0.25,substituting these into the
formulae above gives ?1 = - 0.476,?2 = 0.190.
Thus the acf plot will appear as follows:
ACF Plot
-0, 6
-0, 4
-0, 2
0
0, 2
0, 4
0, 6
0, 8
1
1, 2
0 1 2 3 4 5 6
s
a
c
f
5-15
? An autoregressive model of order p,AR(p) can be expressed
as
Or using the lag operator notation:
Lyt = yt-1 Liyt = yt-i
? or
or where
4 Autoregressive Processes
? ? ? ?( ) (,,, )L L L Lp p? ? ? ?1 1 2 2
tptpttt uyyyy ?????? ??? ????,,,2211
?
?
? ???
p
i
titit uyy
1
??
?
?
???
p
i
tt
i
it uyLy
1
??
tt uyL ?? ?? )(
5-16
? 平稳性使 AR模型具有一些很好的性质 。 如前期误差项对当前
值的影响随时间递减 。
? The condition for stationarity of a general AR(p) model is that
the roots of 特征方程
all lie outside the unit circle.
? Example 1,Is yt = yt-1 + ut stationary?
The characteristic root is 1,so it is a unit root process (so non-
stationary)
? Example 2,p241
? A stationary AR(p) model is required for it to have an MA(?)
representation.
The Stationary Condition
for an AR Model
1 01 2 2? ? ? ? ?? ? ?z z zp p.,,
5-17
? States that any stationary series can be decomposed into the
sum of two unrelated processes,a purely deterministic part
and a purely stochastic part,which will be an MA(?).
? For the AR(p) model,,ignoring the intercept,
the Wold decomposition is
where,
可以证明, 算子多项式 R(L)的集合与代数多项式 R(z)的集
合是同结构的, 因此可以对算子 L做加, 减, 乘和比率运算 。
Wold’s Decomposition Theorem
tt uyL ?)(?
tt uLy )(??
?
?
?????
???? ?
3
3
2
21
12
21
1
)1()(
LLL
LLLL pp
???
????
5-18
? The moments of an autoregressive process are as follows,The
mean is given by*
? The autocovariances and autocorrelation functions can be
obtained by solving what are known as the Yule-Walker
equations,*
? If the AR model is stationary,the autocorrelation function will
decay exponentially to zero.
The Moments of an Autoregressive Process
pppp
pp
pp
??????
??????
??????
????
????
????
??
?
?
.,,
.,,
.,,
2211
22112
12111
???
p
tyE ???
?
????? ?211)(
5-19
? Consider the following simple AR(1) model
(i) Calculate the (unconditional) mean of yt.
For the remainder of the question,set ?=0 for simplicity.
(ii) Calculate the (unconditional) variance of yt.
(iii) Derive the autocorrelation function for yt.
Sample AR Problem
ttt uyy ??? ? 11??
5-20
(i) E(yt)= E(?+?1yt-1) =? +?1E(yt-1)
But also
E(yt)=? +?1 (?+?1E(yt-2))
= ? +?1?+?12 E(yt-2))
= ?+?1?+?12 (?+?1E(yt-3))
= ?+?1?+?12?+?13 E(yt-3)
An infinite number of such substitutions would give
E(yt)= ?(1+?1+?12+...) + ?1?y0
So long as the model is stationary,i.e.,then ?1? = 0.
So E(yt)= ?(1+?1+?12+...) =
Solution
1211 ??? ??? ttt uyy ??
11 ?
?
?
11 ??
5-21
(ii) Calculating the variance of yt,*
From Wold’s decomposition theorem:
So long as,this will converge.
Solution (cont’d)
ttt uyy ?? ?11?
tt uLy ?? )1( 1?
tt uLy
1
1 )1(
??? ?
tt uLLy,..)1(
22
11 ???? ??
11 ??
...22111 ???? ?? tttt uuuy ??
5-22
Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0,since we are setting ? = 0.
Var(yt) = E[(yt)(yt)] *有简便方法
= E[ ]
= E[
= E[
=
=
=
Solution (cont’d)
? ?? ?...,2211122111 ?????? ???? tttttt uuuuuu ????
)].,,( 224121212 p r o d u c t sc r o s suuu ttt ????? ?? ??
.,, ) ]( 224121212 ??? ?? ttt uuu ??
...2412212 ??? uuu ?????
...)1( 41212 ??? ??? u
)1( 21
2
?
?
?
u
5-23
(iii) Turning now to calculating the acf,first calculate the
autocovariances,( *用简便方法 )
?1 = Cov(yt,yt-1) = E[yt-E(yt)][yt-1-E(yt-1)]
?1 = E[ytyt-1]
?1 = E[ ]
= E[
=
=
Solution (cont’d)
...)( 22111 ??? ?? ttt uuu ??,,, )( 321211 ??? ??? ttt uuu ??
].,,2231211 p r o d u c tsc r o s suu tt ???? ?? ??
...25123121 ??? ??????
)1( 21
2
1
?
??
?
5-24
Solution (cont’d)
For the second autocorrelation coefficient,
?2 = Cov(yt,yt-2) = E[yt-E(yt)][yt-2-E(yt-2)]
Using the same rules as applied above for the lag 1 covariance
?2 = E[ytyt-2]
= E[ ]
= E[
=
=
=
.,, )( 22111 ??? ?? ttt uuu ??,..)( 421312 ??? ??? ttt uuu ??
].,,23412221 p r o d u c t sc r o s suu tt ???? ?? ??
...241221 ?? ????
.,, )1( 4121221 ??? ????
)1( 21
22
1
?
??
?
5-25
Solution (cont’d)
? If these steps were repeated for ?3,the following expression
would be obtained
?3 =
and for any lag s,the autocovariance would be given by
?s =
The acf can now be obtained by dividing the covariances by
the variance:
)1( 21
23
1
?
??
?
)1( 21
2
1
?
??
?
s
5-26
Solution (cont’d)
?0 =
?1 = ?2 =
?3 =

?s =
1
0
0 ?
?
?
1
2
1
2
2
1
2
1
0
1
)1(
)1(
?
?
?
?
??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
2
1
2
1
2
2
1
22
1
0
2
)1(
)1(
?
?
?
?
??
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
31?
s1?
5-27
? Measures the correlation between an observation k periods
ago and the current observation,after controlling for
observations at intermediate lags (i.e,all lags < k).
? yt-k与 yt之间的偏自相关函数 ?kk 是在给定 yt-k+1,yt-k+2,…,yt-1 的
条件下, yt-k与 yt之间的部分相关 。
? So ?kk measures the correlation between yt and yt-k after
removing the effects of yt-k+1,yt-k+2,…,yt-1,
或者说, 偏自相关函数 ?kk 是对 yt-k与 yt之间未被 yt-k+1,yt-k+2,
…,yt-1所解释的 相关的度量 。
? At lag 1,the acf = pacf always
? At lag 2,?22 = (?2-?12) / (1-?12)
? For lags 3+,the formulae are more complex.
5 The Partial Autocorrelation
Function (denoted ?kk)
5-28
? The pacf is useful for telling the difference between an AR
process and an MA process.
? In the case of an AR(p),there are direct connections between
yt and yt-s only for s? p.
? So for an AR(p),the theoretical pacf will be zero after lag p.
? In the case of an MA(q),this can be written as an AR(?),so
there are direct connections between yt and all its previous
values,
? For an MA(q),the theoretical pacf will be geometrically
declining.
The Partial Autocorrelation Function
5-29
The invertibility condition
? If MA(q) process can be expressed as an AR(∞),then MA(q) is
invertible.
? MA(q)的可逆性条件:特征方程 根的绝对值大于 1。
从而有
t
q
i
tt
i
it uLuuLy )(
1
?? ??? ?
?
qq LLLL ???? ????? ?2211)(
0)( ?z?
t
i
tt
i
itt uyyLayLayL ???? ?
?
?
?
1
1 )()(?
??
?
??
1i
tt
i
it uyLcy
5-30
? By combining the AR(p) and MA(q) models,we can obtain an
ARMA(p,q) model:
where
and
or
with
6 ARMA Processes
? ? ? ?( ),,,L L L Lp p? ? ? ? ?1 1 2 2
qq LLLL ???? ?????,,,1)( 221
tt uLyL )()( ??? ??
tqtqttptpttt uuuuyyyy ?????????? ?????? ???????,,,.,, 22112211
stuuEuEuE sttt ????,0)(;)(;0)( 22 ?
5-31
? ARMA过程的特征是 AR和 MA的组合。
? 可逆性条件,Similar to the stationarity condition,we
typically require the MA(q) part of the model to have roots of
?(z)=0 greater than one in absolute value,
? The mean of an ARMA series is given by
? The autocorrelation function for an ARMA process will
display combinations of behaviour derived from the AR and
MA parts,but for lags beyond q,the acf will simply be
identical to the individual AR(p) model,
ARMA 过程的特征
E y t
p
( ),,,? ? ? ? ??? ? ?1
1 2
5-32
An autoregressive process has
? a geometrically decaying acf,拖尾
? number of spikes尖峰信号 of pacf = AR order,截尾
A moving average process has
? Number of spikes of acf = MA order:截尾
? a geometrically decaying pacf,拖尾
? A ARMA process has
? a geometrically decaying acf,拖尾
? a geometrically decaying pacf,拖尾
Summary of the Behaviour of the acf
and pacf for AR and MA Processes
5-33
The acf and pacf are estimated using 100,000 simulated observations with
disturbances drawn from a normal distribution.
ACF and PACF for an MA(1) Model,yt = – 0.5ut-1 + ut
Some sample acf and pacf plots
for standard processes
-0, 4 5
-0, 4
-0, 3 5
-0, 3
-0, 2 5
-0, 2
-0, 1 5
-0, 1
-0, 0 5
0
0, 0 5
1 2 3 4 5 6 7 8 9 10
L a g
a
c
f
a
n
d
p
a
c
f
a c f
p a c f
5-34ACF and PACF for an MA(2) Model:
yt = 0.5ut-1 - 0.25ut-2 + ut
-0, 4
-0, 3
-0, 2
-0, 1
0
0, 1
0, 2
0, 3
0, 4
1 2 3 4 5 6 7 8 9 10
L a g s
a
c
f
a
n
d
p
a
c
f
a c f
p a c f
5-35
- 0, 1
0
0, 1
0, 2
0, 3
0, 4
0, 5
0, 6
0, 7
0, 8
0, 9
1
1 2 3 4 5 6 7 8 9 10
La gs
a
c
f
a
nd
pa
c
f
a c f
p a c f
ACF and PACF for a slowly decaying
AR(1) Model,yt = 0.9 yt-1 + ut
5-36
ACF and PACF for a more rapidly decaying
AR(1) Model,yt = 0.5 yt-1 + ut
-0, 1
0
0, 1
0, 2
0, 3
0, 4
0, 5
0, 6
1 2 3 4 5 6 7 8 9 10
L a g s
a
c
f
a
n
d
p
a
c
f
a cf
p a cf
5-37
ACF and PACF for a AR(1) Model with
Negative Coefficient:yt = -0.5 yt-1 + ut
-0, 6
-0, 5
-0, 4
-0, 3
-0, 2
-0, 1
0
0, 1
0, 2
0, 3
1 2 3 4 5 6 7 8 9 10
L a g s
a
c
f
a
n
d
p
a
c
f
a cf
p a cf
5-38
ACF and PACF for a Non-stationary
Model ( a unit coefficient),yt = yt-1 + ut
0
0, 1
0, 2
0, 3
0, 4
0, 5
0, 6
0, 7
0, 8
0, 9
1
1 2 3 4 5 6 7 8 9 10
L a g s
a
c
f
a
n
d
p
a
c
f
a c f
p a c f
5-39ACF and PACF for an ARMA(1,1):
yt = 0.5yt-1 + 0.5ut-1 + ut
-0, 4
-0, 2
0
0, 2
0, 4
0, 6
0, 8
1 2 3 4 5 6 7 8 9 10
L a g s
a
c
f
a
n
d
p
a
c
f
a cf
p a cf
5-40
? Box and Jenkins (1970) were the first to approach the task of
estimating an ARMA model in a systematic manner,There
are 3 steps to their approach:
1,Identification
2,Estimation
3,Model diagnostic checking
Step 1:
- Involves determining the order of the model.
- Use of graphical procedures,data,acf,pacf
- A better procedure is now available
7 Building ARMA Models
- The Box Jenkins Approach
5-41
Step 2:
- Estimation of the parameters
- Can be done using least squares or maximum likelihood
depending on the model.
Step 3:
- Model checking
Box and Jenkins suggest 2 methods:
- deliberate overfitting
- residual diagnostics 一般只涉及自相关检验。
Box –Jenkins诊 断检验方法只能针对参数不足的模型,而不能
针对参数过多的模型。
Building ARMA Models
- The Box Jenkins Approach (cont’d)
5-42
? We want to form a parsimonious model because
- variance of estimators is inversely proportional to the
number of degrees of freedom.
- models which are big might be inclined to fit to data specific
features,which would not be replicated out-of-sample.
? Identification would typically not be done using acf and pacf.
? This gives motivation for using information criteria,which
embody 2 factors
- a term which is a function of the RSS
- some penalty for the loss of degrees of freedom from adding
extra parameters
? The object is to choose the number of parameters which
minimises the information criterion.
Some More Recent Developments in
ARMA Modelling
5-43
? The information criteria vary according to how stiff the
penalty term is.
? The three most popular criteria are
? Akaike’s (1974) information criterion (AIC),
? Schwarz’s (1978) Bayesian information criterion (SBIC),
? the Hannan-Quinn criterion (HQIC).
where k = p + q + 1,T = sample size.
So we min,IC s.t.
Information Criteria for Model Selection
A IC k T? ?ln ( ? ) /? 2 2
p p q q? ?,
TTkS B I C ln)?l n ( 2 ?? ?
))l n ( l n (2)?l n ( 2 TT kHQI C ?? ?
5-44
? SBIC embodies a stiffer penalty term than AIC.
? Which IC should be preferred if they suggest
different model orders?
– SBIC is strongly consistent ( but inefficient).
– AIC is not consistent,and will typically pick
“bigger” models.
? No criterion is definitely superior to others
Information Criteria for Model Selection
5-45
? As distinct from ARMA models,The I stands for integrated.
? An integrated autoregressive process is one with a
characteristic root on the unit circle.
? Typically researchers difference the variable as necessary and
then build an ARMA model on those differenced variables.
? An ARMA(p,q) model in the variable differenced d times is
equivalent to an ARIMA(p,d,q) model on the original data.
ARIMA Models
5-46
? Another modelling and forecasting technique
? How much weight do we attach to previous observations?
? Expect recent observations to have the most power in
helping to forecast future values of a series.
? The equation for the model
St = ? yt + (1-?)St-1 (1)
or St = St-1 +? (yt-1 -St-1)
? is the smoothing constant,with 0???1
yt is the current realised value
St is the current smoothed value
11 Exponential Smoothing
5-47
? Lagging (1) by one period we can write
St-1 = ? yt-1 + (1-?)St-2 (2)
? and lagging again
St-2 = ? yt-2 + (1-?)St-3 (3)
? Substituting into (1) for St-1 from (2)
St =? yt + (1-?)(?yt-1 + (1-?)St-2)
= ? yt + (1-?)?yt-1 + (1-?)2 St-2 (4)
? Substituting into (4) for St-2 from (3)
St = ? yt + (1-?)?yt-1 + (1-?)2 St-2
= ? yt + (1-?)?yt-1 + (1-?)2(?yt-2 + (1-?)St-3)
= ? yt + (1-?)?yt-1 + (1-?)2?yt-2 + (1-?)3 St-3
Exponential Smoothing
5-48
? T successive substitutions of this kind would lead to
since ??0,the effect of each observation declines exponentially
as we move another observation forward in time.
? Forecasts are generated by
ft+s = St
for all steps into the future s = 1,2,...
? This technique is called single (or simple) exponential
smoothing.
Exponential Smoothing
? ? ? ? 0
0
11 SyS T
T
i
it
i
t ??? ????
??
?
? ?? ?
?
?
5-49
? It doesn’t work well for financial data because it is simplistic and
inflexible
– there is little structure to smooth
– it is an ARIMA(0,1,1) with MA coefficient (1-?)
– it cannot allow for seasonality
– forecasts do not converge on long term mean as s??
? Can modify single exponential smoothing
– to allow for trends (Holt’s method)
– to allow for seasonality (Winter’s method).
? Advantages of Exponential Smoothing
– Very simple to use
– Easy to update the model if a new realisation becomes available.
Exponential Smoothing
5-50
? Forecasting = prediction.尝试确定序列可能的将来值 。
? An important test of the adequacy of a model.
预测是非常有用的 。 e.g.
- Forecasting tomorrow’s return on a particular share
- Forecasting the price of a house given its characteristics
- Forecasting the riskiness of a portfolio over the next year
- Forecasting the volatility of bond returns
? We can distinguish two approaches:
- Econometric (structural) forecasting,较适合长期
- Time series forecasting
– The distinction is somewhat blurred (e.g,VARs).
? Point forecast点预测,predict a single value对每一时段
? Interval forecast区间预测:给定置信区间
Forecasting in Econometrics
5-51
? Expect the,forecast” of the model to be good in-sample.
? Say we have some data - e.g,monthly FTSE returns for 120
months,1990M1 – 1999M12,We could use all of it to build
the model,or keep some observations back:
? A good test of the model since we have not used the
information from 1999M1 onwards when we estimated the
model parameters.
In-Sample Versus Out-of-Sample
5-52
Some terminology
? One-step ahead forecast
? multi-step ahead forecasts
? s step ahead forecast
? s step ahead extrapolation forecast
? Forecasting horizon范围,水平:预测期的时间跨度
? 预测随预测长度的变化而变化
? 最优预测模型也随着预测长度变化而变化
? Recursive windows (samples):样本长度随时间增加
? Rolling windows (samples):样本随时间移动但长度固

? e.g,p281
5-53
How to produce forecasts
? To understand how to construct forecasts,we need the idea of
conditional expectations:
E(yt+1 ??t )
? We cannot forecast a white noise process,
E(ut+s ??t ) = 0 ? s > 0.
? The two simplest forecasting,methods”
1,Assume no change, f(yt+s) = E(yt+s ??t ) = yt, 对随机
游动是最优预测
2,Forecasts are the long term average f(yt+s) = 对平
稳序列来说优于 1。
y
5-54
Models for Forecasting
? 对于时间序列预测,时间序列模型通常优于结构模型
? Structural models
e.g,
To forecast y,we require the conditional expectation of its
future value:
But what are etc.? We could use etc.,so
= !!
tktktt uxxy ????? ??? ?221
? ? ? ?tktkttt uxxEyE ?????? ? ??? ?2211
? ? ? ?ktkt xExE ??? ???? ?221
)( 2tx? 2x
? ? kkt xxyE ??? ???? ?221
y
5-55
Models for Forecasting
? Time Series Models
The current value of a series,yt,is modelled as a function
only of its previous values and the current value of an error
term (and possibly previous values of the error term).
? Models include:
? simple unweighted averages
? exponentially weighted averages
? ARIMA models
? Non-linear models – e.g,threshold models,GARCH,
bilinear models,etc.
5-56
The forecasting model typically used is of the form:
where ft,s = yt+s,s? 0; ut+s = 0,s > 0
= ut+s,s ? 0
Forecasting with ARMA Models
??
?
??
?
? ???
q
j
jstj
p
i
istist uff
11
,,???
5-57
? An MA(q) only has memory of length q.
e.g,say we have estimated an MA(3) model:
yt = ? + ?1ut-1 + ? 2ut-2 + ? 3ut-3 + ut
yt+1 = ? + ? 1ut + ? 2ut-1 + ? 3ut-2 + ut+1
yt+2 = ? + ? 1ut+1 + ? 2ut + ? 3ut-1 + ut+2
yt+3 = ? + ? 1ut+2 + ? 2ut+1 + ? 3ut + ut+3
? We are at time t and we want to forecast 1,2,...,s steps ahead.
? We know yt,yt-1,...,and ut,ut-1,...。
Forecasting with MA Models
5-58
ft,1 =yt+1,t = E(yt+1 ?t ) =E(? + ? 1ut + ? 2ut-1 + ? 3ut-2 + ut+1)
= ? + ? 1ut + ? 2ut-1 + ? 3ut-2
ft,2 =yt+2,t = E(yt+2 ?t ) =E(? + ? 1ut+1 + ? 2ut + ? 3ut-1 + ut+2)
= ? + ? 2ut + ? 3ut-1
ft,3 =yt+3,t = E(yt+3 ?t ) =E(? + ? 1ut+2 + ? 2ut+1 + ? 3ut + ut+3)
= ? + ? 3ut
ft,4 =yt+4,t = E(yt+4 ?t ) = ?
ft,s =yt+s,t = E(yt+s ?t ) = ? ?s ? 4
Forecasting with MA Models
5-59
? 自回归模型有无限记忆 。 Say we have estimated an AR(2)
yt = ? + ?1yt-1 + ? 2yt-2 + ut
yt+1 = ? + ? 1yt + ? 2yt-1 + ut+1
yt+2 = ? + ? 1yt+1 + ? 2yt + ut+2
yt+3 = ? + ? 1yt+2 + ? 2yt+1 + ut+3
ft,1 = E(yt+1 ? t )= E(? + ? 1yt + ? 2yt-1 + ut+1)
= ? + ? 1E(yt) + ? 2E(yt-1)
= ? + ? 1yt + ? 2yt-1
ft,2 = E(yt+2 ? t )= E(? + ? 1yt+1 + ? 2yt + ut+2)
= ? + ? 1E(yt+1) + ? 2E(yt)
= ? + ? 1 ft,1 + ? 2yt
Forecasting with AR Models
5-60
ft,3 = E(yt+3 ? t ) = E(? + ? 1yt+2 + ? 2yt+1 + ut+3)
= ? + ? 1E(yt+2) + ? 2E(yt+1)
= ? + ? 1 ft,2 + ? 2 ft,1
? We can see immediately that
ft,4 = ? + ? 1 ft,3 + ? 2 ft,2 etc.,so
ft,s = ? + ? 1 ft,s-1 + ? 2 ft,s-2
? Can easily generate ARMA(p,q) forecasts in the same way.
Forecasting with AR Models
5-61
?For example,say we predict that tomorrow’s return on the
FTSE will be 0.2,but the outcome is actually -0.4,Is this
accurate? Define ft,s as the forecast made at time t for s steps
ahead (i.e,the forecast made for time t+s),and yt+s as the
realised value of y at time t+s.
? Some of the most popular criteria for assessing the accuracy
of time series forecasting techniques are:*
Mean absolute percentage error:
How can we test whether a forecast is
accurate or not?
??
?
?
?
? ??????
T
Tt
Ttt
h
s
stst yyTTfyhM S E
1
1
2
1,
11
2
,)(1
1)(1
??
?
?
? ?
? ?
???
?? T
Tt t
Ttth
s st
stst
y
yy
TTy
fy
hM A P E
1
1 1,
11
,
1
100100
5-62
? It has,however,also recently been shown (Gerlow et al.,
1993) that the accuracy of forecasts according to traditional
statistical criteria are not related to trading profitability.
? A measure more closely correlated with profitability:
% correct sign predictions
where zt+s = 1 if (yt+s, ft,s ) > 0
zt+s = 0 otherwise
? 对于 大的误差的惩罚程度由小到大依次为
符号预测 —— →MAE —— →MSE
How can we test whether a forecast is
accurate or not?
?
?
??
h
s
stzh
1
1
5-63
? Given the following forecast and actual values,calculate the
MSE,MAE and percentage of correct sign predictions:
? MSE = 0.079,MAE = 0.180,
? % of correct sign predictions = 40
Forecast Evaluation Example
S te ps A he a d F or e c a st A c tua l
1 0.20 - 0.40
2 0.15 0.20
3 0.10 0.10
4 0.06 - 0.10
5 0.04 - 0.05
5-64Statistical Versus Economic or
Financial loss functions
? Statistical evaluation methods may not be appropriate.How well
does the forecast perform in doing the job we wanted it for?
Limits of forecasting,What can and cannot be forecast?
? All statistical forecasting models are essentially extrapolative
? Forecasting models are prone to break down around turning
points
? Series subject to structural changes or regime shifts cannot be
forecast
? Predictive accuracy usually declines with forecasting horizon
? Forecasting is not a substitute for judgement
The Usually Optimal Approach
To use a statistical forecasting model built on solid theoretical
foundations supplemented by expert judgements and
interpretation,