Ch. 16 Stochastic Model Building
Unlike linear regression model which usually has an economic theoretic model
built somewhere in economic literature, the time series analysis of a stochastic
process needs the ability to relating a stationary ARMA model to real data. It is
usually best achieved by a three-stage iterative procedure based on identification,
estimation, and diagnostic checking as suggested by Box and Jenkins (1976).
1 Model Identi cation
By identi cation we mean the use of the data, and of any information on how
the series was generated, to suggest a subclass of parsimonious model worthy to
be entertained. We usually transform the data, if necessary, so the assumption of
covariance stationarity is a reasonable one. We then at this stage make an initial
guess of small values of p and q for an ARMA(p; q) model that might describe
the transformed data.
1.1 Identifying the degree of Di erence
Trend stationary or di erence stationary ? See Ch. 19.
1.2 Use of the Autocorrelation and Partial Autocorrela-
tion Function in Identi cation
1.2.1 Autocorrelation
Recall that if the data really follow an MA(q) process, then its (population)
autocorrelation rj(= j= 0) will be zero for j > q. By contrast, if the data
follow an AR(p) process, then rj will gradually decay toward zero as a mixture
of exponential or damped sinusoids. On guide for distinguishing MA and AR
representation, then, would be the decay properties of rj. It is useful to have a
rough check on whether rj is e ectively zero beyond a certain lag.
1
A natural estimate of the population autocorrelation rj is provided by the
corresponding sample moment: (remember at this stage, you still have no "model"
to estimate, so it is natural to use moment estimator)
^rj = ^ j^
0
;
where
^ j = 1T
TX
t=j+1
(Yt Y )(Yt j Y ) for j = 0; 1; 2; :::; T 1
Y = 1
T
TX
t=1
Yt:
If the data were really generated by a Gauss MA(q) process, then the covariance
of the estimated autocorrelation ^rj, could be approximated by (see Box et al.
(1994), p. 33)
V ar(^rj) = 1T
(
1 + 2
qX
i=1
r2i
)
for j = q + 1; q + 2; ::: (1)
To use (1) in practice, the estimated autocorrelation ^rj(j = 1; 2; :::; q) are
substituted for the theoretical autocorrelation rj, and when this is done we shall
refer to the square root of (1) as the large lag standard error. In particu-
lar, if we suspect that the data were generated by Gaussian white noise, then
^rj N(0; 1=T) for j 6= 0, that is ^rj should lie between 2=pT about 95% of the
time.
Example:
The following estimated autocorrelations were obtained from a time series of
length T = 200 observations, generated from a stochastic process for which it
was known that r1 = 0:4 and rj = 0 for j 2:
^r1 = 0:38, ^r2 = 0:08, ^r3 = 0:11, ^r4 = 0:08, ^r5 = 0:02, ^r6 = 0:00, ^r7 = 0:00,
^r8 = 0:00, ^r9 = 0:07 and ^r10 = 0:08.
On the assumption that the series is complete random: H0 : q = 0, then for
all lags, (1) yields
V ar(^r1) = 1T = 1200 = 0:005:
2
Under the null hypothesis,
^r1 N(0; 0:005)
or the 95% con dence interval is
2 < ^r1p0:005 < 2
0:14 < ^r1 < 0:14:
Since the value of estimated ^r1 is 0:38, which is outside the con dence interval,
it can be conclude that the hypothesis that q = 0 is rejected.
It might be reasonable to ask next whether the series was compatible with
the hypothesis that q = 1. Using (1) with q = 1, the estimated large-lag variance
under this assumption is
V ar(^r2) = 1200[1 + 2( 0:38)2] = 0:0064:
Under the null hypothesis,
^r2 N(0; 0:0064)
or the 95% con dence interval is
2 < ^r2p0:0064 < 2
0:16 < ^r2 < 0:16:
Since the value of estimated ^r2 is 0:08, which is lying in the con dence interval,
it can be conclude that the hypothesis that q = 1 is accepted.
1.2.2 Partial Autocorrelation Function
Another useful measures is the partial autocorrelation which is a device to
exploits the fact that whereas an AR(p) has an autocorrelation function which
is in nite in extent, it can by its very nature be described in terms of p nonzero
functions of the autocorrelations. The mth population partial autocorrelation
3
(denoted (m)m ) is de ned as the last coe cient in a linear projection of Y on its
m most recent value:
^Yt+1jt = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ): (2)
We saw in (15) of Chapter 15 that the vector (m) can be calculated from
2
66
66
66
64
(m)1
(m)2
:
:
:
(m)m
3
77
77
77
75
=
2
66
66
66
4
0 1 : : : m 1
1 0 : : : m 2
: : : : : :
: : : : : :
: : : : : :
m 1 m 2 : : : 0
3
77
77
77
5
1 2
66
66
66
4
1
2
:
:
:
m
3
77
77
77
5
:
Recall that if the data were really generated by an AR(p) process, only the
p most recent values of Y would be useful for forecasting. In this case, the
projection coe cients on Y ’s more than p periods in the past are equal to zeros:
(m)m = 0 for m = p + 1; p + 2; :::
By contrast, if the data really were generated by an MA(q) process with q 1,
then the partial autocorrelation (m)m asymptotically approaches zero instead of
cutting o abruptly.
Since forecast error "t+1 is uncorrelated with xt, we could rewrite (2) as
Yt+1 = (m)1 (Yt ) + (m)2 (Yt 1 ) + ::: + (m)m (Yt m+1 ) + "t+1; t 2 T
or
Yt = (m)1 (Yt 1 ) + (m)2 (Yt 2 ) + ::: + (m)m (Yt m ) + "t; t 2 T : (3)
The reason why the quantity (m)m de ned through (2) is called the partial autocorrelation
of the process fYtg at lag m is clear from (3), since it is actually equal to the
partial correlation between the variable Yt and Yt m adjusted for the interme-
diate variables Yt 1; Yt 2; :::; Yt m+1, and (m)m measures the correlation between
Yt and Yt m after adjusting for the e ect of Yt 1; Yt 2; :::; Yt m+1 (or the corre-
lation between Yt and Yt m not account for by Yt 1; Yt 2; :::; Yt m+1). See the
4
counterpart-result from sample on p.6 of Chapter 6.
A natural estimate of the mth partial autocorrelations is the last coe cients
in an OLS regression of Y on a constant and its m most recent values:
Yt = ^c + ^ (m)1 Yt 1 + ^ (m)2 Yt 2 + ::: + ^ (m)m Yt m + ^et; (4)
where ^et denotes the OLS regression residual. If the data were really generated by
an AR(p) process, then the sample estimate ^ (m)m would have a variance around
the true value (0) that could be approximated by (see Box et al. 1994, p.68)
V ar(^ (m)m ) = 1T for m = p + 1; p + 2; :::
1.3 Use of Model Selection Criteria
Another approach to model selection is the use of information criteria such as AIC
proposed by Akaike (1974) or the BIC of Schwarz (1978). In the implementation
of this approach, a range of potential ARMA models is estimated by maximum
likelihood methods to be discussed in Chapter 17, and for each, a criterion such
as AIC (normalized by sample size T, given by
AICp;q = 2 ln(maximized likelihood) + 2mT ln(^ 2) + 2mT
or the related BIC given by
BICp;q = ln(^ 2) + m ln(T)T
is evaluated, where ^ 2 denotes the maximum likelihood estimate of 2, and
m = p + q + 1 denotes the number of parameters estimated in the model, in-
cluding a constant term. In the criteria above, the rst term essentially corre-
sponds to minus 2=T times the log of the maximized likelihood, while the second
term is a "penalty factor" for inclusion of additional parameters in the model. In
the information criteria approach, models that yield a minimum value for the
criterion are to be preferred, and the AIC or BIC values are compared among
various model as the basis for selection of the models. However, one immediate
5
disadvantage of this approach is that several models may have to be estimated by
MLE, which is computationally time consuming and expensive. For this reason,
Hannan and Rissanen (1982) propose an alternative model selection procedure.
2 Model Estimation
By estimation we mean e cient use of the data to make inference about pa-
rameters conditional on the adequacy of the model entertained. See Chapter 17
for details.
3 Model Diagnostic Checking
By diagnostic checking we mean checking the tted model in its relation to
the data with intent to reveal model inadequacies and so to achieve model im-
provement.
Suppose that using a particular time series, the model has been identi ed and
the parameters estimated using the methods described in Chapter 17. The ques-
tion remains (unlike the regression analysis where an economic or nance model
is provided by theoretical literature) of deciding whether this model is adequate.
If there should be evidence of serious inadequacy, we shall need to know how the
model should be modi ed. By reference to familiar procedures outside time se-
ries analysis, the scrutiny of residuals for the analysis of variance would be called
diagnostic checks.
3.1 Diagnostic Checks Applied to residuals
It cannot be too strongly emphasized that visual inspection of a plot of the residual
is an indispensable rst step in the checking process.
6
3.1.1 Autocorrelation Check
Suppose we have identi ed and tted a model
(L)Yt = (L)"t
with MLE estimator (^ ; ^ ) obtained for the parameters. Then we shall refer the
quantities
^"t = ^ 1(L)^ (L)Yt
as the residuals. The residuals are computed recursive from ^ (L)^"t = ^ (L)Yt as
^"t = Yt
pX
j=1
^ jYt j +
qX
j=1
^ j ^"t j t = 1; 2; :::; T
using either zero initial values (conditional method) or back-forecasted initial
value (exact method) for the initial ^"0s and Y 0s.
Now it is possible to show that if the model is adequate,
^"t = "t + O
1
pT
: (read as big O T 1=2; it means this term
has to multiply T1=2 to be bounded: That is; it converges to zero itself !)
As the series length increase, the ^"t’s become close to the white noise "t’s. There-
fore, one might expect that study of the ^"t’s could indicate the existence and
nature of model adequacy. In particular, recognizable patterns in the estimated
autocorrelations function of the ^"t’s, ^rj(^"), and using (1), could point out to ap-
propriate modi cation in the model.
3.1.2 Portmanteau Lack-of-Fit Test
Rather than consider the ^rj(^")’s individually, an indication is often needed of
whether, say, the rst 20 autocorrelations of the ^"t’s taken as a whole, indicating
inadequacy of the model. Suppose we have the rst k autocorrelation 1 ^rj(^"); j =
1Here, k is chosen su ciently large so that the weight ’j in the model written in the form
Yt = (L) 1 (L)"t = ’(L)"t
will be negligible small after j = k.
7
1; 2; :::; k form any ARMA(p; q) process; then it is possible to show that if the
model is appropriate, the Box-Pierce (1970) Q statistics
Q = T
kX
j=1
^r2j (^");
is approximately distributed as 2k. On the other hand, if the model is inappro-
priate, the average value of Q will be in ated. A re nement that appears to have
better nite-sample properties is the Ljung-Box (1979) statistics:
Q0 = T(T + 2)
kX
j=1
^r2j (^")
T k:
The limiting distribution of Q0 is the same as that of Q.
Exercise:
Build up a stochastic model to the data set I give to you from Box-Jenkins
procedure.
8