Ch. 17 Maximum Likelihood Estimation
The identi cation process having led to a tentative formulation for the model, we
then need to obtain e cient estimates of the parameters. After the parameters
have been estimated, the tted model will be subjected to diagnostic checks .
This chapter contains a general account of likelihood method for estimation of
the parameters in the stochastic model.
Consider an ARMA (from model identi cation) model of the form
Yt = c + 1Yt 1 + 2Yt 2 + ::: + pYt p + "t + 1"t 1
+ 2"t 2 + ::: + q"t q;
with "t white noise:
E("t) = 0
E("t" ) =
2 for t =
0 otherwise :
This chapter explores how to estimate the value of (c; 1; :::; p; 1; :::; q; 2)
on the basis of observations on Y .
The primary principle on which estimation will be based is maximum likelihood
estimation. Let = (c; 1; :::; p; 1; :::; q; 2)0 denote the vector of population
parameters. Suppose we have observed a sample of size T (y1; y2; :::; yT). The
approach will be to calculate the joint probability density
fYT ;YT 1;:::;Y1(yT ; yT 1; :::; y1; ); (1)
which might loosely be viewed as the probability of having observed this particular
sample. The maximum likelihood estimate (MLE) of is the value for which
this sample is most likely to have been observed; that is, it is the value of that
maximizes (1).
This approach requires specifying a particular distribution for the white noise
process "t. Typically we will assume that "t is gaussian white noise:
"t i:i:d: N(0; 2):
1
1 MLE of a Gaussian AR(1) Process
1.1 Evaluating the Likelihood Function Using (Scalar) Con-
ditional Density
A stationary Gaussian AR(1) process takes the form
Yt = c + Yt 1 + "t; (2)
with "t i:i:d: N(0; 2) and j j < 1 (How do you know at this stage ?). For this
case, = (c; ; 2)0.
Consider the p:d:f of Y1, the rst observations in the sample. This is a random
variable with mean and variance
E(Y1) = = c1 and
V ar(Y1) =
2
1 2:
Since f"tg1t= 1 is Gaussian, Y1 is also Gaussian. Hence,
fY1(y1; ) = fY1(y1; c; ; 2)
= 1p2 p 2=(1 2) exp
12 fy1 [c=(1 )]g
2
2=(1 2)
:
Next consider the distribution of the second observation Y2 conditional on the
observing Y1 = y1. From (2),
Y2 = c + Y1 + "2: (3)
Conditional on Y1 = y1 means treating the random variable Y1 as if it were the
deterministic constant y1. For this case, (3) gives Y2 as the constant (c + y1)
plus the N(0; 2) variable "2. Hence,
(Y2jY1 = y1) N((c + y1); 2);
meaning that
fY2jY1(y2jy1; ) = 1p2 2 exp
12 (y2 c y1)
2
2
:
The joint density of observations 1 and 2 is then just
fY2;Y1(y2; y1; ) = fY2jY1(y2jy1; )fY1(y1; ):
2
Similarly, the distribution of the third observation conditional on the rst
two is
fY3jY2;Y1(y3jy2; y1; ) = 1p2 2 exp
12 (y3 c y2)
2
2
form which
fY3;Y2;Y1(y3; y2; y1; ) = fY3jY2;Y1(y3jy2; y1; )fY2;Y1(y2; y1; )
= fY3jY2;Y1(y3jy2; y1; )fY2jY1(y2jy1; )fY1(y1; ):
In general, the value of Y1; Y2; :::; Yt 1 matter for Yt only through the value
Yt 1, and the density of observation t conditional on the preceding t 1 observa-
tions is given by
fYtjYt 1;Yt 2;:::;Y1(ytjyt 1; yt 2; :::; y1; )
= fYtjYt 1(ytjyt 1; )
= 1p2 2 exp
12 (yt c yt 1)
2
2
:
The likelihood of the complete sample can thus be calculated as
fYT ;YT 1;YT 2;:::;Y1(yT ; yT 1; yT 2; :::; y1; ) = fY1(y1; )
TY
t=2
fYtjYt 1(ytjyt 1; ): (4)
The log likelihood function (denoted L( )) is therefore
L( ) = log fY1(y1; ) +
TX
t=2
log fYtjYt 1(ytjyt 1; ): (5)
The log likelihood for a sample of size T from a Gaussian AR(1) process is
seen to be
L( ) = 12 log(2 ) 12 log[ 2=(1 2)] fy1 [c=(1 )]g
2
2 2=(1 2)
[(T 1)=2] log(2 ) [(T 1)=2] log( 2)
TX
t=2
(y
t c yt 1)2
2 2
: (6)
3
1.2 Evaluating the Likelihood Function Using (Vector)
Joint Density
A di erent description of the likelihood function for a sample of size T from a
Gaussian AR(1) process is some time useful. Collect the full set of observations
in a (T 1) vector,
y (Y1; Y2; :::; YT )0:
The mean of this (T 1) vector is
E(y) =
2
66
66
66
4
E(Y1)
E(Y2)
:
:
:
E(YT )
3
77
77
77
5
=
2
66
66
66
4
:
:
:
3
77
77
77
5
= ;
where = c=(1 ). The variance -covariance of y is
= E[(y )(y )0] = 2 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
= 2V
where
V = 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
:
The sample likelihood function is therefore the multivariate Gaussian density:
fY(y; ) = (2 ) T=2j 1j1=2 exp
12(y )0 1(y )
;
with log likelihood
L( ) = ( T=2) log(2 ) + 12 logj 1j 12(y )0 1(y ): (7)
(6) and (7) must represent the identical likelihood function.
4
It is easy to verify by direct multiplication that L0L = V 1, with
L =
2
66
66
66
4
p1 2 0 : : : 0
1 0 : : 0
0 1 0 : 0
: : : : : :
: : : : : :
0 0 : : 1
3
77
77
77
5
:
Then (7) becomes
L( ) = ( T=2) log(2 ) + 12 logj 2L0Lj 12(y )0 2L0L(y ): (8)
De ne the (T 1) vector ~y to be
~y L(y )
=
2
66
66
66
4
p1 2 0 : : : 0
1 0 : : 0
0 1 0 : 0
: : : : : :
: : : : : :
0 0 : : 1
3
77
77
77
5
2
66
66
66
4
Y1
Y2
Y3
:
:
YT
3
77
77
77
5
=
2
66
66
66
4
p1 2(Y
1 )
(Y2 ) (Y1 )
(Y3 ) (Y2 )
:
:
(YT ) (YT 1 )
3
77
77
77
5
=
2
66
66
66
4
p1 2[Y
1 c=(1 )]
Y2 c Y1
Y3 c Y2
:
:
YT c YT 1
3
77
77
77
5
:
The last term in (8) can thus be written
1
2(y )
0 2L0L(y ) =
1
2 2
~y0~y
=
1
2 2
(1 2)[Y1 c=(1 )]2 +
1
2 2
TX
t=2
(Yt c Yt 1)2:
5
The middle term in (8) is similarly
1
2 logj
2L0Lj = 1
2 logf
2T jL0Ljg
= 12 log 2T + 12 logjL0Lj
= T2 log 2 + 12 logfjL0jjLjg since L is triangular
= T2 log 2 + logjLj
= T2 log 2 + 12 log(1 2):
Thus equation (6) and (7) are just two di erent expressions for the same magni-
tude. Either expression accurately describes the log likelihood function.
1.3 Exact Maximum Likelihood Estimators for the Gaus-
sian AR(1) Process
The MLE ^ is the value for which (6) is maximized. In principle, this requires
di erentiating (6) and setting the result equal to zero. In practice, when an
attempt is made to carry this out, the result is a system of nonlinear equation
in and (Y1; Y2; :::; YT) for which there is no simple solution for in terms of
(Y1; Y2; :::; YT ). Maximization of (6) thus requires iterative or numerical proce-
dure described in p.21 of Chapter 3.
1.4 Conditional Maximum Likelihood Estimation
An alternative to numerical maximization of the exact likelihood function is to
regard the value of y1 as deterministic and maximize the likelihood conditioned
on the rst observation
fYT ;YT 1;YT 2;::;Y2jY1(yT ; yT 1; yT 2; :::; y2jy1; ) =
TY
t=2
fYtjYt 1(ytjyt 1; );
6
the objective then being to maximize
L ( ) = [(T 1)=2] log(2 ) [(T 1)=2] log( 2)
TX
t=2
(y
t c yt 1)2
2 2
= [(T 1)=2] log(2 ) [(T 1)=2] log( 2)
TX
t=2
"2
t
2 2
(9)
Maximization of (9) with respect to c and is equivalent to minimization of
TX
t=2
(yt c yt 1)2 = (y X )0(y X ); (10)
which is achieved by an ordinary least square (OLS) regression of yt on a constant
and its own lagged value, where
y =
2
66
66
66
4
y2
y3
:
:
:
yT
3
77
77
77
5
;X =
2
66
66
66
4
1 y1
1 y2
: :
: :
: :
1 yT 1
3
77
77
77
5
; and =
c
:
The conditional maximum likelihood estimates of c and are therefore given by
^c
^
=
T 1 PT
t=2 yt 1PT
t=2 yt 1
PT
t=2 y
2
t 1
1 PT
t=2 yt 1PT
t=2 yt 1yt
:
The conditional maximum likelihood estimator of 2 is found by setting
@L
@ 2 =
(T 1)
2 2 +
TX
t=2
(y
t c yt 1)2
2 4
= 0
or
^ 2 =
TX
t=2
"
(yt ^c ^ yt 1)2
T 1
#
:
It is important to note if you have a sample of size T to estimate an AR(1) process
by conditional MLE, you will only use T 1 observation of this sample.
7
2 MLE of a Gaussian AR(p) Process
This section discusses a Gaussian AR(p) process,
Yt = c + 1Yt 1 + 2Yt 2 + ::: + pYt p + "t;
with "t i:i:d: N(0; 2). In this case, the vector of population parameters to be
estimated is = (c; 1; 2; :::; p; 2)0.
2.1 Evaluating the Likelihood Function
We rst collect the rst p observation in the sample (Y1; Y2; :::; Yp) in a (p 1)
vector yp which has mean vector p with each element
= c1
1 2 ::: p
and variance-covariance matrix is given by
2Vp =
2
66
66
66
66
4
0 1 2 : : : p 1
1 0 1 : : : p 2
2 1 0 : : : p 3
: : : : : : :
: : : : : : :
: : : : : : :
p 1 p 2 p 3 : : : 1
3
77
77
77
77
5
The density of the rst p observations is then
fYp;Yp 1;:::;Y1(yp; yp 1; :::; y1; ) = (2 ) p=2j 2V 1p j1=2 exp
12 2 (yp p)0V 1p (yp p)
= (2 ) p=2( 2)p=2jV 1p j1=2 exp
12 2 (yp p)0V 1p (y p)
For the remaining observations in the sample (Yp+1; Yp+2; :::; YT ), conditional
on the rst t 1 observations, the tth observations is gaussian with mean
c + 1yt 1 + 2yt 2 + ::: + pyt p
8
and variance 2. Only the p most recent observations matter for this distribution.
Hence for t > p
fYtjYt 1;:::;Y1(ytjyt 1; :::; y1; ) = fYtjYt 1;::;Yt p(ytjyt 1; ::; yt p; )
= 1p2 2 exp
(y
t c 1yt 1 2yt 2 ::: pyt p)2
2 2
:
The likelihood function for the complete sample is then
fYT ;YT 1;:::;Y1(yT ; yT 1; :::; y1; ) = fYp;Yp 1;:::;Y1(yp; yp 1; :::; y1; )
TY
t=p+1
fYtjYt 1;::;Yt p(ytjyt 1; ::; yt p; );
and the loglikelihood is therefore
L( ) = log fYT ;YT 1;:::;Y1(yT ; yT 1; :::; y1; )
= p2 log(2 ) p2 log( 2) + 12 logjV 1p j 12 2 (yp p)0V 1p (y p)
T p2 log(2 ) T p2 log( 2)
TX
t=p+1
(yt c 1yt 1 2yt 2 ::: pyt p)2
2 2 :
Maximization of this exact log likelihood of an AR(p) process must be accom-
plished numerically.
2.2 Conditional maximum Likelihood estimates
The log of the likelihood conditional on the rst p observation assume the simple
form
L ( ) = log fYT ;YT 1;::;Yp+1jYp;:::;Y1(yT ; yT 1; ::yp+1jyp; :::; y1; )
= T p2 log(2 ) T p2 log( 2)
TX
t=p+1
(yt c 1yt 1 2yt 2 ::: pyt p)2
2 2
= T p2 log(2 ) T p2 log( 2)
TX
t=p+1
"2t
2 2: (11)
9
The value of c; 1; :::; p that maximizes (11) are the same as those that minimize
TX
t=p+1
(yt c 1yt 1 2yt 2 ::: pyt p)2:
Thus, the conditional MLE of these parameters can be obtained from an OLS
regression of yt on a constant and p of its own lagged values. The conditional MLE
estimator of 2 turns out to be the average squared residual from this regression:
^ 2 = 1T p
TX
t=p+1
(yt ^c ^ 1yt 1 ^ 2yt 2 ::: ^ pyt p)2:
It is important to note if you have a sample of size T to estimate an AR(p) process
by conditional MLE, you will only use T p observation of this sample.
10
3 MLE of a Gaussian MA(1) Process
This section discusses a Gaussian MA(1) process,
Yt = + "t + "t 1 (12)
with "t i:i:d: N(0; 2). In this case, the vector of population parameters to be
estimated is = ( ; ; 2)0.
3.1 Evaluating the Likelihood Function
The observations in the sample (Y1; Y2; :::; YT) in a (T 1) vector y which has
mean vector with each element and variance-covariance matrix given by
= E(y )(y )0 = 2
2
66
66
66
66
4
(1 + 2) 0 : : : 0
(1 + 2) : : : 0
0 (1 + 2) : : : 0
: : : : : : :
: : : : : : :
: : : : : : :
0 0 0 : : : (1 + 2)
3
77
77
77
77
5
The likelihood function is then
fYT ;YT 1;:::;Y1(yT ; yT 1; :::; y1; ) = (2 ) T=2j j 1=2 exp
12(y )0 1(y )
Using triangular factorization of the variances covariance matrix, the likelihood
function can be written
fYT ;YT 1;:::;Y1(yT; yT 1; :::; y1; ) = (2 ) T=2
" TY
t=1
dtt
# 1=2
exp
"
12
TX
t=1
~y2t
dtt
#
and the loglikelihood is therefore
L( ) = log fYT ;YT 1;:::;Y1(yT; yT 1; :::; y1; )
= T2 log(2 ) 12
TX
t=1
dtt 12
TX
t=1
~y2t
dtt ;
11
where
dtt = 2 1 +
2 + 4 + ::: + 2t
1 + 2 + 4 + ::: + 2(t 1)
and
~yt = yt [1 +
2 + 4 + ::: + 2t]
1 + 2 + 4 + ::: + 2(t 1) ~yt 1:
Maximization of this exact log likelihood of an MA(1) process must be ac-
complished numerically.
3.2 Evaluating the Likelihood Function Using (Scalar) Con-
ditional Density
Consider the p:d:f of Y1,
Y1 = + "1 + "0;
the rst observations in the sample. This is a random variable with mean and
variance
E(Y1) =
V ar(Y1) = 2(1 + 2):
Since f"tg1t= 1 is Gaussian, Y1 is also Gaussian. Hence,
Y1 N( ; (1 + 2) 2)
or
fY1(y1; ) = fY1(y1; ; ; 2)
= 1p2 p 2(1 + 2) exp
12 (y1 )
2
2(1 + 2)
:
Next consider the distribution of the second observation Y2 conditional on the
"observing" Y1 = y1. From (12),
Y2 = + "2 + "1: (13)
12
(Following the method in calculating the joint density of the complete sample
of AR process) Conditional on Y1 = y1 means treating the random variable Y1
as if it were the deterministic constant y1. For this case, (13) gives Y2 as the
constant ( + "1) plus the N(0; 2) variable "2. However, it is not the case
since observing Y1 = y1 give no information on the realization of "1 for you can
not distinguish "1 from "0 even after rst observation y1.
3.2.1 Conditional Maximum Likelihood Estimation
To make the conditional density fY2jY1(y2jy1; ) feasible, we must impose an ad-
ditional assumption such as that we know with certainty that "0 = 0.
Suppose that we know for certain that "0 = 0. Then
(Y1j"0 = 0) N( ; 2)
or
fY1j"0=0(y1j"0 = 0; ) = 1p2 2 exp
12 (y1 )
2
2 2
= 1p2 2 exp
"
2
1
2 2
:
Moreover, given observation of y1, the value of "1 is then known with certainty
as well:
"1 = y1 :
Hence
(Y2jY1 = y1; "0 = 0) N(( + "1); 2);
meaning that
fY2jY1;"0=0(y2jy1; "0 = 0; ) = 1p2 2 exp
12 (y2 "1)
2
2 2
= 1p2 2 exp
"
2
2
2 2
:
Since "1 is know with certainty, "2 can be calculated from
"2 = y2 "1:
13
Proceeding in this fashion, it is clear that given knowledge that "0 = 0, the
full sequence f"1; "2; :::; "Tg can be calculated from fy1; y2; :::; yTg by iterating on
"t = yt "t 1
for t = 1; 2; :::; T, starting from "0 = 0. The condition density of the tth ob-
servation can then be calculated as joint density of observations 1 and 2 is then
just
fYtjYt 1;Yt 2;:::;Y1;"0=0(ytjyt 1; yt 2; :::; y1; "0 = 0; ) = fYtj"t 1(ytj"t 1; )
= 1p2 2 exp
"
2
t
2 2
:
The likelihood (conditional on "0 = 0) of the complete sample can thus be
calculated as the product of these individual densities:
fYT ;YT 1;YT 2;:::;Y1j"0=0(yT; yT 1; yT 2; :::; y1j"0 = 0; )
= fY1j"0=0(y1j"0 = 0; )
TY
t=2
fYtjYt 1;Yt 2;:::;Y1;"0=0(ytjyt 1; yt 2; :::; y1; "0 = 0; ):
The conditional log likelihood function (denoted L ( )) is therefore
L ( ) = log fYT ;YT 1;YT 2;:::;Y1j"0=0(yT ; yT 1; yT 2; :::; y1j"0 = 0; )
= T2 log(2 ) T2 log( 2)
TX
t=1
"2t
2 2: (14)
The data implied in the log likelihood function can be calculated from the itera-
tion
(Yt ) = (1 + L)"t
and then (the reason why invertibility is needed)
"t = (1 + L) 1(Yt )
= (Yt ) (Yt 1 ) + 2(Yt 2 ) ::: + ( 1)t 1 t 1(Y1 ) + ( 1)t t"0:
14
Although it is simple to program this iteration by computer, the log likelihood
function is a fairly complicated nonlinear function of and , so that an analytical
expression for the MLE of and is not readily calculated. Hence even the
conditional MLE for an MA(1) process must be found by numerical optimization.
It is important to note if you have a sample of size T to estimate an MA(1)
process by conditional MLE, you will use all the T observation of this sample.
15
4 MLE of a Gaussian MA(q) Process
This section discusses a Gaussian MA(q) process,
Yt = + "t + 1"t 1 + 2"t 2 + ::: + q"t q (15)
with "t i:i:d: N(0; 2). In this case, the vector of population parameters to be
estimated is = ( ; 1; 2; ::; q; 2)0.
4.1 Evaluating the Likelihood Function
The observations in the sample (Y1; Y2; :::; YT) in a (T 1) vector y which has
mean vector with each element and variance-covariance matrix given by .
The likelihood function is then
fYT ;YT 1;:::;Y1(yT ; yT 1; :::; y1; ) = (2 ) T=2j j 1=2 exp
12(y )0 1(y )
Maximization of this exact log likelihood of an MA(q) process must be ac-
complished numerically.
4.2 Evaluating the Likelihood Function Using (Scalar) Con-
ditional Density
Consider the p:d:f of Y1
Y1 = + "1 + 1"0 + 2" 1 + ::: + q" q+1:
A simple approach is to condition on the assumption that the rst q value of "
were all zero:
"0 = " 1 = ::: = " q+1 = 0:
Let "0 denote the (q 1) vector ("1; " 1; :::; " q+1)0 . Then
(Y1j"0 = 0) N( ; 2)
16
or
fY1j"0=0(y1j"0 = 0; ) = 1p2 2 exp
12 (y1 )
2
2 2
= 1p2 2 exp
"
2
1
2 2
:
Next consider the distribution of the second observation Y2 conditional on the
"observing" Y1 = y1. From (15),
Y2 = + "2 + 1"1 + 2"0 + ::: + q" q+2: (16)
Moreover, given observation of y1, the value of "1 is then known with certainty
as well:
"1 = y1 :
Hence
(Y2jY1 = y1; "0 = 0) N(( + 1"1); 2);
meaning that
fY2jY1; "0=0(y2jy1; "0 = 0; ) = 1p2 2 exp
12 (y2 1"1)
2
2 2
= 1p2 2 exp
"
2
2
2 2
:
Since "1 is know with certainty, "2 can be calculated from
"2 = y2 1"1:
Proceeding in this fashion, it is clear that given knowledge that "0 = 0, the
full sequence f"1; "2; :::; "Tg can be calculated from fy1; y2; :::; yTg by iterating on
"t = yt 1"t 1 2"t 2 ::: q"t q
for t = 1; 2; :::; T, starting from "0 = 0. The likelihood (conditional on "0 = 0)
of the complete sample can thus be calculated as the product of these individual
densities:
fYT ;YT 1;YT 2;:::;Y1j"0=0(yT; yT 1; yT 2; :::; y1j"0 = 0; )
= fY1j"0=0(y1j"0 = 0; )
TY
t=2
fYtjYt 1;Yt 2;:::;Y1;"0=0(ytjyt 1; yt 2; :::; y1; "0 = 0; ):
17
The conditional log likelihood function (denoted L ( )) is therefore
L ( ) = log fYT ;YT 1;YT 2;:::;Y1j"0=0(yT ; yT 1; yT 2; :::; y1j"0 = 0; )
= T2 log(2 ) T2 log( 2)
TX
t=1
"2t
2 2: (17)
It is important to note if you have a sample of size T to estimate an MA(q)
process by conditional MLE, you will also use all the T observation of this sample.
18
5 MLE of a Gaussian ARMA(p; q) Process
This section discusses a Gaussian ARMA(p; q) process,
Yt = c + 1Yt 1 + 2Yt 2 + ::: + pYt p + "t + 1"t 1 + 2"t 2 + ::: + q"t q;
with "t i:i:d: N(0; 2). In this case, the vector of population parameters to be
estimated is = (c; 1; 2; :::; p; 1; 2; :::; q; 2)0.
5.1 Conditional maximum Likelihood estimates
The approximation to the likelihood function for an autoregrssion conditional on
initial value of the y0s. The approximation to the likelihood function for a moving
average process conditioned on initial value of the "’s. A common approximation
to the likelihood function for an ARMA(p; q) process conditions on both y’s and
"’s.
The (p + 1)th observation is
Yp+1 = c + 1Yp + 2Yp 1 + ::: + pY1 + "p+1 + 1"p + ::: + q"p q+1:
Conditional on Y1 = y1; Y2 = y2; :::; Yp = yp and setting "p = "p 1 = ::: = "p q+1 =
0 we have
Yp+1 N((c + 1Yp + 2Yp 1 + ::: + pY1); 2):
Then the conditional likelihood calculated from t = p + 1; :::; T is
L ( ) = log f(yT; yT 1; ::yp+1jyp; :::; y1; "p = "p 1 = ::: = "p q+1 = 0; )
= T p2 log(2 ) T p2 log( 2)
TX
t=p+1
"2t
2 2; (18)
where the sequence f"p+1; "p+2; :::; "Tg can be calculated from fy1; y2; :::; yTg by
iterating on
"t = Yt c 1Yt 1 2Yt 2 ::: pYt p 1"t 1 2"t 2 ::: q"t q;
t = p + 1; p + 2; :::; T:
19
It is important to note if you have a sample of size T to estimate an ARMA(p; q)
process by conditional MLE, you will only use the T p observation of this sam-
ple
From (9),(11),(14),(17), and (18) we see that all the conditional log-likelihood
function take a concise form
T
2 log(2 )
T
2 log(
2)
TX
t=t
"2
t
2 2
;
where T and t is the appropriate total and rst observations used, respectively.
The solution to the conditional log-likelihood function ^ is also called the con-
ditional sums of squared estimator, CSS, denoted as ^ CSS.
20
6 Numerical Optimization
Refer to p.21 of Chapter 3.
7 Statistical Properties of MLE
1. Refer to p.23 of Chapter 3 for the consistency, asymptotic normality and
asymptotic e ciency of ^ MLE.
2. Refer to p.25 of Chapter 3 for three methods of estimating the asymptotic
variance of ^ MLE.
3. Refer to p.9 of Chapter 5 for three asymptotic equivalent tests relating to
^ MLE.
4. For an large number of observations the CSS estimators will be equivalent
to MLE. See Pierce (1971), " Least square estimation of a mixed autoregressive-
moving average process", Biometrika 58: pp. 299-312.
Exercise:
Use the data I give to you, identify what model it appears to be and estimate the
model you identify with CSS.
21