Ch. 10 Autocorrelated Disturbances
In a time-series setting, a common problem is autocorrelation, or serial corre-
lation of the disturbance across periods. See the plot of the residuals at Figure
12.1 on p. 251.
1 Stochastic Process
A particularly important aspect of real observable phenomena, which the random
variables concept cannot accommodate, is their time dimension; the concept of
random variable is essential static. A number of economic phenomena for which
we need to formulate probability models come in the form of dynamic processes
for which we have discrete sequence of observations in time. The problem we
have to face is extend the simple probability model,
= ff(x; ); 2 g;
to one which enables us to model dynamic phenomena. We have already moved
in this direction by proposing the random vector probability model
= ff(x1; x2; :::; xT ; ); 2 g:
The way we viewed this model so far has been as representing di erent char-
acteristics of the phenomenon in question in the form of the jointly distributed
r.v.’s X1; X2; :::; XT . If we reinterpret this model as representing the same char-
acteristic but at successive points in time then this can be viewed as a dynamic
probability model. With this as a starting point let us consider the dynamic
probability model in the context of (S;F;P).
1.1 The Concept of a Stochastic Process
The natural way to make the concept of a random variable dynamic is to extend
its domain by attaching a date to the elements of the sample space S.
De nition 1:
Let (S;F;P) be a probability space and T an index set of real numbers and
de ne the function X( ; ) by X( ; ) : S T ! R. The ordered sequence of
random variables fX( ; t); t 2 Tg is called a stochastic process.
1
This de nition suggests that for a stochastic process fX( ; t); t 2 Tg, for each
t 2 T , X( ; t) represents a random variable on S. On the other hand, for each s
in S, X(s; ) represents a function of t which we call a realization of the process.
X(s; t) for given s and t is just a real number.
Three main elements of a stochastic process fX( ; t); t 2 Tg are:
1. its range space (sometimes called the state space), usually R;
2. the index T , usually one of R;R+ = [0;1), and
3. the dependence structure of the r.v.’s fX( ; t); t 2 Tg.
In what follows a stochastic process will be denoted by fXt; t 2 Tg (s is
dropped and X(t) is customary used as continuous stochastic process) and we
are concerning exclusively on discrete stochastic process.
The dependence structure of fXt; t 2 Tg, in direct analogy with the case of
a random vector, should be determined by the joint distribution of the process.
The question arises, however, since T is commonly an in nite set, do we need an
in nite dimensional distribution to de ne the structure of the process ?
This question was tackled by Kolmogorov (1933) who showed that when the
stochastic process satis es certain regularity conditions the answer is de nitely
’no’. In particular, if we de ne the ’tentative’ joint distribution of the process for
the subset (t1 < t2 < ::: < tT ) of T by F(Xt1; Xt2; :::; XtT ) = Pr(Xt1 x1; Xt2
x2; :::; XtT xT ), then if the stochastic process fXt; t 2 Tg satis es the condi-
tions:
1. symmetry: F(Xt1; Xt2; :::; XtT ) = F(Xtj1; Xtj2; :::; XtjT ) where j1; j2; :::; jT
is any permutation of the indices 1; 2; :::; T (i.e. reshu ing the ordering of the
index does not change the distribution).
2. Compatibility: limxT!1 F(Xt1; Xt2; :::; XtT ) = F(Xt1; Xt2; :::; XtT 1) (i.e.
the dimensionality of the joint distribution can be reduced by marginalisation);
there exist a probability space (S;F;P) and a stochastic process fXt; t 2 Tg de-
ned on it whose nite dimensional distribution is the distribution F(Xt1; Xt2; :::; XtT )
as de ned above. That is, the probability structure of the stochastic process
fXt; t 2 Tg is completely speci ed by the joint distribution of F(Xt1; Xt2; :::; XtT )
for all values of T (a positive integer) and any subset (t1; t2; :::; tT) of T .
2
Given that, for a speci c t, Xt is a random variable, we can denote its dis-
tribution and density function by F(Xt) and f(Xt) respectively. Moreover the
mean, variance and higher moments of Xt (as a r.v.) can be de ned as standard
form as:
E(Xt) =
Z
xt
xtf(xt)dxt = t
E(Xt t)2 =
Z
xt
(xt t)2f(xt)dxt = v2(t)
E(Xt)r = rt; r 1; t 2 T :
The linear dependence measures between Xti and Xtj
v(ti; tj) = E[(Xti ti)(Xtj tj )]; ti; tj 2 T ;
is now called the autocovariance function. In standardized form
r(ti; tj) = v(ti; tj)v(t
i)v(tj)
; ti; tj 2 T ;
is called is autocorrelation function. These numerical characteristics of the
stochastic process fXt; t 2 Tg play an important role in the analysis of the pro-
cess and its application to modeling real observable phenomena. We say that
fXt; t 2 Tg is an uncorrelated process if r(ti; tj) = 0 for any ti; tj 2 T ; ti 6= tj.
Example:
One of the most important example of a stochastic process is the normal process.
The stochastic process fXt; t 2 Tg is said to be normal (or Gaussian) if any nite
subset of T , say t1; t2; :::; tT , (Xt1; Xt2; :::; XtT ) X0T has a multivariate normal
distribution, i.e.
f(Xt1; Xt2; :::; XtT ) = (2 ) T=2jVTj 1=2 exp[ 12(XT T )0V 1T (XT T )];
where
T = E(XT) =
2
66
66
66
4
1
2
:
:
:
T
3
77
77
77
5
VT =
2
66
66
66
4
v2(t1) v(t1; t2) : : : v(t1; tT )
v(t2; t1) v2(t2) : : : v(t2; tT )
: : : : : :
: : : : : :
: : : : : :
v(tT ; t1) : : : : v2(tT )
3
77
77
77
5
:
3
As in the case of a normal random variable, the distribution of a normal stochas-
tic process is characterized by the rst two moment but now they are function of t.
One problem so far in the de nition of a stochastic process given above is
much too general to enable us to obtain a operational probability model. In
the analysis of stochastic process we only have a single realization of the process
and we will have to deduce the value of t and v(t) with the help of a single
observation. (which is impossible !)
The main purpose of the next three sections is to consider various special
forms of stochastic process where we can construct probability models which are
manageable in the context of statistical inference. Such manageability is achieved
by imposing certain restrictions which enable us to reduce the number of unknown
parameters involved in order to be able to deduce their value from a single real-
ization. These restrictions come in two forms:
1. restriction on the time-heterogeneity of the process; and
2. restriction on the memory of the process.
1.2 Restricting the time-heterogeneity of a stochastic pro-
cess
For an arbitrary stochastic process fXt; t 2 Tg the distribution function F(Xt; t)
depends on t with the parameter t characterizing it being function of t as well.
That is, a stochastic process is time-heterogeneous in general. This, however,
raises very di cult issues in modeling real phenomena because usually we only
have one observation for each t. Hence in practice we will have to estimate t
on the basis of a single observation, which is impossible. For this reason we are
going to consider an important class of stationary process which exhibit con-
siderable time-homogeneity and can be used to model phenomena approaching
their equilibrium steady state, but continuously undergoing ’random’ func-
tions. This is the class of stationary stochastic processes.
De nition:
A stochastic process fXt; t 2 Tg is said to be (strictly) stationary if any subset
(t1; t2; :::; tT) of T and any ,
4
F(Xt1; :::; XtT ) = F(Xt1+ ; :::; XtT + ):
That is, the distribution of the process remains unchanged when shifted in time by
an arbitrary value . In terms of the marginal distributions, (strictly) stationarity
implies that
F(Xt) = F(Xt+ ); t 2 T ;
and hence F(Xt1) = F(Xt2) = ::: = F(XtT ). That is stationarity implies that
Xt1; Xt2; :::; XtT are (individually) identically distributed.
The concept of stationarity, although very useful in the context of probability
theory, is very di cult to verify in practice because it is de ned in terms of dis-
tribution function. For this reason the concept of the second order stationarity,
de ned in terms of the rst two moments, is commonly preferred.
De nition:
A stochastic process fXt; t 2 Tg is said to be (weakly) stationary if
E(Xt) = for all t;
v(ti; tj) = E[(Xti )(Xtj )] = jtj tij; ti; tj 2 T :
These suggest that weakly stationarity for fXt; t 2 Tg implies that its mean and
variance v2(ti) = 0 are constant and free of t and its autocovariance depends on
the interval jtj tij; not ti and tj.
Example:
Consider the normal stochastic process in the above example. With the weakly
stationarity assumption, now
T = E(XT ) =
2
66
66
66
4
:
:
:
3
77
77
77
5
VT =
2
66
66
66
4
0 1 : : : T 1
1 0 : : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 0
3
77
77
77
5
;
a sizeable reduction in the number of unknown parameters from T +[T(T +1)=2]
to (T + 1). It is important, however, to note that even in the case of stationarity
5
the number of parameters increase with the size of the subset (t1; :::; tT ) although
the parameters do not depend on t 2 T . This is because time-homogeneity does
not restrict the ’memory’ of the process. In the next section we are going to
consider ’memory’ restrictions in an obvious attempt to ’solve’ the problem of
the parameters increasing with the size of the subset (t1; t2; :::; tT ) of T .
1.3 Restricting the memory of a stochastic process
In the case of a typical economic times series, viewed as a particular realiza-
tion of a stochastic process fXt; t 2 Tg one would expect that the dependence
between Xti and Xtj would tend to weaken as the distance (tj ti) increase.
Formally, this dependence can be described in terms of the joint distribution
F(Xt1) = F(Xt2) = ::: = F(XtT ) as follows:
De nition:
asymptotically independent
De nition:
asymptotically uncorrelated
De nition:
strongly mixing
De nition:
uniformly mixing
De nition:
ergodic.
1.4 Some special stochastic process
We will consider brie y several special stochastic process which play an impor-
tant role in econometric modeling. These stochastic processes will be divided into
parametric and non-parametric process. The non-parametric process are de-
ned in terms of their joint distribution function or the rst few joint moments.
6
On the other hand, parametric process are de ned in terms of a generating mech-
anism which is commonly a functional form based on a non-parametric process.
1.4.1 Non-Parametric process
De nition:
A stochastic process fXt; t 2 Tg is said to be a white-noise process if
(i): E(Xt) = 0;
(ii): E(XtX ) =
2 if t =
0 if t 6= :
Hence, a white-noise process is both time-homogeneous, in view of the fact that it
is a weakly-stationary process, and has no memory. In the case where fXt; t 2 Tg
is also assumed to be normal the process is also strictly stationary.
De nition:
A stochastic process fXt; t 2 Tg is said to be a martingales process if...
De nition:
A stochastic process fXt; t 2 Tg is said to be an innovation process if....
De nition:
A stochastic process fXt; t 2 Tg is said to be a Markov process if....
De nition:
A stochastic process fXt; t 2 Tg is said to be a Brownian motion process if...
1.4.2 Parametric stochastic processes
De nition:
A stochastic process fXt; t 2 Tg is said to be a autoregressive of order one
(AR(1)) if it satis es the stochastic di erence equation,
Xt = Xt 1 + ut
where is a constant and ut is a white-noise process.
7
We rst consider the index set T = f0; 1; 2; :::g and assume that X T ! 0
as T !1. De ne a lag operatorL by
LXt Xt 1;
then the AR(1) process can be rewritten as
(1 L)Xt = ut or when j j < 1;
Xt = (1 L) 1ut = (1 + L + 2L2 + ::::)ut = ut + ut 1 + 2ut 2 + :::::
=
1X
i=0
iut i;
from which we can deduce that
E(Xt) = 0;
E(XtXt+ ) = E
( 1X
i=0
iut i
! 1X
j=0
iut+ j
!)
= 2u
1X
i=0
i i+
!
= 2u
1X
i=0
2i
!
; 0:
Hence, for j j < 1, the stochastic process fXt; t 2 T g is both weakly-stationary
and asymptotically uncorrelated since the autocovariance function
v( ) =
2
u
(1 2)
! 0; as !1:
Therefore, if any nite subset of T , say t1; t2; :::; tT of a AR(1) process, (Xt1; Xt2; :::; XtT )
X0T has covariance matrix
E(XTX0T) = 2u 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
= 2u ;
where
= 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
:
8
It is straightforward to show by direct multiplication that
P0P = 1;
for
P =
2
66
66
66
4
p1 2 0 : : : 0
1 0 : : 0
0 1 0 : 0
: : : : : :
: : : : : :
0 0 : : 1
3
77
77
77
5
:
De nition:
AR(p) process.
De nition:
MA(1) process.
De nition:
MA(q) process.
De nition:
ARMA(p,q) process.
De nition:
ARIMA(p,d,q) process.
De nition:
ARFIMA(p,d,q) process.
2 Testing for Autocorrelation
Most of the available tests for autocorrelation are based on the principle that if
the true disturbance "t are autocorrelated, this fact will be revealed through the
autocorrelation of the OLS residuals et, i.e.
yt = x0t + "t = x0t ^ + et:
9
2.1 The Durbin-Watson Test
The most extensively used for AR(1) disturbance id the Durbin-Watson test de-
veloped by Durbin and Watson (1950,1951).
Lemma:
Let z and v be T 1 random vector such that z = Mv, where M = I (X0X) 1X0
and X is a T k nonstochastic matrix of rank k. Furthermore, let r = z0Az=z0z,
where A is a real symmetric matrix. Then
(1). There exists an orthogonal transformation v = H such that
r =
PT k
i=1 ui
2
iP
T k
i=1
2
i
;
where u1; u2; :::; uT k are the T k nonzero (ordered) eigenvalues of MA, the
rest being zero and i N(0; 1). (That is, ui is function of X. Therefore the
distribution of r is unknown.)
(2). If s of the columns of X are linear combinations of s of the eigenvectors of
A and if the eigenvalues of A associated with the remaining T s eigenvalues of
A are renumbered so that
1 2 ::: T s;
then
i ui i+k s (i = 1; 2; :::; T k):
From the above lemma the following corollary can be deduced:
Corollary:
rL r rU, where
rL =
PT k
i=1 i
2
iP
T k
i=1
2
i
;
and
rU =
PT k
i=1 i+k s
2
iP
T k
i=1
2
i
:
10
The importance of this results is that it set bounds on r which are independent
of X.
We now turn to the test H0 : = 0, of the AR(1) disturbance in the linear
model:
yt = x0t + "t;
"t = "t 1 + ut; t = 1; 2; :::; T;
where ut is a white noise process and 1 < < 1.
The Durbin-Watson d-statistics is written as
d =
PT
t=1(et et 1)
2
PT
t=1 e
2t =
e0Ae
e0e =
P
t=2(e
2
t 2etet 1 + e
2
t 1)P
t=1 e
2t ’ 2(1 corr(et; et 1)); :
where
A =
2
66
66
66
66
66
4
1 1 0 : : 0
1 2 1 0 : 0
0 1 2 1 : 0
: : : : : :
: : : : : :
: : : : : 0
: : : 1 2 1
0 : 0 1 1
3
77
77
77
77
77
5
:
Therefore a small value of d would like to reject H0 for testing a positive .
The eigenvalue of A are
i = 2
1 cos (i 1)T
; i = 1; 2; :::; T:
The eigenvector of A corresponding to the zero eigenvalues 1 is (1; 1; :::; 1)0,
which is the regression vector corresponding to a constant term in the regression
model. (Notice that, in this discussion, as well as in the statistical table, the
existence of a constant is implicitly assumed.) From the Corollary above and
using the fact that e = M" we have
dL d dU;
where
dL =
PT k
i=1 i
2
iP
T k
i=1
2
i
;
11
and
dU =
PT k
i=1 i+k 1
2
iP
T k
i=1
2
i
:
Since the i are the same in any regression models with T observations and k
regressors including the constant term, the distribution of dL and dU have been
computed critical values by DW.
Three hypothesis of interest with respect to AR(1) disturbance process are
(1). H0 : = 0 versus H1 : > 0;
(2). H0 : = 0 versus H1 : < 0; and
(3). H0 : = 0 versus H1 : 6= 0.
For test (1), the null hypothesis is rejected if d < F(d)5% which is guaranteed if
d < F(dL)5% for the unavaility of F(d)5%. Equivalently speak, the null hypothesis
is accepted if d > F(d)5% which is guaranteed if d > F(dU)5%.
It is important to emphasize that: (i) the statistical tables of Durbin and
Watson assume the existence of a constant, (ii) no allowance is made for missing
observations, and (iii) the DW test was derived under the assumption that X is
nonstochastic and thus is not applicable, for example, when lagged values of the
dependent variable appear among the regressors.
2.2 Durbin-Watson test in the presence of a lagged de-
pendent variables
2.3 The Box Q test
3 E cient Estimation when is known
As a prelude to deriving feasible estimator for in this model, we consider full
generalized least squares estimation assuming that is known. In the next
section, we will turn to the more realistic case in which must be estimated as
well.
3.1 Generalized Least Squares Estimators
If the parameters of are known, then the GLS estimator,
~ = (X0 1X) 1X0 1y
12
can be computed directly. For the AR(1) process,
P =
2
66
66
66
4
p1 2 0 : : : 0
1 0 : : 0
0 1 0 : 0
: : : : : :
: : : : : :
0 0 : : 1
3
77
77
77
5
:
The data for the transformed model are
Py =
2
66
66
66
66
4
p1 2y
1
y2 y1
y3 y2
:
:
:
yT yT 1
3
77
77
77
77
5
; PX =
2
66
66
66
66
4
p1 2x0
1
x02 x01
x03 x02
:
:
:
x0T x0T 1
3
77
77
77
77
5
;
and
P" =
2
66
66
66
66
4
p1 2"
1
"2 "1
"3 "2
:
:
:
"T "T 1
3
77
77
77
77
5
=
2
66
66
66
66
4
p1 2"
1
u2
:
:
:
:
uT
3
77
77
77
77
5
:
Since E(p1 2"1)2 = 2u, thus E(P""0P0) = 2uI as expected.
3.2 Maximum Likelihood Estimator
lnL( ; 2u)
4 Estimation When is unknown
We consider speci cally the case the disturbance is a AR(1) process but with a
unknown .
13
4.1 Feasible Generalized Least Squares Estimators
For a FGLS of , all that need is a consistent estimator of ( ). Since the OLS
^ is consistent, we can use
^ = g =
PT
t=2 etet 1P
T
t=1 e
2t
as a consistent estimator of . With this ^ = (^ ), the FGLS is
= (X0^ 1X) 1X0^ 1y:
4.2 Maximum Likelihood Estimator
lnL( ; 2u; )
Exercise:
Reproduce the results at Table 12.2 on p.275.
5 An Example of GLS (FGLS): Seemingly Un-
related Regression, SURE
Consider the following two classical linear regression models (which are all satis-
ed ideal conditions):
y1 = X1 1 + "1
and
y2 = X2 2 + "2;
where yi is of dimension T 1 and Xi is T ki.
The reason for the label "seemingly" unrelated regression should now be clear.
Though initially it may appear that the rst equation is not in any way related to
the second equation, in fact there may be random e ects which are pertinent to
both. The commonalty of the random e ects is re ected in the covariance of the
14
two equation’s disturbance term. If the disturbance of the above two equation
are assumed to be contemporaneously correlated in that
E"it"jt = ij; i; j = 1; 2; t = 1; 2; :::; T;
then the variance covariance in the combined equation
y = X + ";
is
E""0 = E
"
1
"2
"01 "02 = E
"
1"01 "1"02
"2"01 "2"02
=
11IT 12IT
21IT 22IT
= IT;
where
y =
y
1
y2
; X =
X
1 0
0 X2
; =
1
2
; " =
"
1
"2
and
=
11 12
21 22
:
Clearly, this combined equation does not satisfy the classical assumption since
its disturbance variance-covariance is heteroscedastic and autocorrelated. To es-
timate e ciently, GLS or FGLS is called for.
Let us now generalize the seemingly unrelated regression model to M equa-
tions rather than just two and de ne the standard conditions for the seemingly
unrelated regression. These conditions are su cient to ensure that the seemingly
unrelated regression model meets the requirements of generalized least squares
estimation. Consider the m regression equation
y1 = X1 1 + "1;
y2 = X2 2 + "2;
:
:
:
yM = XM M + "M;
where yi is of dimension T 1 and Xi is T ki.
15
These M equations can be written in the combined form
y = X + ";
where
y =
2
66
66
66
4
y1
y2
:
:
:
yM
3
77
77
77
5
; X =
2
66
66
66
66
4
X1 0 : : : 0
0 X2 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 XM
3
77
77
77
77
5
; =
2
66
66
66
4
1
2
:
:
:
M
3
77
77
77
5
; " =
2
66
66
66
4
"1
"2
:
:
:
"M
3
77
77
77
5
:
De nition:
Assume that the seemingly unrelated regression system satis es the conditions:
(i) E(") = 0,
(ii) E(""0) = , where = IT and = [ ij]; i; j = 1; 2; :::; M.
(iii) The matrix X is nonstochastic and
lim
T!1
X0 1X
T
is nite and nonsingular.
These assumptions are called the standard conditions for seeming unrelated
regression. For the present, let us examine the estimation of SURE in the in-
stance where is assumed known.
Theorem:
The BLUE of is just
~ = (X0 1X) 1X0 1y;
with covariance matrix (X0 1X) 1.
Proof:
The SURE estimator ~ satis es the conditions required in generalized least
squares estimation. Therefore these results follow directly from the development
in Chapter 8 and 9.
16
Using partitioned matrix multiplication and the Kronecker product property
(A B) 1 = (A 1 B 1), the GLS’s estimator of SURE model can be written
as
~ = (X0 1X) 1X0 1y
=
0
BB
BB
BB
BB
@
2
66
66
66
66
4
X01 0 : : 0
0 X02 0 : 0
: : : : :
: : : : :
: : : : :
: : : : :
0 : : 0 X0M
3
77
77
77
77
5
2
66
66
66
66
4
11IT 12IT : : : 1MIT
21IT 22IT : : : 2MIT
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1IT : : : : MMIT
3
77
77
77
77
5
2
66
66
66
66
4
X1 0 : : 0
0 X2 0 : 0
: : : : :
: : : : :
: : : : :
: : : : :
0 : : 0 XM
3
77
77
77
77
5
1
CC
CC
CC
CC
A
1
2
66
66
66
66
4
X01 0 : : 0
0 X02 0 : 0
: : : : :
: : : : :
: : : : :
: : : : :
0 : : 0 X0M
3
77
77
77
77
5
2
66
66
66
66
4
11IT 12IT : : : 1MIT
21IT 22IT : : : 2MIT
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1IT : : : : MMIT
3
77
77
77
77
5
2
66
66
66
4
y1
y2
:
:
:
yM
3
77
77
77
5
=
2
66
66
66
66
4
11X01X1 : : : : 1MX01XM
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1X0MX1 : : : : MMX0MXM
3
77
77
77
77
5
12
66
66
66
4
PM
j=1
1jX0
1yj
:
:
:
:P
M
j=1
MjX0
Myj
3
77
77
77
5
;
where ij represent the (i; j) th elements of 1.
17
The GLS estimator ~ is more e cient, in general, than the OLS estimator
^ = (X0X) 1X0y =
2
66
66
66
66
4
X01X1 0 : : : 0
0 X02X2 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 X0MXM
3
77
77
77
77
5
12
66
66
66
4
X01y1
X02y2
:
:
:
X0MyM
3
77
77
77
5
=
2
66
66
66
4
(X01X1) 1X01y1
(X02X2) 1X02y2
:
:
:
(X0MXM) 1X0MyM
3
77
77
77
5
=
2
66
66
66
4
^ 1
^ 2
:
:
:
^ M
3
77
77
77
5
;
where ^ i = (X0iXi) 1X0iyi; i = 1; 2; :::; M represent the OLS estimator for the
i th equation. This results follows directly from Gauss-Markov theorem.
There are, however, two cases in which the GLS and OLS estimators are iden-
tical.
Theorem:
If jk = 0 for j 6= k then ~ = ^ , and OLS is fully e cient.
Therefore the equation of the SURE system are "truly" unrelated when the
disturbance of the various equation are uncorrelated and nothing is lost by us-
ing an estimator which ignore the possibility of contemporaneously correlated
disturbance term.
The other case where the GLS estimator ~ and OLS estimator ^ are numer-
ically equivalent and equally e cient when the regressor Xi; i = 1; 2; :::; M are
numerically identical. Formally,
Theorem:
18
Consider the set of equation
y1 = X 1 + "1
y2 = X 2 + "2
:
:
yM = X M + "M:
In this case the OLS estimator is fully e cient in that ~ = ^ .
Proof:
In this case,
X =
2
66
66
66
66
4
X1 0 : : : 0
0 X2 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 XM
3
77
77
77
77
5
=
2
66
66
66
66
4
X 0 : : : 0
0 X 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 X
3
77
77
77
77
5
= IM X ;
and
X0 1X = (IM X )0( IT) 1(IM X )
= (IM X0 )( 1 IT)(IM X )
= ( 1 X0 )(IM X )
= 1 (X0 X );
X0 1y = (IM X )0( IT) 1y
= (IM X0 )( 1 IT)y
= ( 1 X0 )y:
The GLS estimator is therefore
~ = (X0 1X) 1X0 1y
= [ 1 (X0 X )] 1[( 1 X0 )y]
= ( (X0 X ) 1)( 1 X0 )y
= (IM (X0 X ) 1X0 )y
19
=
2
66
66
66
66
4
(X0 X ) 1X0 0 : : : 0
0 (X0 X ) 1X0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 (X0 X ) 1X0
3
77
77
77
77
5
2
66
66
66
4
y1
y2
:
:
:
yM
3
77
77
77
5
=
2
66
66
66
4
(X0 X ) 1X0 y1
(X0 X ) 1X0 y2
:
:
:
(X0 X ) 1X0 yM
3
77
77
77
5
=
2
66
66
66
4
^ 1
^ 2
:
:
:
^ M
3
77
77
77
5
= ^ :
Note that when the numerical values of the M design matrices are identical,
i.e., X1 = X2 = ::: = XM, this theorem holds regardless of the degree of contem-
poraneous correlation among the disturbance terms. This results is particular
important in the estimation of the Vector autoregressive model (VAR) where
each individual equations is just the case here to have the same numerical value
of regressors.
5.1 An alternative formulation of the SURE model
An alternative way of developing the SURE estimator{which does not involve
Kronecker products { is to write the M equations together as
yt = Xt + "t; t = 1; 2; :::; T;
where
yt =
2
66
66
66
4
y1t
y2t
:
:
:
yMt
3
77
77
77
5
; Xt =
2
66
66
66
66
4
x01t 0 : : : 0
0 x02t 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 x0Mt
3
77
77
77
77
5
; =
2
66
66
66
4
1
2
:
:
:
M
3
77
77
77
5
20
and
E( "t "0t) = E
2
66
66
66
4
"1t
"2t
:
:
:
"Mt
3
77
77
77
5
"
1t "2t : : : "Mt
=
2
66
66
66
66
4
11 12 : : : 1M
21 22 : : : 2M
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1 : : : : MM
3
77
77
77
77
5
= :
If the T equation are stacked in the usual way, we have
y = X + ";
where
y =
2
66
66
66
4
y1
y2
:
:
:
yT
3
77
77
77
5
; X =
2
66
66
4
X1
:
:
:
XT
3
77
77
5
; and " =
2
66
66
66
4
"1
"2
:
:
:
"T
3
77
77
77
5
:
The covariance matrix of the disturbance in the stacked equation is
E( " "0) = E
2
66
66
66
4
"1
"2
:
:
:
"T
3
77
77
77
5
"0
1 "
0
2 : : : "
0
T
=
2
66
66
66
66
4
0 : : : 0
0 : : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : :
3
77
77
77
77
5
= IT = :
The GLS of in this form is
~ = ( X0 1 X) 1 X0 1 y
21
=
0
BB
BB
BB
BB
@
X0
1 : : : : X0T
2
66
66
66
66
4
1 0 : : : 0
0 1 : : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : : 1
3
77
77
77
77
5
2
66
66
4
X1
:
:
:
XT
3
77
77
5
1
CC
CC
CC
CC
A
1
0
BB
BB
BB
BB
@
X0
1 : : : : X0T
2
66
66
66
66
4
1 0 : : : 0
0 1 : : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : : 1
3
77
77
77
77
5
2
66
66
4
y1
:
:
:
yT
3
77
77
5
1
CC
CC
CC
CC
A
=
TX
t=1
X0t 1 Xt
! 1 TX
t=1
X0t 1 yt
!
:
It is easy to show that ~ = ~ . In fact,
TX
t=1
X0t 1 Xt
=
TX
t=1
0
BB
BB
BB
BB
@
2
66
66
66
66
4
x1t 0 : : : 0
0 x2t 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 xMt
3
77
77
77
77
5
2
66
66
66
66
4
11 12 : : : 1M
21 22 : : : 2M
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1 : : : : MM
3
77
77
77
77
5
2
66
66
66
66
4
x01t 0 : : : 0
0 x02t 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 x0Mt
3
77
77
77
77
5
1
CC
CC
CC
CC
A
=
2
66
66
66
4
11PTt=1 x1tx01t : : : : 1M PTt=1 x1tx0Mt
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1PTt=1 xMtx01t : : : : MM PTt=1 xMtx0Mt
3
77
77
77
5
22
=
2
66
66
66
66
4
11X01X1 : : : : 1MX01XM
: : : : : :
: : : : : :
: : : : : :
: : : : : :
: : : : : :
M1X0MX1 : : : : MMX0MXM
3
77
77
77
77
5
from the fact that
X0iXj = xi1 : : : : xiT
2
66
66
66
4
x0j1
:
:
:
:
x0jT
3
77
77
77
5
=
TX
t=1
xitx0jt:
23