Ch. 11 Panel Data Model
Data sets that combine time series and cross sections are common in econo-
metrics. For example, the published statistics of the OECD contain numerous
series of economic aggregate observed yearly for many countries. The PSID is a
studies of roughly 6000 families and 15000 individuals who has been interviews
periodically from 1968 to the present. Panel data sets are more oriented toward
cross-section analysis; they are wide but typically short (relatively). Hetero-
geneity across units is an integral part of the analysis.
Recall that the (multiple) linear model is used to study the relationship be-
tween a dependent variable and several independent variables. That is
y = f(x1; x2; :::; xk) + "
= 1x1 + 2x2 + ::: + kxk + "
= x0 + "
where y is the dependent or explained variable, xi; i = 1; :::; k are the independent
or the explanatory variables and i; i = 1; :::; k are unknown coe cients that we
are interested in learning about, either through estimation or through hypothesis
testing. The term " is an unobservable random disturbance. In the following, we
will see the panel data sets provide a richer source of information and the needing
of some complex stochastic speci cations.
The fundamental advantage of a panel data set over a cross section is that it
will allow the researcher greater exibility in model di erence in behavior across
individuals. The basic framework for this statistical model is of the form
yit = x0it + z0i + "it; i = 1; 2; :::; N; t = 1; 2; :::; T:
There are k regressor in xit, not including a constant term. The heterogene-
ity, or individual e ect is z0i where z0i contains a constant term and a set of
individual or group speci c variables, which may be observed, such as race, sex,
location, an so on or unobserved, such as family speci c characteristics, individ-
ual heterogeneity in skill or preference, and so on, all of which are taken to be
constant over time t. The various cases we will consider are:
1
1. Pooled Regression: If z0i contains only a constant term, then there is no
individual speci c characteristics in this model. All we need is pooling the data,
yit = x0it + + "it; i = 1; 2; :::; N; t = 1; 2; :::; T:
and OLS provides consistent and e cient estimate of the common and .
2. Fixed E ects: If z0i = i, then it is the xed e ect approach to take i as a
group-speci c constant term in the regression model.
yit = x0it + i + "it; i = 1; 2; :::; N; t = 1; 2; :::; T:
3. Random e ects: If the unobserved individual heterogeneity can be assumed to
be uncorrelated with the included variables, then the model may be formulated
as
yit = x0it + E(z0i ) + [z0i E(z0i )] + "it
= x0it + + ui + "it; i = 1; 2; :::; N; t = 1; 2; :::; T:
The random e ect approach speci es that ui is a group speci c random element,
similar to "it except that for each group, there is but a single draw that enters
the regression identically in each period.
1 Fixed E ects
This formulation of the model assume that di erences across units can be cap-
tured in di erence in the constant term. Each i is treated as an unknown
parameter to be estimated. Let yi and Xi be the T observations the ith unit, i
be a T 1 column of ones, and let "i be associated T 1 vector of disturbance.
Then
yi = Xi + i i + "i; i = 1; 2; :::; N:
It is also assumed that the disturbance terms are well behaved, that is
E("i) = 0;
E("i"0i) = 2IT; and
E("i"0j) = 0 if i 6= j:
2
Observations on all the cross-section can be rewritten as
2
66
66
66
66
4
y1
y2
:
:
:
:
yN
3
77
77
77
77
5
=
2
66
66
66
66
4
X1
X2
:
:
:
:
XN
3
77
77
77
77
5
+
2
66
66
66
66
4
i 0 : : : 0
0 i 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i
3
77
77
77
77
5
2
66
66
66
66
4
1
2
:
:
:
:
N
3
77
77
77
77
5
+
2
66
66
66
66
4
"1
"2
:
:
:
:
"N
3
77
77
77
77
5
;
or in more compact form
y = X + D + ";
where y and " are NT 1, X is NT k, is k 1, and D = [d1 d2 :::dN] is
NT N with di is a dummy variables indicating the ith unit. This model is
usually referred to as the least squares dummy variable (LSDV) model.
Since this model satisfy the ideal conditions, OLS estimator is BLUE. By
using the familiar partitioned regression of Ch. 6, the slope estimator would be
^ = (X0MDX) 1X0MDy;
where
MD = INT D(D0D) 1D0:
Lemma:
MD = INT D(D0D) 1D0 =
2
66
66
66
66
4
M0 0 : : : 0
0 M0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 M0
3
77
77
77
77
5
;
where M0 = IT 1=T(ii0) is the demean-matrix.
3
Proof:
By de nition,
D0D =
2
66
66
66
66
4
i0 0 : : : 0
0 i0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i0
3
77
77
77
77
5
2
66
66
66
66
4
i 0 : : : 0
0 i 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i
3
77
77
77
77
5
=
2
66
66
66
66
4
i0i 0 : : : 0
0 i0i 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i0i
3
77
77
77
77
5
=
2
66
66
66
66
4
T 0 : : : 0
0 T 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 T
3
77
77
77
77
5
N N
;
and therefore
INT D(D0D) 1D0
=
2
66
66
66
66
4
IT 0 : : : 0
0 IT 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 IT
3
77
77
77
77
5
2
66
66
66
66
4
i 0 : : : 0
0 i 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i
3
77
77
77
77
5
2
66
66
66
66
4
T 0 : : : 0
0 T 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 T
3
77
77
77
77
5
12
66
66
66
66
4
i0 0 : : : 0
0 i0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i0
3
77
77
77
77
5
=
2
66
66
66
66
4
IT 1T ii0 0 : : : 0
0 IT 1T ii0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 IT 1T ii0
3
77
77
77
77
5
=
2
66
66
66
66
4
M0 0 : : : 0
0 M0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 M0
3
77
77
77
77
5
:
4
It is easy to see that the matrix MD is idempotent and that
MDy =
2
66
66
66
66
4
M0 0 : : : 0
0 M0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 M0
3
77
77
77
77
5
2
66
66
66
66
4
y1
y2
:
:
:
:
yN
3
77
77
77
77
5
=
2
66
66
66
66
4
y1 y1i
y2 y2i
:
:
:
:
yN yNi
3
77
77
77
77
5
and
MDX =
2
66
66
66
66
4
M0 0 : : : 0
0 M0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 M0
3
77
77
77
77
5
2
66
66
66
66
4
X1
X2
:
:
:
:
XN
3
77
77
77
77
5
=
2
66
66
66
66
4
M0X1
M0X2
:
:
:
:
M0XN
3
77
77
77
77
5
;
where the scalar yi = 1=T PTt=1 yit; i = 1; 2; :::; N, and let Xi = [xi1 xi2 ::::xik],
then M0Xi = [M0xi1 M0xi2 ::::: M0xik] . Therefore M0xij = xij xiji; j =
1; 2; :::; k with xij = 1=T PTt=1 xijt. Denote xi = [ xi1 xi2 :::: xik]0, the least squares
regression of Mdy on MDX is equivalently to regression of [yit yi] on [xit xi].
The dummy variables coe cient can be recovered from
D0y = D0X^ + D0D^ + D0e;
or
^ = (D0D) 1D0(y X^ )
since D0e = 0.
5
This implies that
2
66
66
66
66
4
^ 1
^ 2
:
:
:
:
^ N
3
77
77
77
77
5
= 1T
2
66
66
66
66
4
i0 0 : : : 0
0 i0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0 i0
3
77
77
77
77
5
2
66
66
66
66
4
y1 X1 ^
y2 X2 ^
:
:
:
:
yN XN ^
3
77
77
77
77
5
=
2
66
66
66
4
1
T [
PT
t=1(y1t x
01t ^ )]
1
T [
PT
t=1(y2t x
02t ^ )]:
:
:
:
1
T [
PT
t=1(yNt x
0Nt ^ )]
3
77
77
77
5
=
2
66
66
66
66
4
y1 x01 ^
y2 x02 ^
:
:
:
:
yN x0N ^
3
77
77
77
77
5
:
Exercise:
Let the xed e ect model be partitioned as
y = X^ + D^ + e;
show that the variance of ^ is V ar(^ ) = 2(X0MDX) 1.
Proof:
^ = (X0MDX) 1X0MDy
= + (X0MDX) 1X0MD";
6
therefore,
V ar(^ ) = E[(^ )(^ )0]
= E[((X0MDX) 1X0MD")((X0MDX) 1X0MD")0]
= E[(X0MDX) 1X0MD""0MDX(X0MDX) 1]
= 2[(X0MDX) 1X0MDINTMDX(X0MDX) 1]
= 2[(X0MDX) 1X0MDX(X0MDX) 1]
= 2(X0MDX) 1:
With the above results, the appropriate estimator of V ar( ^ ) is therefore
Est(V ar(^ )) = s2(X0MDX) 1;
where the disturbance variance estimator is s2
s2 = (y X
^ D^ )0(y X^ D^ )
NT N K =
PN
i=1
PT
t=1(yit x
0
it ^ ^ i)
2
NT N k :
Exercise:
Show that
V ar(^ i) =
2
T + x
0
iV ar(^ ) xi:
1.1 Testing the Signi cance of the Group E ects
Consider the null hypothesis that H0 : 1 = 2 = ::: = N = . Under this null
hypothesis, the e cient estimator is the pooled least squares. The F ration used
for the test would be
FN 1;NT N k = (R
2
LSDV R
2
Pooled)=(N 1)
(1 R2LSDV )=(NT N k);
where R2LSDV indicates the R2 from the dummy variables model and R2Pooled in-
dicates the R2 from the pooled or restricted model with only a single overall
7
constant.
Example:
Example 13.2 at p.292 of Greene’s, where N=6, k=3 and T=15 (see Ex. 7.2 on
p.118).
Exercise:
Reproduce rst, third and fourth rows of the results at Table 13.1 on p.292 of
Greene.
1.2 The Within and Between Groups Estimators
We could formulate a pooled regression model in three ways. First, the original
formulation is
yit = + x0it + "it; i = 1; 2; :::; N; t = 1; 2; :::; T: (1)
In term of deviations from the group means,
yit yi = (xit xi)0 + "it "i; i = 1; 2; :::; N; t = 1; 2; :::; T; (2)
and in terms of the group means,
yi = + x0i + "i; i = 1; 2; :::; N: (3)
To estimate by OLS, in (1) we would use the total sum of squares and cross
products,
Stotalxx =
NX
i=1
TX
t=1
(xit x)(xit x)0
and
Stotalxy =
NX
i=1
TX
t=1
(xit x)(yit y);
where x = 1NT PNi=1PTt=1 xit and y = 1NT PNi=1PTt=1 yit.
In (2), the moments matrices we use are within-group (i.e., deviations from
the group means) sums of squares and cross products,
SWithinxx =
NX
i=1
TX
t=1
(xit xi)(xit xi)0
8
and
SWithinxy =
NX
i=1
TX
t=1
(xit xi)(yit yi);
Finally, for (3), the means of group mean are the overall mean (i.e., 1=N PNi=1 yi =
y). Therefore the moment matrices are the between-groups sums of squares
and cross products,
SBetweenxx =
NX
i=1
T( xi x)( xi x)0
and
SBetweenxy =
NX
i=1
T( xi x)( yi y);
It is easy to verify that
STotalxx = SWithinxx + SBetweenxx
and
STotalxy = SWithinxy + SBetweenxy :
Therefore, there are three possible least square estimator of corresponding to
theses decomposition. The least squares estimator in the pooling regression is
^ Total = (STotalxx ) 1STotalxy = (SWithinxx + SBetweenxx ) 1(SWithinxy + SBetweenxy ):
The within-groups estimator is
^ W ithin = (SWithinxx ) 1SWithinxy :
This id the LSDV estimator computed earlier. An alternative estimator would
be the between-groups estimator,
^ Between = (SBetweenxx ) 1SBetweenxy :
This is the least square estimator based on the N sets of group means. From the
preceding expression,
SWithinxy = SWithinxx ^ Within
9
and
SBetweenxy = SBetweenxx ^ Between;
we have
^ Total = (SWithinxx + SBetweenxx ) 1(SWithinxx ^ Within + SBetweenxx ^ Between)
= (SWithinxx + SBetweenxx ) 1SWithinxx ^ Within + (SWithinxx + SBetweenxx ) 1SBetweenxx ^ Between
= F Within ^ Within + F Between ^ Between;
where F Within = (SWithinxx +SBetweenxx ) 1SWithinxx and F Within+F Between = (SWithinxx +
SBetweenxx ) 1(SWithinxx +SBetweenxx ) = I. That is the pooling OLS estimator is a matrix
weighted average of the within- and between-groups estimator.
2 Random E ects
Consider the model
yit = x0it + + ui + "it; i = 1; 2; :::; N; t = 1; 2; :::; T:
where there are k regressors including a constant and now the single constant
term is the mean of the unobserved heterogeneity, E(z0i ). The component
ui is the random heterogeneity speci c to the ith observation and is constant
through time. We assume further
E("it) = E(ui) = 0;
E("2it) = 2";
E(u2i ) = 2u;
E("ituj) = 0 for all i; t; and j;
E("it"js) = 0 if t 6= s or i 6= j;
E(uiuj) = 0 if i 6= j:
Denote
it = "it + ui;
let yi and Xi (including the constant term) be the T observations the ith unit, i
be a T 1 column of ones, and let
i = [ i1; i2; ::::; iT ]0;
10
then
yi = X0i + i; i = 1; 2; :::; N;
and the variance of the disturbance would be
= E( i 0i) = E
2
66
66
66
4
"i1 + ui
"i2 + ui
:
:
:
"iT + ui
3
77
77
77
5
"
i1 + ui "i2 + ui : : "iT + ui
=
2
66
66
66
4
2" + 2u 2u 2u : : : 2u
2u 2" + 2u 2u : : : 2u
: :
: :
: :
2u 2u 2u : : : 2" + 2u
3
77
77
77
5
= 2"IT + 2uiTi0T:
Observations on all the cross-section can be rewritten as
2
66
66
66
66
4
y1
y2
:
:
:
:
yN
3
77
77
77
77
5
=
2
66
66
66
66
4
X1
X2
:
:
:
:
XN
3
77
77
77
77
5
+
2
66
66
66
66
4
1
2
:
:
:
:
N
3
77
77
77
77
5
;
or in more compact form
y = X + ;
where y and are NT 1, X is NT k, is k 1 and
= E( 0) =
2
66
66
66
66
4
0 : : : 0
0 0 : : 0
: : : : : :
: : : : : :
: : : : : :
: : : : : :
0 : : : 0
3
77
77
77
77
5
= IN :
11
2.1 Generalized Least Squares
The generalized least squares estimator of the slope parameters is
~ = (X0 1X) 1X0 1y =
NX
i=1
X0i 1Xi
! 1 NX
i=1
X0i 1yi)
!
:
As with many generalized least squares problem it is convenient to nd a
transform matrix 1=2 = [In ] 1=2 so that the OLS can be applied to the
transformed model. Fuller and Battese (1973) suggest
1=2 = 1
"
IT T iTi0T
;
where
= 1 "p 2
" + T 2u
:
The transformation of yi and Xi for GLS is therefore
1=2yi = 1
"
2
66
66
66
4
yi1 yi
yi2 yi
:
:
:
yiT yi
3
77
77
77
5
and likewise for the rows of Xi. Note that the similarity of this procedure to the
computation in the LSDV model, which use = 1. One would interpret as the
e ect that would remain if " = 0, because the only e ect then would be ui. In
this case, the xed and the random e ects model would be indistinguishable, so
this results make sense.
2.2 FGLS when is unknown
If the variance component are known, generalized least squares can be computed
as shown earlier. Of course, this is unlikely, so as usual, we must rst estimates
the disturbance variance and then use an FGLS procedure. A heuristic approach
to estimate the variance components is as follows:
yit = + x0it + "it + ui; i = 1; 2; :::; N; t = 1; 2; :::; T: (4)
12
and in term of group means,
yi = + x0i + "i + ui; i = 1; 2; :::; N; (5)
Therefore, taking deviation from the group means removes the heterogeneity:
yit yi = (xit xi)0 + "it "i; i = 1; 2; :::; N; t = 1; 2; :::; T; (6)
Since
E
TX
t=1
("it "i)2
#
= (T 1) 2";
if is observable (and therefore " is observed), then an unbiased estimator of 2"
based on T observations in group i would be
^ 2"(i) =
PT
t=1("it "i)
2
T 1 :
Since must be estimated, we may use the residual from the LSDV estimator
(which is consistent and unbiased in general) and correct the degree of freedom
to form the estimator:
s2e(i) =
PT
t=1(eit ei)
2
T k 1 :
We have N such estimators, so we average them to obtain
s2e = 1N
NX
i=1
s2e(i) = 1N
"PT
t=1(eit ei)
2
T k 1
#
=
PN
i=1
PT
t=1(eit ei)
2
NT Nk N :
The degree of freedom correction in s2e is excessive because it assume that and
are reestimated for each i. The unbiased estimator would be
^ 2" = s2LSDV =
PN
i=1
PT
t=1(eit ei)
2
NT N K :
where ei should be 0.
It remains to estimate 2u. Back to the original model:
yit = + x0it + "it + ui; i = 1; 2; :::; N; t = 1; 2; :::; T: (7)
In spite of the correlation across observation, this is a classical regression model
in which the OLS estimation slope and variance estimator are both consistent
13
and, in general, unbiased. Therefore, using the OLS residual from the model
with only a single overall constant, we have
plim s2Pooled = plim e
0e
NT k 1 =
2
" +
2
u:
This provides the two estimators needed for the variance components; the second
would be
^ 2u = s2Pooled s2LSDV :
Example:
Example 13.4 of Greene at p.299.
2.3 Husman’s Speci cation Test for the Random E ect
Model
Fixed E ect Model{Costly in terms of degree of freedom lost.
Random E ect Model{little justi cation for treating the individual e ects as un-
correlated with other regressors.
The speci cation test developed by Hausman (1978) is used to test for or-
thogonality of the random e ect and the regressors . Under the null hypothesis
of no correlation, both OLS in the LSDV ^ model and GLS in the random e ect
~ model are consistent, but OLS is ine cient,1 whereas under the alternative,
OLS is consistent, but GLS is not. Therefore, under the null hypothesis, the
two estimates should not di erent systematically, and a test can be based on the
di erence.
The essential ingredient for the test is the covariance matrix of the di erence
vector, [^ ~ ]:
V ar[^ ~ ] = V ar[^ ] + V ar[~ ] Cov[^ ; ~ ] Cov[^ ; ~ ]0:
Hausman’s essential results is that the covariance of an e cient estimator
with its di erence from an ine cient estimator is zero, which implies
Cov[(^ ~ ); ~ ] = Cov[^ ; ~ ] V ar[~ ] = 0 (8)
1Referring to the GLS matrix weighted average given earlier, we see that the e cient weight
uses , where OLS sets = 1
14
or that
Cov[^ ; ~ ] = V ar[~ ]:
Inserting this result to (8) produces the required covariance matrix for the test
V ar[^ ~ ] = V ar[^ ] V ar[~ ] = :
The chi-squared test is based on the Wald criterion:
W = [^ ~ ]0^ 1[^ ~ ]:
For ^ , we use the estimated covariance matrices of the slope estimator is the
LSDV model and estimated covariance matrix in the random e ects model, ex-
cluding the constant term. Under the null hypothesis, W is asymptotically dis-
tributed as chi-squared with k degree of freedom.
Exercise:
Reproduce the results of Example 13.5 and Table 13.2 on p. 302 of Greene.
15