Economics 20 - Prof,Anderson 1
Panel Data Methods
yit = b0 + b1xit1 +,,, bkxitk + uit
Economics 20 - Prof,Anderson 2
A True Panel vs,
A Pooled Cross Section
Often loosely use the term panel data to
refer to any data set that has both a cross-
sectional dimension and a time-series
dimension
More precisely it’s only data following the
same cross-section units over time
Otherwise it’s a pooled cross-section
Economics 20 - Prof,Anderson 3
Pooled Cross Sections
We may want to pool cross sections just to
get bigger sample sizes
We may want to pool cross sections to
investigate the effect of time
We may want to pool cross sections to
investigate whether relationships have
changed over time
Economics 20 - Prof,Anderson 4
Difference-in-Differences
Say random assignment to treatment and
control groups,like in a medical experiment
One can then simply compare the change in
outcomes across the treatment and control
groups to estimate the treatment effect
For time 1,2,groups A,B (y2,B – y2,A) -
(y1,B – y1,A),or equivalently (y2,B – y1,B) -
(y2,A – y1,A),is the difference-in-differences
Economics 20 - Prof,Anderson 5
Difference-in-Differences (cont)
A regression framework using time and
treatment dummy variables can calculate
this difference-in-difference as well
Consider the model,yit = b0 + b1treatmentit
+ b2afterit + b3treatmentit*afterit + uit
The estimated b3 will be the difference-in-
differences in the group means
Economics 20 - Prof,Anderson 6
Difference-in-Differences (cont)
When don’t truly have random assignment,
the regression form becomes very useful
Additional x’s can be added to the
regression to control for differences across
the treatment and control groups
Sometimes referred to as a,natural
experiment” especially when a policy
change is being analyzed
Economics 20 - Prof,Anderson 7
Two-Period Panel Data
It’s possible to use a panel just like pooled
cross-sections,but can do more than that
Panel data can be used to address some
kinds of omitted variable bias
If can think of the omitted variables as
being fixed over time,then can model as
having a composite error
Economics 20 - Prof,Anderson 8
Unobserved Fixed Effects
Suppose the population model is yit = b0 +
d0d2t + b1xit1 +…+ bkxitk + ai + uit
Here we have added a time-constant
component to the error,uit = ai + uit
If ai is correlated with the x’s,OLS will be
biased,since we ai is part of the error term
With panel data,we can difference-out the
unobserved fixed effect
Economics 20 - Prof,Anderson 9
First-differences
We can subtract one period from the other,
to obtain Dyi = d0 + b1Dxi1 +…+ bkDxik +
Dui
This model has no correlation between the
x’s and the error term,so no bias
Need to be careful about organization of
the data to be sure compute correct change
Economics 20 - Prof,Anderson 10
Differencing w/ Multiple Periods
Can extend this method to more periods
Simply difference adjacent periods
So if 3 periods,then subtract period 1 from
period 2,period 2 from period 3 and have 2
observations per individual
Simply estimate by OLS,assuming the Duit
are uncorrelated over time
Panel Data Methods
yit = b0 + b1xit1 +,,, bkxitk + uit
Economics 20 - Prof,Anderson 2
A True Panel vs,
A Pooled Cross Section
Often loosely use the term panel data to
refer to any data set that has both a cross-
sectional dimension and a time-series
dimension
More precisely it’s only data following the
same cross-section units over time
Otherwise it’s a pooled cross-section
Economics 20 - Prof,Anderson 3
Pooled Cross Sections
We may want to pool cross sections just to
get bigger sample sizes
We may want to pool cross sections to
investigate the effect of time
We may want to pool cross sections to
investigate whether relationships have
changed over time
Economics 20 - Prof,Anderson 4
Difference-in-Differences
Say random assignment to treatment and
control groups,like in a medical experiment
One can then simply compare the change in
outcomes across the treatment and control
groups to estimate the treatment effect
For time 1,2,groups A,B (y2,B – y2,A) -
(y1,B – y1,A),or equivalently (y2,B – y1,B) -
(y2,A – y1,A),is the difference-in-differences
Economics 20 - Prof,Anderson 5
Difference-in-Differences (cont)
A regression framework using time and
treatment dummy variables can calculate
this difference-in-difference as well
Consider the model,yit = b0 + b1treatmentit
+ b2afterit + b3treatmentit*afterit + uit
The estimated b3 will be the difference-in-
differences in the group means
Economics 20 - Prof,Anderson 6
Difference-in-Differences (cont)
When don’t truly have random assignment,
the regression form becomes very useful
Additional x’s can be added to the
regression to control for differences across
the treatment and control groups
Sometimes referred to as a,natural
experiment” especially when a policy
change is being analyzed
Economics 20 - Prof,Anderson 7
Two-Period Panel Data
It’s possible to use a panel just like pooled
cross-sections,but can do more than that
Panel data can be used to address some
kinds of omitted variable bias
If can think of the omitted variables as
being fixed over time,then can model as
having a composite error
Economics 20 - Prof,Anderson 8
Unobserved Fixed Effects
Suppose the population model is yit = b0 +
d0d2t + b1xit1 +…+ bkxitk + ai + uit
Here we have added a time-constant
component to the error,uit = ai + uit
If ai is correlated with the x’s,OLS will be
biased,since we ai is part of the error term
With panel data,we can difference-out the
unobserved fixed effect
Economics 20 - Prof,Anderson 9
First-differences
We can subtract one period from the other,
to obtain Dyi = d0 + b1Dxi1 +…+ bkDxik +
Dui
This model has no correlation between the
x’s and the error term,so no bias
Need to be careful about organization of
the data to be sure compute correct change
Economics 20 - Prof,Anderson 10
Differencing w/ Multiple Periods
Can extend this method to more periods
Simply difference adjacent periods
So if 3 periods,then subtract period 1 from
period 2,period 2 from period 3 and have 2
observations per individual
Simply estimate by OLS,assuming the Duit
are uncorrelated over time