Economics 20 - Prof,Anderson 1
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 +,,, bkxk + u
5,Dummy Variables
Economics 20 - Prof,Anderson 2
Dummy Variables
A dummy variable is a variable that takes
on the value 1 or 0
Examples,male (= 1 if are male,0
otherwise),south (= 1 if in the south,0
otherwise),etc,
Dummy variables are also called binary
variables,for obvious reasons
Economics 20 - Prof,Anderson 3
A Dummy Independent Variable
Consider a simple model with one
continuous variable (x) and one dummy (d)
y = b0 + d0d + b1x + u
This can be interpreted as an intercept shift
If d = 0,then y = b0 + b1x + u
If d = 1,then y = (b0 + d0) + b1x + u
The case of d = 0 is the base group
Economics 20 - Prof,Anderson 4
Example of d0 > 0
x
y
{ d0
} b0
y = (b0 + d0) + b1x
y = b0 + b1x
slope = b1
d = 0
d = 1
Economics 20 - Prof,Anderson 5
Dummies for Multiple Categories
We can use dummy variables to control for
something with multiple categories
Suppose everyone in your data is either a
HS dropout,HS grad only,or college grad
To compare HS and college grads to HS
dropouts,include 2 dummy variables
hsgrad = 1 if HS grad only,0 otherwise;
and colgrad = 1 if college grad,0 otherwise
Economics 20 - Prof,Anderson 6
Multiple Categories (cont)
Any categorical variable can be turned into
a set of dummy variables
Because the base group is represented by
the intercept,if there are n categories there
should be n – 1 dummy variables
If there are a lot of categories,it may make
sense to group some together
Example,top 10 ranking,11 – 25,etc,
Economics 20 - Prof,Anderson 7
Interactions Among Dummies
Interacting dummy variables is like subdividing
the group
Example,have dummies for male,as well as
hsgrad and colgrad
Add male*hsgrad and male*colgrad,for a total of
5 dummy variables –> 6 categories
Base group is female HS dropouts
hsgrad is for female HS grads,colgrad is for
female college grads
The interactions reflect male HS grads and male
college grads
Economics 20 - Prof,Anderson 8
More on Dummy Interactions
Formally,the model is y = b0 + d1male +
d2hsgrad + d3colgrad + d4male*hsgrad +
d5male*colgrad + b1x + u,then,for example,
If male = 0 and hsgrad = 0 and colgrad = 0
y = b0 + b1x + u
If male = 0 and hsgrad = 1 and colgrad = 0
y = b0 + d2hsgrad + b1x + u
If male = 1 and hsgrad = 0 and colgrad = 1
y = b0 + d1male + d3colgrad + d5male*colgrad +
b1x + u
Economics 20 - Prof,Anderson 9
Other Interactions with Dummies
Can also consider interacting a dummy
variable,d,with a continuous variable,x
y = b0 + d1d + b1x + d2d*x + u
If d = 0,then y = b0 + b1x + u
If d = 1,then y = (b0 + d1) + (b1+ d2) x + u
This is interpreted as a change in the slope
Economics 20 - Prof,Anderson 10
y
x
y = b0 +
b1x
y = (b0 + d0) + (b1 + d1) x
Example of d0 > 0 and d1 <
0
d = 1
d = 0
Economics 20 - Prof,Anderson 11
Testing for Differences Across
Groups
Testing whether a regression function is
different for one group versus another can
be thought of as simply testing for the joint
significance of the dummy and its
interactions with all other x variables
So,you can estimate the model with all the
interactions and without and form an F
statistic,but this could be unwieldy
Economics 20 - Prof,Anderson 12
The Chow Test
Turns out you can compute the proper F statistic
without running the unrestricted model with
interactions with all k continuous variables
If run the restricted model for group one and get
SSR1,then for group two and get SSR2
Run the restricted model for all to get SSR,then
? ?? ? ? ?? ?
1
12
21
21
?
??
?
?
??
?
k
kn
S S RS S R
S S RS S RS S R
F
Economics 20 - Prof,Anderson 13
The Chow Test (continued)
The Chow test is really just a simple F test
for exclusion restrictions,but we’ve
realized that SSRur = SSR1 + SSR2
Note,we have k + 1 restrictions (each of
the slope coefficients and the intercept)
Note the unrestricted model would estimate
2 different intercepts and 2 different slope
coefficients,so the df is n – 2k – 2
Economics 20 - Prof,Anderson 14
Linear Probability Model
P(y = 1|x) = E(y|x),when y is a binary
variable,so we can write our model as
P(y = 1|x) = b0 + b1x1 + … + bkxk
So,the interpretation of bj is the change in
the probability of success when xj changes
The predicted y is the predicted probability
of success
Potential problem that can be outside [0,1]
Economics 20 - Prof,Anderson 15
Linear Probability Model (cont)
Even without predictions outside of [0,1],
we may estimate effects that imply a change
in x changes the probability by more than
+1 or –1,so best to use changes near mean
This model will violate assumption of
homoskedasticity,so will affect inference
Despite drawbacks,it’s usually a good
place to start when y is binary
Economics 20 - Prof,Anderson 16
Caveats on Program Evaluation
A typical use of a dummy variable is when
we are looking for a program effect
For example,we may have individuals that
received job training,or welfare,etc
We need to remember that usually
individuals choose whether to participate in
a program,which may lead to a self-
selection problem
Economics 20 - Prof,Anderson 17
Self-selection Problems
If we can control for everything that is
correlated with both participation and the
outcome of interest then it’s not a problem
Often,though,there are unobservables that
are correlated with participation
In this case,the estimate of the program
effect is biased,and we don’t want to set
policy based on it!