Economics 20 - Prof,Anderson 1
Multiple Regression Analysis
y = b0 + b1x1 + b2x2 +,,, bkxk + u
1,Estimation
Economics 20 - Prof,Anderson 2
Parallels with Simple Regression
b0 is still the intercept
b1 to bk all called slope parameters
u is still the error term (or disturbance)
Still need to make a zero conditional mean
assumption,so now assume that
E(u|x1,x2,…,xk) = 0
Still minimizing the sum of squared
residuals,so have k+1 first order conditions
Economics 20 - Prof,Anderson 3
Interpreting Multiple Regression
t io nin t e r p r e t a a
h a s e a c h is t h a t,
?
?
t h a tim p lie s f ix e d,.,,,h o ld in g so
,
?
...
??
?
so,
?
...
???
?
11
2
2211
22110
r ib u s c e te r is p a
xy
xx
xxxy
xxxy
k
kk
kk
bb
bbb
bbbb
???
????????
?????
Economics 20 - Prof,Anderson 4
A,Partialling Out” Interpretation
? ?
2201
1
2
111
22110
???? r e g r e s s i o n
e s t im a t e d t h ef r o m r e s id u a l s t h e
a r e ? w h e r e,??
?
t h e n,
???
?
i, e,,2 w h e r ec a s e heC o n s i d e r t
xx
rryr
xxy
k
iiii
??
b
bbb
??
?
???
?
??
Economics 20 - Prof,Anderson 5
“Partialling Out” continued
Previous equation implies that regressing y
on x1 and x2 gives same effect of x1 as
regressing y on residuals from a regression
of x1 on x2
This means only the part of xi1 that is
uncorrelated with xi2 are being related to yi
so we’re estimating the effect of x1 on y
after x2 has been,partialled out”
Economics 20 - Prof,Anderson 6
Simple vs Multiple Reg Estimate
s a m p l e i n t h e edu n c o r r e l a t a r e a n d
OR ) ofe f f e c t p a r t i a l no ( i, e, 0
?
:u n l e s s
?
~
G e n e r a ll y,
????
r e g r e s s i o n m u l t i p l e w i t h t h e
~~~
r e g r e s s i o n s i m p l e t h eC o m p a r e
21
22
11
22110
110
xx
x
xxy
xy
?
?
???
??
b
bb
bbb
bb
Economics 20 - Prof,Anderson 7
Goodness-of-Fit
? ?
? ?
SSR SSE SSTT h e n
( S S R ) s q u a r e s of s u m r e s i d u a l t h eis ?
( S S E ) s q u a r e s of s u m e x p l a i n e d t h eis ?
( S S T ) s q u a r e s of s u m t o t a l t h eis
:f o l l o w i n g t h ed e f i n e t h e n W e??
p a r t,du n e x p l a i n ea n a n d p a r t,e x p l a i n e da n of up
m a d e b e i n g asn o b s e r v a t i oe a c h ofc a n t h i n k We
2
2
2
??
?
?
??
?
?
?
i
i
i
iii
u
yy
yy
uyy
Economics 20 - Prof,Anderson 8
Goodness-of-Fit (continued)
How do we think about how well our
sample regression line fits our sample data?
Can compute the fraction of the total sum
of squares (SST) that is explained by the
model,call this the R-squared of regression
R2 = SSE/SST = 1 – SSR/SST
Economics 20 - Prof,Anderson 9
Goodness-of-Fit (continued)
? ?? ?? ?
? ?? ? ? ?? ???
?
??
??
?
22
2
2
2
??
??
? v a l u e s t h ea n d a c t u a l t h e
b e t w e e nt c o e f f i c i e nn c o r r e l a t i o s q u a r e d t h e
t oe q u a l b e i n g as of t h i n k a l s oc a n We
yyyy
yyyy
R
yy
R
ii
ii
ii
Economics 20 - Prof,Anderson 10
More about R-squared
R2 can never decrease when another
independent variable is added to a
regression,and usually will increase
Because R2 will usually increase with the
number of independent variables,it is not a
good way to compare models
Economics 20 - Prof,Anderson 11
Assumptions for Unbiasedness
Population model is linear in parameters,
y = b0 + b1x1 + b2x2 +…+ bkxk + u
We can use a random sample of size n,{(xi1,
xi2,…,xik,yi),i=1,2,…,n},from the
population model,so that the sample model
is yi = b0 + b1xi1 + b2xi2 +…+ bkxik + ui
E(u|x1,x2,… xk) = 0,implying that all of the
explanatory variables are exogenous
None of the x’s is constant,and there are no
exact linear relationships among them
Economics 20 - Prof,Anderson 12
Too Many or Too Few Variables
What happens if we include variables in
our specification that don’t belong?
There is no effect on our parameter
estimate,and OLS remains unbiased
What if we exclude a variable from our
specification that does belong?
OLS will usually be biased
Economics 20 - Prof,Anderson 13
Omitted Variable Bias
? ?
? ??
?
?
?
?
???
????
2
11
11
1
110
22110
~
t h e n,
~~
~
e s t im a t e
b u t w e,
asg iv e n is m o d e l t r u e t h eS u p p o s e
xx
yxx
uxy
uxxy
i
ii
b
bb
bbb
Economics 20 - Prof,Anderson 14
Omitted Variable Bias (cont)
? ?? ?
? ? ? ? ? ?
iiiii
iiii
iiii
uxxxxxxx
uxxxx
uxxy
???
?
?????
?????
????
112112
2
111
2211011
22110
b e c o m e sn u m e r a t o r
t h eso,
t h a tso m o d e l,t r u e t h eR e c a ll
bb
bbb
bbb
Economics 20 - Prof,Anderson 15
Omitted Variable Bias (cont) ? ?
? ?? ?
? ?
? ?? ?
? ?
? ?
? ?? ??
?
?
?
?
?
?
?
??
?
?
?
?
?
?
??
2
11
211
211
2
11
11
2
11
211
21
~
h a v e w ense x p e c t a t i o t a k i n g0,)E( s i n c e
~
xx
xxx
E
u
xx
uxx
xx
xxx
i
ii
i
i
ii
i
ii
bbb
bbb
Economics 20 - Prof,Anderson 16
Omitted Variable Bias (cont)
? ?
? ?? ?
? ?
1211
2
11
211
11102
12
~~
so
~
t h e n
~~~
on of r e g r e s s io n heC o n s id e r t
?bbb
???
??
?
?
???
?
?
E
xx
xxx
xx
xx
i
ii
Economics 20 - Prof,Anderson 17
Summary of Direction of Bias
Corr(x1,x2) > 0 Corr(x1,x2) < 0
b2 > 0 Positive bias Negative bias
b2 < 0
Negative bias Positive bias
Economics 20 - Prof,Anderson 18
Omitted Variable Bias Summary
Two cases where bias is equal to zero
? b2 = 0,that is x2 doesn’t really belong in model
? x1 and x2 are uncorrelated in the sample
If correlation between x2,x1 and x2,y is
the same direction,bias will be positive
If correlation between x2,x1 and x2,y is
the opposite direction,bias will be negative
Economics 20 - Prof,Anderson 19
The More General Case
Technically,can only sign the bias for the
more general case if all of the included x’s
are uncorrelated
Typically,then,we work through the bias
assuming the x’s are uncorrelated,as a
useful guide even if this assumption is not
strictly true
Economics 20 - Prof,Anderson 20
Variance of the OLS Estimators
Now we know that the sampling
distribution of our estimate is centered
around the true parameter
Want to think about how spread out this
distribution is
Much easier to think about this variance
under an additional assumption,so
Assume Var(u|x1,x2,…,xk) = s2
(Homoskedasticity)
Economics 20 - Prof,Anderson 21
Variance of OLS (cont)
Let x stand for (x1,x2,…xk)
Assuming that Var(u|x) = s2 also implies
that Var(y| x) = s2
The 4 assumptions for unbiasedness,plus
this homoskedasticity assumption are
known as the Gauss-Markov assumptions
Economics 20 - Prof,Anderson 22
Variance of OLS (cont)
? ?
? ?
? ?
s'o t h e r a l lon r e g r e s s i n g f r o m
t h eis a n d
w h e r e,
1
?
sA s s u m p t i o n M a r k o v-G a u s s G i v e n t h e
22
2
2
2
xx
RRxxSST
RSST
V a r
j
jjijj
jj
j
? ??
?
?
s
b
Economics 20 - Prof,Anderson 23
Components of OLS Variances
The error variance,a larger s2 implies a
larger variance for the OLS estimators
The total sample variation,a larger SSTj
implies a smaller variance for the estimators
Linear relationships among the independent
variables,a larger Rj2 implies a larger
variance for the estimators
Economics 20 - Prof,Anderson 24
Misspecified Models
? ?
? ? ? ?
s a m e t h ere' t h e n t h e ye d,u n c o r r e l a t a r e
a n d u n l e s s
?
~
T h u s,
~
t h a t so,
~~
~
m o d e l edm i s s p e c i f i a g a i n t h eC o n s i d e r
2
111
1
2
1110
x
xV a rV a r
SST
V a rxy
bb
s
bbb
?
???
Economics 20 - Prof,Anderson 25
Misspecified Models (cont)
While the variance of the estimator is
smaller for the misspecified model,unless
b2 = 0 the misspecified model is biased
As the sample size grows,the variance of
each estimator shrinks to zero,making the
variance difference less important
Economics 20 - Prof,Anderson 26
Estimating the Error Variance
We don’t know what the error variance,s2,
is,because we don’t observe the errors,ui
What we observe are the residuals,?i
We can use the residuals to form an
estimate of the error variance
Economics 20 - Prof,Anderson 27
Error Variance Estimate (cont)
? ? ? ?
? ? ? ?? ? 212
22
1?? t h u s,
1??
jjj
i
RSSTse
dfSSRknu
??
???? ?
sb
s
df = n – (k + 1),or df = n – k – 1
df (i.e,degrees of freedom) is the (number
of observations) – (number of estimated
parameters)
Economics 20 - Prof,Anderson 28
The Gauss-Markov Theorem
Given our 5 Gauss-Markov Assumptions it
can be shown that OLS is,BLUE”
Best
Linear
Unbiased
Estimator
Thus,if the assumptions hold,use OLS