CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 1
Chapter 3 Least Squares Methods for Estimating β
Methods for estimating β
Least squares estimation
Maximum likelihood estimation
Method of moments estimation
Least absolute deviation estimation
...
3.1 Least squares estimation
The criterion of the least squares estimation is
min
b0
nsummationdisplay
i=1
(yi ?Xprimeib0)2
or
min
b0
(y?Xb0)prime (y?Xb0).
Let the objective function be
S (b0) = (y?Xb0)prime (y?Xb0) = yprimey?bprime0Xprimey?yprimeXb0 + bprime0XprimeXb0
= yprimey?2yprimeXb0 + b0XprimeXb0.
The first—order condition for the minimization of this function is
?S (b0)
?b0 = ?2X
primey + 2XprimeXb0 = 0.
The solution of this equation is the least squares estimate of the coefficient vector β.
b = (XprimeX)?1 Xprimey.
If rank(X) = K, rank(XprimeX) = K. Thus, the inverse of XprimeX exists.
Let e = y?Xb. We call this residual vector. We have
e = y?Xb (1)
= y?X(XprimeX)?1Xprimey
= (I ?X (XprimeX)?1 Xprime)y
= (I ?P)y, (2)
where P = X (XprimeX)?1 Xprime. The matrix P is called projection matrix. We also let I?P =
M. Then, we may write (2) as
y = Xb + e = Py + My.
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 2
We often write Py = ?y. This is the part of y that is explained by X.
Properties of the matrices P and M are:
(i) Pprime = P, P2 = P (idempotent matrix)
(ii) Mprime = M, M2 = M
(iii) PX = X, MX = 0
(iv) PM = 0
Using (1) and (iii), we have
Xprimee = XprimeMy = 0.
If the first column of X is 1 = (1,···,1)prime , this relation implies
Xprime1e =
nsummationdisplay
i=1
ei = 0.
In addition, (iv) gives
yprimey = yprimePprimePy + yprimeMprimeMy = ?yprime?y + eprimee
3.2 Partitioned regression and partial regression
Consider
y = Xβ + ε = X1β1 + X2β2 + ε.
The normal equations for b1 and b2 are
parenleftbigg Xprime
1X1 X
prime
1X2
Xprime2X1 Xprime2X2
parenrightbiggparenleftbigg b
1
b2
parenrightbigg
=
parenleftbigg Xprime
1y
Xprime2y
parenrightbigg
.
The first part of these equations are
(Xprime1X1)b1 + (Xprime1X2)b2 = Xprime1y
which gives
b1 = (Xprime1X1)?1 Xprime1y ?(Xprime1X1)?1 Xprime1X2b2
= (Xprime1X1)?1 Xprime1 (y ? X2b2).
Plug this into the second part of the normal equations. Then, we have
Xprime2X1b1 + Xprime2X2b2
= Xprime2X1 (Xprime1X1)?1 Xprime1y ? Xprime2X1 (Xprime1X1)?1 Xprime1X2b2 + Xprime2X2b2
= Xprime2X1 (Xprime1X1)?1 Xprime1y + Xprime2 (I ? PX1)X2b2
= Xprime2y.
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 3
Thus
b2 = (Xprime2 (I ? PX1)X2)?1 Xprime2 (I ? PX1)y.
In the same manner,
b1 = (Xprime1 (I ? PX2)X1)?1 Xprime1 (I ? PX2)y.
Suppose that
X1 =
?
?? 1...
1
?
?? and X
2 = Z(n×K2).
Then
b2 = (Zprime (I ? P1)Z)?1 Zprime (I ? P1)y.
But
(I ? P1)Z = Z ?1(1prime1)1primeZ
and
1prime1 =n
1primeZ = parenleftbig 1 ··· 1 parenrightbig
?
?? z11 ··· z1K2...
zn1 ··· znK2
?
??
= parenleftbig summationtextni=1 zi1 ··· summationtextni=1 ziK2 parenrightbig.
Thus,
(I ?P1)Z = Z ?
?
?? 1...
1
?
??parenleftbig ˉz
1 ··· ˉzK2
parenrightbig
=
?
??
??
z11 ? ˉz1 ··· z1K2 ? ˉzK2
z21 ? ˉz1 ··· z2K2 ? ˉzK2
...
zn1 ? ˉz1 ··· znK2 ? ˉzK2
?
??
??
In the same way,
(I ?P1)y =
?
?? y1 ? ˉy...
yn ? ˉy
?
??.
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 4
These show that b2 is equivalent to the OLS estimator of β in the demeaned regression
equation
yi ? ˉy = βprime (zi ? ˉz) + εi.parenleftbig
ˉz = (ˉz1,···, ˉzK2)primeparenrightbig
Whether we demean the data and run regression or put a constant term in the model
and run regression, we get the same results.
3.3 Goodness—of—fit measures
(i) R2
Write
y = Xb+ e = ?y + ?e.
Let
M0 = I ? 1(1prime1)?1 1prime with 1 =
?
?? 1...
1
?
??.
M0 transforms observation into deviations from sample means. Then
M0y = M0Xb+ M0e
= M0Xb+ M0e
or
y ? 1ˉy = ?y ?1ˉy + e.
The total sum of variation of yi is
yprimeM0y = bprimeXprimeM0Xb + eprimee.
parallelshort parallelshort parallelshortsummationtext
(yi ? ˉy)2 summationtext(?yi ? ˉy)2 summationtexte2i
Note that
bprimeXprimeM0e = bprimeXprimeM0Mε
= bprimeXprime
parenleftBig
I ?1(1prime1)?1 1prime
parenrightBig
Mε
= bprimeXprimeMε ? bprimeXprime1(1prime1)?1 1primeMε
= 0
because XprimeM = 0 and 1primeM = 0. bprimeXprimeM0b is called regression sum of squares (SSR),
and eprimee error sum of squares (SSE).
CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 5
How well the regression line fits the data can be explained by
R2 = SSRSST = b
primeXM0Xb
yprimeM0y = 1?
eprimee
yprimeM0y.
We call R2 coefficient of determination.
Remark 1
0 ≤ R2 ≤ 1
0 : no fit
1 : perfect fit
Remark 2 R
2
Xz : R
2 for the regression of y on X and an additional variable Z.
R2X : R2 for the regression of y on X.
Then
R2Xz = R2X + parenleftbig1?R2Xparenrightbigr?2yz
where
r?2yz = (z
prime
?y?)
2
(zprime?z?)(yprime?y?), z? = (I ?PX)z, y? = (I ?PX)y.
R2 increases as the number of regressors increases whatever quality the additional regres-
sors have.
(ii) Theil’s ˉR2 (adjusted R2)
ˉR2 = 1? eprimee/(n?k)
yprimeM0y/(n?1) = 1?
n?1
n?K
parenleftbig1?R2parenrightbig
ˉR2 will fall (rise) when the variable x is deleted from the regression if the t—ratio
associated with this variable is greater (less) than 1.
(iii) Information criteria
AIC (k) = ln e
primee
n +
2k
n (Akaike’s information criteria)
BIC (k) = ln e
primee
n +
klnn
n (Bayesian information criteria)
PC (k) = e
primee
n?k
parenleftbigg
1 + kn
parenrightbigg
The smaller, the better.