CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 1 Chapter 3 Least Squares Methods for Estimating β Methods for estimating β Least squares estimation Maximum likelihood estimation Method of moments estimation Least absolute deviation estimation ... 3.1 Least squares estimation The criterion of the least squares estimation is min b0 nsummationdisplay i=1 (yi ?Xprimeib0)2 or min b0 (y?Xb0)prime (y?Xb0). Let the objective function be S (b0) = (y?Xb0)prime (y?Xb0) = yprimey?bprime0Xprimey?yprimeXb0 + bprime0XprimeXb0 = yprimey?2yprimeXb0 + b0XprimeXb0. The first—order condition for the minimization of this function is ?S (b0) ?b0 = ?2X primey + 2XprimeXb0 = 0. The solution of this equation is the least squares estimate of the coefficient vector β. b = (XprimeX)?1 Xprimey. If rank(X) = K, rank(XprimeX) = K. Thus, the inverse of XprimeX exists. Let e = y?Xb. We call this residual vector. We have e = y?Xb (1) = y?X(XprimeX)?1Xprimey = (I ?X (XprimeX)?1 Xprime)y = (I ?P)y, (2) where P = X (XprimeX)?1 Xprime. The matrix P is called projection matrix. We also let I?P = M. Then, we may write (2) as y = Xb + e = Py + My. CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 2 We often write Py = ?y. This is the part of y that is explained by X. Properties of the matrices P and M are: (i) Pprime = P, P2 = P (idempotent matrix) (ii) Mprime = M, M2 = M (iii) PX = X, MX = 0 (iv) PM = 0 Using (1) and (iii), we have Xprimee = XprimeMy = 0. If the first column of X is 1 = (1,···,1)prime , this relation implies Xprime1e = nsummationdisplay i=1 ei = 0. In addition, (iv) gives yprimey = yprimePprimePy + yprimeMprimeMy = ?yprime?y + eprimee 3.2 Partitioned regression and partial regression Consider y = Xβ + ε = X1β1 + X2β2 + ε. The normal equations for b1 and b2 are parenleftbigg Xprime 1X1 X prime 1X2 Xprime2X1 Xprime2X2 parenrightbiggparenleftbigg b 1 b2 parenrightbigg = parenleftbigg Xprime 1y Xprime2y parenrightbigg . The first part of these equations are (Xprime1X1)b1 + (Xprime1X2)b2 = Xprime1y which gives b1 = (Xprime1X1)?1 Xprime1y ?(Xprime1X1)?1 Xprime1X2b2 = (Xprime1X1)?1 Xprime1 (y ? X2b2). Plug this into the second part of the normal equations. Then, we have Xprime2X1b1 + Xprime2X2b2 = Xprime2X1 (Xprime1X1)?1 Xprime1y ? Xprime2X1 (Xprime1X1)?1 Xprime1X2b2 + Xprime2X2b2 = Xprime2X1 (Xprime1X1)?1 Xprime1y + Xprime2 (I ? PX1)X2b2 = Xprime2y. CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 3 Thus b2 = (Xprime2 (I ? PX1)X2)?1 Xprime2 (I ? PX1)y. In the same manner, b1 = (Xprime1 (I ? PX2)X1)?1 Xprime1 (I ? PX2)y. Suppose that X1 = ? ?? 1... 1 ? ?? and X 2 = Z(n×K2). Then b2 = (Zprime (I ? P1)Z)?1 Zprime (I ? P1)y. But (I ? P1)Z = Z ?1(1prime1)1primeZ and 1prime1 =n 1primeZ = parenleftbig 1 ··· 1 parenrightbig ? ?? z11 ··· z1K2... zn1 ··· znK2 ? ?? = parenleftbig summationtextni=1 zi1 ··· summationtextni=1 ziK2 parenrightbig. Thus, (I ?P1)Z = Z ? ? ?? 1... 1 ? ??parenleftbig ˉz 1 ··· ˉzK2 parenrightbig = ? ?? ?? z11 ? ˉz1 ··· z1K2 ? ˉzK2 z21 ? ˉz1 ··· z2K2 ? ˉzK2 ... zn1 ? ˉz1 ··· znK2 ? ˉzK2 ? ?? ?? In the same way, (I ?P1)y = ? ?? y1 ? ˉy... yn ? ˉy ? ??. CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 4 These show that b2 is equivalent to the OLS estimator of β in the demeaned regression equation yi ? ˉy = βprime (zi ? ˉz) + εi.parenleftbig ˉz = (ˉz1,···, ˉzK2)primeparenrightbig Whether we demean the data and run regression or put a constant term in the model and run regression, we get the same results. 3.3 Goodness—of—fit measures (i) R2 Write y = Xb+ e = ?y + ?e. Let M0 = I ? 1(1prime1)?1 1prime with 1 = ? ?? 1... 1 ? ??. M0 transforms observation into deviations from sample means. Then M0y = M0Xb+ M0e = M0Xb+ M0e or y ? 1ˉy = ?y ?1ˉy + e. The total sum of variation of yi is yprimeM0y = bprimeXprimeM0Xb + eprimee. parallelshort parallelshort parallelshortsummationtext (yi ? ˉy)2 summationtext(?yi ? ˉy)2 summationtexte2i Note that bprimeXprimeM0e = bprimeXprimeM0Mε = bprimeXprime parenleftBig I ?1(1prime1)?1 1prime parenrightBig Mε = bprimeXprimeMε ? bprimeXprime1(1prime1)?1 1primeMε = 0 because XprimeM = 0 and 1primeM = 0. bprimeXprimeM0b is called regression sum of squares (SSR), and eprimee error sum of squares (SSE). CHAPTER 3 LEAST SQUARES METHODS FOR ESTIMATING β 5 How well the regression line fits the data can be explained by R2 = SSRSST = b primeXM0Xb yprimeM0y = 1? eprimee yprimeM0y. We call R2 coefficient of determination. Remark 1 0 ≤ R2 ≤ 1 0 : no fit 1 : perfect fit Remark 2 R 2 Xz : R 2 for the regression of y on X and an additional variable Z. R2X : R2 for the regression of y on X. Then R2Xz = R2X + parenleftbig1?R2Xparenrightbigr?2yz where r?2yz = (z prime ?y?) 2 (zprime?z?)(yprime?y?), z? = (I ?PX)z, y? = (I ?PX)y. R2 increases as the number of regressors increases whatever quality the additional regres- sors have. (ii) Theil’s ˉR2 (adjusted R2) ˉR2 = 1? eprimee/(n?k) yprimeM0y/(n?1) = 1? n?1 n?K parenleftbig1?R2parenrightbig ˉR2 will fall (rise) when the variable x is deleted from the regression if the t—ratio associated with this variable is greater (less) than 1. (iii) Information criteria AIC (k) = ln e primee n + 2k n (Akaike’s information criteria) BIC (k) = ln e primee n + klnn n (Bayesian information criteria) PC (k) = e primee n?k parenleftbigg 1 + kn parenrightbigg The smaller, the better.