CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 1 Chapter 2 The Classical Multiple Linear Regression Model 2.1 Linear Regression Model Notation: yi : dependent variable, regressand xi1,···,xik : independent variables, regressors i : index for time, individuals, etc. We want to explain yi using xi1,···,xik. The multiple linear regression model takes the form yi = β1xi1 + ···βkxik + εi. Here εi (random disturbance, error term) denotes braceleftBigg measurement error omitted regressors Index i is used for cross-section data, and t for time series data. Example 1 Earnings and education earningsi = β1 + β2educationi + εi earningsi : i ? th individual’s annual earning educationi : i ? th individual’s number of years in school Any problems in the model? Omitted variables such as job experience, job experience2, sex, marital status, etc. earningsi = β1 + β2educationi + β3job experiencei + β4job experience2i +β5sexi + β6marital statusi + εi For sex and marital status, use dummy variables. That is, sexi braceleftBigg = 1 if male = 0 if female marital statusi braceleftBigg = 1 if married = 0 if single See, for example, Ashenfelter and Krueger, American Economic Review, 1974, 73—85. CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 2 Example 2 Class attendance and test scores scorei = β1 + β2 (fraction of lectures attended)i +β3 (fraction of problem sets completed)i + εi See Romer, Journal of Economic Perspectives, 1993. 2.2 Classical Assumptions 1. Linearity yi = β1xi1 + · · · + βkxiK + ε1 (i = 1, · · · ,n) = xprimeiβ + εi where xi = ? ?? ? xi1 ... xiK ? ?? ?, β = ? ?? ? β1 ... βK ? ?? ? or y = x1β1 + · · · + xKβK + ε where y = ? ?? ? y1 ... yn ? ?? ?, xl = ? ?? ? x1l ... xnl ? ?? ? (l = 1, · · · ,K) ε = ? ?? ? ε1 ... εn ? ?? ? or y = Xβ + ε where X = ? ?? ? x11 · · · x1K ... xn1 · · · xnK ? ?? ?, β = ? ?? ? β1 ... βK ? ?? ? Loglinear model: lny = β1 + β2 lnx2 + · · · + βK lnxK + ε ? lny ? lnxk = βk (constant elasticity) CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 3 Semilog model: lnyt = Xprimetβ + δt + εt Per period growth rate of yt not explained by Xt is dlny dt = δ 2. Full rank X is an n × K matrix with rank K. ? The columns of X are linearly independent. ? Obviously, we should have n ≥ K. ? If this assumption is violated, X contains redundant information. What if this assumption is violated? Suppose that y = β1 + β2X1 + β3X2 + ε and X1 = αX2. Then, the ordinary least squares estimator does not exist. When X is of deficient rank, we say that there is a multicollinearity problem. 3. Zero conditional mean of the disturbance E (ε|X) = ? ?? ? E (ε1|X) ... E (εn|X) ? ?? ? = 0. No observation on X convey information about the expected value of the distur- bance. The assumption implies E (εi) = 0 and Cov(εi,X) = 0. 4. Spherical disturbances braceleftBigg V ar(ε i|X) = σ2 for all i = 1,2,···,n Cov (εi,εj|X) = 0 for all i negationslash= j. CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 4 These imply E (εεprime|X) = ? ?? ?? ? E (ε21|X) E (ε1ε2|X) · · · E (ε1εn|X) ... E (ε2n|X) ? ?? ?? ? = σ2I and Var [ε] = σ2I. The assumption of common variance for εi is called homoskedasticity.