CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 1
Chapter 2 The Classical Multiple Linear Regression
Model
2.1 Linear Regression Model
Notation:
yi : dependent variable, regressand
xi1,···,xik : independent variables, regressors
i : index for time, individuals, etc.
We want to explain yi using xi1,···,xik. The multiple linear regression model takes
the form
yi = β1xi1 + ···βkxik + εi.
Here
εi (random disturbance, error term)
denotes braceleftBigg
measurement error
omitted regressors
Index i is used for cross-section data, and t for time series data.
Example 1 Earnings and education
earningsi = β1 + β2educationi + εi
earningsi : i ? th individual’s annual earning
educationi : i ? th individual’s number of years in school
Any problems in the model?
Omitted variables such as job experience, job experience2, sex, marital status, etc.
earningsi = β1 + β2educationi + β3job experiencei + β4job experience2i
+β5sexi + β6marital statusi + εi
For sex and marital status, use dummy variables. That is,
sexi
braceleftBigg = 1 if male
= 0 if female
marital statusi
braceleftBigg = 1 if married
= 0 if single
See, for example, Ashenfelter and Krueger, American Economic Review, 1974, 73—85.
CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 2
Example 2 Class attendance and test scores
scorei = β1 + β2 (fraction of lectures attended)i
+β3 (fraction of problem sets completed)i + εi
See Romer, Journal of Economic Perspectives, 1993.
2.2 Classical Assumptions
1. Linearity
yi = β1xi1 + · · · + βkxiK + ε1 (i = 1, · · · ,n)
= xprimeiβ + εi
where
xi =
?
??
?
xi1
...
xiK
?
??
?, β =
?
??
?
β1
...
βK
?
??
?
or
y = x1β1 + · · · + xKβK + ε
where
y =
?
??
?
y1
...
yn
?
??
?, xl =
?
??
?
x1l
...
xnl
?
??
? (l = 1, · · · ,K)
ε =
?
??
?
ε1
...
εn
?
??
?
or
y = Xβ + ε
where
X =
?
??
?
x11 · · · x1K
...
xn1 · · · xnK
?
??
?, β =
?
??
?
β1
...
βK
?
??
?
Loglinear model:
lny = β1 + β2 lnx2 + · · · + βK lnxK + ε
? lny
? lnxk = βk (constant elasticity)
CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 3
Semilog model:
lnyt = Xprimetβ + δt + εt
Per period growth rate of yt not explained by Xt is
dlny
dt = δ
2. Full rank
X is an n × K matrix with rank K.
? The columns of X are linearly independent.
? Obviously, we should have n ≥ K.
? If this assumption is violated, X contains redundant information.
What if this assumption is violated?
Suppose that
y = β1 + β2X1 + β3X2 + ε
and
X1 = αX2.
Then, the ordinary least squares estimator does not exist. When X is of deficient
rank, we say that there is a multicollinearity problem.
3. Zero conditional mean of the disturbance
E (ε|X) =
?
??
?
E (ε1|X)
...
E (εn|X)
?
??
? = 0.
No observation on X convey information about the expected value of the distur-
bance. The assumption implies
E (εi) = 0
and
Cov(εi,X) = 0.
4. Spherical disturbances
braceleftBigg V ar(ε
i|X) = σ2 for all i = 1,2,···,n
Cov (εi,εj|X) = 0 for all i negationslash= j.
CHAPTER 2 THE CLASSICAL MULTIPLE LINEAR REGRESSION MODEL 4
These imply
E (εεprime|X) =
?
??
??
?
E (ε21|X) E (ε1ε2|X) · · · E (ε1εn|X)
...
E (ε2n|X)
?
??
??
?
= σ2I
and
Var [ε] = σ2I.
The assumption of common variance for εi is called homoskedasticity.