第 13章 多重回归模型
Multiple Regression Models
本章概要
? The Multiple Regression Model
? Contribution of Individual Independent
Variables
? Coefficient of Determination
? Categorical Explanatory Variables
? Transformation of Variables
? Model Building
The Multiple Regression Model
多重回归模型
ipipiii XXXY ????? ????????? 22110
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population
Y-intercept Population slopes
Dependent (Response)
variable for sample
Independent (Explanatory)
variables for sample model
Random
Error
ipipiii eXbXbXbbY ? ????????? 22110
Sample Multiple Regression Model
简单多重回归 ----线性
X2
X1
Y
pipiii XbXbXbbY ? ???????? 22110
ipipiii eXbXbXbbY ????????? 22110
ei
O i l ( G a l ) T e m p I n s u l a t i o n
2 7 5, 3 0 40 3
3 6 3, 8 0 27 3
1 6 4, 3 0 40 10
4 0, 8 0 73 6
9 4, 3 0 64 6
2 3 0, 9 0 34 6
3 6 6, 7 0 9 6
3 0 0, 6 0 8 10
2 3 7, 8 0 23 10
1 2 1, 4 0 63 3
3 1, 4 0 65 10
2 0 3, 5 0 41 6
4 4 1, 1 0 21 3
3 2 3, 0 0 38 3
5 2, 5 0 58 10
Multiple Regression
Model,Example
(0F)Develop a model for
estimating heating oil
used for a single family
home in the month of
January based on average
temperature and amount
of insulation in inches.
Sample Regression
Model,Example
pipiii XbXbXbbY ? ???????? 22110
C o e f f i c i e n t s
I n t e r c e p t 5 6 2, 1 5 1 0 0 9 2
X V a r i a b l e 1 - 5, 4 3 6 5 8 0 5 8 8
X V a r i a b l e 2 - 2 0, 0 1 2 3 2 0 6 7
Excel Output
iii X.X..Y ? 21 012204375151562 ???
For each degree increase in
temperature,the average amount of
heating oil used is decreased by
5.437 gallons,holding insulation
constant.
For each increase in one inch
of insulation,the use of heating
oil is decreased by 20.012
gallons,holding temperature
constant.
Using The Model to
Make Predictions
969278
601220304375151562
012204375151562
21
.
...
X.X..Y ?
iii
?
?????
???
Estimate the average amount of heating oil used
for a home if the average temperature is 300 and
the insulation is 6 inches.
The estimated heating oil
used is 278.97 gallons
Coefficient of Multiple
Determination
R e g r e s s i o n S t a t i s t i c s
M u l t i p l e R 0, 9 8 2 6 5 4 7 5 7
R S q u a r e 0, 9 6 5 6 1 0 3 7 1
A d j u s t e d R S q u a r e 0, 9 5 9 8 7 8 7 6 6
S t a n d a r d E r r o r 2 6, 0 1 3 7 8 3 2 3
O b s e r v a t i o n s 15
Excel Output
S S T
S S Rr
,Y ?
2
12
Adjusted r2
?reflects the number
of explanatory
variables and sample
size
? is smaller than r2
Residual Plots
残差散点图
? Residuals Vs Yi
? May need to transform Y variable
? Residuals Vs X1
? May need to transform X1variable
? Residuals Vs X2
? May need to transform X2 variable
? Residuals Vs Time
? May have autocorrelation(自回归)
?
I n s u l a t i o n R e s i d u a l P l o t
0 2 4 6 8 10 12
Residual Plots,Example
Excel Output
No Discernable
Pattern
T e m p e r a t u r e R e s i d u a l P l o t
- 6 0
- 4 0
- 2 0
0
20
40
60
0 20 40 60 80
R
e
s
i
d
u
a
l
s
Testing for Overall Significance
整体显著性检验
?Shows if there is a linear relationship between all of
the X variables together and Y
?Use F test Statistic
?Hypotheses:
H0,?1 = ?2 =,..= ?p = 0 (No linear relationship)
H1,At least one ?i ? 0 ( At least one independent
variable affects Y)
Test for Overall Significance
Excel Output,Example
A N O V A
df SS MS F S i g n i f i c a n c e F
R e g r e s s i o n 2 2 2 8 0 1 4, 6 1 1 4 0 0 7, 3 1 6 8, 4 7 1 2 0 2 8 1, 6 5 4 1 1 E - 0 9
R e s i d u a l 12 8 1 2 0, 6 0 3 6 7 6, 7 1 6 9
T o t a l 14 2 3 6 1 3 5, 2
p = 2,the number of
explanatory variables n - 1
MRS
MSE
p value
= F Test Statistic
F0 3.89
H0,?1 = ?2 =?= ?p = 0
H1,At least one ?I ? 0
? =,05
df = 2 and 12
Critical Value(s):
Test Statistic,
Decision:
Conclusion:
Reject at ? = 0.05
There is evidence that At
least one independent
variable affects Y
? = 0.05
F ?
Test for Overall Significance
Example Solution
168.47
(Excel Output)
Test for Significance:Individual Variables
变量的显著性检验
?Shows if there is a linear relationship between the
variable Xi and Y
?Use t test Statistic
?Hypotheses:
H0,?i = 0 (No linear relationship)
H1,?i ? 0 (Linear relationship between Xi and Y)
C o e f f i c i e n t s S t a n d a r d E r r o r t S t a t
I n t e r c e p t 5 6 2, 1 5 1 0 0 9 2 1, 0 9 3 1 0 4 3 3 2 6, 6 5 0 9 4
X V a r i a b l e 1 - 5, 4 3 6 5 8 0 6 0, 3 3 6 2 1 6 1 6 7 - 1 6, 1 6 9 9
X V a r i a b l e 2 - 2 0, 0 1 2 3 2 1 2, 3 4 2 5 0 5 2 2 7 - 8, 5 4 3 1 3
t Test Statistic
Excel Output,Example
t Test Statistic for X1
(Temperature)
t Test Statistic for X2
(Insulation)
H0,?1 = 0
H1,?1 ? 0
df = 12
Critical Value(s):
Test Statistic:
Decision:
Conclusion:
Reject H0 at ? = 0.05
There is evidence of a
significant effect of
temperature on oil
consumption.Z0 2.1788-2.1788
.025
Reject H0 Reject H0
.025
Does temperature have a significant effect on monthly
consumption of heating oil? Test at ? = 0.05.
t Test, Example Solution
t Test Statistic = -16.1699
Confidence Interval
Estimate For The Slope
Provide the 95% confidence interval for the population
slope ?1 (the effect of temperature on oil consumption).
111 bpn Stb ???
C o e ff i c i e n ts L o w e r 9 5 % U p p e r 9 5 %
I n t e r c e p t 5 6 2, 1 5 1 0 0 9 5 1 6, 1 9 3 0 8 3 7 6 0 8, 1 0 8 9 3 5
X V a r i a b l e 1 - 5, 4 3 6 5 8 0 6 - 6, 1 6 9 1 3 2 6 7 3 - 4, 7 0 4 0 2 8 5
X V a r i a b l e 2 - 2 0, 0 1 2 3 2 1 - 2 5, 1 1 6 2 0 1 0 2 - 1 4, 9 0 8 4 4
-6.169 ? ?1 ? -4.704
The average consumption of oil is reduced by between
4.7 gallons to 6.17 gallons per each increase of 10 F.
Testing Portions of Model
模型部分检验
? Contribution of One Xi to Model (holding
all others constant)
? Denote by SSR(Xi?all variables except i )
? = Coefficient of partial determination
of X1 with Y holding X2 constant
? Evaluate Separate Models
? Useful in Selecting Independent Variables
2
21.Yr
Testing Portions of
Model,SSR
Contribution of X1 given X2 has been included:
SSR(X1?X2) = SSR(X1 and X2) - SSR(X2)
From ANOVA section of
regression for
iii XbXbbY? 22110 ???
From ANOVA section of
regression for
ii XbbY? 220 ??
Partial F Test For
Contribution of Xi
? Hypotheses:
? H0, Variable Xi does not significantly improve
the model given all others included
H1, Variable Xi significantly improves the
model given all others included
? Test Statistic:
? F =
M SE
)o t h e rsa l lX(S SR i
With df = 1 and (n - p -1)
Coefficient of Partial Determination
偏相关性检验
)XX(S S R)XandX(S S RS S T
)XX(S S R
r,Y
2121
212
21 ???
iii XbXbbY? 22110 ???
From ANOVA section of
regression for
From ANOVA section of
regression for
ii XbbY? 220 ??
Testing Portions of
Model,Example
Test at the ? =,05 level
to determine if the
variable of average
temperature
significantly improves
the model given that
insulation is included.
Testing Portions of
Model,Example
H0,X1 does not improve
model (X2 included)
H1,X1 does improve model
? =,05,df = 1 and 12
Critical Value = 4.75A N O V A
SS
R e g r e s s i o n 5 1 0 7 6, 4 7
R e s i d u a l 1 8 5 0 5 8, 8
T o t a l 2 3 6 1 3 5, 2
7 1 76 7 6
0 7 6510 1 52 2 821
,
,,
M S E
)XX(SSR
F ???
A N O V A
SS MS
R e g r e s s i o n 2 2 8 0 1 4, 6 2 6 3 1 1 4 0 0 7, 3 1 3
R e s i d u a l 8 1 2 0, 6 0 3 0 1 6 6 7 6, 7 1 6 9 1 8
T o t a l 2 3 6 1 3 5, 2 2 9 3
(For X1 and X2) (For X2)
= 261.47
Conclusion,Reject H0,X1 does improve model
Curvilinear Regression Model
曲线回归
? Relationship between 1 response
variable and 2 or more explanatory
variable is a polynomial function
? Useful when scatter diagram indicates
non-linear relationship
? Curvilinear model:
? The second explanatory variable is the
square of the 1st.
iiii XXY ???? ???? 212110
Curvilinear Regression Model
Curvilinear models may be considered when
scatter diagram takes on the following shapes:
X1
Y
X1X1X1
YYY
?2 > 0 ?2 > 0 ?2 < 0 ?2 < 0
?2 = the coefficient of the quadratic term
Testing for Significance,Curvilinear Model
? Testing for Overall Relationship
? Similar to test for linear model
? F test statistic =
? Testing the Curvilinear Effect
? Compare curvilinear model
with the linear model
MSE
MSR
iiii XXY ???? ???? 212110
iii XY ??? ??? 110
Dummy-Variable Models
哑变量模型
? Categorical Variable Involved (dummy
variable) with 2 Levels:
? yes or no,on or off,male or female,
? Coded 0 or 1
? Intercepts Different
? Assumes Equal Slopes
? Regression Model has Same Form:
ipipiii XXXY ????? ????????? 22110
Dummy-Variable Models Assumption
Given:
Y = Assessed Value of House
X1 = Square footage of House
X2 = Desirability of Neighborhood =
Desirable (X2 = 1)
Undesirable (X2 = 0)
iii Xb)bb()(bXbbY ? 11202110 1 ??????
0 if undesirable
1 if desirable
iii Xbb)(bXbbY ? 1102110 0 ?????
iii XbXbbY? 22110 ???
Same
slopes
Dummy-Variable Models Assumption
哑变量模型假设
X1 (Square footage)
Y (Assessed Value)
b0 + b2
b0
Same
slopes
Intercepts
different
Evaluating Presence of Interaction
评估交互作用
? Hypothesize Interaction Between Pairs of
Independent Variables
? Contains 2-way Product Terms
? Hypotheses:
? H0,??3 = 0 (No interaction between X1 and X2
H1,??3 ? 0 (X1 interacts with X2)
iiiiii XXXXY ????? ????? 21322110
使用变换处理非线性问题
? For Non-linear Models that Violate
Linear Regression Assumptions
? Determine Type of Transformation
From Scatter Diagram
? Requires Data Transformation
? Either or Both Independent and
Dependent Variables May be
Transformed
Square Root Transformation
平方根变换
Y
X
1
Y
X
1
iiii XXY ???? ???? 22110
?1 > 0
?1 < 0
Similarly for X2
Transforms one of above model to one that appears linear,
Often used to overcome heteroscedasticity.
Logarithmic Transformation
对数变换
Y
X
1
Y
X
1
iiii )Xl n ()Xl n (Y ???? ???? 22110
?1 > 0
?1 < 0
Similarly for X2
Transformed from an original multiplicative model
Exponential Transformation
指数变换
Y
X
1
Y
X
1
i
XX
i iieY ?
??? 22110 ???Original Model
?1 > 0
?1 < 0
Similarly for X2
Transformed into,122110 ???? lnXXYln iii ????
Collinearity
共线性
? High Correlation Between Explanatory
Variables
? Coefficients Measure Combined Effect
? No New Information Provided
? Leads to Unstable Coefficients
? Depending on the explanatory variables
? VIF Used to Measure Collinearity,
R
V I F
j
j 21
1
?
? 2
jR = Coefficient of MultipleDetermination of X
j
with all the others
Model Building
? Goal is to Develop Model with Fewest
Explanatory Variables
? Easier to interpret
? Lower probability of collinearity(共线性)
? Stepwise Regression Procedure
? Provide limited evaluation of alternative
models
? Best-Subset Approach
? Uses the Cp Statistic
? Selects model with small Cp near p+1
Model Building Flowchart
建模流程
Choose
X1,X2,匵 k
Run Regression
to find VIFs
Remove
Variable with
Highest
VIF
Any
VIF>5?
Run Subsets
Regression to Obtain
揵 est?models in
terms of Cp
Do Complete Analysis
Add Curvilinear Term and/or
Transform Variables as Indicated
Perform
Predictions
No
More than
One?
Remove
this X
Yes
No
Yes
本章小结
? Presented The Multiple Regression Model
? Considered Contribution of Individual
Independent Variables
? Discussed Coefficient of Determination
? Addressed Categorical Explanatory
Variables
? Considered Transformation of Variables
? Discussed Model Building