CHAPTER 17
Model Building
to accompany
Introduction to Business Statistics
fourth edition,by Ronald M,Weiers
Presentation by Priscilla Chaffe-Stengel
Donald N,Stengel
? 2002 The Wadsworth Group
Chapter 17 - Learning Objectives
? Build polynomial regression models to describe
curvilinear relationships
? Apply qualitative variables representing two or
three categories.
? Use logarithmic transforms in constructing
exponential and multiplicative models.
? Identify and compensate for multicollinearity
? Apply stepwise regression
? Select the most suitable among competing models
? 2002 The Wadsworth Group
Polynomial Models with One
Quantitative Predictor Variable
? Simple linear regression equation:
? Equation for second-order polynomial model:
? Equation for third-order polynomial model:
? Equation for general polynomial model:
? 2002 The Wadsworth Group
xbby 10? ??
2210? xbxbby ???
332210? xbxbxbby ????
p
p xbxbxbxbby ??????,..?
3
3
2
210
Polynomial Models with Two
Quantitative Predictor Variables
? First-order model with no interaction:
? First-order model with interaction:
? Second-order model with no interaction:
? Second-order model with interaction:
? 2002 The Wadsworth Group
22110? xbxbby ???
21322110? xxbxbxbby ????
22421322110? xbxbxbxbby ?????
21522421322110? xxbxbxbxbxbby ??????
Models with Qualitative Variables
? Equation for a model with a categorical
independent variable with two possible states:
– where state 1 is shown x = 1
– where state 2 is shown x = 0
? Equation for a model with a categorical
independent variable with three possible states:
– where state 1 is shown x1 = 1,x2 = 0
– where state 2 is shown x1 = 0,x2 = 1
– Where state 3 is shown x1 = 0,x2 = 0
? 2002 The Wadsworth Group
xbby 10? ??
22110? xbxbby ???
Models with Data Transformations
Exponential Model:
? General equation for an exponential model:
? Corresponding linear regression equation for an
exponential model:
Multiplicative Model:
? General equation for a multiplicative model:
? Corresponding linear regression equation for a
multiplicative model:
? 2002 The Wadsworth Group
xy 10 ?? ??
xbby ??? )( lo glo g?lo g 10
21 210 ??? xxy ???
22110 lo glo glo g?lo g xbxbby ?????
Example,Problem 17.8
? International Data Corporation has reported the following
costs per gigabyte of hard drive storage space for years 1995
through 2000,Using x = 1 through 6 to represent years 1995
through 2000,fit a second-order polynomial model to the
data and estimate the cost per gigabyte for the year 2008.
The regression equation
will have the form:
Year x = Yr y = Cost
1995 1 $261.84
1996 2 137.94
1997 3 69.68
1998 4 29.30
1999 5 13.09
2000 6 6.46
? 2002 The Wadsworth Group
2
210? xbxbby ???
Example,Problem 17.8,cont.
Microsoft Excel Output
? 2002 The Wadsworth Group
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.99655892
R Square 0.99312968
Adj R Square 0.98854948
Standard Error 10.5650522
Observations 6
Example,Problem 17.8,cont.
Microsoft Excel Output
The regression equation is:
? 2002 The Wadsworth Group
Coefficients
Standard
Error t Stat P-value
Intercept 387.993 18.8993399 20.529447 0.0002527
x -147.65675 12.3644646 -11.94203 0.0012629
x^2 14.1883929 1.72911255 8.2055924 0.0037879
219.1466.14799.387? xxy ???
Example,Problem 17.8,cont.
? To estimate the cost per gigabyte for the year
2008,evaluate when x = 14.
? So the cost per gigabyte in 2008 is estimated to
be $1101.99.
? Does this make sense? Of course not.
? Explanation,Although the polynomial equation
provides a good fit for the data during the
period 1995-2000,this form is not appropriate to
extrapolate the data out to 2008.
? 2002 The Wadsworth Group
y?
99.1 1 0 1?
1419.141466.14799.387? 2
?
?????
y
y
Example,Problem 17.32
? An exponential model will probably be more
appropriate to the data used in Problem 17.8.
? 2002 The Wadsworth Group
y Log y x
$261.84 2.418036 1
137.94 2.13969 2
69.68 1.843108 3
29.30 1.466868 4
13.09 1.11694 5
6.46 0.810233 6
Example,Problem 17.32,cont.
Microsoft Excel Output
? 2002 The Wadsworth Group
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.998899423
R Square 0.997800057
Adj R Square 0.997250071
Standard Error 0.03222401
Observations 6
Example,Problem 17.32,cont.
Microsoft Excel Output
The regression equation is:
? 2002 The Wadsworth Group
Coefficients
Standard
Error t Stat P-value
Intercept 2.780829985 0.02999892 92.69767 8.12E-08
x -0.32810028 0.00770301 -42.5938 1.82E-06
xy 3 2 8 1.07 8 0 8.2?lo g ??
Example,Problem 17.32,cont.
? For x = 14,
? Based on the exponential model,the cost
per gigabyte in 2008 will be $0.0154,or just
under 2 cents.
? 2002 The Wadsworth Group
0 1 5 4.010?
8 1 2 6.1
143 2 8 1.07 8 0 8.2?l o g
8 1 2 6.1
??
??
???
?
y
y
Example,Problem 17.27
? An efficiency expert has studied 12 employees
who perform similar assembly tasks,recording
productivity (units per hour),number of years of
experience,and which one of three popular
assembly methods the individual has chosen to
use in performing the task,Given the data,
shown on the next slide,determine the linear
regression equation for estimating productivity
based on the other variables,For any qualitative
variables that are used,be sure to specify the
coding strategy each will employ.
? 2002 The Wadsworth Group
Example,Problem 17.27,cont.
? 2002 The Wadsworth Group
Worker Prod,Yrs.Exp Method Worker Prod,Yrs.Exp Method
1 75 7 A 7 97 12 B
2 88 10 C 8 85 10 C
3 91 4 B 9 102 12 C
4 93 5 B 10 93 13 A
5 95 11 C 11 112 12 B
6 77 3 A 12 86 14 A
Example,Problem 17.27,cont.
? The equation for a model with one quantitative
variable and a categorical independent variable
with three possible states is:
– where x1 represents the years of experience
– where state 1 is shown x2 = 1 if method A is used,
0 if otherwise
– where state 2 is shown x3 = 1 if method B is used,
0 if otherwise
– where state 3 is shown x2 = 0 and x3 = 0 if method C is
used.
? 2002 The Wadsworth Group
3322110? xbxbxbby ????
Example,Problem 17.27,cont.
So the data to be analyzed are:
? 2002 The Wadsworth Group
Worker y x1 x2 x3
1 75 7 1 0
2 88 10 0 0
3 91 4 0 1
4 93 5 0 1
5 95 11 0 0
6 77 3 1 0
Example,Problem 17.27,cont.
? 2002 The Wadsworth Group
Worker y x1 x2 x3
7 97 12 0 1
8 85 10 0 0
9 102 12 0 0
10 93 13 1 0
11 112 12 0 1
12 86 14 1 0
Example,Problem 17.27,cont.
Microsoft Excel Output
? 2002 The Wadsworth Group
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.86075031
R Square 0.74089109
Adj R Square 0.64372525
Standard Error 6.0861957
Observations 12
Example,Problem 17.27,cont.
Microsoft Excel Output
The regression equation is:
? 2002 The Wadsworth Group
Coefficients
Standard
Error t Stat P-value
Intercept 75.368984 6.30729302 11.949498 2.214E-06
x1 1.59358289 0.51391877 3.1008459 0.014647
x2 -7.3596257 4.37208671 -1.683321 0.1308108
x3 9.73395722 4.49127957 2.1673016 0.062079
321 73.936.759.137.75? xxxy ????
Example,Problem 17.27,cont.
? The regression equation has an adjusted
R-square of 0.644,This indicates that the
regression model provides a reasonable
explanation for the variation in the data
set.
? Only the coefficient for x1 is significant at
the 0.05 level,One might consider
removing the assembly method from the
model.
? 2002 The Wadsworth Group