CHAPTER 15
Simple Linear Regression
and Correlation
to accompany
Introduction to Business Statistics
fourth edition,by Ronald M,Weiers
Presentation by Priscilla Chaffe-Stengel
Donald N,Stengel
? 2002 The Wadsworth Group
Chapter 15 - Learning Objectives
? Determine the least squares regression
equation,and make point and interval
estimates for the dependent variable.
? Determine and interpret the value of the:
– Coefficient of correlation.
– Coefficient of determination.
? Construct confidence intervals and carry
out hypothesis tests involving the slope of
the regression line.
? 2002 The Wadsworth Group
Chapter 15 - Key Terms
? Direct or inverse
relationships
? Least squares
regression model
? Standard error of the
estimate,sy,x
? Point estimate using
the regression model
? Confidence interval
for the mean
? Prediction interval for
an individual value
? Coefficient of
correlation
? Coefficient of
determination
? 2002 The Wadsworth Group
Chapter 15 - Key Concept
Regression analysis generates a
“best-fit” mathematical equation
that can be used in predicting the
values of the dependent variable as
a function of the independent
variable.
? 2002 The Wadsworth Group
Direct vs Inverse Relationships
? Direct relationship:
– As x increases,y increases.
– The graph of the model rises from left to right.
– The slope of the linear model is positive.
? Inverse relationship:
– As x increases,y decreases.
– The graph of the model falls from left to right.
– The slope of the linear model is negative.
? 2002 The Wadsworth Group
Simple Linear Regression Model
? Probabilistic Model,yi = b0 + b1xi + ei
where yi = a value of the dependent variable,y
xi = a value of the independent variable,x
b0 = the y-intercept of the regression line
b1 = the slope of the regression line
ei = random error,the residual
? Deterministic Model:
= b0 + b1xi where
and is the predicted value of y in contrast to the actual
value of y.
y i b 0 ? b 0,b 1 ? b 1
y i
? 2002 The Wadsworth Group
Determining the Least Squares
Regression Line
? Least Squares Regression Line:
– Slope
– y-intercept
? y = b0 + b1x1
b1 = ( xiyi) – n× x × y ?( x
i2) – n× x 2?
b 0 = y b 1 ? x
? 2002 The Wadsworth Group
Simple Linear Regression:
An Example
? Problem 15.9,
For a sample of 8 employees,a personnel director has
collected the following data on ownership of company
stock,y,versus years with the firm,x.
x 6 12 14 6 9 13 15 9
y 300 408 560 252 288 650 630 522
(a) Determine the least squares regression line and
interpret its slope,(b) For an employee who has been with
the firm 10 years,what is the predicted number of shares
of stock owned?
? 2002 The Wadsworth Group
An Example,cont.
x y x?y x2
6 300 1800 36
12 408 4896 144
14 560 7840 196
6 252 1512 36
9 288 2592 81
13 650 8450 169
15 630 9450 225
9 522 4698 81
Mean,10.5 451.25
Sum,41,238 968
? 2002 The Wadsworth Group
An Example,cont.
? Slope:
? y-Intercept:
So the,best-fit” linear model,rounding to
the nearest tenth,is:
b1 = ( xiyi) – n× x × y ?( x
i2) – n× x 2?
= 41238 – 8× (10.5)× (451.25)
968 - 8× (10.5)2
= 38.7558
b 0 = y b 1 ? x = 4 5 1, 25 ( 38, 7 5 5 8 ) ? ( 10, 5 ) = 44, 3 1 4 0
? y = 44.3140 + 38.7558x ? 44.3 + 38.8x
? 2002 The Wadsworth Group
An Example,cont.
? Interpretation of the slope,For every
additional year an employee works for the
firm,the employee acquires an estimated
38.8 shares of stock per year.
? If x1 = 10,the point estimate for the number
of shares of stock that this employee owns
is,? y = 44.314 + 38.7558× x
= 44.314 + 38.7558× (10)
= 431.872 ? 432 shares
? 2002 The Wadsworth Group
Interval Estimates Using the
Regression Model
? Confidence Interval for the Mean of y
– places an upper and lower bound around
the point estimate for the average value
of y given x.
? Prediction Interval for an Individual y
– places an upper and lower bound around
the point estimate for an individual value
of y given x.
? 2002 The Wadsworth Group
To Form Interval Estimates
? The Standard Error of the Estimate,sy,x
– The standard deviation of the distribution of the
?data points above and below the regression line,
?distances between actual and predicted values of y,
?residuals,of e
– The square root of MSE given by ANOVA
2–
2)?–(
,n
yiy
xys
?=
? 2002 The Wadsworth Group
Equations for the Interval Estimates
? Confidence Interval for the Mean of y
? Prediction Interval for the Individual y
? ?
+××±
ni
x
ix
xvaluexnxysty
2)(
–)2(
2)–(1)
,(2? a
? y ± ta
2
× (sy,x)× 1 + 1n + (x value – x )2
( xi2) – ( xi)
2?
n?
? 2002 The Wadsworth Group
Using Intervals - An Example
? For employees who worked 10 years for the firm,
what is the 95% confidence interval for their mean
share holdings?
This calls for a confidence interval on the
average number of shares owned by
employees who worked for the firm 10 years,
So we will use:
? ?
+××±
nxx
xxnxysty
2)(–)2(
2)–value(1
,2? a
? 2002 The Wadsworth Group
Standard Error of the Estimate,
Definitional Equation
x y Predicted y Squared Residual
6 300 276.8488 535.9763
12 408 509.3837 10278.6589
14 560 586.8953 723.3598
6 252 276.8488 617.4647
9 288 393.1163 11049.4321
13 650 548.1395 10375.5544
15 630 625.6512 18.9124
9 522 393.1163 16611.0135
Sum = 50210.3721
? 2002 The Wadsworth Group
Evaluating the Confidence Interval
Since n = 8,df = 8 – 2 = 6 and ta/2 = 2.447,From our prior
analyses,Sx = 84,Sx2 = 968,and the predicted y = 431.872.
4789.912–8 3721.210,502–
2)?–(
,==
?=
n
yiy
xys
057.80872.431)3576.0()4789.91()447.2(872.431
8
284–968
2)5.10–10(
8
1)4789.91()447.2(872.431
2)(–)2(
2)–value(1
,2?
±=××±
=+××±
=
? ?
+××±
nxx
xxnxysty a
? 2002 The Wadsworth Group
Interpreting the Confidence Interval
? Based on our calculations,we would
have 95% confidence that the mean
number of shares for persons working
for the firm 10 years will be between:
431.872 – 80.057 = 351.815
and
431.872 + 80.057 = 511.929
Written in interval notation,
(351.815,511.929)
? 2002 The Wadsworth Group
Using Intervals - An Example
? An employee worked 10 years for the firm,What
is the 95% prediction interval for her share
holdings?
This calls for a prediction interval on the
number of shares owned by an individual
employee who worked for the firm 10 years,
So we will use:
? ?
++××±
nxx
xxnxysty
2)(–)2(
2)–value(11
,2? a
? 2002 The Wadsworth Group
Evaluating the Prediction Interval
Since n = 8,df = 8 – 2 = 6 and ta/2 = 2.447,From our prior
analyses,Sx = 84,Sx2 = 968,and the predicted y = 431.872.
734.237872.431)0620.1()4789.91()447.2(872.431
8
284–968
2)5.10–10(
8
11)4789.91()447.2(872.431
2)(–)2(
2)–value(11
,2?
±=××±
=++××±
=
? ?
++××±
nxx
xxnxysty a
? 2002 The Wadsworth Group
Interpreting the Prediction Interval
? Based on our calculations,we would
have 95% confidence that the number of
shares an employee working for the firm
10 years will hold will be between:
431.872 – 237.734 = 194.138
and
431.872 + 237.734 = 669.606
Written in interval notation,
(194.138,669.606)
? 2002 The Wadsworth Group
Comparing the Two Intervals
Notice that the confidence interval for the
mean is much narrower than the prediction
interval for the individual value,There is
greater fluctuation among individual values
than among group means,Both are centered
at the point estimate,= 431.872
| | | | | | | | | | | | | | |
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0
C o n f i d e n c e | |
I n t e r v a l, 3 5 1, 8 5 1 1, 9
P r e d i c t i o n | |
I n t e r v a l, 1 9 4, 1 6 6 9, 6
?y
? 2002 The Wadsworth Group
Coefficient of Correlation
? A measure of the
– Direction of the linear relationship between
x and y.
?If x and y are directly related,r > 0.
?If x and y are inversely related,r < 0.
– Strength of the linear relationship between
x and y.
?The larger the absolute value of r,the more the
value of y depends in a linear way on the value of x.
? 2002 The Wadsworth Group
Coefficient of Determination
? A measure of the
– Strength of the linear relationship between
x and y.
?The larger the value of r2,the more the value of
y depends in a linear way on the value of x.
– Amount of variation in y that is related to
variation in x.
– Ratio of variation in y that is explained by
the regression model divided by the total
variation in y.
? 2002 The Wadsworth Group
Testing for Linearity
Key Argument:
? If the value of y does not change linearly
with the value of x,then using the mean
value of y is the best predictor for the actual
value of y,This implies is preferable.
? If the value of y does change linearly with
the value of x,then using the regression
model gives a better prediction for the value
of y than using the mean of y,This implies
is preferable.
y = y
y =?y
? 2002 The Wadsworth Group
Three Tests for Linearity
? 1,Testing the Coefficient of Correlation
H0,r = 0 There is no linear relationship between x and y.
H1,r? 0 There is a linear relationship between x and y.
Test Statistic,
? 2,Testing the Slope of the Regression Line
H0,b1 = 0 There is no linear relationship between x and y.
H1,b1 ? 0 There is a linear relationship between x and y.
Test Statistic:
t = r
1 – r2
n – 2
t
b
s y x
x n x
=
? - ?
1
2 2
,
( )
? 2002 The Wadsworth Group
Three Tests for Linearity
? 3,The Global F-test
H0,There is no linear relationship between x and y.
H1,There is a linear relationship between x and y.
Test Statistic:
Note,At the level of simple linear regression,the global
F-test is equivalent to the t-test on b1,When we conduct
regression analysis of multiple variables,the global F-
test will take on a unique function.
F = MSRMSE =
SSR1
SSE(n – 2)
? 2002 The Wadsworth Group
A General Test of b1
? Testing the Slope of the Population
Regression Line Is Equal to a Specific
Value.
H0,b1 = b10
The slope of the population regression line is b10.
H1,b1 ?b10
The slope of the population regression line is not b10.
Test Statistic:
2)(–2
,
10–1
xnx
xys
bt
?
= b
? 2002 The Wadsworth Group
Simple Linear Regression
and Correlation
to accompany
Introduction to Business Statistics
fourth edition,by Ronald M,Weiers
Presentation by Priscilla Chaffe-Stengel
Donald N,Stengel
? 2002 The Wadsworth Group
Chapter 15 - Learning Objectives
? Determine the least squares regression
equation,and make point and interval
estimates for the dependent variable.
? Determine and interpret the value of the:
– Coefficient of correlation.
– Coefficient of determination.
? Construct confidence intervals and carry
out hypothesis tests involving the slope of
the regression line.
? 2002 The Wadsworth Group
Chapter 15 - Key Terms
? Direct or inverse
relationships
? Least squares
regression model
? Standard error of the
estimate,sy,x
? Point estimate using
the regression model
? Confidence interval
for the mean
? Prediction interval for
an individual value
? Coefficient of
correlation
? Coefficient of
determination
? 2002 The Wadsworth Group
Chapter 15 - Key Concept
Regression analysis generates a
“best-fit” mathematical equation
that can be used in predicting the
values of the dependent variable as
a function of the independent
variable.
? 2002 The Wadsworth Group
Direct vs Inverse Relationships
? Direct relationship:
– As x increases,y increases.
– The graph of the model rises from left to right.
– The slope of the linear model is positive.
? Inverse relationship:
– As x increases,y decreases.
– The graph of the model falls from left to right.
– The slope of the linear model is negative.
? 2002 The Wadsworth Group
Simple Linear Regression Model
? Probabilistic Model,yi = b0 + b1xi + ei
where yi = a value of the dependent variable,y
xi = a value of the independent variable,x
b0 = the y-intercept of the regression line
b1 = the slope of the regression line
ei = random error,the residual
? Deterministic Model:
= b0 + b1xi where
and is the predicted value of y in contrast to the actual
value of y.
y i b 0 ? b 0,b 1 ? b 1
y i
? 2002 The Wadsworth Group
Determining the Least Squares
Regression Line
? Least Squares Regression Line:
– Slope
– y-intercept
? y = b0 + b1x1
b1 = ( xiyi) – n× x × y ?( x
i2) – n× x 2?
b 0 = y b 1 ? x
? 2002 The Wadsworth Group
Simple Linear Regression:
An Example
? Problem 15.9,
For a sample of 8 employees,a personnel director has
collected the following data on ownership of company
stock,y,versus years with the firm,x.
x 6 12 14 6 9 13 15 9
y 300 408 560 252 288 650 630 522
(a) Determine the least squares regression line and
interpret its slope,(b) For an employee who has been with
the firm 10 years,what is the predicted number of shares
of stock owned?
? 2002 The Wadsworth Group
An Example,cont.
x y x?y x2
6 300 1800 36
12 408 4896 144
14 560 7840 196
6 252 1512 36
9 288 2592 81
13 650 8450 169
15 630 9450 225
9 522 4698 81
Mean,10.5 451.25
Sum,41,238 968
? 2002 The Wadsworth Group
An Example,cont.
? Slope:
? y-Intercept:
So the,best-fit” linear model,rounding to
the nearest tenth,is:
b1 = ( xiyi) – n× x × y ?( x
i2) – n× x 2?
= 41238 – 8× (10.5)× (451.25)
968 - 8× (10.5)2
= 38.7558
b 0 = y b 1 ? x = 4 5 1, 25 ( 38, 7 5 5 8 ) ? ( 10, 5 ) = 44, 3 1 4 0
? y = 44.3140 + 38.7558x ? 44.3 + 38.8x
? 2002 The Wadsworth Group
An Example,cont.
? Interpretation of the slope,For every
additional year an employee works for the
firm,the employee acquires an estimated
38.8 shares of stock per year.
? If x1 = 10,the point estimate for the number
of shares of stock that this employee owns
is,? y = 44.314 + 38.7558× x
= 44.314 + 38.7558× (10)
= 431.872 ? 432 shares
? 2002 The Wadsworth Group
Interval Estimates Using the
Regression Model
? Confidence Interval for the Mean of y
– places an upper and lower bound around
the point estimate for the average value
of y given x.
? Prediction Interval for an Individual y
– places an upper and lower bound around
the point estimate for an individual value
of y given x.
? 2002 The Wadsworth Group
To Form Interval Estimates
? The Standard Error of the Estimate,sy,x
– The standard deviation of the distribution of the
?data points above and below the regression line,
?distances between actual and predicted values of y,
?residuals,of e
– The square root of MSE given by ANOVA
2–
2)?–(
,n
yiy
xys
?=
? 2002 The Wadsworth Group
Equations for the Interval Estimates
? Confidence Interval for the Mean of y
? Prediction Interval for the Individual y
? ?
+××±
ni
x
ix
xvaluexnxysty
2)(
–)2(
2)–(1)
,(2? a
? y ± ta
2
× (sy,x)× 1 + 1n + (x value – x )2
( xi2) – ( xi)
2?
n?
? 2002 The Wadsworth Group
Using Intervals - An Example
? For employees who worked 10 years for the firm,
what is the 95% confidence interval for their mean
share holdings?
This calls for a confidence interval on the
average number of shares owned by
employees who worked for the firm 10 years,
So we will use:
? ?
+××±
nxx
xxnxysty
2)(–)2(
2)–value(1
,2? a
? 2002 The Wadsworth Group
Standard Error of the Estimate,
Definitional Equation
x y Predicted y Squared Residual
6 300 276.8488 535.9763
12 408 509.3837 10278.6589
14 560 586.8953 723.3598
6 252 276.8488 617.4647
9 288 393.1163 11049.4321
13 650 548.1395 10375.5544
15 630 625.6512 18.9124
9 522 393.1163 16611.0135
Sum = 50210.3721
? 2002 The Wadsworth Group
Evaluating the Confidence Interval
Since n = 8,df = 8 – 2 = 6 and ta/2 = 2.447,From our prior
analyses,Sx = 84,Sx2 = 968,and the predicted y = 431.872.
4789.912–8 3721.210,502–
2)?–(
,==
?=
n
yiy
xys
057.80872.431)3576.0()4789.91()447.2(872.431
8
284–968
2)5.10–10(
8
1)4789.91()447.2(872.431
2)(–)2(
2)–value(1
,2?
±=××±
=+××±
=
? ?
+××±
nxx
xxnxysty a
? 2002 The Wadsworth Group
Interpreting the Confidence Interval
? Based on our calculations,we would
have 95% confidence that the mean
number of shares for persons working
for the firm 10 years will be between:
431.872 – 80.057 = 351.815
and
431.872 + 80.057 = 511.929
Written in interval notation,
(351.815,511.929)
? 2002 The Wadsworth Group
Using Intervals - An Example
? An employee worked 10 years for the firm,What
is the 95% prediction interval for her share
holdings?
This calls for a prediction interval on the
number of shares owned by an individual
employee who worked for the firm 10 years,
So we will use:
? ?
++××±
nxx
xxnxysty
2)(–)2(
2)–value(11
,2? a
? 2002 The Wadsworth Group
Evaluating the Prediction Interval
Since n = 8,df = 8 – 2 = 6 and ta/2 = 2.447,From our prior
analyses,Sx = 84,Sx2 = 968,and the predicted y = 431.872.
734.237872.431)0620.1()4789.91()447.2(872.431
8
284–968
2)5.10–10(
8
11)4789.91()447.2(872.431
2)(–)2(
2)–value(11
,2?
±=××±
=++××±
=
? ?
++××±
nxx
xxnxysty a
? 2002 The Wadsworth Group
Interpreting the Prediction Interval
? Based on our calculations,we would
have 95% confidence that the number of
shares an employee working for the firm
10 years will hold will be between:
431.872 – 237.734 = 194.138
and
431.872 + 237.734 = 669.606
Written in interval notation,
(194.138,669.606)
? 2002 The Wadsworth Group
Comparing the Two Intervals
Notice that the confidence interval for the
mean is much narrower than the prediction
interval for the individual value,There is
greater fluctuation among individual values
than among group means,Both are centered
at the point estimate,= 431.872
| | | | | | | | | | | | | | |
0 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 0 0 7 0 0
C o n f i d e n c e | |
I n t e r v a l, 3 5 1, 8 5 1 1, 9
P r e d i c t i o n | |
I n t e r v a l, 1 9 4, 1 6 6 9, 6
?y
? 2002 The Wadsworth Group
Coefficient of Correlation
? A measure of the
– Direction of the linear relationship between
x and y.
?If x and y are directly related,r > 0.
?If x and y are inversely related,r < 0.
– Strength of the linear relationship between
x and y.
?The larger the absolute value of r,the more the
value of y depends in a linear way on the value of x.
? 2002 The Wadsworth Group
Coefficient of Determination
? A measure of the
– Strength of the linear relationship between
x and y.
?The larger the value of r2,the more the value of
y depends in a linear way on the value of x.
– Amount of variation in y that is related to
variation in x.
– Ratio of variation in y that is explained by
the regression model divided by the total
variation in y.
? 2002 The Wadsworth Group
Testing for Linearity
Key Argument:
? If the value of y does not change linearly
with the value of x,then using the mean
value of y is the best predictor for the actual
value of y,This implies is preferable.
? If the value of y does change linearly with
the value of x,then using the regression
model gives a better prediction for the value
of y than using the mean of y,This implies
is preferable.
y = y
y =?y
? 2002 The Wadsworth Group
Three Tests for Linearity
? 1,Testing the Coefficient of Correlation
H0,r = 0 There is no linear relationship between x and y.
H1,r? 0 There is a linear relationship between x and y.
Test Statistic,
? 2,Testing the Slope of the Regression Line
H0,b1 = 0 There is no linear relationship between x and y.
H1,b1 ? 0 There is a linear relationship between x and y.
Test Statistic:
t = r
1 – r2
n – 2
t
b
s y x
x n x
=
? - ?
1
2 2
,
( )
? 2002 The Wadsworth Group
Three Tests for Linearity
? 3,The Global F-test
H0,There is no linear relationship between x and y.
H1,There is a linear relationship between x and y.
Test Statistic:
Note,At the level of simple linear regression,the global
F-test is equivalent to the t-test on b1,When we conduct
regression analysis of multiple variables,the global F-
test will take on a unique function.
F = MSRMSE =
SSR1
SSE(n – 2)
? 2002 The Wadsworth Group
A General Test of b1
? Testing the Slope of the Population
Regression Line Is Equal to a Specific
Value.
H0,b1 = b10
The slope of the population regression line is b10.
H1,b1 ?b10
The slope of the population regression line is not b10.
Test Statistic:
2)(–2
,
10–1
xnx
xys
bt
?
= b
? 2002 The Wadsworth Group