Chapter 4
STATISTICAL INFERENCE:
ESTIMATION AND HYPOTHESES
TESTING
Statistical inference draws conclusions
about a population [i.e.,probability
density function (PDF) ] from a random
sample that has supposedly been drawn
from that population.
4.1 THE MEANING OF STATISTICAL
INFERENCE
Statistical inference,the study of the relationship
between a population and a sample drawn for that
population.
The process of generalizing from the sample value
( ) to the population value E(X) is the essence of
statistical inference,
X
4.2 ESTIMATION AND HYPOTHESIS TESTING:
TWIN BRANCHES OF STATISTICAL INFERENCE
1,Estimation
Estimation,the first step in statistical inference.
,an estimator/statistic of the population parameter E(X),
estimate,the particular numerical value of the estimator
sampling variation /sampling error,the variation in estimation
from sample to sample.
2.Hypothesis testing
In hypothesis testing we may have a prior judgment or
expectation about what value a particular parameter may assume.
X
4.3 ESTIMATION OF PARAMETERS
The usual procedure of estimation,
—— to assume that we have a random sample of size
n from the known probability distribution and use the
sample to estimate the unknown parameters,that is,use
the sample mean as an estimate of the population mean
(or expected value) and the sample variance as an
estimate of the population variance.
? 1,Point estimate
A point estimator,or a statistic,is an r.v.,its value
will vary from sample to sample,
How can we rely on just one estimate of the true
population mean,
X
2,Interval estimate
Although is the single,best” guess of the true
population mean,the interval,say,from 8 to 14,most likely
includes the true μχ? This is interval estimation,
Sampling or probability distribution,
?
? P(-tn-1 ≤ t≤ tn-1)=1- α
critical t values:± tn-1
confidence interval,
(lower limit-upper limit)
confidence coefficient,1- α
level of significance/the prob,of committing type I error,α
X
),(~ 2nNX x ??
)1,0(~/ )( NnXZ X? ???
)1(~/ ??? nX tnSXt ?
α1n StXμn StXP( 1nX1n ?????? ??
n
StX
n
StX n
Xn 11 ?? ???? ?
Note,The interval is random,and not the parameterμx.
The confidence interval,a random interval,because it is
based on and which will vary from sample to sample.
The population mean,although unknown,is some fixed
number and it is not random.
You should not say,the probability is 0.95(1- α) that μx
lies in this interval.
You should say,the probability is 0.95 that the random
interval,contains the trueμx.
Interval estimation,in contrast to point estimation,provides a range
of values that will include the true value with a certain degree of
confidence or probability (such as 0.95).
P(L≤μx≤U)=1-α 0<α<1
That is,the prob,is( 1-α) that the random interval from L to U
contains the trueμx,If we construct a confidence interval with a
confidence coefficient of 0.95,then in repeated such constructions 95 out
of 100 intervals can be expected to include the trueμx.
X nS/
4.4 PROPERTIES OF POINT ESTIMATORS
The sample mean is the most frequently used measure of
the population mean because it satisfies several properties that
statisticians deem desirable.
? 1,Linearity
An estimator if said to be a linear estimator if it is a linear
function of the sample observations.
?
?
? 2,Unbiasedness
An estimator is an unbiased estimator ofμx if
If we draw repeated samples of size n from the normal
population and compute for each sample,then on the average
will coincide with μx.
The unbiasedness is a repeated sampling property.
? 3,Efficiency
If we consider only unbiased estimators of a parameter,the
one with the smallest variance is called best,or efficient,
estimator.
X xXE ??)(
?
?
????? n
i
ni XXXnn
XX
1
21 )...(
1
4,Best Linear Unbiased Estimator(BLUE)
If an estimator is linear,is unbiased,and has a minimum
variance in the class of all linear unbiased estimators of a
parameter,it is called a best linear unbiased estimator.
5,Consistency
An estimator (e.g.,X*) is said to be a consistent
estimator if it approaches the true value of the parameter as
the sample size gets larger and larger.
?? nXX i
? ?? 1nXX i
XXE ??)(
Xn
nXE ??
?
??
?
?
?? 1)(
*
4.5 STATISTICAL INFERENCE,HYPOTHESIS
TESTING
Hypothesis testing,Instead of establishing a confidence interval,in
hypothesis testing,we hypothesize that the trueμx takes a particular
numerical value,e.g.,μx=13,Our task is to“test”this hypothesis.
Null hypothesis( H0),the hypothesis we hypothesize,e.g.μx=13.
Alternative hypothesis( H1), the hypothesis used to test the null
hypothesis.
H1,μx>13,one-sided alternative hypothesis
H1,μx<13,one-sided alternative hypothesis
H1,μx≠ 13,two-sided alternative hypothesis
? 1,The Confidence Interval Approach to Hypothesis
Testing
In hypothesis testing,the 95% confidence interval is called the
acceptance region and the area outside the acceptance region is called the
critical region/the region of rejection,of the null hypothesis,The
boundaries of the acceptance region are called critical values.
The null hypothesis is rejected if the value of the parameter under the
null hypothesis either exceeds the upper critical value or is less than the
lower critical value of the acceptance region.
2,Type I and Type II Errors,A Digression
Type I error,the error of rejecting a hypothesis when it is true.
Type II error,the error of accepting a false hypothesis.
Type I error=α =prob.(rejectingH0 |H0 is true)
Type II error=β =prob.(acceptingH0 |H0 is false)
The classical approach to deal with type I,type II problems:
——To assume a type I error is more serious than a type II error,try
to keep the prob,of committing a type I error at a fairly low level,and
then minimize a type II error as much as possible,That is,simply
specifies the value ofα without worrying too much aboutβ,
The decision to accept or reject a null hypothesis depends critically
on both the d.f,and the probability of committing a type I error.
A 95% confidence coefficient/a 5% level of significance/a 95%
level or degree of confidence,we are prepared to accept at the most a 5
percent probability of committing a type I error.
3,The Test of Significance Approach to Hypothesis Testing
If the difference between and μx is small (in absolute terms),then the
|t| value will also be small.
If =μx,t will be zero,then we can accept the null hypothesis.
As the |t| value deviates from zero,increasingly we will tend to reject
the null hypothesis,If the computed t value lies in either of the rejection
regions,we can reject the null hypothesis.
When we reject the null hypothesis,we say that,our finding is
statistically significant.
when we do not reject the null hypothesis,we say that,our finding is
not statistically significant.
▲ One or two-tailed test
▲ The confidence interval approach and the test of significance
approach.
)1(~/ ?
??
nX tnS
Xt ?
X
4,A Word on Choosing the Level of Significance,α,
and the p Value
p value,the exact significance level,of the test statistic,the lowest
significance level at which a null hypothesis can be rejected.
The smaller the p value,the stronger the evidence against the
null hypothesis.
5,The test of significance.
(1) The confidence interval approach,establish a (1-α) % confidence
interval for the true but unknownσ2 using the distribution.
(2) Hypothesis testing approach,Just compute value and test its
significance against the critical value.
6,The F test of significance.
2?
)1(~)1( 222 ???
?
?
???
?? nSn ?
?
2χ
2χ
2χ
Conclusion:
——Summarizing the steps involved in testing a statistical
hypothesis:
Step 1,State the null hypothesis H0 and the alternative
hypothesis H1
e.g.,H0:μX=13 and H1:μX≠13,
Step 2,Select the test statistic (e.g.,)
Step 3,Determine the probability distribution of the test
statistic (e.g.,.
)
Step 4,Choose the level of significance α,that is,the
probability of committing a type I error.
(Keep in mind our discussion about the p value.)
Step 5,Choose the confidence interval or the test of
significance approach.
)/,(~ 2 nNX X ??
X
Step 6,Accept or refuse the null hypothesis?
(1)The confidence interval approach:
——Using the probability distribution of the test statistic,establish a
100(1-α)% confidence interval.
If this interval (the acceptance region) includes the null-
hypothesized value,do not reject the null hypothesis.
If this interval does not include it,reject the null hypothesis.
(2)The test of significance approach:
——Obtaining the relevant test statistic (e.g.,the t statistic) under the
null hypothesis and find out the probability of obtaining a specific
value of the test statistic from the appropriate probability distribution
The probability is less than the prechosen value of α,reject the null
hypothesis.
The probability is greater than α,do not reject it.
If you do not want to preselect α,just present the p value of the statistic.
Note,Whether you choose the confidence interval or the test of
significance approach,keep in mind that in rejecting or not rejecting a null
hypothesis you are taking a chance of being wrongα% of the time.
STATISTICAL INFERENCE:
ESTIMATION AND HYPOTHESES
TESTING
Statistical inference draws conclusions
about a population [i.e.,probability
density function (PDF) ] from a random
sample that has supposedly been drawn
from that population.
4.1 THE MEANING OF STATISTICAL
INFERENCE
Statistical inference,the study of the relationship
between a population and a sample drawn for that
population.
The process of generalizing from the sample value
( ) to the population value E(X) is the essence of
statistical inference,
X
4.2 ESTIMATION AND HYPOTHESIS TESTING:
TWIN BRANCHES OF STATISTICAL INFERENCE
1,Estimation
Estimation,the first step in statistical inference.
,an estimator/statistic of the population parameter E(X),
estimate,the particular numerical value of the estimator
sampling variation /sampling error,the variation in estimation
from sample to sample.
2.Hypothesis testing
In hypothesis testing we may have a prior judgment or
expectation about what value a particular parameter may assume.
X
4.3 ESTIMATION OF PARAMETERS
The usual procedure of estimation,
—— to assume that we have a random sample of size
n from the known probability distribution and use the
sample to estimate the unknown parameters,that is,use
the sample mean as an estimate of the population mean
(or expected value) and the sample variance as an
estimate of the population variance.
? 1,Point estimate
A point estimator,or a statistic,is an r.v.,its value
will vary from sample to sample,
How can we rely on just one estimate of the true
population mean,
X
2,Interval estimate
Although is the single,best” guess of the true
population mean,the interval,say,from 8 to 14,most likely
includes the true μχ? This is interval estimation,
Sampling or probability distribution,
?
? P(-tn-1 ≤ t≤ tn-1)=1- α
critical t values:± tn-1
confidence interval,
(lower limit-upper limit)
confidence coefficient,1- α
level of significance/the prob,of committing type I error,α
X
),(~ 2nNX x ??
)1,0(~/ )( NnXZ X? ???
)1(~/ ??? nX tnSXt ?
α1n StXμn StXP( 1nX1n ?????? ??
n
StX
n
StX n
Xn 11 ?? ???? ?
Note,The interval is random,and not the parameterμx.
The confidence interval,a random interval,because it is
based on and which will vary from sample to sample.
The population mean,although unknown,is some fixed
number and it is not random.
You should not say,the probability is 0.95(1- α) that μx
lies in this interval.
You should say,the probability is 0.95 that the random
interval,contains the trueμx.
Interval estimation,in contrast to point estimation,provides a range
of values that will include the true value with a certain degree of
confidence or probability (such as 0.95).
P(L≤μx≤U)=1-α 0<α<1
That is,the prob,is( 1-α) that the random interval from L to U
contains the trueμx,If we construct a confidence interval with a
confidence coefficient of 0.95,then in repeated such constructions 95 out
of 100 intervals can be expected to include the trueμx.
X nS/
4.4 PROPERTIES OF POINT ESTIMATORS
The sample mean is the most frequently used measure of
the population mean because it satisfies several properties that
statisticians deem desirable.
? 1,Linearity
An estimator if said to be a linear estimator if it is a linear
function of the sample observations.
?
?
? 2,Unbiasedness
An estimator is an unbiased estimator ofμx if
If we draw repeated samples of size n from the normal
population and compute for each sample,then on the average
will coincide with μx.
The unbiasedness is a repeated sampling property.
? 3,Efficiency
If we consider only unbiased estimators of a parameter,the
one with the smallest variance is called best,or efficient,
estimator.
X xXE ??)(
?
?
????? n
i
ni XXXnn
XX
1
21 )...(
1
4,Best Linear Unbiased Estimator(BLUE)
If an estimator is linear,is unbiased,and has a minimum
variance in the class of all linear unbiased estimators of a
parameter,it is called a best linear unbiased estimator.
5,Consistency
An estimator (e.g.,X*) is said to be a consistent
estimator if it approaches the true value of the parameter as
the sample size gets larger and larger.
?? nXX i
? ?? 1nXX i
XXE ??)(
Xn
nXE ??
?
??
?
?
?? 1)(
*
4.5 STATISTICAL INFERENCE,HYPOTHESIS
TESTING
Hypothesis testing,Instead of establishing a confidence interval,in
hypothesis testing,we hypothesize that the trueμx takes a particular
numerical value,e.g.,μx=13,Our task is to“test”this hypothesis.
Null hypothesis( H0),the hypothesis we hypothesize,e.g.μx=13.
Alternative hypothesis( H1), the hypothesis used to test the null
hypothesis.
H1,μx>13,one-sided alternative hypothesis
H1,μx<13,one-sided alternative hypothesis
H1,μx≠ 13,two-sided alternative hypothesis
? 1,The Confidence Interval Approach to Hypothesis
Testing
In hypothesis testing,the 95% confidence interval is called the
acceptance region and the area outside the acceptance region is called the
critical region/the region of rejection,of the null hypothesis,The
boundaries of the acceptance region are called critical values.
The null hypothesis is rejected if the value of the parameter under the
null hypothesis either exceeds the upper critical value or is less than the
lower critical value of the acceptance region.
2,Type I and Type II Errors,A Digression
Type I error,the error of rejecting a hypothesis when it is true.
Type II error,the error of accepting a false hypothesis.
Type I error=α =prob.(rejectingH0 |H0 is true)
Type II error=β =prob.(acceptingH0 |H0 is false)
The classical approach to deal with type I,type II problems:
——To assume a type I error is more serious than a type II error,try
to keep the prob,of committing a type I error at a fairly low level,and
then minimize a type II error as much as possible,That is,simply
specifies the value ofα without worrying too much aboutβ,
The decision to accept or reject a null hypothesis depends critically
on both the d.f,and the probability of committing a type I error.
A 95% confidence coefficient/a 5% level of significance/a 95%
level or degree of confidence,we are prepared to accept at the most a 5
percent probability of committing a type I error.
3,The Test of Significance Approach to Hypothesis Testing
If the difference between and μx is small (in absolute terms),then the
|t| value will also be small.
If =μx,t will be zero,then we can accept the null hypothesis.
As the |t| value deviates from zero,increasingly we will tend to reject
the null hypothesis,If the computed t value lies in either of the rejection
regions,we can reject the null hypothesis.
When we reject the null hypothesis,we say that,our finding is
statistically significant.
when we do not reject the null hypothesis,we say that,our finding is
not statistically significant.
▲ One or two-tailed test
▲ The confidence interval approach and the test of significance
approach.
)1(~/ ?
??
nX tnS
Xt ?
X
4,A Word on Choosing the Level of Significance,α,
and the p Value
p value,the exact significance level,of the test statistic,the lowest
significance level at which a null hypothesis can be rejected.
The smaller the p value,the stronger the evidence against the
null hypothesis.
5,The test of significance.
(1) The confidence interval approach,establish a (1-α) % confidence
interval for the true but unknownσ2 using the distribution.
(2) Hypothesis testing approach,Just compute value and test its
significance against the critical value.
6,The F test of significance.
2?
)1(~)1( 222 ???
?
?
???
?? nSn ?
?
2χ
2χ
2χ
Conclusion:
——Summarizing the steps involved in testing a statistical
hypothesis:
Step 1,State the null hypothesis H0 and the alternative
hypothesis H1
e.g.,H0:μX=13 and H1:μX≠13,
Step 2,Select the test statistic (e.g.,)
Step 3,Determine the probability distribution of the test
statistic (e.g.,.
)
Step 4,Choose the level of significance α,that is,the
probability of committing a type I error.
(Keep in mind our discussion about the p value.)
Step 5,Choose the confidence interval or the test of
significance approach.
)/,(~ 2 nNX X ??
X
Step 6,Accept or refuse the null hypothesis?
(1)The confidence interval approach:
——Using the probability distribution of the test statistic,establish a
100(1-α)% confidence interval.
If this interval (the acceptance region) includes the null-
hypothesized value,do not reject the null hypothesis.
If this interval does not include it,reject the null hypothesis.
(2)The test of significance approach:
——Obtaining the relevant test statistic (e.g.,the t statistic) under the
null hypothesis and find out the probability of obtaining a specific
value of the test statistic from the appropriate probability distribution
The probability is less than the prechosen value of α,reject the null
hypothesis.
The probability is greater than α,do not reject it.
If you do not want to preselect α,just present the p value of the statistic.
Note,Whether you choose the confidence interval or the test of
significance approach,keep in mind that in rejecting or not rejecting a null
hypothesis you are taking a chance of being wrongα% of the time.