第 11章 非参数统计法
Nonparametric Statistics
本章概要
?Testing with Rank Sum
? Z Test for Differences in Two Proportions
(Independent Samples)
? ???2 Test for Differences in Two Proportions
(Independent Samples)
? ???2 Test for Differences in c Proportions
(Independent Samples)
? ???2 Test of Independence
常见非参数法
Statistical Procedures for Hypothesis Testing that
do Not Require a Normal Distribution
? Because they are based on Counts or Ranks
? A Random Sample is still required
The Nonparametric Approach Based on Counts
? Count the number of times some event occurs
? Use the binomial distribution to decide whether this
count is reasonable or not under the null hypothesis
The Nonparametric Approach Based on Ranks
? Replace each data value with its rank (1,2,3,… )
? Use formulas and tables created for testing ranks
参数法及其效率
Parametric Methods,Efficiency
Parametric Methods
? Statistical procedures that require a completely
specified model
? e.g.,t tests,regression tests,F tests
Efficiency
? A measure of the effectiveness of a statistical test
? Tells how well it makes use of the information in the
data
? A more-efficient test can achieve the same results
with a smaller sample size
优、缺点
Advantages of Nonparametric Testing
? No need to assume normality
? Avoids problems of transformation (e.g.,
interpretation)
? Can be used with ordinal data
? Because ranks can be found
? Can be much more efficient than parametric
methods when distributions are not normal
Disadvantage of Nonparametric Testing
? Less statistically efficient than parametric methods
when distributions are normal
? Often,this loss of efficiency is slight
中位数 的检验 ( Median)
? Without assuming a normal distribution
? Note,The number of sample data values below a
continuous population’s median follows a binomial
distribution where p = 0.5 and n is the sample size
? The Sign Test( 符号检验 )
1.Find the modified sample size m,the number of data values
different from the reference value q0
2.Find the limits in the table for this modified sample size
3.Count how many data values fall below the reference
value
4.Significant if the count (step 3) is outside the limits (step 2)
Example,Family Income
Comparing Local to National Family Income
? Survey median is $70,547,based on n = 25 families
? National median is $27,735
? This is the reference value q0
? Performing the sign test
? Modified sample size is m = 25,since all sampled families
have incomes different from the reference value
? Limits from the table are 8 and 17 (for testing at the 5%
level with m = 25)
? There are 6 families with income below the reference value
? Since 6 falls outside the limits (from 8 to 17),
Median family income in the community is significantly
higher than the national median
配对设计资料的检验
Sign Test for the Differences(符号检验)
? Two columns of data
? Reduce to a single column representing the
differences (changes) between the two columns
? A similar approach to the two-sample paired t test,
Chapter 10
? Perform the sign test on these differences
1.Find the modified sample size m,the number of data values
that change between columns 1 and 2
2.Find the limits in the table for this modified sample size
3.Count how many data values went down
4.Significant if the count (step 3) is outside the limits (step 2)
Sign Test for the Differences
? Hypotheses
H0,Probability of X < Y equals Probability of X > Y
That is,the probability of going up equals the
probability of going down
H1,Probability of X < Y is not equal to Probability of
X > Y
That is,the probability of going up and down
are unequal
? Assumption
? The data set is a random sample from the population
of interest and each elementary unit in the sample has
both values X and Y measured for it
Two Unpaired Samples(非配对样本检验)
The Nonparametric Test is Based on the Ranks of ALL of the Data
? Put both samples together to define overall ranks
? Three ways to obtain the same answer
? The Wilcoxon rank-sum test
? The Mann-Whitney U test
? Test the average ranks against each other
? If the test statistic is larger than 1.960 in magnitude,the
two samples are significantly different
21
21
21
12
12
1
)(
s t a t i s t i cT e s t
nn
nn
nn
RR
??
?
?
?
Wilcoxon or Mann-Whitney Test
? Hypotheses
H0,The two samples come from populations
with the same distribution
H1,The two samples come from populations
with different distributions
? Assumptions
? Each sample is a random sample from its
population
? More than 10 elementary units have been chosen
from each population
Sign Test,Hypotheses,Assumption
Sign Test for the Median (for a Continuous Population Distribution)
H0,q = q0 and H1,q?q0
? where q is the unknown population median and q0 is the
(known) reference value being tested
Sign Test for the Median (in General)
H0,The probability of being above q0 equals the
probability of being below q0 in the
population
H1,These probabilities are not equal
? where q0 is the (known) reference value being tested
Assumption required:
? The data set is a random sample from the
population
Z Test for Differences in
Two Proportions
?What it is used for:
To determine whether there is a difference between
2 population proportions and whether one is larger
than the other,
?Assumptions:
?Independent Samples
?Population follows Binomial Distribution
?Sample Size Large Enough,np ? 5 and n(1-p) ? 5
for each population
Z Test Statistic
??
?
?
??
?
?
??
???
?
21
21
11
1
21
nn
)p(p
)pp()pp(
Z
ss
21
21
nn
XXp
?
??
Where
X1 = Number of Successes in Sample 1
X2 = Number of Successes in Sample 2
Pooled Estimate of the
Population Proportion
Research Questions
Hypothesis No DifferenceAny Difference Prop 1 ??Prop 2Prop 1 < Prop 2 Prop 1 ?? Prop 2Prop 1 > Prop 2
H0 p1 - p2 = 0 p1 - p2 ? 0 p1 - p2 ? 0
H1 p1 - p2 ??0 p1 - p2 < 0 p1 - p2 > 0
Stating The Hypothesis
for the Z Test
Z Test for Two Proportions
Example
As personnel director,you
want to test the perception of
fairness of two methods of
performance evaluation,63
of 78 employees rated
Method 1 as fair,49 of 82
rated Method 2 as fair,At the
0.01 level,is there a
difference in perceptions?
np ? 5
n( 1 - p) ? 5
for both pop.
?
n1 = 78
n2 = 82
63
78 =,808
49
82 =,598
pS
1 =
S2 =p
Calculation of
The Test Statistic 902
82
1
78
1
3070
05 9 88 0 8
11
1
21
21
21
.
))(.(.
).(.
nn
)p(p
)pp()pp(
Z
ss
?
?
?
?
?
?
?
?
??
?
?
?
?
?
?
?
?
?
??
???
?
708278 4963
21
21,
nn
XXp ?
?
??
?
??
Z Test for the Difference of Two
Proportions,Solution
H0,p1 - p2 = 0
H1,p1 - p2 ? 0
? = 0.01
n1 = 78 n2 = 82
Critical Value(s):
Test Statistic,
Decision:
Conclusion:
Reject at ? = 0.01
There is evidence of a
difference in proportions,
Z ? 2 90.
Z0 2.58-2.58
.005
Reject H0 Reject H0
.005
?2 Test,Basic Idea
卡方检验基本思想
? Compares observed to expected
frequencies if null hypothesis is true
? The closer observed frequencies are to
expected frequencies,the more likely the
H0 is true
? Measured by squared difference relative to
expected frequency
? Sum of relative squared differences is test
statistic
Evaluation Method
Perception 1 2 Total
Fair 63 49 112
Unfair 15 33 48
Total 78 82 160
?2 Test for 2 Proportions
Contingency Table 列联表
Contingency Table for Comparing Fairness
of Performance Evaluation Methods
2 Populations
Levels of Variable
?2 Test for 2 Proportions
Expected Frequencies 期望频数
? 112 of 160 Total are 慺 air?( = 112/160 )
? 78 used evaluation method 1
? Expect (78 ? 112/160) = 54.6 to be 慺 air
Evaluation Method
Perception 1 2 Total
Fair 63 49 112
Unfair 15 33 48
Total 78 82 160
p
?2 Test Statistic ? ?
?
?
?
C el lsA ll e
e
f
ff
2
02?
f0 = Observed Frequency in a cell
fe = Theoretical or Expected Frequency
Computation of the
?2 Test Statistic
f0 fe (f0 - fe) (f0 - fe)2 (f0 - fe)2 / fe
63 54.6 8.4 70.56 1.293
49 57.4 -8.4 70.56 1.293
15 23.4 -8.4 70.56 3.015
33 24.6 8.4 70.56 2.868
Sum = 8.405Observed
Frequencies Expected Frequencies
?20 6.635
Reject
?2 Test for Two Proportions
Finding Critical Value
?r = 2 (# rows in
Contingency Table)
c = 2 (# columns)
?? =,01 ? =,01
df = (r - 1)(c - 1) = 1
?2 Table
(Portion) Upper Tail Area
DF,995,95,05
1,.,0.004 3.841
2 0.010 0.103 5.991
.025,01
5.024
7.378
6.635
9.210
?2 Test for Two
Proportions,Solution
H0,p1 - p2 = 0
H1,p1 - p2 ? 0
Test Statistic = 8.405
Decision:
Conclusion:
6.635 ?20
Reject
? =,01Reject at ? = 0.01
There is evidence of a
difference in proportions.
Note,Conclusion obtained using ???test is the same as using Z Test.
?2 Test for c Proportions
多个独立样本比例的卡方检验
? Extends the ?2 Test to the General Case of c
Independent Populations
? Tests for Equality (=) of Proportions Only:
(Two Tail Tests,No One Tail Tests) 为什么?
? One Variable with Several Groups or Levels
? Uses Contingency Table
? Assumptions,
?Independent Random samples
?Large in Sample Size
All expected Frequencies ? 1
?2 Test for c Proportions:
Procedure
1,Set Hypotheses:
H0,p1 = p2 =,.,= pc
H1,Not All pj Are Equal
2,Choose ? and Set Up Contingency Table
3,Compute the Overall Proportion:
4,Calculate Test Statistic:
5,Determine Degrees of Freedom
6,Compare Test Statistic with Table Value and Make
Decision
n
X
n.,,nn
X.,,XXp
c
c ?
???
????
21
21
? ?? ??
C e l l sA l l e
e
f
ff 202?
?2 Test for c Proportions,
Example
The University is thinking of switching to a trimester
academic calendar,A random sample of 100 undergraduates,
50 graduate students and 50 faculty members were surveyed.
Opinion Under Grad Faculty
Favor 63 27 30
Oppose 37 23 20
Totals 100 50 50
Test at the,01 level of significance to determine is there
is evidence of a difference in attitude between the groups.
?2 Test for c Proportions,
Example
1,Set Hypothesis:
H0,p1 = p2 = p3
H1,Not All pj Are Equal
2,Contingency Table:
3,Compute Over All Proportion:
602001205050100 302763
21
21,
n
X
n...nn
X...XXp
c
c ??
??
????
???
????
Opinion Under Grad Faculty Totals
Favor 63 27 30 120
Oppose 37 23 20 80
Totals 100 50 50 200
All expected
frequencies are
large.
?
?2 Test for c Proportions,
Example
4,Compute Test Statistic:
f0 fe (f0 - fe) (f0 - fe)2 (f0 - fe)2 / fe
63 60 3 9,15
27 30 -3 9,30
30 30 0 0,0
37 40 -3 9,225
23 20 3 9,45
20 20 0 0,0
Test Statistic ?2 = 1.125
?2 Test for c Proportions,
Example Solution
H0,p1 = p2 = p3
H1,Not All pj Are Equal
Decision:
Conclusion:
df = c - 1 = 3 - 1 = 2
Reject
? =,01
?20 9.210
Do Not Reject H0
There is no evidence of a difference in
attitude among the groups.
?2 Test of Independence
独立性的卡方检验
? Shows if a relationship exists between 2 factors
of interest
? One sample drawn
? Each factor has 2 or more levels of responses
? Does Not show nature of relationship
? Does Not show causality
? Similar to testing p1 = p2 =?= pc
? Used widely in marketing
? Uses contingency table
?2 Test of Independence:
Procedure
1,Set Hypotheses:
H0,The 2 categorical variables are independent
H1,The 2 categorical variables are related
2,Choose ? and Set Up Contingency Table
3,Compute Theoretical Frequencies,fe
4,Calculate Test Statistic:
5,Determine Degrees of Freedom
6,Compare Test Statistic with Table Value and Make
Decision
? ?? ??
C e l l sA l l e
e
f
ff 202?
?2 Test of Independence:
Example
A Survey was conducted to determine whether there is a
relationship between architectural style (Split level or
Ranch) and geographical location (Urban or Rural).
Given the survey
data,test at the
? =,01 level to
determine whether
there is a relationship
between location and
architectural style.
House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
?2 Test of Independence
Example
1,Set Hypothesis:
H0,The 2 categorical variables (Architectural
Style and Location) are independent
H1,The 2 categorical variables are related
2,Contingency Table:
Levels of Variable 2
Levels of
Variable 1
?2 Test of Independence
Expected Frequencies
3,Computing Expected Frequencies
? Statistical independence, P(A and B) = P(A)稰 (B)
? Compute marginal (row & column) probabilities &
multiply for joint probability
? Expected frequency is sample size times joint probability
House Location
Urban Rural
House Style Obs,Exp,Obs,Exp,Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160
82?12
160
78*112
160
?2 Test of Independence
Test Statistic
4,Calculate Test Statistic:
? ?? ??
C e l l sA l l e
e
f
ff 202?
f0 fe (f0 - fe) (f0 - fe)2 (f0 - fe)2 / fe
63 54.6 8.4 70.56 1.292
49 57.4 -8.4 70.56 1.229
15 23.4 -8.4 70.56 3.015
33 24.6 8.4 70.56 2.868
8.404?2 Test Statistic =
?2 Test of Independence,
Example Solution
H0,The 2 categorical variables (Architectural Style and
Location) are independent
H1,The 2 categorical variables are related
Decision:
Conclusion:
df = (r - 1)(c - 1) = 1 Reject
? =,01
?20 6.635
Reject H0 at ? =,01
There is evidence that the choice of
architectural design and location are related.
本章小结
?Performed Z Test for Differences in Two
Proportions (Independent Samples)
?Discussed ?2 Test for Differences in Two
Proportions (Independent Samples)
?Addressed ?2 Test for Differences in c
Proportions (Independent Samples)
?Described ?2 Test of Independence