Ch. 5 Hypothesis Testing
The current framework of hypothesis testing is largely due to the work of
Neyman and Pearson in the late 1920s, early 30s, complementing Fisher’s work
on estimation. As in estimation, we begin by postulating a statistical model
but instead of seeking an estimator of in we consider the question whether
2 0 or 2 1 = 0 is most supported by the observed data. The
discussion which follows will proceed in a similar way, though less systematically
and formally, to the discussion of estimation. This is due to the complexity of
the topic which arises mainly because one is asked to assimilate too many con-
cepts too quickly just to be able to de ne the problem properly. This di culty,
however, is inherent in testing, if any proper understanding of the topic is to be
attempted, and thus unavoidable.
1 Testing: De nition and Concepts
1.1 The Decision Rule
Let X be a random variables de ned on the probability space (S;F;P( )) and
consider the statistical model associated with X:
(a) = ff(x; ); 2 g;
(b) x = (X1; X2; :::; Xn)’ is a random sample from f(x; ).
The problem of hypothesis testing is one of deciding whether or not some
conjectures about of the form belongs to some subset 0 of is supported
by the data x = (x1; x2; :::; xn)0: We call such a conjecture the null hypothesis
and denoted it by
H0 : 2 0;
where if the sample realization x 2 C0 we accept H0, if x 2 C1 we reject it.
Since the observation space X 2 Rn, but both the acceptance region C0 2 R1
and rejection region C1 2 R1, we need a mapping from Rn to R1. The mapping
which enables us to de ne C0 and C1 we call a test statistic (x) : X ! R1.
1
Example:
Let X be the random variables representing the marks achieved by students in
an econometric theory paper an let the statistical model be:
(a) =
n
f(x; ) = 18p2 exp
h
12 x 8 2
io
; 2 [0; 100];
(b) x = (X1; X2; :::; Xn)0, n=40 is random sample from .
The hypothesis to be tested is
H0 : = 60 (i:e: X N(60; 64)); 0 = f60g
against
H1 : 6= 60 (i:e: X N( ; 64); 6= 60); 1 = [0; 100] f60g:
Common sense suggests that if some ’good’ estimator of , say Xn = (1=n)Pni=1 xi
for the sample realization x takes a value ’around’ 60 then we will be inclined to
accept H0. Let us formalise this argument:
The accept region takes the form: 60 " Xn 60 + "; " > 0, or
C0 = fx : j Xn 60j "g
and
C1 = fx : j Xn 60j "g; is the rejection region:
Formally, if x 2 C1 (reject H0) and 2 0 (H0 is true){type I error; if x 2 C0
(accept H0) and 2 1 (H0 is false){type II error. The hypothesis to be tested
is formally stated as follows:
H0 : 2 0; 0 :
Against the null hypothesis H0 we postulate the alternative H1 which takes the
form:
H1 : 2 1 0:
2
It is important to note at the outset that H0 and H1 are in e ect hypothesis
about the distribution of the sample f(x; ), i.e.
H0 : f(x; ): 2 0; H1 : f(x; ): 2 1:
In testing a null hypothesis H0 against an alternative H1 the issue is to decide
whether the sample realization x ’support’ H0 or H1. In the former case we
say that H0 is accepted, in the latter H0 is rejected. In order to be able to
make such a decision we need to formulate a mapping which related 0 to some
subset of the observation space X, say C0, we call an acceptance region, and
its complement C1 (C0 [ C1 = X; C0 \ C1 = ?) we call the rejection region.
3
1.2 Type I and Type II Errors
The next question is "how do we choose " ?" If " is to small we run the risk of
rejecting H0 when it is true; we call this type I error. On the other hand,
if " is too large we run the risk of accepting H0 when it is false; we call this
type II error. That is, if we were to choose " too small we run a higher risk
of committing a type I error than of committing a type II error and vice versa.
That is, there is a trade o between the probability of type I error, i.e.
Pr(x 2 C1; 2 0) = ;
and the probability of type II error, i.e.
Pr(x 2 C0; 2 1) = :
Ideally we would like = = 0 for all 2 which is not possible for a
xed n. Moreover we cannot control both simultaneously because of the trade-o
between them. The strategy adopted in hypothesis testing where a small value of
is chosen and for a given , is minimized. Formally, this amounts to choose
such that
Pr(x 2 C1; 2 0) = ( ) for 2 0;
and
Pr(x 2 C0; 2 1) = ( ); is minimized for 2 1
by choosing C1 or C0 appropriately.
In the case of the above example if we were to choose , say = 0:05, then
Pr(j Xn 60j > "; = 60) = 0:05:
"How do we determine ", then ?" The only random variable involved in the
statement is X and hence it has to be its sampling distribution. For the above
4
probabilistic statement to have any operational meaning to enable us to determine
", the distribution of Xn must be known. In the present case we know that
Xn N
;
2
n
where
2
n =
64
40 = 1:6;
which implies that for = 60, (i.e. when H0 is true) we can ’construct’ a test
statistic (x) from sample x such that
(x) =
X
n p
1:6
=
X
n 60p
1:6
=
X
n 60
1:265
N(0; 1);
and thus the distribution of ( ) is known completely (no unknown parame-
ters). When this is the case this distribution can be used in conjunction with the
above probabilistic statement to determine ". In order to do this we need to relate
j Xn 60j to (x) (a statistics) for which the distribution is known. The obvious
way is to standardize the former. This suggests change the above probabilistic
statement to the equivalent statement
Pr
j X
n 60j
1:265 c ; = 60
= 0:05 where c = "1:265:
The value of c given from the N(0; 1) table is c = 1:96. This in turn implies
that the rejection region for the test is
C1 =
x : j
Xn 60j
1:265 1:96
= fx : j (x)j 1:96g
or
C1 = fx : j Xn 60j 2:48g:
That is, for sample realization x which give rise to Xn falling outside the in-
terval (57.52, 62.48) we reject H0.
Let us summarize the argument so far. We set out to construct a test for
H0 : = 60 against H1 : 6= 60 and intuition suggested the rejection region
(j Xn 60j "). In order to determine " we have to
5
(a) Chose an ; and then
(b) de ne the rejection region in terms some statistic (x).
The latter is necessary to enable us to determine " via some known distribution.
This is the distribution of the test statistic (x) under H0 (i.e. when H0 is true).
The next question which naturally arise is: "What do we need the probability
of type II error for ?" The answer is that we need to decide whether the test
de ned in terms of C1(of course C0) is a ’good’ or a ’bad’ test. As we mentioned
at the outset, the way we decided to solve the problem of the trade-o between
and was to choose a given small value of and de ne C1 so as to minimize
. At this stage we do not know whether the test de ned above is a ’good’ test
or not. Let us consider setting up the apparatus to enable us to consider the
question of optimality.
6
2 Optimal Tests
First we note that minimization of Pr(x 2 C0) for all 2 1 is equivalent to
maximizing Pr(x 2 C1) for all 2 1.
De nition 1:
The probability of reject H0 when false at some point 1 2 1, i.e. Pr(x 2
C1; = 1) is called the power of the test at = 1.
Note that
Pr(x 2 C1; = 1) = 1 Pr(x 2 C0; = 1) = 1 ( 1):
In the above example we can de ne the power of the test at some 1 2 1,
say = 54, to be Pr[(j Xn 60j)=1:265 1:96; = 54].
Under the alternative hypothesis that = 54, then it is true that Xn 541:265
N(0; 1). We would like to know that the probability of the statistics constructed
under the null hypothesis that Xn 601:265 would fall in the rejection region; that
is, the power of the test at = 54 to be
Pr
j X
n 60j
1:265 1:96; = 54
= Pr
j X
n 54j
1:265 1:96
(54 60)
1:265
+Pr
j X
n 54j
1:265 1:96
(54 60)
1:265
= 0:993:
Hence, the power of the test de ned by C1 above is indeed very high for = 54.
From this we know that to calculate the power of a test we need to know the
distribution of the test statistics (x) under the alternative hypothesis. In this
case it is the distribution of Xn 541:265 . 1
1In the example above, the test statistic (x) have a standard normal distribution under
both the null and the alternative hypothesis. However, it is quite often the case when it happen
that a test statistics have a di erent distribution under the null and the alternative hypotheses.
For example, the unit root test. See Chapter 21.
7
Following the same procedure the power of the test de ned by C1 is as fol-
lowing for all 2 1:
Pr(j (X)j 1:96; = 56) = 0:8849;
Pr(j (X)j 1:96; = 58) = 0:3520;
Pr(j (X)j 1:96; = 60) = 0:0500;
Pr(j (X)j 1:96; = 62) = 0:3520;
Pr(j (X)j 1:96; = 64) = 0:8849;
Pr(j (X)j 1:96; = 66) = 0:9973:
As we can see, the power of the test increase as we go further away from
= 60(H0) and the power at = 60 equals the probability of type I error. This
prompts us to de ne the power function as follows.
De nition 2:
P( ) = Pr(x 2 C1); 2 is called the power function of the test defined by
the rejection region C1.
De nition 3:
= max 2 0P( ) is de ned to be the size (or the signi cance level) of the test.
In the case where H0 is simple, say = 0, then = P( 0).
De nition 4:
A test of H0 : 2 0 against H1 : 2 1 as de ned by some rejection region C1
is said to be uniformly most powerful (UMP) test of size if
(a) max 2 0P( ) = ;
(b) P( ) P ( ) for all 2 1,
where P ( ) is the power function of any other test of size .
As will be seen in the sequential, no UMP tests exists in most situations of
interest in practice. The procedure adopted in such cases is to reduce the class
8
of all tests to some subclass by imposing some more criteria and consider the
question of UMP tests within the subclass.
De nition 5:
A test of H0 : 2 0 against 2 1 is said to be unbiased if
max 2 0P( ) max 2 1P( ):
In other word, a test is unbiased if it reject H0 more often when it is false than
when it is true.
Collecting all the above concepts together we say that a test has been de ned
when the following components have been speci ed:
(a) a test statistic (x);
(b) the size of the test ;
(c) the distribution of (x) under H0 and H1;
(d) the rejection region C1 (or, equivalently, C0).
The most important component in de ning a test is the test statistics for
which we need to know its distribution under both H0 and H1. Hence, construct-
ing an optimal test is largely a matter of being able to nd a statistic (x) which
should have the following properties:
(a) (x) depends on x via a ’good’ estimator of ; and
(b) the distribution of (x) under both H0 and H1 does not depend on any un-
known parameters. We call such a statistic a pivot.
Example:
Assume a random sample of size 11 is drawn from a normal distribution N( ; 400).
In particular, y1 = 62; y2 = 52; y3 = 68; y4 = 23; y5 = 34; y6 = 45; y7 = 27; y8 =
42; y9 = 83; y10 = 56 and y11 = 40. Test the null hypothesis that H0 : = 55
versus H1 : 6= 55.
Since 2 is known, the sample mean will distributed as
Y N( ; 2=n) N( ; 400=11);
therefore under H0 : = 55,
Y N(55; 36:36)
9
or
Y 55p
36:36 N(0; 1):
We accept H0 when the test statistics (x) = ( Y 55)=p36:36 lying in the in-
terval C0 = [ 1:96; 1:96] under the size of the test = 0:05.
We now have
11X
i=1
yi = 532 and y = 53211 = 48:4:
Then
48:4 55p
36:36 = 1:01
which is in the accept region. Therefore we accept the null hypothesis that
H0 : = 55.
Example:
Assume a random sample of size 11 is drawn from a normal distribution N( ; 2).
In particular, y1 = 62; y2 = 52; y3 = 68; y4 = 23; y5 = 34; y6 = 45; y7 = 27; y8 =
42; y9 = 83; y10 = 56 and y11 = 40. Test the null hypothesis that H0 : = 55
versus H1 : 6= 55.
Since 2 is unknown, the sample mean distributed as
Y N( ; 2=n);
therefore under H0 : = 55
Y N(55; 2=n)
or
Y 55
p 2=n N(0; 1);
however it is not a pivotal test statistics since an unknown parameters 2.
10
From the fact that P(Yi Y )2= 2 2n 1 or s2(n 1)= 2 2n 1 where
s2 =Pni=1(Yi Y )2=(n 1) is an unbiased estimator of 2,2 we have
( Y 55)=(p 2=n)p
(n 1)s2=(n 1) =
Y 55p
s2=n tn 1:
We accept H0 when the test statistics (x) = Y 55ps2=n lying in the interval
C0 = [ 2:23; 2:23].
We now have
11X
i=1
yi = 532
11X
i=1
y2i = 29000 and s2 =
Py2
i n y
2
10 =
29000 11(48:4)2
10 = 323:19:
Then
48:4 55p
323:19=11 = 1:2
which is also in the accept region C0. Therefore we accept the null hypothesis
that H0 : = 55.
2See p. 22 of Chapter 3.
11
3 Asymptotic Test Procedures
As discussed in the last section, the main problem in hypothesis testing is to
construct a test statistics (x) whose distribution we know under both the null
hypothesis H0 and the alternative H1 and it does not depend on the unknown
parameters . The rst part of the problem, that of constructing (X), can be
handled relatively easily using various methods (Neyman-Pearson likelihood ra-
tio) when certain condition are satis ed. The second part of the problem, that
of determining the distribution of (x) under both H0 and H1, is much more
di cult to solve and often we have to resort to asymptotic theory. This amount
to deriving the asymptotic distribution of (x) and using that to determine the
rejection region C1 (or C0) and associated probabilities.
3.1 Asymptotic properties
For a given sample size n, if the distribution of n(x) is not known (otherwise
we use that) , we do not know how ’good’ the asymptotic distribution of n(X)
is an accurate approximation of its nite sample distribution. This suggest that
when asymptotic results are used we should be aware of their limitations and the
inaccuracies they can lead to.
Consider the test de ned by the rejection region
Cn1 = fx : j n(x)j cng;
and whose power function is
Pn( ) = Pr(x 2 Cn1 ); 2 :
Since the distribution of n(X) is not known we cannot determine cn or Pn. If
the asymptotic distribution of n(x) is available, however we can use that instead
to de ne cn from some xed and the asymptotic power function
( ) = Pr(x 2 C11 ); 2 :
12
In this sense we can think of f n(x); n 1g as a sequence of test statis-
tics de ning a sequence of rejection region fCn1 ; n 1g with power function
fPn( ); n 1; 2 g.
De nition 6:
The sequence of tests for H0 : 2 0 against H1 : 2 1 de ned by fCn1 ; n 1g
is said to be consistent of size if
max
2 0
( ) =
and
( ) = 1; 2 1:
De nition 7:
A sequence of test as de ned above is said to be asymptotically unbiased of
size if
max
2 0
( ) =
and
< ( ) < 1; 2 1:
3.2 Three asymptotically equivalent test procedures
In this section three general test procedures{"Holy Trinity"{which gives rise to
asymptotically optimal tests will be considered: the likelihood ratio; Wald and
Lagarange multiplier test. All three test procedures can be interpreted as uti-
lizing the information incorporated in the log likelihood function in di erent but
asymptotically equivalently ways.
For expositional purpose the test procedures will be considered in the context
of the simplest statistical model where
13
(a). = ff(x; ); 2 g is the probability model; and
(b). x (X1; X2; :::; Xn)0 is a random sample,
and we consider the simple null hypothesis that H0 : = 0; 2 Rm
against H1 : 6= 0.
3.2.1 The Likelihood Ratio Test (test statistic is calculated under
both the null and the alternative hypothesis)
Thee likelihood ratio test statistics takes the form
(x) = L( 0;x)max
2 L( ;x)
= L( 0;x)L(^ ;x) ;
where ^ is the MLE of .
Under certain regularity conditions which include RC1-RC3 (see Chapter 3),
log L( ;x) can be expanded in a Taylor series at = ^ :
log L( ;x) ’ log L(^ ;x) + @ log L@ 0
^
( ^ ) + 12( ^ )0 @
2 log L
@ @ 0
^
( ^ );
(see Alpha Chiang (1984), p261; Lagranage Form of the Remainder). Since
@ log L
@ 0
^
= 0;
being the FOC of the MLE, and
@2
@ @ 0 log L(
^ ;x)
p ! I
n( );
(see for example Greene; p: 131; 132))
the above expansion can be simpli ed to
log L( ;x) = log L(^ ;x) 12(^ 0)0In( )(^ ):
This implies that under the null hypothesis that H0 : = 0
2 log (x) = 2[log L(^ ;x) L( 0;x)] = (^ 0)0In( )(^ 0):
14
From the asymptotic properties of the MLE’s it is known that under certain
regularity conditions
(^ 0) N(0; I 1n ( )):
Using this we can deduce that
LR = 2 log (x) ’ (^ 0)0In( )(^ 0) H0 2(m);
being a quadratic form in asymptotically normal random variables.
3.2.2 Wald test (test statistic is computed under H1)
Wald (1943), using the above approximation of 2 log (x), proposed an alterna-
tive test statistics by replacing In( ) with In(^ ):
W = (^ 0)0In(^ )(^ 0) H0 2(m);
given that In(^ ) p ! In( ).3
3.2.3 The Lagrange Multiplier Test (test statistic is computed under
H0)
Rao (1947) propose the LM test. Expanding score function of log L( ;x) (i.e.
@ log L( ;x)
@ ) around
^ we have:
@ log L( ;x)
@ ’
@ log L(^ ;x)
@ +
@2 log L(^ ;x)
@ @ 0 (
^ ):
As in the LR test, @ log L(^ ;x)@ = 0, and the above equation reduces to
@ log L( ;x)
@ ’
@2 log L(^ ;x)
@ @ 0 (
^ ):
3In(^ ) can be anyone of the three estimators which estimate the asymptotic variance of the
MLE. See section 3.3.6 of Chapter 3.
15
Now we consider the test statistics under the null hypothesis H0 : = 0
LM =
@ log L(
0;x)
@
0
I 1n ( 0)
@ log L(
0;x)
@
;
which is equivalent to
LM = ( 0 ^ )0@
2 log L(^ ;x)
@ @ 0 I
1
n ( 0)
@2 log L(^ ;x)
@ @ 0 ( 0
^ ):
Given that @2 log L(^ ;x)@ @ 0 p ! In( ) and that under H0, In( 0) p ! In( ), we have
LM =
@ log L(
0;x)
@
0
I 1n ( 0)
@ log L(
0;x)
@
= ( 0 ^ )0In( )( 0 ^ ) 2(m)
as in the proof of LR test.
Example:
Reproduce the results on Greene (4th.) P.157 example or Greene (5th.) p.490
Table 17.1.
Example:
Of particular interest in practice is the case where ( 01; 02)0 and H0 : 1 = 01
against H1 : 1 6= 01, 1 is r 1 with 2 (m r) 1 left unrestricted. In this
case the three test statistics takes the form
LR = 2(ln L(~ ;x) ln L(^ ;x));
W = (^ 1 01)0[I11(^ ) I12(^ )I 122 (^ )I21(^ )]( ^ 1 01)
LM = (~ )0[I11(~ ) I12(~ )I 122 (~ )I21(~ )] 1 (~ );
where
^ ( ^ 01; ^ 02)0; ~ 0 ( 010; ~ 20)0;
~ 2 is the solution to
@ ln( ;x)
@ 2
1= 01
= 0; and (~ ) = @ ln( ;x)@
1
1= 01
:
16
Proof:
Since
^ N( ; I 1(^ ));
under H0 : 1 = 01, we have the results from partitioned inverse rule
^ 1 N( 01; [I11(^ ) I12(^ )I 122 (^ )I21(^ ] 1)
That is the Wald test statistics is
W = ( ^ 1 01)0[I11(^ ) I12(^ )I 122 (^ )I21(^ )]( ^ 1 01):
By de nition the LM test statistics is
LM =
@ log L(~ ;x)
@
!0
I 1n (~ )
@ log L(~ ;x)
@
!
;
but
@ log L(~ ;x)
@
!
=
0
@
@ log L
@ 1
1= 01
@ log L
@ 2
2= ~ 2
1
A =
(~ )
0
;
using partitioned inverse rule we have
LM = (~ )0[I11(~ ) I12(~ )I 122 (~ )I21(~ )] 1 (~ ):
The proof of LR test statistics is straightforward.
Reminder:
for a general 2 2 partitioned matrix,
A
11 A12
A21 A22
1
=
F
1 F1A12A 122
A 122 A21F1 A 122 (I + A21F1A12A 122 )
;
where F1 = (A11 A12A 122 A21) 1.
17