Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher’s work on estimation. As in estimation, we begin by postulating a statistical model but instead of seeking an estimator of in we consider the question whether 2 0 or 2 1 = 0 is most supported by the observed data. The discussion which follows will proceed in a similar way, though less systematically and formally, to the discussion of estimation. This is due to the complexity of the topic which arises mainly because one is asked to assimilate too many con- cepts too quickly just to be able to de ne the problem properly. This di culty, however, is inherent in testing, if any proper understanding of the topic is to be attempted, and thus unavoidable. 1 Testing: De nition and Concepts 1.1 The Decision Rule Let X be a random variables de ned on the probability space (S;F;P( )) and consider the statistical model associated with X: (a) = ff(x; ); 2 g; (b) x = (X1; X2; :::; Xn)’ is a random sample from f(x; ). The problem of hypothesis testing is one of deciding whether or not some conjectures about of the form belongs to some subset 0 of is supported by the data x = (x1; x2; :::; xn)0: We call such a conjecture the null hypothesis and denoted it by H0 : 2 0; where if the sample realization x 2 C0 we accept H0, if x 2 C1 we reject it. Since the observation space X 2 Rn, but both the acceptance region C0 2 R1 and rejection region C1 2 R1, we need a mapping from Rn to R1. The mapping which enables us to de ne C0 and C1 we call a test statistic (x) : X ! R1. 1 Example: Let X be the random variables representing the marks achieved by students in an econometric theory paper an let the statistical model be: (a) = n f(x; ) = 18p2 exp h 12 x 8 2 io ; 2 [0; 100]; (b) x = (X1; X2; :::; Xn)0, n=40 is random sample from . The hypothesis to be tested is H0 : = 60 (i:e: X N(60; 64)); 0 = f60g against H1 : 6= 60 (i:e: X N( ; 64); 6= 60); 1 = [0; 100] f60g: Common sense suggests that if some ’good’ estimator of , say Xn = (1=n)Pni=1 xi for the sample realization x takes a value ’around’ 60 then we will be inclined to accept H0. Let us formalise this argument: The accept region takes the form: 60 " Xn 60 + "; " > 0, or C0 = fx : j Xn 60j "g and C1 = fx : j Xn 60j "g; is the rejection region: Formally, if x 2 C1 (reject H0) and 2 0 (H0 is true){type I error; if x 2 C0 (accept H0) and 2 1 (H0 is false){type II error. The hypothesis to be tested is formally stated as follows: H0 : 2 0; 0 : Against the null hypothesis H0 we postulate the alternative H1 which takes the form: H1 : 2 1 0: 2 It is important to note at the outset that H0 and H1 are in e ect hypothesis about the distribution of the sample f(x; ), i.e. H0 : f(x; ): 2 0; H1 : f(x; ): 2 1: In testing a null hypothesis H0 against an alternative H1 the issue is to decide whether the sample realization x ’support’ H0 or H1. In the former case we say that H0 is accepted, in the latter H0 is rejected. In order to be able to make such a decision we need to formulate a mapping which related 0 to some subset of the observation space X, say C0, we call an acceptance region, and its complement C1 (C0 [ C1 = X; C0 \ C1 = ?) we call the rejection region. 3 1.2 Type I and Type II Errors The next question is "how do we choose " ?" If " is to small we run the risk of rejecting H0 when it is true; we call this type I error. On the other hand, if " is too large we run the risk of accepting H0 when it is false; we call this type II error. That is, if we were to choose " too small we run a higher risk of committing a type I error than of committing a type II error and vice versa. That is, there is a trade o between the probability of type I error, i.e. Pr(x 2 C1; 2 0) = ; and the probability of type II error, i.e. Pr(x 2 C0; 2 1) = : Ideally we would like = = 0 for all 2 which is not possible for a xed n. Moreover we cannot control both simultaneously because of the trade-o between them. The strategy adopted in hypothesis testing where a small value of is chosen and for a given , is minimized. Formally, this amounts to choose such that Pr(x 2 C1; 2 0) = ( ) for 2 0; and Pr(x 2 C0; 2 1) = ( ); is minimized for 2 1 by choosing C1 or C0 appropriately. In the case of the above example if we were to choose , say = 0:05, then Pr(j Xn 60j > "; = 60) = 0:05: "How do we determine ", then ?" The only random variable involved in the statement is X and hence it has to be its sampling distribution. For the above 4 probabilistic statement to have any operational meaning to enable us to determine ", the distribution of Xn must be known. In the present case we know that Xn N ; 2 n where 2 n = 64 40 = 1:6; which implies that for = 60, (i.e. when H0 is true) we can ’construct’ a test statistic (x) from sample x such that (x) = X n p 1:6 = X n 60p 1:6 = X n 60 1:265 N(0; 1); and thus the distribution of ( ) is known completely (no unknown parame- ters). When this is the case this distribution can be used in conjunction with the above probabilistic statement to determine ". In order to do this we need to relate j Xn 60j to (x) (a statistics) for which the distribution is known. The obvious way is to standardize the former. This suggests change the above probabilistic statement to the equivalent statement Pr j X n 60j 1:265 c ; = 60 = 0:05 where c = "1:265: The value of c given from the N(0; 1) table is c = 1:96. This in turn implies that the rejection region for the test is C1 = x : j Xn 60j 1:265 1:96 = fx : j (x)j 1:96g or C1 = fx : j Xn 60j 2:48g: That is, for sample realization x which give rise to Xn falling outside the in- terval (57.52, 62.48) we reject H0. Let us summarize the argument so far. We set out to construct a test for H0 : = 60 against H1 : 6= 60 and intuition suggested the rejection region (j Xn 60j "). In order to determine " we have to 5 (a) Chose an ; and then (b) de ne the rejection region in terms some statistic (x). The latter is necessary to enable us to determine " via some known distribution. This is the distribution of the test statistic (x) under H0 (i.e. when H0 is true). The next question which naturally arise is: "What do we need the probability of type II error for ?" The answer is that we need to decide whether the test de ned in terms of C1(of course C0) is a ’good’ or a ’bad’ test. As we mentioned at the outset, the way we decided to solve the problem of the trade-o between and was to choose a given small value of and de ne C1 so as to minimize . At this stage we do not know whether the test de ned above is a ’good’ test or not. Let us consider setting up the apparatus to enable us to consider the question of optimality. 6 2 Optimal Tests First we note that minimization of Pr(x 2 C0) for all 2 1 is equivalent to maximizing Pr(x 2 C1) for all 2 1. De nition 1: The probability of reject H0 when false at some point 1 2 1, i.e. Pr(x 2 C1; = 1) is called the power of the test at = 1. Note that Pr(x 2 C1; = 1) = 1 Pr(x 2 C0; = 1) = 1 ( 1): In the above example we can de ne the power of the test at some 1 2 1, say = 54, to be Pr[(j Xn 60j)=1:265 1:96; = 54]. Under the alternative hypothesis that = 54, then it is true that Xn 541:265 N(0; 1). We would like to know that the probability of the statistics constructed under the null hypothesis that Xn 601:265 would fall in the rejection region; that is, the power of the test at = 54 to be Pr j X n 60j 1:265 1:96; = 54 = Pr j X n 54j 1:265 1:96 (54 60) 1:265 +Pr j X n 54j 1:265 1:96 (54 60) 1:265 = 0:993: Hence, the power of the test de ned by C1 above is indeed very high for = 54. From this we know that to calculate the power of a test we need to know the distribution of the test statistics (x) under the alternative hypothesis. In this case it is the distribution of Xn 541:265 . 1 1In the example above, the test statistic (x) have a standard normal distribution under both the null and the alternative hypothesis. However, it is quite often the case when it happen that a test statistics have a di erent distribution under the null and the alternative hypotheses. For example, the unit root test. See Chapter 21. 7 Following the same procedure the power of the test de ned by C1 is as fol- lowing for all 2 1: Pr(j (X)j 1:96; = 56) = 0:8849; Pr(j (X)j 1:96; = 58) = 0:3520; Pr(j (X)j 1:96; = 60) = 0:0500; Pr(j (X)j 1:96; = 62) = 0:3520; Pr(j (X)j 1:96; = 64) = 0:8849; Pr(j (X)j 1:96; = 66) = 0:9973: As we can see, the power of the test increase as we go further away from = 60(H0) and the power at = 60 equals the probability of type I error. This prompts us to de ne the power function as follows. De nition 2: P( ) = Pr(x 2 C1); 2 is called the power function of the test defined by the rejection region C1. De nition 3: = max 2 0P( ) is de ned to be the size (or the signi cance level) of the test. In the case where H0 is simple, say = 0, then = P( 0). De nition 4: A test of H0 : 2 0 against H1 : 2 1 as de ned by some rejection region C1 is said to be uniformly most powerful (UMP) test of size if (a) max 2 0P( ) = ; (b) P( ) P ( ) for all 2 1, where P ( ) is the power function of any other test of size . As will be seen in the sequential, no UMP tests exists in most situations of interest in practice. The procedure adopted in such cases is to reduce the class 8 of all tests to some subclass by imposing some more criteria and consider the question of UMP tests within the subclass. De nition 5: A test of H0 : 2 0 against 2 1 is said to be unbiased if max 2 0P( ) max 2 1P( ): In other word, a test is unbiased if it reject H0 more often when it is false than when it is true. Collecting all the above concepts together we say that a test has been de ned when the following components have been speci ed: (a) a test statistic (x); (b) the size of the test ; (c) the distribution of (x) under H0 and H1; (d) the rejection region C1 (or, equivalently, C0). The most important component in de ning a test is the test statistics for which we need to know its distribution under both H0 and H1. Hence, construct- ing an optimal test is largely a matter of being able to nd a statistic (x) which should have the following properties: (a) (x) depends on x via a ’good’ estimator of ; and (b) the distribution of (x) under both H0 and H1 does not depend on any un- known parameters. We call such a statistic a pivot. Example: Assume a random sample of size 11 is drawn from a normal distribution N( ; 400). In particular, y1 = 62; y2 = 52; y3 = 68; y4 = 23; y5 = 34; y6 = 45; y7 = 27; y8 = 42; y9 = 83; y10 = 56 and y11 = 40. Test the null hypothesis that H0 : = 55 versus H1 : 6= 55. Since 2 is known, the sample mean will distributed as Y N( ; 2=n) N( ; 400=11); therefore under H0 : = 55, Y N(55; 36:36) 9 or Y 55p 36:36 N(0; 1): We accept H0 when the test statistics (x) = ( Y 55)=p36:36 lying in the in- terval C0 = [ 1:96; 1:96] under the size of the test = 0:05. We now have 11X i=1 yi = 532 and y = 53211 = 48:4: Then 48:4 55p 36:36 = 1:01 which is in the accept region. Therefore we accept the null hypothesis that H0 : = 55. Example: Assume a random sample of size 11 is drawn from a normal distribution N( ; 2). In particular, y1 = 62; y2 = 52; y3 = 68; y4 = 23; y5 = 34; y6 = 45; y7 = 27; y8 = 42; y9 = 83; y10 = 56 and y11 = 40. Test the null hypothesis that H0 : = 55 versus H1 : 6= 55. Since 2 is unknown, the sample mean distributed as Y N( ; 2=n); therefore under H0 : = 55 Y N(55; 2=n) or Y 55 p 2=n N(0; 1); however it is not a pivotal test statistics since an unknown parameters 2. 10 From the fact that P(Yi Y )2= 2 2n 1 or s2(n 1)= 2 2n 1 where s2 =Pni=1(Yi Y )2=(n 1) is an unbiased estimator of 2,2 we have ( Y 55)=(p 2=n)p (n 1)s2=(n 1) = Y 55p s2=n tn 1: We accept H0 when the test statistics (x) = Y 55ps2=n lying in the interval C0 = [ 2:23; 2:23]. We now have 11X i=1 yi = 532 11X i=1 y2i = 29000 and s2 = Py2 i n y 2 10 = 29000 11(48:4)2 10 = 323:19: Then 48:4 55p 323:19=11 = 1:2 which is also in the accept region C0. Therefore we accept the null hypothesis that H0 : = 55. 2See p. 22 of Chapter 3. 11 3 Asymptotic Test Procedures As discussed in the last section, the main problem in hypothesis testing is to construct a test statistics (x) whose distribution we know under both the null hypothesis H0 and the alternative H1 and it does not depend on the unknown parameters . The rst part of the problem, that of constructing (X), can be handled relatively easily using various methods (Neyman-Pearson likelihood ra- tio) when certain condition are satis ed. The second part of the problem, that of determining the distribution of (x) under both H0 and H1, is much more di cult to solve and often we have to resort to asymptotic theory. This amount to deriving the asymptotic distribution of (x) and using that to determine the rejection region C1 (or C0) and associated probabilities. 3.1 Asymptotic properties For a given sample size n, if the distribution of n(x) is not known (otherwise we use that) , we do not know how ’good’ the asymptotic distribution of n(X) is an accurate approximation of its nite sample distribution. This suggest that when asymptotic results are used we should be aware of their limitations and the inaccuracies they can lead to. Consider the test de ned by the rejection region Cn1 = fx : j n(x)j cng; and whose power function is Pn( ) = Pr(x 2 Cn1 ); 2 : Since the distribution of n(X) is not known we cannot determine cn or Pn. If the asymptotic distribution of n(x) is available, however we can use that instead to de ne cn from some xed and the asymptotic power function ( ) = Pr(x 2 C11 ); 2 : 12 In this sense we can think of f n(x); n 1g as a sequence of test statis- tics de ning a sequence of rejection region fCn1 ; n 1g with power function fPn( ); n 1; 2 g. De nition 6: The sequence of tests for H0 : 2 0 against H1 : 2 1 de ned by fCn1 ; n 1g is said to be consistent of size if max 2 0 ( ) = and ( ) = 1; 2 1: De nition 7: A sequence of test as de ned above is said to be asymptotically unbiased of size if max 2 0 ( ) = and < ( ) < 1; 2 1: 3.2 Three asymptotically equivalent test procedures In this section three general test procedures{"Holy Trinity"{which gives rise to asymptotically optimal tests will be considered: the likelihood ratio; Wald and Lagarange multiplier test. All three test procedures can be interpreted as uti- lizing the information incorporated in the log likelihood function in di erent but asymptotically equivalently ways. For expositional purpose the test procedures will be considered in the context of the simplest statistical model where 13 (a). = ff(x; ); 2 g is the probability model; and (b). x (X1; X2; :::; Xn)0 is a random sample, and we consider the simple null hypothesis that H0 : = 0; 2 Rm against H1 : 6= 0. 3.2.1 The Likelihood Ratio Test (test statistic is calculated under both the null and the alternative hypothesis) Thee likelihood ratio test statistics takes the form (x) = L( 0;x)max 2 L( ;x) = L( 0;x)L(^ ;x) ; where ^ is the MLE of . Under certain regularity conditions which include RC1-RC3 (see Chapter 3), log L( ;x) can be expanded in a Taylor series at = ^ : log L( ;x) ’ log L(^ ;x) + @ log L@ 0 ^ ( ^ ) + 12( ^ )0 @ 2 log L @ @ 0 ^ ( ^ ); (see Alpha Chiang (1984), p261; Lagranage Form of the Remainder). Since @ log L @ 0 ^ = 0; being the FOC of the MLE, and @2 @ @ 0 log L( ^ ;x) p ! I n( ); (see for example Greene; p: 131; 132)) the above expansion can be simpli ed to log L( ;x) = log L(^ ;x) 12(^ 0)0In( )(^ ): This implies that under the null hypothesis that H0 : = 0 2 log (x) = 2[log L(^ ;x) L( 0;x)] = (^ 0)0In( )(^ 0): 14 From the asymptotic properties of the MLE’s it is known that under certain regularity conditions (^ 0) N(0; I 1n ( )): Using this we can deduce that LR = 2 log (x) ’ (^ 0)0In( )(^ 0) H0 2(m); being a quadratic form in asymptotically normal random variables. 3.2.2 Wald test (test statistic is computed under H1) Wald (1943), using the above approximation of 2 log (x), proposed an alterna- tive test statistics by replacing In( ) with In(^ ): W = (^ 0)0In(^ )(^ 0) H0 2(m); given that In(^ ) p ! In( ).3 3.2.3 The Lagrange Multiplier Test (test statistic is computed under H0) Rao (1947) propose the LM test. Expanding score function of log L( ;x) (i.e. @ log L( ;x) @ ) around ^ we have: @ log L( ;x) @ ’ @ log L(^ ;x) @ + @2 log L(^ ;x) @ @ 0 ( ^ ): As in the LR test, @ log L(^ ;x)@ = 0, and the above equation reduces to @ log L( ;x) @ ’ @2 log L(^ ;x) @ @ 0 ( ^ ): 3In(^ ) can be anyone of the three estimators which estimate the asymptotic variance of the MLE. See section 3.3.6 of Chapter 3. 15 Now we consider the test statistics under the null hypothesis H0 : = 0 LM = @ log L( 0;x) @ 0 I 1n ( 0) @ log L( 0;x) @ ; which is equivalent to LM = ( 0 ^ )0@ 2 log L(^ ;x) @ @ 0 I 1 n ( 0) @2 log L(^ ;x) @ @ 0 ( 0 ^ ): Given that @2 log L(^ ;x)@ @ 0 p ! In( ) and that under H0, In( 0) p ! In( ), we have LM = @ log L( 0;x) @ 0 I 1n ( 0) @ log L( 0;x) @ = ( 0 ^ )0In( )( 0 ^ ) 2(m) as in the proof of LR test. Example: Reproduce the results on Greene (4th.) P.157 example or Greene (5th.) p.490 Table 17.1. Example: Of particular interest in practice is the case where ( 01; 02)0 and H0 : 1 = 01 against H1 : 1 6= 01, 1 is r 1 with 2 (m r) 1 left unrestricted. In this case the three test statistics takes the form LR = 2(ln L(~ ;x) ln L(^ ;x)); W = (^ 1 01)0[I11(^ ) I12(^ )I 122 (^ )I21(^ )]( ^ 1 01) LM = (~ )0[I11(~ ) I12(~ )I 122 (~ )I21(~ )] 1 (~ ); where ^ ( ^ 01; ^ 02)0; ~ 0 ( 010; ~ 20)0; ~ 2 is the solution to @ ln( ;x) @ 2 1= 01 = 0; and (~ ) = @ ln( ;x)@ 1 1= 01 : 16 Proof: Since ^ N( ; I 1(^ )); under H0 : 1 = 01, we have the results from partitioned inverse rule ^ 1 N( 01; [I11(^ ) I12(^ )I 122 (^ )I21(^ ] 1) That is the Wald test statistics is W = ( ^ 1 01)0[I11(^ ) I12(^ )I 122 (^ )I21(^ )]( ^ 1 01): By de nition the LM test statistics is LM = @ log L(~ ;x) @ !0 I 1n (~ ) @ log L(~ ;x) @ ! ; but @ log L(~ ;x) @ ! = 0 @ @ log L @ 1 1= 01 @ log L @ 2 2= ~ 2 1 A = (~ ) 0 ; using partitioned inverse rule we have LM = (~ )0[I11(~ ) I12(~ )I 122 (~ )I21(~ )] 1 (~ ): The proof of LR test statistics is straightforward. Reminder: for a general 2 2 partitioned matrix, A 11 A12 A21 A22 1 = F 1 F1A12A 122 A 122 A21F1 A 122 (I + A21F1A12A 122 ) ; where F1 = (A11 A12A 122 A21) 1. 17