Testing for a Fractional Unit Root in Time Series Regression Chingnun Lee1?, Tzu-Hsiang Liao2 and Fu-Shuen Shie3 1Inst. of Economics, National Sun Yat-sen Univ., Kaohsiung, Taiwan 2Dept. of Finance, National Central Univ., Chung-Li, Taiwan 3Dept. of Finance, National Taiwan Univ., Taipei, Taiwan Abstract This paper presents a nonparametric test of the differencing parameter of a general fractionally integrated process, which allows for weakly dependent and heteroge- neously distributed innovation in the short run dynamics. It is shown that these test statistics derived from standardized ordinary least squared estimator of a simple autoregressive model are consistent and fare well both in terms of power and size. The paper ends with two empirical applications. Key Words: Unit Root, Fractional Integrated Process, Power. JEL classification: C12, C22. August 2004 ?Corresponding author, Address: Graduate Institute of Economics National Sun Yat-sen University 70 Leng-hai Road Kaohsiung, Taiwan 804 Tel: 886-7-5252000 ext. 5618 Fax: 886-7-5255611 e-mail: lee econ@mail.nsysu.edu.tw 1 Introduction It has become quite a standard practice in applied work to perform test on whether a variable is integrated or stationary using both the null hypothesis of I(1) and I(0). See Phillips and Xiao (1998) for an updated survey of unit root testing approaches. However, by proceeding in this way, it is often found that both the null hypothesis are rejected( for example, see Tsay (2000)), suggesting that many time series are not well represented as either I(1) or I(0). In view of this outcome, the class of fractionally integrated processes, denoted as I(d), where the order of integration d is extended to be any real number, has proved very useful in capturing the persistence of many long-memory processes. This procedure above raises two viewpoints on the issues. First, the power of traditional unit roots test against fractional alternative such as Dickey and Fuller (DF, 1979 and 1981), Phillips and Perron (PP, 1988) and Kwiatkowski, Phillips, Schmidt and Shin (KPSS, 1992). Secondly, the power of various technique of estimation of d both when they are applied to be the test of unit root (d = 1) and to the inference of the true value of d. The power of traditional unit root tests have been studied by Diebold and Rudebusch (1991), Lee and Schmidt (1996), Kramer (1998), Dolado, Gonzalo and Mayoral (2002) and Lee and Shie (2002). In general, they al show that these unit roots test are consistent when the alternative is a I(d) process but their power turns out to be quite low. In particular, this lack of power has motivated the development of new approach to estimate the order of integration d, of a series directly, where d does not assume a special value such as unity or zero, but is apparently arbitrary. To inference about the fractional difference parameter d from an observed process, there are a very large number of rather heterogeneous methods to estimate and/or test d. In parametric methodology, 1 most methods are based on the autore- gressive fractionally-integrated moving average (ARFIMA) process as pioneered by Granger and Joyeux (1980) and Hosking (1981). Fox and Taqqu (1986) construct an asymptotic approximation to the likelihood of an ARFIMA process in the fre- 1which we mean that other than d, the short-run dynamics have to be specified parametrically to estimation or inference about d. 1 quency domain, Sowell (1992) construct the exact likelihood function of an ARFIMA process in the time domain, and Chung and Baillie (1993) propose conditional sums of squared (CSS) estimator for an ARFIMA process. Standard maximum likelihood estimator (MLE) inference procedures then apply to these estimates. Besides MLE estimator, Chung and Schmidt (1995) and Mayroal (2001) had proposed a mini- mum distance estimator for the ARFIMA process. On the other hand of parametric methodology, Robinson (1994) and Tanaka (1999) have proposed LM test for d in the frequency and in the time domain, respectively. To implement these tests em- pirically, a parametric short-run dynamic ARMA structure has to be specified. See Gli-Ala?na and Robinson (1997) and Gli-Ala?na (2000) for example. To inference d nonparametrically, Geweke and Porter-Hudak (GPH, 1983) sug- gests a regression of the ordinate of the log spectral density on trigonometric function to estimate d. Robinson (1992) consider frequency domain approach to find con- sistent estimate of d in the absence of any parameterization of the autocovariance function. In general the parametric methods present narrower confidence intervals over nonparametric ones. However, a drawback is that the parameters are sensitive to the class of models considered and may be misleading because of misspecification. See Hauser et al. (1999). By allowing the short-run dynamics to be weakly dependent and heteroge- neously distributed, our aim of the present paper is to develop a new nonparametric test for the fractional difference parameter, d for 0 < d ≤ 1, from a generally frac- tional integrated process in the time domain. It is shown that this test statistics derived from the simple least squares regression of an simple autoregressive model is easy to compute and is consistent against possible alternative including I(1) process. It also fare very well in finite samples, in terms of power and size, when compared to other competing tests. The rest of the paper is organized as follows. Section 2 reviews the functional central limit theorem for a quite general fractional integrated process. In Section 3 we define our model, estimators and their limiting distributions. In Section 4, we provide a test statistics for testing the fractional difference parameters. Simulation 2 evidence of size and power of this test is shown in Section 5. Section 6 discuss some empirical application to the previous test. Finally, Section 7 draws some concluding remarks. ProofsofTheoremsandLemmaaregatheredinAppendix. Fronnowon,throughout this paper, the following conventional notation is adopted: L is the lag operator, Γ(·) denotes the gamma function, ? denotes weak convergence of associated probability measures, p?→ denotes convergence in probability, [z] means the largest integer that is smaller than or equal to z, ?d = d+1, and we let ei ~ fi denote that ei/fi → 1 as i →∞. 2 Preliminaries The class of I(d) process, ut is customarily written in the form (1?L)dut = εt, (1) innovation process ε, the fractional difference of ut, is a stationary and weakly de- pendent process to be specified below. The process ut is covariance stationary and ergodic for ?0.5 < d < 0.5, moreover d < 1 implies that ut is mean reverting. To begin we must be precise about the sequence εt of allowable innovations in (1) in the present paper. Following Phillips (1987), we assume that εt is a sequence of random variables that satisfy the following Assumption. Assumption 1. The sequence εt, 0 < t < ∞ (a) has zero mean; (b) is satisfying supt E|εt|2γ < ∞ for some γ > 2; (c) is covari- ance stationary, and 0 < σ2ε < ∞ where σ2ε = limT→∞T?1summationtextTt=1summationtextTs=1 E(εtεs); (d) is strong mixing with mixing coefficients αm that satisfy summationtext∞m=1 α1?2/γm < ∞. Assumption 1 allow for both temporal dependence and heteroskedasticity in the process of innovations εt. These include all Gaussian and many other stationary finite order ARMA models under very general conditions on the underlying errors. 3 Define the variance of the partial sums of the I(d) process ut by σ2T = Var(summationtextTt=1 ut), then we have the following functional central limit theorem that will be extensive used in our theoretical development below. The results are due to Davidson and De Jong (2000). Lemma 1: Suppose (1?L)dut = εt, ?0.5 < d < 0.5 and εt satisfy Assumption 1, then as T →∞, (a) σ2T ~ σ2εVdT1+2d, and (b) σ?1T summationtext[Tr]t=1 ut ? Bd(r), for r ∈ [0,1]. Here, Bd(r) is the normalized fractional Brownian motion that is defined by the following stochastic integral: 2 Bd(r) ≡ 1 Γ(1 + d)V 12d ( integraldisplay r 0 (r ?x)ddB(x) + integraldisplay 0 ?∞ [(r ?x)d ?(?x)d]dB(x)), (2) with Vd ≡ 1Γ(1+d)2( 11+2d + integraltext∞0 [(1 + τ)d ? τd]dτ) = Γ(1?2d)(1+2d)Γ(1+d)Γ(1?d) and B(r) is the standard Brownian motion. This type of fractional Brownian motion is so defined as to make EBd(1)2 = 1. Fractional Brownian motion differs from standard Brownian motion B(r) in having correlated increments. See Mandelbrot and Van Ness (1968) and Marinucci and Robinson (1999) for additional detail on the fractional Brownian motion. Lemma 1(b) is a functional central limit theorem for a generally fractional inte- grated process that could apply to a large class of fractionally integrated process in- cluding the well-known ARFIMA (p,d,q) process. See for example, Davydov (1970), Akonom and Gourieroux (1987), Hosking (1996), Marinucci and Robinson (2000) and Chung (2002). 2The original definition of fractional Brownian motion shown in Sowell (1990) is Bd(r) = 1/Γ(1+ d)integraltextr0 (r ?x)ddB(x). However, Marinucci and Robinson (1999) show that it require correction by replacing it with the definition of fractional Brownian motion as in (2). 4 3 The model and estimators 3.1 A generally nonstationary I(d) process Let yt be a nonstationary fractional difference process generated by (1?L)?dyt = εt, (3) where 0.5 < ?d < 1.5, and εt, the fractional difference of yt, satisfy Assumption 1. The process yt can be also represented equivalently as yt = βyt?1 + ut; (4) β = 1 and (1?L)dut = εt, (5) where ?d = 1 + d, and ?0.5 < d < 0.5. Initial condition for (3) are set at t = 0 and y0 = 0. We consider the two least-squares regression equations: yt = ?βyt?1 + ?ut, (6) yt = ?α + ?βyt?1 + ?ut, (7) where ?β and (?α, ?β) are the conventional least-square regression coefficients. We shall be concerned with the limiting distribution of the regression coefficients in (6) and (7) under the hypothesis that the data are generated by (3) or equivalently by (4) and (5). Thus for the null values ?d = ?d0, it will become β = 1, α = 0, and d = d0. Under (4) and (5), sample moments of yt and ut that are useful to derive the OLS estimator are collected in the following lemma. Lemma 2. As T →∞, then (a) T?12?d Tsummationtext t=1 ut ? V 12d σεBd(1), (b) T?2?2d Tsummationtext t=1 y2t?1 ? Vdσ2ε integraltext10 [Bd(r)]2dr, (c) T?1?2dy2T ? Vdσ2ε[Bd(1)]2, (d) T?32?d Tsummationtext t=1 yt?1 ? V 12d σεintegraltext10 Bd(r)dr, 5 (e) T?1 Tsummationtext t=1 u2t p?→ σ2u = E(u2t), (f) T?1?2d Tsummationtext t=1 yt?1ut ? 12Vdσ2ε[Bd(1)]2 if d > 0, (g) T?1 Tsummationtext t=1 yt?1ut p?→?12σ2u if d < 0, (h) T?32?d Tsummationtext t=1 tut ? V 12d σε[Bd(1)?integraltext10 Bd(r)dr], (i) T?52?d Tsummationtext t=1 tyt?1 ? V 12d σεintegraltext10 rBd(r)dr, (j) T?3?2d Tsummationtext t=1 ty2t?1 ? Vdσ2ε integraltext10 r[Bd(r)]2dr. Joint weak convergence for the sample moments give above to their respective limits is easily established by Cramer-Wald Theorem and will be used below. 3.2 Limiting Distributions of the Statistics In this section we characterize the limiting distribution of the coefficient estimator ?β, ?α and ?β under the maintained hypothesis that the time series yt is generated by (4) and (5). Theorem 1. For the regression model (6), then as T →∞: (a) T(?β ?1) ? 12 [Bd(1)] 2 integraltext1 0 [Bd(r)] 2dr, when d > 0; (b) T1+2d(?β ?1) ??12 σ 2 u Vdσ2ε integraltext10 [Bd(r)]2dr, when d < 0; and (c) T(?β ?1) ? 12 {[B(1)]2 ? σ2uσ2 ε } integraltext1 0 [B(r)] 2dr , when d = 0. For the regression model (7), then as T →∞: when d > 0, then (d) T 12?d?α ? V 1 2 d σεBd(1){ integraltext1 0 [Bd(r)] 2dr ? 1 2Bd(1) integraltext1 0 Bd(r)dr}integraltext 1 0 [Bd(r)] 2dr ?[integraltext1 0 Bd(r)dr] 2 , and 6 (e) T(?β ?1) ? 1 2[Bd(1)] 2 ?Bd(1)integraltext1 0 Bd(r)drintegraltext 1 0 [Bd(r)] 2dr ?[integraltext1 0 Bd(r)dr] 2 ; when d < 0, then (f) T 12+d?α ? 12 σ 2 u integraltext1 0 Bd(r)dr V 12d σε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2} , and (g) T1+2d(?β ?1) ??12 σ 2 u Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}; when d = 0, then (h) T 12 ?α ? σε{B(1)integraltext10 [B(r)]2dr ? 12{[B(1)]2 ? σ2uσ2 ε }integraltext10 B(r)dr} integraltext1 0 [B(r)] 2dr ?[integraltext1 0 B(r)dr] 2 , and (i) T(?β ?1) ? 1 2{[B(1)] 2 ? σ2u σ2ε}?B(1) integraltext1 0 B(r)drintegraltext 1 0 [B(r)] 2dr ?[integraltext1 0 B(r)dr] 2 . We first discuss the results from model (6). The convergence rates of (?β ?1) depend intrinsically on the degree of fractional difference in the ut process. The distribution of Tmin[1,1+2d](?βT ? 1) is therefore called a generalized fractional unit root distribution. This fact is also discussed in Sowell (1990) and Tanaka (1999, Corollary 2.4) where εt in (3) is assumed to be i.i.d. and to be infinite order moving average process respectively. It may be easily illustrated that when the innovation process εt is i.i.d.(0,σ2), we have σ2u = [Γ(1 ? 2d)/Γ(1 ? d)2]σ2, 3 leading to the following simplification of part (b) and (c) of Theorem 1: T1+2d(?β ?1) ?? (12 + d)Γ(1+d)Γ(1?d) integraltext1 0 [Bd(r)] 2dr, when d < 0; (8) and T(?β ?1) ? 1 2{[B(1)] 2 ?1} integraltext1 0 [B(r)] 2dr , when d = 0. (9) Result (8) was first given by Sowell (1990) and result (9) was given by Dickey 3See for example Baillie (1996). 7 and Fuller (1979). Theorem 1 therefore extends (8) and (9) to the very general case of weakly dependent distributed data after difference-d times. It is interesting to note that when d > 0, the assumption on εt did not play any role in determining this limiting distribution. It converges to the same distrib- ution as that of Sowell (1990) and Tanaka (1999). When d < 0, the distribution of T1+2d(?β?1) has the same general form for a very wide class of the innovation process εt. It reduces to be the distribution of Phillips (1987, Theorem 3.1, (c)) when d = 0. Similar conclusion applies to the results from model (7). The simplification of part (g) of Theorem 1 when εt is i.i.d.(0,σ2) is: T1+2d(?β ?1) ?? (12 + d)Γ(1+d)Γ(1?d) integraltext1 0 [B(r)] 2dr ?[integraltext1 0 B(r)dr] 2. (10) 4 Statistical Inference of the Fractional Differ- ence Parameter 4.1 Test for 0.5 < ?d < 1 The limiting distribution of the regression coefficients when ?0.5 < d < 0 (0.5 < d < 1) given in last section depend upon the nuisance parameter σ2u and σ2ε. These distributions are therefore not directly usable for statistical testing. However, since σ2u and σ2ε may be consistently estimated and the estimate may be used to construct modified statistics whose limiting distribution are independent of (σ2u,σ2ε), there exist simple transformation of the test statistics which eliminate the nuisance parameters asymptotically. This idea was first developed by Phillips (1987) and Phillips and Perron (1988) in the context of test for a unit root. Here we show how the similar procedure may be extended to apply to test for the fractional difference parameter value in a quite generally fractional integrated process. First due to the ergodicity assumption of ut, consistent estimation of σ2u are provided by ?σ2u = T?1summationtextTt=1(yt ?yt?1)2 for data 8 generated by (4) and (5). Since ?β and (?α,?β) are consistent by Theorem 1, we may also use ?σ2u = T?1summationtextTt=1(yt ? ?βyt?1)2 and ?σ2u = T?1summationtextTt=1(yt ? ?α ? ?βyt?1)2 as a consistent estimator of σ2u for model (6) and (7), respectively. Consistent estimation of σ2ε can be in the same spirit with that of Phillips and Perron (1988) by the following simple estimator based on truncated sample autocovariance, namely: s2Tl = T?1 Tsummationdisplay t=1 ?ε2t + 2T?1 lsummationdisplay τ=1 wτl Tsummationdisplay t=τ+1 ?εt?εt?τ, (11) where ?εt = (1?L)d(yt?yt?1) = (1?L)dut and wτl = 1?τ/(l+1). We may also use ?εt = (1?L)d(yt ? ?βyt?1) and ?εt = (1?L)d(yt ? ?α? ?βyt?1) as alternative estimate to ?εt in the construction of s2Tl. Conditions for the consistency of s2Tl are and ex- plored by Phillips (1987, Theorem 4.2). We now define some simple transformation of conventional test statistics from the regression (6) and (7) which eliminate the nuisance parameter dependencies asymptotically. Specifically, we define Z(d) = s 2 Tl ?σ2u T 1+2d(?β ?1), (12) and Zμ(d) = s 2 Tl ?σ2u T 1+2d(?β ?1). (13) Z(d) is the transformation of the standard estimator T1+2d(?β ?1) and Zμ(d) is the transformation of T1+2d(?β?1). The limiting distribution of Z(d) and Zμ(d) is given by: Theorem 2: Assume that l = o(T 12), then as T →∞, Z(d0) ??12 1V d0 integraltext1 0 [Bd0(r)] 2dr, and Zμ(d0) ??12 1V d0( integraltext1 0 [Bd0(r)] 2dr ?[integraltext1 0 Bd0(r)dr] 2), 9 under the null hypothesis that yt is I(?d0) process as in (3). Theorem 1 demonstrates that the limiting distribution of the two statistics Z(d0) and Zμ(d0) are invariant within a very wide class of weakly dependent and possible heterogeneously distributed innovation εt. Table1givesthecriticalvaluesof?12 1V d0 CA 1 0 [Bd0(r)]2dr and?12 1V d0( CA 1 0 [Bd0(r)]2dr?[ CA 1 0 Bd0(r)dr]2) , calculated via a direct simulation from a simple transformation of (8) and (10), using a sample size of 500, and 10000 replication. The calculations were done in GAUSS using the normal random number. Observation on a I(d) process for d ∈ (?0.5,0) were generated using the Durbin-Levinson algorithm. 4 4See for example, Brockwell and Davis (1991), Section 5.2. 10 Table 1. Critical values for the Z(d) and Zμ(d) statistics Percentiles of the distribution of ?12 1V d CA 1 0 [Bd(r)]2dr T = 500 d 1% 2.5% 5% 10% 90% 95% 97.5% 99% ?0.05 ?9.238 ?7.238 ?5.736 ?4.212 0.244 0.406 0.540 0.708 ?0.10 ?6.912 ?5.552 ?4.444 ?3.336 ?0.036 0.072 0.145 0.220 ?0.15 ?5.198 ?4.238 ?3.461 ?2.637 ?0.124 ?0.048 0.001 0.044 ?0.20 ?3.825 ?3.201 ?2.688 ?2.132 ?0.157 ?0.093 ?0.061 ?0.028 ?0.25 ?2.601 ?2.229 ?1.893 ?1.552 ?0.153 ?0.104 ?0.075 ?0.053 ?0.30 ?1.840 ?1.583 ?1.380 ?1.169 ?0.143 ?0.100 ?0.080 ?0.058 ?0.35 ?1.229 ?1.093 ?0.981 ?0.843 ?0.118 ?0.088 ?0.069 ?0.052 ?0.40 ?0.831 ?0.736 ?0.663 ?0.582 ?0.093 ?0.070 ?0.055 ?0.043 ?0.45 ?0.522 ?0.473 ?0.436 ?0.388 ?0.072 ?0.053 ?0.043 ?0.032 Percentiles of the distribution of ?12 1V d( CA 1 0 [Bd(r)]2dr?[ CA 1 0 Bd(r)dr]2) T = 500 d 1% 2.5% 5% 10% 90% 95% 97.5% 99% ?0.05 ?14.591 ?12.281 ?10.423 ?8.538 ?1.064 ?0.636 ?0.303 0.013 ?0.10 ?10.622 ?8.977 ?7.843 ?6.484 ?1.083 ?0.772 ?0.560 ?0.361 ?0.15 ?7.575 ?6.487 ?5.734 ?4.860 ?0.976 ?0.755 ?0.592 ?0.452 ?0.20 ?5.144 ?4.487 ?3.964 ?3.429 ?0.822 ?0.650 ?0.536 ?0.426 ?0.25 ?3.548 ?3.125 ?2.778 ?2.442 ?0.702 ?0.573 ?0.481 ?0.385 ?0.30 ?2.347 ?2.113 ?1.917 ?1.723 ?0.569 ?0.470 ?0.398 ?0.327 ?0.35 ?1.533 ?1.394 ?1.289 ?1.172 ?0.454 ?0.384 ?0.334 ?0.282 ?0.40 ?0.976 ?0.899 ?0.835 ?0.769 ?0.342 ?0.297 ?0.261 ?0.221 ?0.45 ?0.603 ?0.565 ?0.535 ?0.496 ?0.253 ?0.221 ?0.196 ?0.173 11 Next we consider the behavior of the test under the alternative hypothesis that d negationslash= d0 in the following theorem. Theorem 3. Suppose that l = o(T 12). If yt is generated by (1?L)?d1yt = εt, where ?d1 = ?d0 ?δ < 1, and δ negationslash= 0, then Z(d0) ??12V?δV d1 (Tl )2δ 1integraltext1 0 [Bd1(r)] 2dr, and Zμ(d0) ??12V?δV d1 (Tl )2δ 1integraltext1 0 [Bd0(r)] 2dr ?[integraltext1 0 Bd0(r)dr] 2. For δ > 0, Z(d0) →?∞ and Zμ(d0) →?∞; for δ < 0, Z(d0) → 0 and Zμ(d0) → 0. Theorem 2 implies that the two-tailed Z(d0) and Zμ(d0) test is consistent against I(?d1 = ?d0 ? δ) alternative for δ negationslash= 0. Obviously the lower tail test is con- sistent against I(?d1 = ?d0 ? δ) alternative for δ > 0; while the upper tail test is consistent against I(?d1 = ?d0 ?δ) alternative for δ < 0. Corollary 1: Suppose that l = o(T 12). If yt is generated by (1?L)?d1yt = εt, where ?d1 = 1, then Z(d0) → 0 and Zμ(d0) → 0. This results imply that the upper-tailed Z(d0) and Zμ(d0) are also consistent against I(1) alternative. 4.2 Test for ?d = 1 Here, we show that our result in the last section is easily to be applied as a test of unit root. Let the observed series yt is generated by (3) but with ?d = 1. We first fractionally difference the series yt by the following operator (1? L)d1 to obtain a series wt for arbitrary choice of d1 such that 0 < d1 < 0.5. In particular, (1?L)d1yt = wt or (1 ? L)?d?d1wt = εt. When it is true ?d = 1, then applying the Z(d) and Zμ(d) test statistics on wt would lie in the confidence interval as in Table 1 for the corresponding choice of d1. 12 4.3 Test for 0 < ?d ≤ 0.5 It is also easily to implement the Z(d) and Zμ(d) test to test for 0 < ?d ≤ 0.5 from the results above. Let the observed series yt is generated by (3) but with 0 < ?d ≤ 0.5. We first fractionally difference the series yt by the following operator (1 ? L)?d?, where the known d? → 0.5?, to obtain a series wt. In particular, (1?L)?d?yt = wt, or (1?L)d?+?dwt = εt. Apply Z(d) and Zμ(d) test statistics on wt to inference about d? + ?d. The value of ?d follows immediately. 5 Size and Power in finite sample In this section, we provided some evidence on the size and power of the Z(d) and Zμ(d) and GPH tests in finite sample. This evidence is based on simulations. Data were generated by the model (3) for ?d = 0.9 with AR(1) error: εt = φεt?1 + νt, (14) where the νt independent and identically distributed N(0,1). We set y0 = 0 and use various lag truncations l in (11) to evaluate the effects of these choice on test performance. A Bartlett window wτl = 1?τ/(l + 1) and first difference of yt were used in the construction variance estimates ?σ2u and s2Tl. The simulation reported in Table 2 are based on a sample of size 500 with 10000 replication and give results for one-side test under the null hypothesis ?d = 0.9 and for the alternative ?d = 0.7. The results show size and power computations for six different value of φ in (14). When φ = 0, we first observe it is in accuracy with respect to the size of Z(d) and Zμ(d) test no matter the choice of l. GPH has lightly over-rejection. It should be stressed that the distribution of our test under the alternative depends on l (i.e., on l/T) even asymptotically, so there is a clear supposition that choosing l large will cost power, as indeed it does in our simulation. The power of the Z test decrease as l increases, which is in accordance with the prediction of Theorem 2. Overall Zμ(d) test has greater power than Z(d) for almost all choice of l and greater than GPH for l less than 9. Thus, when φ = 0, Zμ(d) is the preferred test over GPH and Z(d) when l is not larger than 10. 13 When φ < 0, the Z(d) test suffer size distortions but these are attenuated as the lag length in l increase. GPH has the same amount of size distortion as φ = 0. The Zμ(d) again very conservative and has higher power than the Z(d) and GPH test for the choice l less than 10. When φ > 0, it is now the GPH that has significant size distortions and are too liberal to be useful for φ = 0.8. The finding is as been predicted by Agiakloglou et al. (1993) on the theoretical ground. The Zμ(d) test is however still in good size for moderate choice of l greater than 5. The combination of good size and power performance of the Zμ(d) do not change for l less than 9. Schwert (1987) suggested a criterion to choose the number of lags l: l4 = int[4(T/100)1/4] and l12 = int[12(T/100)1/4]. For the sample size T=500 in this simulation, it corre- sponds to l4 = 5 and l12 = 15. Our simulation results therefore suggest that using l4 in Zμ(d) test is preferred. 6 Size and Power in finite sample In this section, we provided some evidence on the finite sample performance of the Z(d) and Zμ(d) tests proposed in this study for testing of fractional difference para- meter and unit root. This evidence is based on simulations. Throughout simulation, we fix the sample size at T=500, the number of replication at 1000 and the signifi- cance level at nominal 5 percent based on Table 1. In the course of the simulation the behavior of the Phillips and Perron’s test– pp (corresponding to no constant regression model as Z(d) ) and ppμ (corresponding to regression with constant as Zμ(d) ), and GPH are also examined for they being the non-parametric counterpart of the present tests. We use first difference of yt in the construction variance esti- mates ?σ2u and s2Tl. Also we use various lag truncations l in the Bartlett window as (11) to evaluate the effects of these choice on test performance of our tests. The same number of l is also used in the PP test. However, it should be stressed that the distribution of our test under the alternative depends on l (i.e., on l/T) even asymptotically, so there is a clear supposition that choosing l large will cost power 14 as indicated by Theorem 3. Therefore we report the power performance of our tests below from the minimun choice of l that has acceptable size. They data generating model employed here are Model A: (1?L)?dyt = νt; and Model B: (1?L)?dyt = εt; where εt = φεt?1 + νt. Hereνt isindependentlyandidenticallydistributedasN(0,1)andwesety0 = 0. For testing the unit tests hypothesis, proposal from our test suggesting we first differencing yt by the operator, (1?L)d1. We choose d1=0.1, 0.2,0.3 and 0.4 to get w1t, w2t, w3t and w4t respectively. We first consider the size of the test under various null hypothesis of ?d0 in model A. In fact under model A of iid error, there is no need to transformation leading to Z(d) and Zμ(d) tests. However we gather from Table 2 that these tests has little loss in accuracy with respect to the size except for Zμ(d) with d1 being 0.1 as a test for unit root where there always being over rejection. We next consider the size of the tests in the presence of autocorrelated errors of model B. Table 3 presents our simulation results given the size of the test. The test reject too often for φ < 0 and too seldom for φ > 0 when l = 0. However, these size distortions are rapidly removed in the increase of l. For φ = ±0.2 and ±0.5, the size problem is attentinued for the choice of l less than 5 except for Zμ(d) with d1 being 0.1. For φ = ±0.8, we see from Table * that when φ = +0.8, correct size is attenable as we choose large l. However, when φ = ?0.8, increase of l can’t ease the size problem when the null hypothesis is H0 : ?d = 0.6 and 0.7 and the case when the null hypothesis is H0 : ?d = 1 with d1 = 0.3 and 0.4. Finally, we consider the power of the tests. Let us first deal with Model A. Table 4 and Table 5 report percentage power of various results for H0 : ?d = 1 against H1 : ?d < 1 from two different choice of l, whereas Table 6 present results for H0 : ?d = 0.9 against H1 : ?d < 0.9. Note that the column entries corresponding to ?d = 1 are type I error in Table 4. The power of theZ(d) and Zμ(d)test decrease 15 as l increases, which is in accordance with the prediction of Theorem 2. Choice of d1 to be 0.3 or 0.4 is preferred in term of size and power. However, overall for the results in Table 4 and 5 shows that as a test for unit root, the Z(d) test behaves better than pp; Zμ(d) and ppμ behave similar; GPH tests has least power. Table 6 show that the Zμ(d) test clearly dominate the others test in term of power when it is applied to be the test of fractional parameter in model A. Table 7 and 8 are concerned with Model B, with AR coefficients of φ = 0.2. Table 7 represent results for H0 : ?d = 1 against ?d < 1, whereas Table 8 presents results for H0 : ?d = 0.9 against H1 : ?d < 0.9. Table 9 and 10 deal with model B with φ = 0.5 and present results for H0 : ?d = 1 and for H0 : ?d = 0.9, respectively. Finally, Table 11 and 12 are concerned with the case when φ = ?0.2. The general feature of Table 7-12 is that given the accepted size, Z(d) test is clearly powerful than PP and GPH when it is applied to be a test of unit root and Zμ(d) be outperformed to GPH when it is applied to be the test for the fractional differenced parameter. 16 Table 2. Size and power for the GPH, Z(d) and Zμ(d) statistics (a) GPH test* Size (?d = 0.7) Power (?d = 0.4) φ μ = 0.55 μ = 0.575 μ = 0.6 μ = 0.55 μ = 0.575 μ = 0.6 0.0 0.070 0.070 0.072 0.668 0.736 0.806 0.2 0.071 0.073 0.079 0.651 0.716 0.778 0.5 0.097 0.108 0.133 0.568 0.603 0.635 0.8 0.412 0.561 0.733 0.138 0.097 0.057 ?0.2 0.068 0.069 0.073 0.677 0.745 0.819 ?0.5 0.067 0.069 0.072 0.681 0.753 0.827 ?0.8 0.065 0.067 0.071 0.684 0.753 0.828 * μ means the values use for the sample size function n = Tμ where n is the choice of the number of low-frequency ordinates, used in the GPH spectral regression. (b) Z(d) test Size (?d = 0.7) φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15 0.0 0.056 0.056 0.049 0.055 0.054 0.052 0.052 0.049 0.2 0.038 0.046 0.047 0.049 0.050 0.049 0.050 0.045 0.5 0.012 0.030 0.035 0.047 0.050 0.051 0.051 0.044 0.8 0.014 0.024 0.031 0.044 0.050 0.054 0.060 0.063 ?0.2 0.089 0.069 0.056 0.059 0.058 0.056 0.056 0.052 ?0.5 0.140 0.104 0.078 0.072 0.068 0.066 0.063 0.063 ?0.8 0.271 0.155 0.129 0.125 0.113 0.107 0.101 0.105 17 (continued) Power (?d = 0.4) φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15 0.0 0.955 0.863 0.747 0.616 0.484 0.348 0.220 0.000 0.2 0.904 0.826 0.728 0.622 0.519 0.415 0.309 0.006 0.5 0.631 0.680 0.643 0.588 0.523 0.455 0.387 0.078 0.8 0.004 0.098 0.209 0.273 0.299 0.309 0.302 0.192 ?0.2 0.977 0.889 0.756 0.597 0.416 0.242 0.127 0.000 ?0.5 0.986 0.918 0.757 0.496 0.209 0.083 0.042 0.001 ?0.8 0.912 0.613 0.224 0.091 0.067 0.060 0.059 0.011 (c) Zμ(d) test Size (?d = 0.7) φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15 0.0 0.053 0.044 0.040 0.029 0.024 0.021 0.018 0.017 0.2 0.015 0.035 0.036 0.033 0.029 0.025 0.025 0.025 0.5 0.002 0.011 0.021 0.028 0.034 0.035 0.037 0.038 0.8 0.000 0.002 0.007 0.015 0.028 0.044 0.057 0.062 ?0.2 0.109 0.054 0.037 0.024 0.016 0.013 0.011 0.011 ?0.5 0.164 0.063 0.023 0.009 0.007 0.004 0.004 0.003 ?0.8 0.021 0.004 0.002 0.002 0.002 0.003 0.002 0.003 Power (?d = 0.4) φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15 0.0 0.999 0.973 0.774 0.381 0.097 0.015 0.003 0.000 0.2 0.993 0.949 0.793 0.525 0.254 0.091 0.021 0.006 0.5 0.662 0.764 0.697 0.573 0.429 0.278 0.162 0.078 0.8 0.000 0.033 0.119 0.189 0.221 0.229 0.217 0.192 ?0.2 1.000 0.982 0.698 0.170 0.016 0.002 0.000 0.000 ?0.5 1.000 0.980 0.300 0.014 0.001 0.001 0.001 0.001 ?0.8 0.429 0.026 0.007 0.007 0.007 0.008 0.010 0.011 18 7 Empirical Application In order to provide some empirical illustrations of how the testing and estimation methods proposed in this paper can be applied in practice, we have examined two series for which evidence of fractional integration has been found before in the liter- ature by parametric approach. More specifically, the first one is the US real interest rate series whereas the second one is the US unemployment rate series in the Nelson and Plosser’s (1982) data set. Findings from size and power performance of test statistics in finite sample from simulation of last section, a left-tail Zμ(d) test will be employed in the following empirical application for different hypothesized value of ?d, from 0.95 with 0.05 decrements and a given d?=0.49. Schwert’s criterion to choose the number of lags (l): l4 = int[4(T/100)1/4] is conducted. 7.1 Real Interest rate data We consider the behavior of the Mishkin (1990)’s ex post real interest rate data analyzed by Tsay (2000). They are monthly data from January 1953 to December 1990 and its length is 456 . Table 3.a resumes the results of Z(d). The basic findings is that the US real interest rate series for this period is well characterized by a I(d) process with d around 0.65. The value d inferenced from our procedure is close to the 0.666 that are obtained by Tsay (2000) who uses Chung and Baillie (1993)’s CSS approach on a gaussian ARFIMA (1,d,1) model. In view of this result, the conclusion that can be drawn is that shock does not have a permanent effect on real interest rate but it took a long while for its dying out. 7.2 Unemployment rate data Finally, we examine the US unemployment rate data that is most controversy about its degree of integration in the extended version of Nelson and Plosser’s data set. The data is annual, from 1891 to in 1988 and has been transformed to natural logarithms. This series has been analyzed, among others, by Gil-Ala?na and Robinson (1997) and Dolado, Gonzalo and Mayoral (2002). By applying Robinson’s LM test, Gil-Ala?na et 19 al. find evidence of fractional integration in unemployment rate series. Although the estimated value of d vary substantially across different models for the disturbance εt, overall, they conclude that the unemployment rate seems close to stationarity. Recently, Dolado et al. employ their Wald-type fractional Augmented Dickey-Fuller test on the same series and obtain an estimated value of d to be 0.412. Table 3.b reports the results of Zμ(d) and we find that the US unemployment rate series for this period is well characterized by a I(d) process with d around 0.46, lying in the stationary range, although close to the non-stationary boundary. The estimated value of d is in large agrees with previous studies. 20 Table 3.a Zμ(d) test on Mishikin’s data H0 : ?d = ?d0 H1 : ?d < ?d0 Hypotheses Test CriticalStatistics Values H0 : ?d = 0.95 H1 : ?d < 0.95 Zμ(d) =?34.514* ?10.423 H0 : ?d = 0.90 H1 : ?d < 0.90 Zμ(d) =?18.974* ?7.843 H0 : ?d = 0.85 H1 : ?d < 0.85 Zμ(d) =?10.495* ?5.734 H0 : ?d = 0.80 H1 : ?d < 0.80 Zμ(d) = ?5.844* ?3.964 H0 : ?d = 0.75 H1 : ?d < 0.75 Zμ(d) = ?3.279* ?2.778 H0 : ?d = 0.70 H1 : ?d < 0.70 Zμ(d) = ?1.856 ?1.917 H0 : ?d = 0.65 H1 : ?d < 0.65 Zμ(d) = ?1.061 ?1.289 * significance at 5% level 21 Table 3.b Zμ(d) test on unemployment rate data H0 : ?d = ?d0 H1 : ?d < ?d0 Hypotheses Test CriticalStatistics Values H0 : ?d = 0.95 H1 : ?d < 0.95 Zμ(d) =?14.265* ?10.423 H0 : ?d = 0.90 H1 : ?d < 0.90 Zμ(d) = ?9.601* ?7.843 H0 : ?d = 0.85 H1 : ?d < 0.85 Zμ(d) = ?6.536* ?5.734 H0 : ?d = 0.80 H1 : ?d < 0.80 Zμ(d) = ?4.508* ?3.964 H0 : ?d = 0.75 H1 : ?d < 0.75 Zμ(d) = ?3.158* ?2.778 H0 : ?d = 0.70 H1 : ?d < 0.70 Zμ(d) = ?2.253* ?1.917 H0 : ?d = 0.65 H1 : ?d < 0.65 Zμ(d) = ?1.642* ?1.289 H0 : ?d = 0.60 H1 : ?d < 0.60 Zμ(d) = ?1.225* ?0.835 H0 : ?d = 0.55 H1 : ?d < 0.55 Zμ(d) = ?0.939* ?0.535 H0 : ?d + 0.49 = ?d0 H1 : ?d + 0.49 < ?d0 H0 : ?d = 0.95 H1 : ?d < 0.95 Zμ(d) =?5.183 ?10.423 * significance at 5% level 22 8 Conclusion In this paper, we have considered a test of the fractional difference parameter of the generally fractional-integrated process. The test is based on the standardized OLS estimation in a very simple regression model. The main technical innovation of this paper is the allowance made for error autocorrelation. Correspondingly, the main practical difficulty in performing the test is the estimation of the long-run variance. Our autocorrelation correction is similar to the Phillips and Perron correction for the unit roots test. This is a plausible approach to consider because it does not rely on the distribution and parametric assumption on the innovation. It avoids both the computational difficulties associated with the normal MLE and possibility of model misspecification from parameterization of the short run behavior. Monte Carlo experiment throughout the paper support the analytical results and show that the proposed tests behave reasonable well in finite samples. Further, several empirical illustrations of how to use and interpret these tests are provided. The tests are intended to complement unit root tests, such as the Phillips- Perron tests and the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992) test. After rejecting the unit root hypothesis and the stationary hypothesis, we can per- form the test proposed in this paper to inference the value of possible fractional difference parameter d. Future extension is necessary to derive the appropriate asymptotic theory of the proposed test statistics when d = ?1/2 (?d = 1/2). However there are difficulties with this extension in deriving the functional central limiting theorem in Theorem 1. The limiting sample path are not continuous in this condition d = ?1/2. See Davidson and De Jong (2000). A modified weak convergence results of Theorem 1 is called for. 23 Appendix Proof of Lemma 1. This is a direct application of the results of Functional Central Limits Theo- rem of Davidson and De Jong (2000, Theorem 3.2) by replacing a weaker condition in their Assumption on εt with the stronger condition in Assumption 1 here in this paper. See Davidson (2002) for the relationship between these two assumptions. Proof of Lemma 2. The proofs of items (a)~(d) are straight forward application of continuous mapping theorem to Lemma 1(b)’s results. They are omitted here. Item (e) is due to ergodicity of the stationary process ut. To prove item (f) and (g), we first recall that y0 = 0, then we establish that summationtextTt=1 yt?1ut = 12y2T ? 12 summationtextTt=1 u2t. From Lemma 2(c) and (e), we know that y2T is Op(T1+2d) and summationtextTt=1 u2t is Op(T), then summationtextTt=1 yt?1ut would be Op(κ), where κ = max(T1+2d,T). Therefore for d > 0, T?1?2dsummationtextTt=1 yt?1ut ? T?1?2d12y2T ? 1 2Vdσ 2 ε[Bd(1)] 2, and for d < 0, T?1summationtextT t=1 yt?1ut ?? 1 2 summationtextT t=1 u 2 t p?→?1 2σ 2 u. To prove item (h), we first observe that summationtextTt=1 yt?1 = summationtextTt=1 Tut ?summationtextTt=1 tut, orsummationtext T t=1 tut = T summationtextT t=1 ut ? summationtextT t=1 yt?1. Therefore T ?32?dsummationtextT t=1 tut = T ?12?dsummationtextT t=1 ut ? T?32?dsummationtextTt=1 yt?1. By Lemma 2(a) and (d), we have T?32?dsummationtextTt=1 tut ? V 12d σεBd(1)? V 12d σεintegraltext10 Bd(r)dr. To prove item (i) and (j), we denote XT(r) = summationtext[Tr]t=1 ut, for r ∈ [0,1], then as T →∞, the following results hold. T?1 Tsummationdisplay t=1 t T yt?1 = integraldisplay 1 0 rXT(r)dr, (A.1) and T?1 Tsummationdisplay t=1 t T y 2 t?1 = integraldisplay 1 0 rX2T(r)dr. (A.2) The result of item (i) follows immediately from (A.1) and Lemma 2(d). Similarly, the result of item (j) follows immediately from (A.2) and Lemma 2(b). 24 Proof of Theorem 1. We show the results of item (d)~(i) for model (7), then the results of model (6) follows immediately. For model (7), we note that bracketleftbigg ?α ?β ?1 bracketrightbigg = ? ?? ? T Tsummationtext t=1 yt?1 Tsummationtext t=1 yt?1 Tsummationtext t=1 y2t?1 ? ?? ? ?1? ?? ? Tsummationtext t=1 ut Tsummationtext t=1 yt?1ut ? ?? ?. (A.3) From Lemma 2, the order of probability of the individual term in (A.3) is as follow, bracketleftbigg ?α ?β ?1 bracketrightbigg = bracketleftbigg O p(T) Op(T 3 2+d) Op(T 32+d) Op(T?2?d) bracketrightbigg?1bracketleftbigg O p(T 1 2+d) Op(Tmax[1+2d,1]) bracketrightbigg . To prove item (d) and (e), we note that when d > 0, summationtextTt=1 yt?1ut is Op(T1+2d). We define two rescaling matrices, ΥT = bracketleftbigg T 1 2 0 0 T1+d bracketrightbigg and ?T = bracketleftbigg T?1 2?d 0 0 T?1?2d bracketrightbigg . Multiplying the rescaling matrices on (A.3), we get ΥT bracketleftbigg ?α ?β ?1 bracketrightbigg = ΥT ? ?? ? T Tsummationtext t=1 yt?1 Tsummationtext t=1 yt?1 Tsummationtext t=1 y2t?1 ? ?? ? ?1 ΥTΥ?1T ??1T ?T ? ?? ? Tsummationtext t=1 ut Tsummationtext t=1 yt?1ut ? ?? ?. (A.4) Substitute the results of Lemma(2) to (A.4), we establish bracketleftbigg T 1 2?d?α T(?β ?1) bracketrightbigg ? bracketleftBigg 1 V 12d σεintegraltext10 Bd(r)dr V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr bracketrightBigg?1bracketleftBigg V 12d σεBd(1) 1 2Vdσ 2 ε[Bd(1)] 2 bracketrightBigg . Notice that bracketleftBigg 1 V 12d σεintegraltext10 Bd(r)dr V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr bracketrightBigg?1 = 1V dσ2ε( integraltext1 0 [Bd(r)] 2dr ?[integraltext1 0 Bd(r)dr] 2) bracketleftBigg Vdσ2ε integraltext10 [Bd(r)]2dr ?V 12d σεintegraltext10 Bd(r)dr ?V 12d σεintegraltext10 Bd(r)dr 1 bracketrightBigg Thus, T 12?d?α ? V 1 2 d σεBd(1){ integraltext1 0 [Bd(r)] 2dr ? 1 2Bd(1) integraltext1 0 Bd(r)dr}integraltext 1 0 [Bd(r)] 2dr ?[integraltext1 0 Bd(r)dr] 2 , 25 and T(?β ?1) ? 1 2[Bd(1)] 2 ?Bd(1)integraltext1 0 Bd(r)drintegraltext 1 0 [Bd(r)] 2dr ?[integraltext1 0 Bd(r)dr] 2 . This complete the proof of item (d) and (e). To prove item (f) and (g), we notice that when d < 0, summationtextTt=1 yt?1ut is Op(T). We define another rescaling matrix ?T = bracketleftbigg T?1 2+d 0 0 T?1 bracketrightbigg . Multiplying ΥT and ?T on (A.3) to get ΥT bracketleftbigg ?α ?β ?1 bracketrightbigg = ΥT ? ?? ? T Tsummationtext t=1 yt?1 Tsummationtext t=1 yt?1 Tsummationtext t=1 y2t?1 ? ?? ? ?1 ΥTΥ?1T ??1T ?T ? ?? ? Tsummationtext t=1 ut Tsummationtext t=1 yt?1ut ? ?? ?. (A.5) Substitute the results of Lemma(2) to (A.5), we establish bracketleftbigg T 1 2+d?α T1+2d(?β ?1) bracketrightbigg ? bracketleftBigg 1 V 12d σεintegraltext10 Bd(r)dr V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr bracketrightBigg?1bracketleftbigg 0 ?12σ2u bracketrightbigg . Consequently, T 12+d?α ? 1 2σ 2 u integraltext1 0 Bd(r)dr V 12d σε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2} , T1+2d(?β ?1) ?? 1 2σ 2 u Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}. This complete the proof of item (f) and (g). Proofs of item (h) and (i) is similar with item (d) and (e) or (f) and (g). They are omitted here. The results are the same with Phillips and Perron (1988). Proof of Theorem 2. By Theorem 1, Lemma 2(e) and Phillips (1987)’s Theorem 4.2, we have as T →∞, for d < 0, then T1+2d(?β ?1) ??12 σ 2 u Vdσ2ε integraltext10 [Bd(r)]2dr, T1+2d(?β ?1) ??12 σ 2 u Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}, 26 ?σ2u p?→ σ2 and s2Tl p?→ σ2ε. The results of (a) and (b) now follow directly by the continuous mapping the- orem. Proof of Theorem 3. Under the alternative hypothesis, d1 = d0 ?δ < 0, then follows the results of Theorem 2, we have s2Tl(d1) ?σ2u T 1+2d1(?β ?1) ??1 2 1 Vd1 integraltext10 [Bd1(r)]2dr, (A.6) where ?σ2u = T?1 Tsummationdisplay t=1 (yt ?yt?1)2, (1?L)d1ut = ?εt, and s2Tl(d1) = T?1 Tsummationdisplay t=1 ?ε2t + 2T?1 lsummationdisplay τ=1 ωτl Tsummationdisplay t=τ+1 ?εt?εt?τ p?→ σ2 ε Multiplying s 2 Tl(d0) s2Tl(d0) on (A.6), we get s2Tl(d0) ?σ2u s2Tl(d1) s2Tl(d0)T 1+2d0?2δ(?β ?1) ??1 2 1 Vd1 integraltext10 [Bd1(r)]2dr, or Z(d0) ??12 1V d1 integraltext1 0 [Bd1(r)] 2dr s2Tl(d0) s2Tl(d1) ·T 2δ. We now show that s2Tl(d0) is not a consistent estimator for σ2ε when d1 = d0 ?δ, δ negationslash= 0. Since it is true that (1?L)d1ut = εt, we have (1?L)d0?δut = εt or (1?L)d0ut = (1?L)δεt. (A.7) To construct s2Tl(d0), we must fractionally difference (1?L)d0 on ut to get (1?L)d0ut = ?ε0t = (1?L)δεt, by the results of (A.7). Therefore (1?L)?δ?ε0t = εt, that is ?ε0t is I(?δ). According to Lee and Schmidt (1996, p.291, Theorem 3), s2Tl(d0) p?→ l?2δσ2εV?δ. 27 Therefore, Z(d0) ? ?12 1V d1 integraltext1 0 [Bd1(r)] 2drT 2δl ?2δσ2 εV?δ σ2ε = ?12V?δV d1 1integraltext 1 0 [Bd1(r)] 2dr( T l ) 2δ. This complete the proof of part (a). To prove part (b), it is easy to establish the same conclusions for the Zμ(d) test as were given for the Z(d) test in Theorem 3 above. All that is necessary to replace integraltext10 [Bd1(r)]2dr in Theorem 3 above with integraltext10 [Bd1(r)]2dr ?[integraltext10 Bd1(r)dr]2. Proof of Corollary 1. Under the alternative hypothesis d1 = 0, by the results of Theorem 1(c), T(?β ?1) ? 12 {[B(1)]2 ? σ2uσ2 ε } integraltext1 0 [B(r)] 2dr . (A.8) Multiplying s 2 Tl(d0) ?σ2u T 2d0 on (A.8), for d0 < 0, we get Z(d0) ? 12 {[B(1)]2 ? σ2uσ2 ε } integraltext1 0 [B(r)] 2dr s2Tl(d0) ?σ2u T 2d0. Since ?σ2u = T?1summationtextTt=1 ut p?→ σ2u, but (1?L)d0ut = ?ε0t, or (1?L)?d0?ε0t = ut = εt, that is ?ε0t = I(?d0). Therefore, s2Tl(d0) p?→ l?2d0σ2εV?2d0. Consequently, we get Z(d0) ? 12 {[B(1)]2 ? σ2uσ2 ε } integraltext1 0 [B(r)] 2dr σ2εV?2d0 σ2u ( T l ) 2d0, which converge to 0 as T →∞, l →∞ but Tl →∞ for d0 < 0. This completes the proof of Corollary 1. 28 References Agiakloglou, C., Newbold, P., Wohar, M., 1993. Bias in an estimator of the frac- tional difference parameter. Journal of Time Series Analysis 14, No. 3, 235- 246. Akonom, J., Gourieroux, C., 1987. A functional central limit theorem for fractional processes. Technical Report #8801, CEPREMAP, Paris. Baillie, R.T., 1996. Long memory processes and fractional integration in econo- metrics. Journal of Econometrics 73, 5-59. Chung, C.F., 2002. Sample means, sample autocovariances, and linear regression of stationary multivariate long memory processes. Econometric Theory 18, 51-78. Chung, C.F., Baillie, R.T., 1993. Small sample bias in conditional sum of squares estimators of fractionally integrated ARMA models. Empirical Economics 18, 791-806. Chung, C.F., Schmidt, P., 1995. The minimum distance estimator for fractionally integrated ARMA processes. Econometrics and Economics Theory Paper, No. 9408, Michigan State University. Davydov, Y.A., 1970. The invariance principle for stationary processes. Theory of Probability and Its Applications 15, 487-489. Davidson, J., 2002. Establishing conditions for the functional central limit theorem in nonlinear and semiparametric time series processes. Journal of Economet- rics 106, 243-269. Davidson, J., De Jong, R.M., 2000. The functional central limit theorem and weak convergence to stochastic integrals II: Fractionally integrated processes. Econometric Theory 16, 643-666. Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimator for autoregressive time series with a unit root. Journal of the American Statistical Association 74, 427-431. Fox, R., Taqqu, M.S., 1986. Large sample properties of parameter estimates for strongly dependent stationary Gaussian time series. Annals of Statistics 14, 517-532. Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long mem- ory time series models. Journal of Time Series Analysis 4, 221-238. Gli-Ala?na L.A., 2000. Mean reversion in the real exchange rates. Economics Letters 69, 285-288. Gli-Ala?na L.A., Robinson P.M., 1997. Testing of unit root and other nonstationary hypotheses in macroeconomic time series. Journal of Econometrics 80, 241- 268. Granger, C.W.J., Joyeux, R., 1980. An introduction to long-memory time series models and fractional differencing. Journal of Time Series Analysis 1, 15-29. 29 Hauser, M.A., P¨otscher, B.M., Reschenhofer, E., 1999. Measuring persistence in aggregate output: ARMA models, fractionally integrated ARMA models and nonparametric procedures. Empirical Economics 24, 243-269. Hosking, J.R.M., 1981. Fractional differencing. Biometrika 68, 1, 165-176. Hosking, J.R.M., 1996. Asymptotic distributions of the sample mean, autocovari- ances and autocorrelations of long-memory time series. Journal of Economet- rics 73, 261-284. Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., Shin, Y., 1992. Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics 54, 159-178. Mandelbrot, B.B., Van Ness, J.W., 1968. Fractional Brownian motions, fractional Brownian noises and applications. SIAM Review 10, 422-437. Marinucci, D., Robinson, P.M., 1999. Alternative forms of fractional Brownian motion. Journal of Statistical Planning and Inference 80, 111-122. Marinucci, D., Robinson, P.M., 2000. Weak convergence of multivariate fractional processes. Stochastic Processes and their Applications 86, 103-120. Mayroal, L., 2001. GeneralizedminimumdistanceestimationofARFIMAprocesses with conditional heterocedastic errors. In Preparation. Mishkin, F.S., 1990. What does the term structure of interest rates tell us about future inflation? Journal of Monetary Economics 25, 77-95. Phillips, P.C.B., 1987. Time series regression with a unit root. Econometrica 55, 277-301. Phillips, P.C.B., Perron, P., 1988. Testing for cointegration using principle compo- nents methods. Journal of Economic Dynamics and Control 12, 205-230. Phillips, P.C.B., Xiao, Z., 1998. A primer on unit root testing. Journal of Economic Surveys 12, No. 5. Robinson, P.M., 1992. Semiparametric analysis of long-memory time series. Annals of Statistics 22, 515-539. Robinson, P.M., 1994. Efficient tests of nonstationary hypotheses. Journal of the American Statistical Association 89, 1420-1437. Schwert, G.W., 1987. Effects of model specification on tests for unit roots in macroeconomics data. Journal of Monetary Economics 20, 73-103. Sowell, F.B., 1990. The fractional unit root distribution. Econometrica 58, No. 2, 495-505. Sowell, F.B., 1992. Maximum likelihood estimation of stationary univariate frac- tionally integrated time series models. Journal of Econometrics 53, 165-188. Tanaka, K., 1999. The nonstationary fractional unit root. Econometric Theory, No. 4, Vol. 15. Tsay, W.-J., 2000. Long memory story of the real interest rate. Economics Letters 67, 325-330. 30