Testing for a Fractional Unit Root in Time Series
Regression
Chingnun Lee1?, Tzu-Hsiang Liao2 and Fu-Shuen Shie3
1Inst. of Economics, National Sun Yat-sen Univ., Kaohsiung, Taiwan
2Dept. of Finance, National Central Univ., Chung-Li, Taiwan
3Dept. of Finance, National Taiwan Univ., Taipei, Taiwan
Abstract
This paper presents a nonparametric test of the differencing parameter of a general
fractionally integrated process, which allows for weakly dependent and heteroge-
neously distributed innovation in the short run dynamics. It is shown that these
test statistics derived from standardized ordinary least squared estimator of a simple
autoregressive model are consistent and fare well both in terms of power and size.
The paper ends with two empirical applications.
Key Words: Unit Root, Fractional Integrated Process, Power.
JEL classification: C12, C22.
August 2004
?Corresponding author, Address: Graduate Institute of Economics
National Sun Yat-sen University
70 Leng-hai Road
Kaohsiung, Taiwan 804
Tel: 886-7-5252000 ext. 5618
Fax: 886-7-5255611
e-mail: lee econ@mail.nsysu.edu.tw
1 Introduction
It has become quite a standard practice in applied work to perform test on whether
a variable is integrated or stationary using both the null hypothesis of I(1) and I(0).
See Phillips and Xiao (1998) for an updated survey of unit root testing approaches.
However, by proceeding in this way, it is often found that both the null hypothesis
are rejected( for example, see Tsay (2000)), suggesting that many time series are
not well represented as either I(1) or I(0). In view of this outcome, the class of
fractionally integrated processes, denoted as I(d), where the order of integration d is
extended to be any real number, has proved very useful in capturing the persistence
of many long-memory processes. This procedure above raises two viewpoints on the
issues. First, the power of traditional unit roots test against fractional alternative
such as Dickey and Fuller (DF, 1979 and 1981), Phillips and Perron (PP, 1988)
and Kwiatkowski, Phillips, Schmidt and Shin (KPSS, 1992). Secondly, the power
of various technique of estimation of d both when they are applied to be the test of
unit root (d = 1) and to the inference of the true value of d.
The power of traditional unit root tests have been studied by Diebold and
Rudebusch (1991), Lee and Schmidt (1996), Kramer (1998), Dolado, Gonzalo and
Mayoral (2002) and Lee and Shie (2002). In general, they al show that these unit
roots test are consistent when the alternative is a I(d) process but their power turns
out to be quite low. In particular, this lack of power has motivated the development
of new approach to estimate the order of integration d, of a series directly, where d
does not assume a special value such as unity or zero, but is apparently arbitrary.
To inference about the fractional difference parameter d from an observed
process, there are a very large number of rather heterogeneous methods to estimate
and/or test d. In parametric methodology, 1 most methods are based on the autore-
gressive fractionally-integrated moving average (ARFIMA) process as pioneered by
Granger and Joyeux (1980) and Hosking (1981). Fox and Taqqu (1986) construct
an asymptotic approximation to the likelihood of an ARFIMA process in the fre-
1which we mean that other than d, the short-run dynamics have to be specified parametrically
to estimation or inference about d.
1
quency domain, Sowell (1992) construct the exact likelihood function of an ARFIMA
process in the time domain, and Chung and Baillie (1993) propose conditional sums
of squared (CSS) estimator for an ARFIMA process. Standard maximum likelihood
estimator (MLE) inference procedures then apply to these estimates. Besides MLE
estimator, Chung and Schmidt (1995) and Mayroal (2001) had proposed a mini-
mum distance estimator for the ARFIMA process. On the other hand of parametric
methodology, Robinson (1994) and Tanaka (1999) have proposed LM test for d in
the frequency and in the time domain, respectively. To implement these tests em-
pirically, a parametric short-run dynamic ARMA structure has to be specified. See
Gli-Ala?na and Robinson (1997) and Gli-Ala?na (2000) for example.
To inference d nonparametrically, Geweke and Porter-Hudak (GPH, 1983) sug-
gests a regression of the ordinate of the log spectral density on trigonometric function
to estimate d. Robinson (1992) consider frequency domain approach to find con-
sistent estimate of d in the absence of any parameterization of the autocovariance
function. In general the parametric methods present narrower confidence intervals
over nonparametric ones. However, a drawback is that the parameters are sensitive
to the class of models considered and may be misleading because of misspecification.
See Hauser et al. (1999).
By allowing the short-run dynamics to be weakly dependent and heteroge-
neously distributed, our aim of the present paper is to develop a new nonparametric
test for the fractional difference parameter, d for 0 < d ≤ 1, from a generally frac-
tional integrated process in the time domain. It is shown that this test statistics
derived from the simple least squares regression of an simple autoregressive model is
easy to compute and is consistent against possible alternative including I(1) process.
It also fare very well in finite samples, in terms of power and size, when compared
to other competing tests.
The rest of the paper is organized as follows. Section 2 reviews the functional
central limit theorem for a quite general fractional integrated process. In Section 3
we define our model, estimators and their limiting distributions. In Section 4, we
provide a test statistics for testing the fractional difference parameters. Simulation
2
evidence of size and power of this test is shown in Section 5. Section 6 discuss some
empirical application to the previous test. Finally, Section 7 draws some concluding
remarks.
ProofsofTheoremsandLemmaaregatheredinAppendix. Fronnowon,throughout
this paper, the following conventional notation is adopted: L is the lag operator, Γ(·)
denotes the gamma function, ? denotes weak convergence of associated probability
measures, p?→ denotes convergence in probability, [z] means the largest integer that
is smaller than or equal to z, ?d = d+1, and we let ei ~ fi denote that ei/fi → 1 as
i →∞.
2 Preliminaries
The class of I(d) process, ut is customarily written in the form
(1?L)dut = εt, (1)
innovation process ε, the fractional difference of ut, is a stationary and weakly de-
pendent process to be specified below. The process ut is covariance stationary and
ergodic for ?0.5 < d < 0.5, moreover d < 1 implies that ut is mean reverting. To
begin we must be precise about the sequence εt of allowable innovations in (1) in the
present paper. Following Phillips (1987), we assume that εt is a sequence of random
variables that satisfy the following Assumption.
Assumption 1. The sequence εt, 0 < t < ∞
(a) has zero mean; (b) is satisfying supt E|εt|2γ < ∞ for some γ > 2; (c) is covari-
ance stationary, and 0 < σ2ε < ∞ where σ2ε = limT→∞T?1summationtextTt=1summationtextTs=1 E(εtεs); (d)
is strong mixing with mixing coefficients αm that satisfy summationtext∞m=1 α1?2/γm < ∞.
Assumption 1 allow for both temporal dependence and heteroskedasticity in
the process of innovations εt. These include all Gaussian and many other stationary
finite order ARMA models under very general conditions on the underlying errors.
3
Define the variance of the partial sums of the I(d) process ut by σ2T = Var(summationtextTt=1 ut),
then we have the following functional central limit theorem that will be extensive
used in our theoretical development below. The results are due to Davidson and De
Jong (2000).
Lemma 1: Suppose (1?L)dut = εt, ?0.5 < d < 0.5 and εt satisfy Assumption 1,
then as T →∞,
(a) σ2T ~ σ2εVdT1+2d, and
(b) σ?1T summationtext[Tr]t=1 ut ? Bd(r), for r ∈ [0,1].
Here, Bd(r) is the normalized fractional Brownian motion that is defined by the
following stochastic integral: 2
Bd(r) ≡ 1
Γ(1 + d)V 12d
(
integraldisplay r
0
(r ?x)ddB(x) +
integraldisplay 0
?∞
[(r ?x)d ?(?x)d]dB(x)), (2)
with Vd ≡ 1Γ(1+d)2( 11+2d + integraltext∞0 [(1 + τ)d ? τd]dτ) = Γ(1?2d)(1+2d)Γ(1+d)Γ(1?d) and B(r) is the
standard Brownian motion.
This type of fractional Brownian motion is so defined as to make EBd(1)2 = 1.
Fractional Brownian motion differs from standard Brownian motion B(r) in having
correlated increments. See Mandelbrot and Van Ness (1968) and Marinucci and
Robinson (1999) for additional detail on the fractional Brownian motion.
Lemma 1(b) is a functional central limit theorem for a generally fractional inte-
grated process that could apply to a large class of fractionally integrated process in-
cluding the well-known ARFIMA (p,d,q) process. See for example, Davydov (1970),
Akonom and Gourieroux (1987), Hosking (1996), Marinucci and Robinson (2000)
and Chung (2002).
2The original definition of fractional Brownian motion shown in Sowell (1990) is Bd(r) = 1/Γ(1+
d)integraltextr0 (r ?x)ddB(x). However, Marinucci and Robinson (1999) show that it require correction by
replacing it with the definition of fractional Brownian motion as in (2).
4
3 The model and estimators
3.1 A generally nonstationary I(d) process
Let yt be a nonstationary fractional difference process generated by
(1?L)?dyt = εt, (3)
where 0.5 < ?d < 1.5, and εt, the fractional difference of yt, satisfy Assumption 1.
The process yt can be also represented equivalently as
yt = βyt?1 + ut; (4)
β = 1 and (1?L)dut = εt, (5)
where ?d = 1 + d, and ?0.5 < d < 0.5. Initial condition for (3) are set at t = 0 and
y0 = 0. We consider the two least-squares regression equations:
yt = ?βyt?1 + ?ut, (6)
yt = ?α + ?βyt?1 + ?ut, (7)
where ?β and (?α, ?β) are the conventional least-square regression coefficients. We shall
be concerned with the limiting distribution of the regression coefficients in (6) and
(7) under the hypothesis that the data are generated by (3) or equivalently by (4)
and (5). Thus for the null values ?d = ?d0, it will become β = 1, α = 0, and d = d0.
Under (4) and (5), sample moments of yt and ut that are useful to derive the OLS
estimator are collected in the following lemma.
Lemma 2. As T →∞, then
(a) T?12?d
Tsummationtext
t=1
ut ? V 12d σεBd(1),
(b) T?2?2d
Tsummationtext
t=1
y2t?1 ? Vdσ2ε integraltext10 [Bd(r)]2dr,
(c) T?1?2dy2T ? Vdσ2ε[Bd(1)]2,
(d) T?32?d
Tsummationtext
t=1
yt?1 ? V 12d σεintegraltext10 Bd(r)dr,
5
(e) T?1
Tsummationtext
t=1
u2t p?→ σ2u = E(u2t),
(f) T?1?2d
Tsummationtext
t=1
yt?1ut ? 12Vdσ2ε[Bd(1)]2 if d > 0,
(g) T?1
Tsummationtext
t=1
yt?1ut p?→?12σ2u if d < 0,
(h) T?32?d
Tsummationtext
t=1
tut ? V 12d σε[Bd(1)?integraltext10 Bd(r)dr],
(i) T?52?d
Tsummationtext
t=1
tyt?1 ? V 12d σεintegraltext10 rBd(r)dr,
(j) T?3?2d
Tsummationtext
t=1
ty2t?1 ? Vdσ2ε integraltext10 r[Bd(r)]2dr.
Joint weak convergence for the sample moments give above to their respective
limits is easily established by Cramer-Wald Theorem and will be used below.
3.2 Limiting Distributions of the Statistics
In this section we characterize the limiting distribution of the coefficient estimator
?β, ?α and ?β under the maintained hypothesis that the time series yt is generated by
(4) and (5).
Theorem 1.
For the regression model (6), then as T →∞:
(a) T(?β ?1) ? 12 [Bd(1)]
2
integraltext1
0 [Bd(r)]
2dr, when d > 0;
(b) T1+2d(?β ?1) ??12 σ
2
u
Vdσ2ε integraltext10 [Bd(r)]2dr, when d < 0; and
(c) T(?β ?1) ? 12
{[B(1)]2 ? σ2uσ2
ε
}
integraltext1
0 [B(r)]
2dr , when d = 0.
For the regression model (7), then as T →∞:
when d > 0, then
(d) T 12?d?α ? V
1
2
d σεBd(1){
integraltext1
0 [Bd(r)]
2dr ? 1
2Bd(1)
integraltext1
0 Bd(r)dr}integraltext
1
0 [Bd(r)]
2dr ?[integraltext1
0 Bd(r)dr]
2 , and
6
(e) T(?β ?1) ?
1
2[Bd(1)]
2 ?Bd(1)integraltext1
0 Bd(r)drintegraltext
1
0 [Bd(r)]
2dr ?[integraltext1
0 Bd(r)dr]
2 ;
when d < 0, then
(f) T 12+d?α ? 12 σ
2
u
integraltext1
0 Bd(r)dr
V 12d σε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}
, and
(g) T1+2d(?β ?1) ??12 σ
2
u
Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2};
when d = 0, then
(h) T 12 ?α ?
σε{B(1)integraltext10 [B(r)]2dr ? 12{[B(1)]2 ? σ2uσ2
ε
}integraltext10 B(r)dr}
integraltext1
0 [B(r)]
2dr ?[integraltext1
0 B(r)dr]
2 , and
(i) T(?β ?1) ?
1
2{[B(1)]
2 ? σ2u
σ2ε}?B(1)
integraltext1
0 B(r)drintegraltext
1
0 [B(r)]
2dr ?[integraltext1
0 B(r)dr]
2 .
We first discuss the results from model (6). The convergence rates of (?β ?1)
depend intrinsically on the degree of fractional difference in the ut process. The
distribution of Tmin[1,1+2d](?βT ? 1) is therefore called a generalized fractional unit
root distribution. This fact is also discussed in Sowell (1990) and Tanaka (1999,
Corollary 2.4) where εt in (3) is assumed to be i.i.d. and to be infinite order moving
average process respectively. It may be easily illustrated that when the innovation
process εt is i.i.d.(0,σ2), we have σ2u = [Γ(1 ? 2d)/Γ(1 ? d)2]σ2, 3 leading to the
following simplification of part (b) and (c) of Theorem 1:
T1+2d(?β ?1) ??
(12 + d)Γ(1+d)Γ(1?d)
integraltext1
0 [Bd(r)]
2dr, when d < 0; (8)
and
T(?β ?1) ?
1
2{[B(1)]
2 ?1}
integraltext1
0 [B(r)]
2dr , when d = 0. (9)
Result (8) was first given by Sowell (1990) and result (9) was given by Dickey
3See for example Baillie (1996).
7
and Fuller (1979). Theorem 1 therefore extends (8) and (9) to the very general case
of weakly dependent distributed data after difference-d times.
It is interesting to note that when d > 0, the assumption on εt did not play
any role in determining this limiting distribution. It converges to the same distrib-
ution as that of Sowell (1990) and Tanaka (1999). When d < 0, the distribution of
T1+2d(?β?1) has the same general form for a very wide class of the innovation process
εt. It reduces to be the distribution of Phillips (1987, Theorem 3.1, (c)) when d = 0.
Similar conclusion applies to the results from model (7). The simplification of part
(g) of Theorem 1 when εt is i.i.d.(0,σ2) is:
T1+2d(?β ?1) ??
(12 + d)Γ(1+d)Γ(1?d)
integraltext1
0 [B(r)]
2dr ?[integraltext1
0 B(r)dr]
2. (10)
4 Statistical Inference of the Fractional Differ-
ence Parameter
4.1 Test for 0.5 < ?d < 1
The limiting distribution of the regression coefficients when ?0.5 < d < 0 (0.5 <
d < 1) given in last section depend upon the nuisance parameter σ2u and σ2ε. These
distributions are therefore not directly usable for statistical testing. However, since
σ2u and σ2ε may be consistently estimated and the estimate may be used to construct
modified statistics whose limiting distribution are independent of (σ2u,σ2ε), there exist
simple transformation of the test statistics which eliminate the nuisance parameters
asymptotically.
This idea was first developed by Phillips (1987) and Phillips and Perron (1988)
in the context of test for a unit root. Here we show how the similar procedure may
be extended to apply to test for the fractional difference parameter value in a quite
generally fractional integrated process. First due to the ergodicity assumption of
ut, consistent estimation of σ2u are provided by ?σ2u = T?1summationtextTt=1(yt ?yt?1)2 for data
8
generated by (4) and (5). Since ?β and (?α,?β) are consistent by Theorem 1, we may
also use ?σ2u = T?1summationtextTt=1(yt ? ?βyt?1)2 and ?σ2u = T?1summationtextTt=1(yt ? ?α ? ?βyt?1)2 as a
consistent estimator of σ2u for model (6) and (7), respectively.
Consistent estimation of σ2ε can be in the same spirit with that of Phillips
and Perron (1988) by the following simple estimator based on truncated sample
autocovariance, namely:
s2Tl = T?1
Tsummationdisplay
t=1
?ε2t + 2T?1
lsummationdisplay
τ=1
wτl
Tsummationdisplay
t=τ+1
?εt?εt?τ, (11)
where ?εt = (1?L)d(yt?yt?1) = (1?L)dut and wτl = 1?τ/(l+1). We may also use
?εt = (1?L)d(yt ? ?βyt?1) and ?εt = (1?L)d(yt ? ?α? ?βyt?1) as alternative estimate
to ?εt in the construction of s2Tl. Conditions for the consistency of s2Tl are and ex-
plored by Phillips (1987, Theorem 4.2). We now define some simple transformation
of conventional test statistics from the regression (6) and (7) which eliminate the
nuisance parameter dependencies asymptotically. Specifically, we define
Z(d) = s
2
Tl
?σ2u T
1+2d(?β ?1), (12)
and
Zμ(d) = s
2
Tl
?σ2u T
1+2d(?β ?1). (13)
Z(d) is the transformation of the standard estimator T1+2d(?β ?1) and Zμ(d) is the
transformation of T1+2d(?β?1). The limiting distribution of Z(d) and Zμ(d) is given
by:
Theorem 2: Assume that l = o(T 12), then as T →∞,
Z(d0) ??12 1V
d0
integraltext1
0 [Bd0(r)]
2dr,
and
Zμ(d0) ??12 1V
d0(
integraltext1
0 [Bd0(r)]
2dr ?[integraltext1
0 Bd0(r)dr]
2),
9
under the null hypothesis that yt is I(?d0) process as in (3).
Theorem 1 demonstrates that the limiting distribution of the two statistics
Z(d0) and Zμ(d0) are invariant within a very wide class of weakly dependent and
possible heterogeneously distributed innovation εt.
Table1givesthecriticalvaluesof?12 1V
d0
CA 1
0 [Bd0(r)]2dr
and?12 1V
d0(
CA 1
0 [Bd0(r)]2dr?[
CA 1
0 Bd0(r)dr]2)
,
calculated via a direct simulation from a simple transformation of (8) and (10), using
a sample size of 500, and 10000 replication. The calculations were done in GAUSS
using the normal random number. Observation on a I(d) process for d ∈ (?0.5,0)
were generated using the Durbin-Levinson algorithm. 4
4See for example, Brockwell and Davis (1991), Section 5.2.
10
Table 1.
Critical values for the Z(d) and Zμ(d) statistics
Percentiles of the distribution of ?12 1V
d
CA 1
0 [Bd(r)]2dr
T = 500
d 1% 2.5% 5% 10% 90% 95% 97.5% 99%
?0.05 ?9.238 ?7.238 ?5.736 ?4.212 0.244 0.406 0.540 0.708
?0.10 ?6.912 ?5.552 ?4.444 ?3.336 ?0.036 0.072 0.145 0.220
?0.15 ?5.198 ?4.238 ?3.461 ?2.637 ?0.124 ?0.048 0.001 0.044
?0.20 ?3.825 ?3.201 ?2.688 ?2.132 ?0.157 ?0.093 ?0.061 ?0.028
?0.25 ?2.601 ?2.229 ?1.893 ?1.552 ?0.153 ?0.104 ?0.075 ?0.053
?0.30 ?1.840 ?1.583 ?1.380 ?1.169 ?0.143 ?0.100 ?0.080 ?0.058
?0.35 ?1.229 ?1.093 ?0.981 ?0.843 ?0.118 ?0.088 ?0.069 ?0.052
?0.40 ?0.831 ?0.736 ?0.663 ?0.582 ?0.093 ?0.070 ?0.055 ?0.043
?0.45 ?0.522 ?0.473 ?0.436 ?0.388 ?0.072 ?0.053 ?0.043 ?0.032
Percentiles of the distribution of ?12 1V
d(
CA 1
0 [Bd(r)]2dr?[
CA 1
0 Bd(r)dr]2)
T = 500
d 1% 2.5% 5% 10% 90% 95% 97.5% 99%
?0.05 ?14.591 ?12.281 ?10.423 ?8.538 ?1.064 ?0.636 ?0.303 0.013
?0.10 ?10.622 ?8.977 ?7.843 ?6.484 ?1.083 ?0.772 ?0.560 ?0.361
?0.15 ?7.575 ?6.487 ?5.734 ?4.860 ?0.976 ?0.755 ?0.592 ?0.452
?0.20 ?5.144 ?4.487 ?3.964 ?3.429 ?0.822 ?0.650 ?0.536 ?0.426
?0.25 ?3.548 ?3.125 ?2.778 ?2.442 ?0.702 ?0.573 ?0.481 ?0.385
?0.30 ?2.347 ?2.113 ?1.917 ?1.723 ?0.569 ?0.470 ?0.398 ?0.327
?0.35 ?1.533 ?1.394 ?1.289 ?1.172 ?0.454 ?0.384 ?0.334 ?0.282
?0.40 ?0.976 ?0.899 ?0.835 ?0.769 ?0.342 ?0.297 ?0.261 ?0.221
?0.45 ?0.603 ?0.565 ?0.535 ?0.496 ?0.253 ?0.221 ?0.196 ?0.173
11
Next we consider the behavior of the test under the alternative hypothesis that
d negationslash= d0 in the following theorem.
Theorem 3. Suppose that l = o(T 12). If yt is generated by (1?L)?d1yt = εt, where
?d1 = ?d0 ?δ < 1, and δ negationslash= 0, then
Z(d0) ??12V?δV
d1
(Tl )2δ 1integraltext1
0 [Bd1(r)]
2dr,
and
Zμ(d0) ??12V?δV
d1
(Tl )2δ 1integraltext1
0 [Bd0(r)]
2dr ?[integraltext1
0 Bd0(r)dr]
2.
For δ > 0, Z(d0) →?∞ and Zμ(d0) →?∞; for δ < 0, Z(d0) → 0 and Zμ(d0) → 0.
Theorem 2 implies that the two-tailed Z(d0) and Zμ(d0) test is consistent
against I(?d1 = ?d0 ? δ) alternative for δ negationslash= 0. Obviously the lower tail test is con-
sistent against I(?d1 = ?d0 ? δ) alternative for δ > 0; while the upper tail test is
consistent against I(?d1 = ?d0 ?δ) alternative for δ < 0.
Corollary 1: Suppose that l = o(T 12). If yt is generated by (1?L)?d1yt = εt, where
?d1 = 1, then Z(d0) → 0 and Zμ(d0) → 0.
This results imply that the upper-tailed Z(d0) and Zμ(d0) are also consistent against
I(1) alternative.
4.2 Test for ?d = 1
Here, we show that our result in the last section is easily to be applied as a test of
unit root. Let the observed series yt is generated by (3) but with ?d = 1. We first
fractionally difference the series yt by the following operator (1? L)d1 to obtain a
series wt for arbitrary choice of d1 such that 0 < d1 < 0.5. In particular, (1?L)d1yt =
wt or (1 ? L)?d?d1wt = εt. When it is true ?d = 1, then applying the Z(d) and
Zμ(d) test statistics on wt would lie in the confidence interval as in Table 1 for the
corresponding choice of d1.
12
4.3 Test for 0 < ?d ≤ 0.5
It is also easily to implement the Z(d) and Zμ(d) test to test for 0 < ?d ≤ 0.5 from the
results above. Let the observed series yt is generated by (3) but with 0 < ?d ≤ 0.5.
We first fractionally difference the series yt by the following operator (1 ? L)?d?,
where the known d? → 0.5?, to obtain a series wt. In particular, (1?L)?d?yt = wt,
or (1?L)d?+?dwt = εt. Apply Z(d) and Zμ(d) test statistics on wt to inference about
d? + ?d. The value of ?d follows immediately.
5 Size and Power in finite sample
In this section, we provided some evidence on the size and power of the Z(d) and
Zμ(d) and GPH tests in finite sample. This evidence is based on simulations. Data
were generated by the model (3) for ?d = 0.9 with AR(1) error:
εt = φεt?1 + νt, (14)
where the νt independent and identically distributed N(0,1). We set y0 = 0 and
use various lag truncations l in (11) to evaluate the effects of these choice on test
performance. A Bartlett window wτl = 1?τ/(l + 1) and first difference of yt were
used in the construction variance estimates ?σ2u and s2Tl. The simulation reported in
Table 2 are based on a sample of size 500 with 10000 replication and give results for
one-side test under the null hypothesis ?d = 0.9 and for the alternative ?d = 0.7.
The results show size and power computations for six different value of φ in
(14). When φ = 0, we first observe it is in accuracy with respect to the size of Z(d)
and Zμ(d) test no matter the choice of l. GPH has lightly over-rejection. It should
be stressed that the distribution of our test under the alternative depends on l (i.e.,
on l/T) even asymptotically, so there is a clear supposition that choosing l large will
cost power, as indeed it does in our simulation. The power of the Z test decrease as
l increases, which is in accordance with the prediction of Theorem 2. Overall Zμ(d)
test has greater power than Z(d) for almost all choice of l and greater than GPH
for l less than 9. Thus, when φ = 0, Zμ(d) is the preferred test over GPH and Z(d)
when l is not larger than 10.
13
When φ < 0, the Z(d) test suffer size distortions but these are attenuated as
the lag length in l increase. GPH has the same amount of size distortion as φ = 0.
The Zμ(d) again very conservative and has higher power than the Z(d) and GPH
test for the choice l less than 10.
When φ > 0, it is now the GPH that has significant size distortions and are
too liberal to be useful for φ = 0.8. The finding is as been predicted by Agiakloglou
et al. (1993) on the theoretical ground. The Zμ(d) test is however still in good
size for moderate choice of l greater than 5. The combination of good size and
power performance of the Zμ(d) do not change for l less than 9. Schwert (1987)
suggested a criterion to choose the number of lags l: l4 = int[4(T/100)1/4] and
l12 = int[12(T/100)1/4]. For the sample size T=500 in this simulation, it corre-
sponds to l4 = 5 and l12 = 15. Our simulation results therefore suggest that using
l4 in Zμ(d) test is preferred.
6 Size and Power in finite sample
In this section, we provided some evidence on the finite sample performance of the
Z(d) and Zμ(d) tests proposed in this study for testing of fractional difference para-
meter and unit root. This evidence is based on simulations. Throughout simulation,
we fix the sample size at T=500, the number of replication at 1000 and the signifi-
cance level at nominal 5 percent based on Table 1. In the course of the simulation
the behavior of the Phillips and Perron’s test– pp (corresponding to no constant
regression model as Z(d) ) and ppμ (corresponding to regression with constant as
Zμ(d) ), and GPH are also examined for they being the non-parametric counterpart
of the present tests. We use first difference of yt in the construction variance esti-
mates ?σ2u and s2Tl. Also we use various lag truncations l in the Bartlett window as
(11) to evaluate the effects of these choice on test performance of our tests. The
same number of l is also used in the PP test. However, it should be stressed that
the distribution of our test under the alternative depends on l (i.e., on l/T) even
asymptotically, so there is a clear supposition that choosing l large will cost power
14
as indicated by Theorem 3. Therefore we report the power performance of our tests
below from the minimun choice of l that has acceptable size.
They data generating model employed here are
Model A: (1?L)?dyt = νt;
and
Model B: (1?L)?dyt = εt; where εt = φεt?1 + νt.
Hereνt isindependentlyandidenticallydistributedasN(0,1)andwesety0 = 0.
For testing the unit tests hypothesis, proposal from our test suggesting we first
differencing yt by the operator, (1?L)d1. We choose d1=0.1, 0.2,0.3 and 0.4 to get
w1t, w2t, w3t and w4t respectively.
We first consider the size of the test under various null hypothesis of ?d0 in
model A. In fact under model A of iid error, there is no need to transformation
leading to Z(d) and Zμ(d) tests. However we gather from Table 2 that these tests
has little loss in accuracy with respect to the size except for Zμ(d) with d1 being 0.1
as a test for unit root where there always being over rejection.
We next consider the size of the tests in the presence of autocorrelated errors
of model B. Table 3 presents our simulation results given the size of the test. The
test reject too often for φ < 0 and too seldom for φ > 0 when l = 0. However, these
size distortions are rapidly removed in the increase of l. For φ = ±0.2 and ±0.5,
the size problem is attentinued for the choice of l less than 5 except for Zμ(d) with
d1 being 0.1. For φ = ±0.8, we see from Table * that when φ = +0.8, correct size
is attenable as we choose large l. However, when φ = ?0.8, increase of l can’t ease
the size problem when the null hypothesis is H0 : ?d = 0.6 and 0.7 and the case when
the null hypothesis is H0 : ?d = 1 with d1 = 0.3 and 0.4.
Finally, we consider the power of the tests. Let us first deal with Model A.
Table 4 and Table 5 report percentage power of various results for H0 : ?d = 1
against H1 : ?d < 1 from two different choice of l, whereas Table 6 present results
for H0 : ?d = 0.9 against H1 : ?d < 0.9. Note that the column entries corresponding
to ?d = 1 are type I error in Table 4. The power of theZ(d) and Zμ(d)test decrease
15
as l increases, which is in accordance with the prediction of Theorem 2. Choice of
d1 to be 0.3 or 0.4 is preferred in term of size and power. However, overall for the
results in Table 4 and 5 shows that as a test for unit root, the Z(d) test behaves
better than pp; Zμ(d) and ppμ behave similar; GPH tests has least power. Table 6
show that the Zμ(d) test clearly dominate the others test in term of power when it
is applied to be the test of fractional parameter in model A.
Table 7 and 8 are concerned with Model B, with AR coefficients of φ = 0.2.
Table 7 represent results for H0 : ?d = 1 against ?d < 1, whereas Table 8 presents
results for H0 : ?d = 0.9 against H1 : ?d < 0.9. Table 9 and 10 deal with model B with
φ = 0.5 and present results for H0 : ?d = 1 and for H0 : ?d = 0.9, respectively. Finally,
Table 11 and 12 are concerned with the case when φ = ?0.2. The general feature
of Table 7-12 is that given the accepted size, Z(d) test is clearly powerful than PP
and GPH when it is applied to be a test of unit root and Zμ(d) be outperformed to
GPH when it is applied to be the test for the fractional differenced parameter.
16
Table 2.
Size and power for the GPH, Z(d) and Zμ(d) statistics
(a) GPH test*
Size (?d = 0.7) Power (?d = 0.4)
φ μ = 0.55 μ = 0.575 μ = 0.6 μ = 0.55 μ = 0.575 μ = 0.6
0.0 0.070 0.070 0.072 0.668 0.736 0.806
0.2 0.071 0.073 0.079 0.651 0.716 0.778
0.5 0.097 0.108 0.133 0.568 0.603 0.635
0.8 0.412 0.561 0.733 0.138 0.097 0.057
?0.2 0.068 0.069 0.073 0.677 0.745 0.819
?0.5 0.067 0.069 0.072 0.681 0.753 0.827
?0.8 0.065 0.067 0.071 0.684 0.753 0.828
* μ means the values use for the sample size function n = Tμ where
n is the choice of the number of low-frequency ordinates, used in
the GPH spectral regression.
(b) Z(d) test
Size (?d = 0.7)
φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15
0.0 0.056 0.056 0.049 0.055 0.054 0.052 0.052 0.049
0.2 0.038 0.046 0.047 0.049 0.050 0.049 0.050 0.045
0.5 0.012 0.030 0.035 0.047 0.050 0.051 0.051 0.044
0.8 0.014 0.024 0.031 0.044 0.050 0.054 0.060 0.063
?0.2 0.089 0.069 0.056 0.059 0.058 0.056 0.056 0.052
?0.5 0.140 0.104 0.078 0.072 0.068 0.066 0.063 0.063
?0.8 0.271 0.155 0.129 0.125 0.113 0.107 0.101 0.105
17
(continued)
Power (?d = 0.4)
φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15
0.0 0.955 0.863 0.747 0.616 0.484 0.348 0.220 0.000
0.2 0.904 0.826 0.728 0.622 0.519 0.415 0.309 0.006
0.5 0.631 0.680 0.643 0.588 0.523 0.455 0.387 0.078
0.8 0.004 0.098 0.209 0.273 0.299 0.309 0.302 0.192
?0.2 0.977 0.889 0.756 0.597 0.416 0.242 0.127 0.000
?0.5 0.986 0.918 0.757 0.496 0.209 0.083 0.042 0.001
?0.8 0.912 0.613 0.224 0.091 0.067 0.060 0.059 0.011
(c) Zμ(d) test
Size (?d = 0.7)
φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15
0.0 0.053 0.044 0.040 0.029 0.024 0.021 0.018 0.017
0.2 0.015 0.035 0.036 0.033 0.029 0.025 0.025 0.025
0.5 0.002 0.011 0.021 0.028 0.034 0.035 0.037 0.038
0.8 0.000 0.002 0.007 0.015 0.028 0.044 0.057 0.062
?0.2 0.109 0.054 0.037 0.024 0.016 0.013 0.011 0.011
?0.5 0.164 0.063 0.023 0.009 0.007 0.004 0.004 0.003
?0.8 0.021 0.004 0.002 0.002 0.002 0.003 0.002 0.003
Power (?d = 0.4)
φ l = 1 l = 3 l = 5 l = 7 l = 9 l = 11 l = 13 l = 15
0.0 0.999 0.973 0.774 0.381 0.097 0.015 0.003 0.000
0.2 0.993 0.949 0.793 0.525 0.254 0.091 0.021 0.006
0.5 0.662 0.764 0.697 0.573 0.429 0.278 0.162 0.078
0.8 0.000 0.033 0.119 0.189 0.221 0.229 0.217 0.192
?0.2 1.000 0.982 0.698 0.170 0.016 0.002 0.000 0.000
?0.5 1.000 0.980 0.300 0.014 0.001 0.001 0.001 0.001
?0.8 0.429 0.026 0.007 0.007 0.007 0.008 0.010 0.011
18
7 Empirical Application
In order to provide some empirical illustrations of how the testing and estimation
methods proposed in this paper can be applied in practice, we have examined two
series for which evidence of fractional integration has been found before in the liter-
ature by parametric approach. More specifically, the first one is the US real interest
rate series whereas the second one is the US unemployment rate series in the Nelson
and Plosser’s (1982) data set. Findings from size and power performance of test
statistics in finite sample from simulation of last section, a left-tail Zμ(d) test will
be employed in the following empirical application for different hypothesized value
of ?d, from 0.95 with 0.05 decrements and a given d?=0.49. Schwert’s criterion to
choose the number of lags (l): l4 = int[4(T/100)1/4] is conducted.
7.1 Real Interest rate data
We consider the behavior of the Mishkin (1990)’s ex post real interest rate data
analyzed by Tsay (2000). They are monthly data from January 1953 to December
1990 and its length is 456 .
Table 3.a resumes the results of Z(d). The basic findings is that the US real
interest rate series for this period is well characterized by a I(d) process with d
around 0.65. The value d inferenced from our procedure is close to the 0.666 that
are obtained by Tsay (2000) who uses Chung and Baillie (1993)’s CSS approach on
a gaussian ARFIMA (1,d,1) model. In view of this result, the conclusion that can
be drawn is that shock does not have a permanent effect on real interest rate but it
took a long while for its dying out.
7.2 Unemployment rate data
Finally, we examine the US unemployment rate data that is most controversy about
its degree of integration in the extended version of Nelson and Plosser’s data set. The
data is annual, from 1891 to in 1988 and has been transformed to natural logarithms.
This series has been analyzed, among others, by Gil-Ala?na and Robinson (1997) and
Dolado, Gonzalo and Mayoral (2002). By applying Robinson’s LM test, Gil-Ala?na et
19
al. find evidence of fractional integration in unemployment rate series. Although the
estimated value of d vary substantially across different models for the disturbance
εt, overall, they conclude that the unemployment rate seems close to stationarity.
Recently, Dolado et al. employ their Wald-type fractional Augmented Dickey-Fuller
test on the same series and obtain an estimated value of d to be 0.412.
Table 3.b reports the results of Zμ(d) and we find that the US unemployment
rate series for this period is well characterized by a I(d) process with d around 0.46,
lying in the stationary range, although close to the non-stationary boundary. The
estimated value of d is in large agrees with previous studies.
20
Table 3.a
Zμ(d) test on Mishikin’s data
H0 : ?d = ?d0
H1 : ?d < ?d0
Hypotheses Test CriticalStatistics Values
H0 : ?d = 0.95
H1 : ?d < 0.95 Zμ(d) =?34.514* ?10.423
H0 : ?d = 0.90
H1 : ?d < 0.90 Zμ(d) =?18.974* ?7.843
H0 : ?d = 0.85
H1 : ?d < 0.85 Zμ(d) =?10.495* ?5.734
H0 : ?d = 0.80
H1 : ?d < 0.80 Zμ(d) = ?5.844* ?3.964
H0 : ?d = 0.75
H1 : ?d < 0.75 Zμ(d) = ?3.279* ?2.778
H0 : ?d = 0.70
H1 : ?d < 0.70 Zμ(d) = ?1.856 ?1.917
H0 : ?d = 0.65
H1 : ?d < 0.65 Zμ(d) = ?1.061 ?1.289
* significance at 5% level
21
Table 3.b
Zμ(d) test on unemployment rate data
H0 : ?d = ?d0
H1 : ?d < ?d0
Hypotheses Test CriticalStatistics Values
H0 : ?d = 0.95
H1 : ?d < 0.95 Zμ(d) =?14.265* ?10.423
H0 : ?d = 0.90
H1 : ?d < 0.90 Zμ(d) = ?9.601* ?7.843
H0 : ?d = 0.85
H1 : ?d < 0.85 Zμ(d) = ?6.536* ?5.734
H0 : ?d = 0.80
H1 : ?d < 0.80 Zμ(d) = ?4.508* ?3.964
H0 : ?d = 0.75
H1 : ?d < 0.75 Zμ(d) = ?3.158* ?2.778
H0 : ?d = 0.70
H1 : ?d < 0.70 Zμ(d) = ?2.253* ?1.917
H0 : ?d = 0.65
H1 : ?d < 0.65 Zμ(d) = ?1.642* ?1.289
H0 : ?d = 0.60
H1 : ?d < 0.60 Zμ(d) = ?1.225* ?0.835
H0 : ?d = 0.55
H1 : ?d < 0.55 Zμ(d) = ?0.939* ?0.535
H0 : ?d + 0.49 = ?d0
H1 : ?d + 0.49 < ?d0
H0 : ?d = 0.95
H1 : ?d < 0.95 Zμ(d) =?5.183 ?10.423
* significance at 5% level
22
8 Conclusion
In this paper, we have considered a test of the fractional difference parameter of the
generally fractional-integrated process. The test is based on the standardized OLS
estimation in a very simple regression model. The main technical innovation of this
paper is the allowance made for error autocorrelation. Correspondingly, the main
practical difficulty in performing the test is the estimation of the long-run variance.
Our autocorrelation correction is similar to the Phillips and Perron correction for
the unit roots test. This is a plausible approach to consider because it does not rely
on the distribution and parametric assumption on the innovation. It avoids both the
computational difficulties associated with the normal MLE and possibility of model
misspecification from parameterization of the short run behavior.
Monte Carlo experiment throughout the paper support the analytical results
and show that the proposed tests behave reasonable well in finite samples. Further,
several empirical illustrations of how to use and interpret these tests are provided.
The tests are intended to complement unit root tests, such as the Phillips-
Perron tests and the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS, 1992) test.
After rejecting the unit root hypothesis and the stationary hypothesis, we can per-
form the test proposed in this paper to inference the value of possible fractional
difference parameter d.
Future extension is necessary to derive the appropriate asymptotic theory of
the proposed test statistics when d = ?1/2 (?d = 1/2). However there are difficulties
with this extension in deriving the functional central limiting theorem in Theorem
1. The limiting sample path are not continuous in this condition d = ?1/2. See
Davidson and De Jong (2000). A modified weak convergence results of Theorem 1
is called for.
23
Appendix
Proof of Lemma 1.
This is a direct application of the results of Functional Central Limits Theo-
rem of Davidson and De Jong (2000, Theorem 3.2) by replacing a weaker condition
in their Assumption on εt with the stronger condition in Assumption 1 here in this
paper. See Davidson (2002) for the relationship between these two assumptions.
Proof of Lemma 2.
The proofs of items (a)~(d) are straight forward application of continuous
mapping theorem to Lemma 1(b)’s results. They are omitted here. Item (e) is due
to ergodicity of the stationary process ut.
To prove item (f) and (g), we first recall that y0 = 0, then we establish
that summationtextTt=1 yt?1ut = 12y2T ? 12 summationtextTt=1 u2t. From Lemma 2(c) and (e), we know that
y2T is Op(T1+2d) and summationtextTt=1 u2t is Op(T), then summationtextTt=1 yt?1ut would be Op(κ), where
κ = max(T1+2d,T). Therefore for d > 0, T?1?2dsummationtextTt=1 yt?1ut ? T?1?2d12y2T ?
1
2Vdσ
2
ε[Bd(1)]
2, and for d < 0, T?1summationtextT
t=1 yt?1ut ??
1
2
summationtextT
t=1 u
2
t
p?→?1
2σ
2
u.
To prove item (h), we first observe that summationtextTt=1 yt?1 = summationtextTt=1 Tut ?summationtextTt=1 tut, orsummationtext
T
t=1 tut = T
summationtextT
t=1 ut ?
summationtextT
t=1 yt?1. Therefore T
?32?dsummationtextT
t=1 tut = T
?12?dsummationtextT
t=1 ut ?
T?32?dsummationtextTt=1 yt?1. By Lemma 2(a) and (d), we have T?32?dsummationtextTt=1 tut ? V 12d σεBd(1)?
V 12d σεintegraltext10 Bd(r)dr.
To prove item (i) and (j), we denote XT(r) = summationtext[Tr]t=1 ut, for r ∈ [0,1], then as
T →∞, the following results hold.
T?1
Tsummationdisplay
t=1
t
T yt?1 =
integraldisplay 1
0
rXT(r)dr, (A.1)
and
T?1
Tsummationdisplay
t=1
t
T y
2
t?1 =
integraldisplay 1
0
rX2T(r)dr. (A.2)
The result of item (i) follows immediately from (A.1) and Lemma 2(d). Similarly,
the result of item (j) follows immediately from (A.2) and Lemma 2(b).
24
Proof of Theorem 1.
We show the results of item (d)~(i) for model (7), then the results of model
(6) follows immediately. For model (7), we note that
bracketleftbigg ?α
?β ?1
bracketrightbigg
=
?
??
?
T
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
y2t?1
?
??
?
?1?
??
?
Tsummationtext
t=1
ut
Tsummationtext
t=1
yt?1ut
?
??
?. (A.3)
From Lemma 2, the order of probability of the individual term in (A.3) is as
follow,
bracketleftbigg ?α
?β ?1
bracketrightbigg
=
bracketleftbigg O
p(T) Op(T
3
2+d)
Op(T 32+d) Op(T?2?d)
bracketrightbigg?1bracketleftbigg O
p(T
1
2+d)
Op(Tmax[1+2d,1])
bracketrightbigg
.
To prove item (d) and (e), we note that when d > 0, summationtextTt=1 yt?1ut is Op(T1+2d).
We define two rescaling matrices,
ΥT =
bracketleftbigg T 1
2 0
0 T1+d
bracketrightbigg
and ?T =
bracketleftbigg T?1
2?d 0
0 T?1?2d
bracketrightbigg
.
Multiplying the rescaling matrices on (A.3), we get
ΥT
bracketleftbigg ?α
?β ?1
bracketrightbigg
= ΥT
?
??
?
T
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
y2t?1
?
??
?
?1
ΥTΥ?1T ??1T ?T
?
??
?
Tsummationtext
t=1
ut
Tsummationtext
t=1
yt?1ut
?
??
?. (A.4)
Substitute the results of Lemma(2) to (A.4), we establish
bracketleftbigg T 1
2?d?α
T(?β ?1)
bracketrightbigg
?
bracketleftBigg
1 V 12d σεintegraltext10 Bd(r)dr
V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr
bracketrightBigg?1bracketleftBigg
V 12d σεBd(1)
1
2Vdσ
2
ε[Bd(1)]
2
bracketrightBigg
.
Notice that
bracketleftBigg
1 V 12d σεintegraltext10 Bd(r)dr
V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr
bracketrightBigg?1
= 1V
dσ2ε(
integraltext1
0 [Bd(r)]
2dr ?[integraltext1
0 Bd(r)dr]
2)
bracketleftBigg
Vdσ2ε integraltext10 [Bd(r)]2dr ?V 12d σεintegraltext10 Bd(r)dr
?V 12d σεintegraltext10 Bd(r)dr 1
bracketrightBigg
Thus,
T 12?d?α ? V
1
2
d σεBd(1){
integraltext1
0 [Bd(r)]
2dr ? 1
2Bd(1)
integraltext1
0 Bd(r)dr}integraltext
1
0 [Bd(r)]
2dr ?[integraltext1
0 Bd(r)dr]
2 ,
25
and
T(?β ?1) ?
1
2[Bd(1)]
2 ?Bd(1)integraltext1
0 Bd(r)drintegraltext
1
0 [Bd(r)]
2dr ?[integraltext1
0 Bd(r)dr]
2 .
This complete the proof of item (d) and (e).
To prove item (f) and (g), we notice that when d < 0, summationtextTt=1 yt?1ut is Op(T).
We define another rescaling matrix
?T =
bracketleftbigg T?1
2+d 0
0 T?1
bracketrightbigg
.
Multiplying ΥT and ?T on (A.3) to get
ΥT
bracketleftbigg ?α
?β ?1
bracketrightbigg
= ΥT
?
??
?
T
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
yt?1
Tsummationtext
t=1
y2t?1
?
??
?
?1
ΥTΥ?1T ??1T ?T
?
??
?
Tsummationtext
t=1
ut
Tsummationtext
t=1
yt?1ut
?
??
?. (A.5)
Substitute the results of Lemma(2) to (A.5), we establish
bracketleftbigg T 1
2+d?α
T1+2d(?β ?1)
bracketrightbigg
?
bracketleftBigg
1 V 12d σεintegraltext10 Bd(r)dr
V 12d σεintegraltext10 Bd(r)dr Vdσ2ε integraltext10 [Bd(r)]2dr
bracketrightBigg?1bracketleftbigg
0
?12σ2u
bracketrightbigg
.
Consequently,
T 12+d?α ?
1
2σ
2
u
integraltext1
0 Bd(r)dr
V 12d σε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}
,
T1+2d(?β ?1) ??
1
2σ
2
u
Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2}.
This complete the proof of item (f) and (g).
Proofs of item (h) and (i) is similar with item (d) and (e) or (f) and (g). They
are omitted here. The results are the same with Phillips and Perron (1988).
Proof of Theorem 2.
By Theorem 1, Lemma 2(e) and Phillips (1987)’s Theorem 4.2, we have as
T →∞, for d < 0, then
T1+2d(?β ?1) ??12 σ
2
u
Vdσ2ε integraltext10 [Bd(r)]2dr,
T1+2d(?β ?1) ??12 σ
2
u
Vdσ2ε{integraltext10 [Bd(r)]2dr ?[integraltext10 Bd(r)dr]2},
26
?σ2u p?→ σ2 and s2Tl p?→ σ2ε.
The results of (a) and (b) now follow directly by the continuous mapping the-
orem.
Proof of Theorem 3.
Under the alternative hypothesis, d1 = d0 ?δ < 0, then follows the results of
Theorem 2, we have
s2Tl(d1)
?σ2u T
1+2d1(?β ?1) ??1
2
1
Vd1 integraltext10 [Bd1(r)]2dr, (A.6)
where ?σ2u = T?1
Tsummationdisplay
t=1
(yt ?yt?1)2,
(1?L)d1ut = ?εt,
and s2Tl(d1) = T?1
Tsummationdisplay
t=1
?ε2t + 2T?1
lsummationdisplay
τ=1
ωτl
Tsummationdisplay
t=τ+1
?εt?εt?τ
p?→ σ2
ε
Multiplying s
2
Tl(d0)
s2Tl(d0) on (A.6), we get
s2Tl(d0)
?σ2u
s2Tl(d1)
s2Tl(d0)T
1+2d0?2δ(?β ?1) ??1
2
1
Vd1 integraltext10 [Bd1(r)]2dr, or
Z(d0) ??12 1V
d1
integraltext1
0 [Bd1(r)]
2dr
s2Tl(d0)
s2Tl(d1) ·T
2δ.
We now show that s2Tl(d0) is not a consistent estimator for σ2ε when d1 =
d0 ?δ, δ negationslash= 0. Since it is true that (1?L)d1ut = εt, we have (1?L)d0?δut = εt or
(1?L)d0ut = (1?L)δεt. (A.7)
To construct s2Tl(d0), we must fractionally difference (1?L)d0 on ut to get
(1?L)d0ut = ?ε0t = (1?L)δεt,
by the results of (A.7). Therefore (1?L)?δ?ε0t = εt, that is ?ε0t is I(?δ). According
to Lee and Schmidt (1996, p.291, Theorem 3), s2Tl(d0) p?→ l?2δσ2εV?δ.
27
Therefore,
Z(d0) ? ?12 1V
d1
integraltext1
0 [Bd1(r)]
2drT
2δl
?2δσ2
εV?δ
σ2ε
= ?12V?δV
d1
1integraltext
1
0 [Bd1(r)]
2dr(
T
l )
2δ.
This complete the proof of part (a).
To prove part (b), it is easy to establish the same conclusions for the Zμ(d)
test as were given for the Z(d) test in Theorem 3 above. All that is necessary to
replace integraltext10 [Bd1(r)]2dr in Theorem 3 above with integraltext10 [Bd1(r)]2dr ?[integraltext10 Bd1(r)dr]2.
Proof of Corollary 1.
Under the alternative hypothesis d1 = 0, by the results of Theorem 1(c),
T(?β ?1) ? 12
{[B(1)]2 ? σ2uσ2
ε
}
integraltext1
0 [B(r)]
2dr . (A.8)
Multiplying s
2
Tl(d0)
?σ2u T
2d0 on (A.8), for d0 < 0, we get
Z(d0) ? 12
{[B(1)]2 ? σ2uσ2
ε
}
integraltext1
0 [B(r)]
2dr
s2Tl(d0)
?σ2u T
2d0.
Since ?σ2u = T?1summationtextTt=1 ut p?→ σ2u, but (1?L)d0ut = ?ε0t, or (1?L)?d0?ε0t = ut = εt,
that is ?ε0t = I(?d0). Therefore,
s2Tl(d0) p?→ l?2d0σ2εV?2d0.
Consequently, we get
Z(d0) ? 12
{[B(1)]2 ? σ2uσ2
ε
}
integraltext1
0 [B(r)]
2dr
σ2εV?2d0
σ2u (
T
l )
2d0,
which converge to 0 as T →∞, l →∞ but Tl →∞ for d0 < 0. This completes the
proof of Corollary 1.
28
References
Agiakloglou, C., Newbold, P., Wohar, M., 1993. Bias in an estimator of the frac-
tional difference parameter. Journal of Time Series Analysis 14, No. 3, 235-
246.
Akonom, J., Gourieroux, C., 1987. A functional central limit theorem for fractional
processes. Technical Report #8801, CEPREMAP, Paris.
Baillie, R.T., 1996. Long memory processes and fractional integration in econo-
metrics. Journal of Econometrics 73, 5-59.
Chung, C.F., 2002. Sample means, sample autocovariances, and linear regression
of stationary multivariate long memory processes. Econometric Theory 18,
51-78.
Chung, C.F., Baillie, R.T., 1993. Small sample bias in conditional sum of squares
estimators of fractionally integrated ARMA models. Empirical Economics 18,
791-806.
Chung, C.F., Schmidt, P., 1995. The minimum distance estimator for fractionally
integrated ARMA processes. Econometrics and Economics Theory Paper, No.
9408, Michigan State University.
Davydov, Y.A., 1970. The invariance principle for stationary processes. Theory of
Probability and Its Applications 15, 487-489.
Davidson, J., 2002. Establishing conditions for the functional central limit theorem
in nonlinear and semiparametric time series processes. Journal of Economet-
rics 106, 243-269.
Davidson, J., De Jong, R.M., 2000. The functional central limit theorem and
weak convergence to stochastic integrals II: Fractionally integrated processes.
Econometric Theory 16, 643-666.
Dickey, D.A., Fuller, W.A., 1979. Distribution of the estimator for autoregressive
time series with a unit root. Journal of the American Statistical Association
74, 427-431.
Fox, R., Taqqu, M.S., 1986. Large sample properties of parameter estimates for
strongly dependent stationary Gaussian time series. Annals of Statistics 14,
517-532.
Geweke, J., Porter-Hudak, S., 1983. The estimation and application of long mem-
ory time series models. Journal of Time Series Analysis 4, 221-238.
Gli-Ala?na L.A., 2000. Mean reversion in the real exchange rates. Economics Letters
69, 285-288.
Gli-Ala?na L.A., Robinson P.M., 1997. Testing of unit root and other nonstationary
hypotheses in macroeconomic time series. Journal of Econometrics 80, 241-
268.
Granger, C.W.J., Joyeux, R., 1980. An introduction to long-memory time series
models and fractional differencing. Journal of Time Series Analysis 1, 15-29.
29
Hauser, M.A., P¨otscher, B.M., Reschenhofer, E., 1999. Measuring persistence in
aggregate output: ARMA models, fractionally integrated ARMA models and
nonparametric procedures. Empirical Economics 24, 243-269.
Hosking, J.R.M., 1981. Fractional differencing. Biometrika 68, 1, 165-176.
Hosking, J.R.M., 1996. Asymptotic distributions of the sample mean, autocovari-
ances and autocorrelations of long-memory time series. Journal of Economet-
rics 73, 261-284.
Kwiatkowski, D., Phillips, P.C.B., Schmidt, P., Shin, Y., 1992. Testing the null
hypothesis of stationarity against the alternative of a unit root. Journal of
Econometrics 54, 159-178.
Mandelbrot, B.B., Van Ness, J.W., 1968. Fractional Brownian motions, fractional
Brownian noises and applications. SIAM Review 10, 422-437.
Marinucci, D., Robinson, P.M., 1999. Alternative forms of fractional Brownian
motion. Journal of Statistical Planning and Inference 80, 111-122.
Marinucci, D., Robinson, P.M., 2000. Weak convergence of multivariate fractional
processes. Stochastic Processes and their Applications 86, 103-120.
Mayroal, L., 2001. GeneralizedminimumdistanceestimationofARFIMAprocesses
with conditional heterocedastic errors. In Preparation.
Mishkin, F.S., 1990. What does the term structure of interest rates tell us about
future inflation? Journal of Monetary Economics 25, 77-95.
Phillips, P.C.B., 1987. Time series regression with a unit root. Econometrica 55,
277-301.
Phillips, P.C.B., Perron, P., 1988. Testing for cointegration using principle compo-
nents methods. Journal of Economic Dynamics and Control 12, 205-230.
Phillips, P.C.B., Xiao, Z., 1998. A primer on unit root testing. Journal of Economic
Surveys 12, No. 5.
Robinson, P.M., 1992. Semiparametric analysis of long-memory time series. Annals
of Statistics 22, 515-539.
Robinson, P.M., 1994. Efficient tests of nonstationary hypotheses. Journal of the
American Statistical Association 89, 1420-1437.
Schwert, G.W., 1987. Effects of model specification on tests for unit roots in
macroeconomics data. Journal of Monetary Economics 20, 73-103.
Sowell, F.B., 1990. The fractional unit root distribution. Econometrica 58, No. 2,
495-505.
Sowell, F.B., 1992. Maximum likelihood estimation of stationary univariate frac-
tionally integrated time series models. Journal of Econometrics 53, 165-188.
Tanaka, K., 1999. The nonstationary fractional unit root. Econometric Theory,
No. 4, Vol. 15.
Tsay, W.-J., 2000. Long memory story of the real interest rate. Economics Letters
67, 325-330.
30