Ch. 12 Stochastic Process
1 Introduction
A particularly important aspect of real observable phenomena, which the random
variables concept cannot accommodate, is their time dimension; the concept of
random variable is essential static. A number of economic phenomena for which
we need to formulate probability models come in the form of dynamic processes
for which we have discrete sequence of observations in time. The problem we
have to face is extend the simple probability model,
= ff(x; ); 2 g;
to one which enables us to model dynamic phenomena. We have already moved
in this direction by proposing the random vector probability model
= ff(x1; x2; :::; xT ; ); 2 g:
The way we viewed this model so far has been as representing di erent char-
acteristics of the phenomenon in question in the form of the jointly distributed
r.v.’s X1; X2; :::; XT . If we reinterpret this model as representing the same char-
acteristic but at successive points in time then this can be viewed as a dynamic
probability model. With this as a starting point let us consider the dynamic
probability model in the context of (S;F;P).
2 The Concept of a Stochastic Process
The natural way to make the concept of a random variable dynamic is to extend
its domain by attaching a date to the elements of the sample space S.
De nition 1:
Let (S;F;P) be a probability space and T an index set of real numbers and de-
ne the function X( ; ) by X( ; ) : S T ! R. The ordered sequence of random
variables fX( ; t); t 2 Tg is called a stochastic process.
This de nition suggests that for a stochastic process fX( ; t); t 2 Tg, for each
t 2 T , X( ; t) represents a random variable on S. On the other hand, for each s
1
in S, X(s; ) represents a function of t which we call a realization of the process.
X(s; t) for given s and t is just a real number.
Three main elements of a stochastic process fX( ; t); t 2 Tg are:
1. its range space (sometimes called the state space),1 usually R;
2. the index T , usually one of R;R+ = [0;1), Z = f:::; 0; 1; 2; :::g and
3. the dependence structure of the r.v.’s fX( ; t); t 2 Tg.
In what follows a stochastic process will be denoted by fXt; t 2 Tg (s is
dropped and X(t) is customary used as continuous stochastic process) and we
are concerning exclusively on discrete stochastic process. That is, the index set
T is a countable set such as T = f0; 1; 2; :::g.
The dependence structure of fXt; t 2 Tg, in direct analogy with the case of
a random vector, should be determined by the joint distribution of the process.
The question arises, however, since T is commonly an in nite set, do we need an
in nite dimensional distribution to de ne the structure of the process ?
This question was tackled by Kolmogorov (1933) who showed that when the
stochastic process satis es certain regularity conditions the answer is de nitely
’no’. In particular, if we de ne the ’tentative’ joint distribution of the process for
the subset (t1 < t2 < ::: < tT ) of T by F(xt1; xt2; :::; xtT ) = Pr(Xt1 x1; Xt2
x2; :::; XtT xT ), then if the stochastic process fXt; t 2 Tg satis es the condi-
tions:
1. symmetry: F(xt1; xt2; :::; xtT ) = F(xtj1; xtj2; :::; xtjT ) where j1; j2; :::; jT is
any permutation of the indices 1; 2; :::; T (i.e. reshu ing the ordering of the in-
dex does not change the distribution).
2. compatibility: limxT!1 F(xt1; xt2; :::; xtT ) = F(xt1; xt2; :::; xtT 1) (i.e. the
dimensionality of the joint distribution can be reduced by marginalisation);
there exist a probability space (S;F;P) and a stochastic process fXt; t 2 Tg de-
1In the function y = f(x), x is referred to as the argument of the function, and y is called
the value of the function. We shall also alternatively refer x as the independent variable and
y as the dependent variable. The set of all permissible value that x can take in a given context
is known as the domain of the function. The value into which an x value is mapped is called
the image of that x value. The set of all images is called the range of the function, which is
the set of all values that y variable will take.
2
ned on it whose nite dimensional distribution is the distribution F(xt1; xt2; :::; xtT )
as de ned above. That is, the probability structure of the stochastic process
fXt; t 2 Tg is completely speci ed by the joint distribution of F(xt1; xt2; :::; xtT )
for all values of T (a positive integer) and any subset (t1; t2; :::; tT) of T .
Given that, for a speci c t, Xt is a random variable, we can denote its distri-
bution and density function by F(xt) and f(xt) respectively. Moreover the mean,
variance and higher moments of Xt (as a r.v.) can be de ned as standard form
as:
E(Xt) =
Z
xt
xtf(xt)dxt = t;
E(Xt t)2 =
Z
xt
(xt t)2f(xt)dxt = v2(t); and
E(Xt)r = rt; r 1;
for all t 2 T .
The linear dependence measures between Xti and Xtj
v(ti; tj) = E[(Xti ti)(Xtj tj)]; ti; tj 2 T ;
is now called the autocovariance function. In standardized form
r(ti; tj) = v(ti; tj)v(t
i)v(tj)
; ti; tj 2 T ;
is called is autocorrelation function. These numerical characteristics of the
stochastic process fXt; t 2 Tg play an important role in the analysis of the pro-
cess and its application to modeling real observable phenomena. We say that
fXt; t 2 Tg is an uncorrelated process if r(ti; tj) = 0 for any ti; tj 2 T ; ti 6= tj.
Example:
One of the most important example of a stochastic process is the normal process.
The stochastic process fXt; t 2 Tg is said to be normal (or Gaussian) if any nite
subset of T , say t1; t2; :::; tT , (Xt1; Xt2; :::; XtT ) x0T has a multivariate normal
distribution, i.e.
f(xt1; xt2; :::; xtT ) = (2 ) T=2jVTj 1=2 exp[ 12(xT T )0V 1T (xT T )];
3
where
T = E(xT ) =
2
66
66
66
4
1
2
:
:
:
T
3
77
77
77
5
VT =
2
66
66
66
4
v2(t1) v(t1; t2) : : : v(t1; tT )
v(t2; t1) v2(t2) : : : v(t2; tT )
: : : : : :
: : : : : :
: : : : : :
v(tT ; t1) : : : : v2(tT )
3
77
77
77
5
:
As in the case of a normal random variable, the distribution of a normal stochas-
tic process is characterized by the rst two moment but now they are function of t.
One problem so far in the de nition of a stochastic process given above is
much too general to enable us to obtain a operational probability model. In
the analysis of stochastic process we only have a single realization of the process
and we will have to deduce the value of t and v(t) with the help of a single
observation. (which is impossible !)
The main purpose of the next three sections is to consider various special
forms of stochastic process where we can construct probability models which are
manageable in the context of statistical inference. Such manageability is achieved
by imposing certain restrictions which enable us to reduce the number of unknown
parameters involved in order to be able to deduce their value from a single real-
ization. These restrictions come in two forms:
1. restriction on the time-heterogeneity of the process; and
2. restriction on the memory of the process.
2.1 Restricting the time-heterogeneity of a stochastic pro-
cess
For an arbitrary stochastic process fXt; t 2 Tg the distribution function F(xt; t)
depends on t with the parameter t characterizing it being function of t as well.
That is, a stochastic process is time-heterogeneous in general. This, however,
raises very di cult issues in modeling real phenomena because usually we only
have one observation for each t. Hence in practice we will have to estimate t
on the basis of a single observation, which is impossible. For this reason we are
going to consider an important class of stationary process which exhibit con-
siderable time-homogeneity and can be used to model phenomena approaching
4
their equilibrium steady state, but continuously undergoing ’random’ func-
tions. This is the class of stationary stochastic processes.
De nition:
A stochastic process fXt; t 2 Tg is said to be (strictly) stationary if any subset
(t1; t2; :::; tT) of T and any ,
F(xt1; :::; xtT ) = F(xt1+ ; :::; xtT + ):
That is, the distribution of the process remains unchanged when shifted in time by
an arbitrary value . In terms of the marginal distributions, (strictly) stationarity
implies that
F(Xt) = F(Xt+ ); t 2 T ;
and hence F(xt1) = F(xt2) = ::: = F(xtT ). That is stationarity implies that
Xt1; Xt2; :::; XtT are (individually) identically distributed.
The concept of stationarity, although very useful in the context of probability
theory, is very di cult to verify in practice because it is de ned in terms of dis-
tribution function. For this reason the concept of the second order stationarity,
de ned in terms of the rst two moments, is commonly preferred.
De nition:
A stochastic process fXt; t 2 Tg is said to be (weakly) stationary if
E(Xt) = for all t;
v(ti; tj) = E[(Xti )(Xtj )] = jtj tij; ti; tj 2 T :
These suggest that weakly stationarity for fXt; t 2 Tg implies that its mean and
variance v2(ti) = 0 are constant and free of t and its autocovariance depends on
the interval jtj tij; not ti and tj. Therefore, k = k.
Example:
Consider the normal stochastic process in the above example. With the weakly
5
stationarity assumption, now
T = E(XT ) =
2
66
66
66
4
:
:
:
3
77
77
77
5
VT =
2
66
66
66
4
0 1 : : : T 1
1 0 : : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 0
3
77
77
77
5
;
a sizeable reduction in the number of unknown parameters from T +[T(T +1)=2]
to (T + 1). It is important, however, to note that even in the case of stationarity
the number of parameters increase with the size of the subset (t1; :::; tT ) although
the parameters do not depend on t 2 T . This is because time-homogeneity does
not restrict the ’memory’ of the process. In the next section we are going to
consider ’memory’ restrictions in an obvious attempt to ’solve’ the problem of
the parameters increasing with the size of the subset (t1; t2; :::; tT ) of T .
2.2 Restricting the memory of a stochastic process
In the case of a typical economic times series, viewed as a particular realization of
a stochastic process fXt; t 2 Tg one would expect that the dependence between
Xti and Xtj would tend to weaken as the distance (tj ti) increase. Formally, this
dependence can be described in terms of the joint distribution F(xt1; xt2; :::; xtT )
as follows:
De nition:
A stochastic process fXt; t 2 Tg is said to be asymptotically independent if
for any subset (t1; t2; :::; tT ) of T and any , ( ) de ned by
jF(xt1; xt2; :::; xtT ; xt1+ ; :::; xtT+ ) F(xt1; xt2; :::; xtT )F(xt1+ ; :::; xtT+ )j
( ) goes to zero as !1:
That is if ( ) ! 0 as !1the two subsets (Xt1; Xt2; :::; XtT ) and (Xt1+ ; :::; XtT+ )
become independent.
A particular case of asymptotic independence is that of m dependence which
restricts ( ) to be zero for all > m. That is, Xt1 and Xt2 are independent for
jt1 t2j > m.
6
De nition:
A stochastic process fXt; t 2 Tg is said to be asymptotically uncorrelated if
for there exists a sequence of constants f ( ); 1g de ned by
v(t; t + )
v(t)v(t + )
( ); for all t 2 T ;
such that
0 ( ) 1 and
1X
=0
( ) < 1:
As we can see, the sequence of constants f ( ); 1g de nes an upper bound
for the sequence of autocorrelation coe cients r(t; t + ). Moreover, given that
( ) ! 0 as ! 1 is a necessary and ( ) < (1+ ) for > 0, a su cient
condition for P1 =0 ( ) < 1, the intuition underlying the above de nition is
obvious.
At this stage it is important to note that the above concept of asymptotic
independence and uncorrelatedness which restrict the memory of a stochastic
process are not de ned in terms of a stationary stochastic process but a general
time-heterogeneous process. This is the reason why ( ) and ( ) for 1
de ne only upper bounds for the two measures of dependence given that when
equality is used in their de nition they will depend on (t1; t2; :::; tT ) as well as .
A more general formulation of asymptotic independence can be achieved using
the concept of a - eld generated by a random vector. Let Ft1 denote the - eld
generated by X1; X2; :::; XT where fXt; t 2 Tg is a stochastic process. A measure
of the dependence among the elements of the stochastic process can be de ned
in terms of the events B 2 Ft 1 and A 2 F1t+ by
( ) = sup
jP(A\ B) P(A)P(B)j:
De nition:
A stochastic process fXt; t 2 Tg is said to be strongly mixing ( mixing)
if ( ) ! 0 as ! 1.
7
A stronger form of mixing, sometimes called uniform mixing, can be de ned
in terms of the following measure of dependence:
’( ) = sup
jP(AjB) P(A)j; P(B) > 0:
De nition:
A stochastic process fXt; t 2 Tg is said to be uniformly mixing (’ mixing)
if ’( ) ! 0 as ! 1.
Looking at the two de nitions of mixing we can see that ( ) and ’( ) de ne
absolute and relative measures of temporal dependence, respectively. The former
is based on the de nition of dependence between two events A and B separated
by periods using the absolute measure
[P(A\ B) P(A) P(B)] 0
and the latter the relative measure
[P(AjB) P(A)] 0:
Because ’( ) ( ) (why ?), ’ mixing implies mixing.
In the context of weakly-stationary stochastic process, asymptotic uncorre-
latedness can be de ned more intuitively in terms of the temporal covariance as
follows:
Cov(Xt; Xt+ ) = ! 0 as ! 1:
A stronger form of such memory restriction is so called ergodicity property.
Ergodicity can be viewed as a condition which ensures that the memory of the
process as measured by "weakens by averaging overtime"
De nition:
A necessary condition is in the nature of a prerequisite: suppose that a statement
8
p is true only if another statement q is true; then q constitutes a necessary
condition of p. Symbolically, we express this as follows:
p =) q
which is read:
1. "p only if q; " or alternative
2. "if p; then q". It is also logically correct to mean
3. "p implies q", and
4. "p is a stronger condition than q " and
5. p q.
De nition:
A weakly-stationary stochastic process fXt; t 2 Tg is said to be ergodic if
1X
=0
j j < 1:
3 Some special stochastic process
We will consider brie y several special stochastic process which play an impor-
tant role in econometric modeling. These stochastic processes will be divided into
parametric and non-parametric process. The non-parametric process are de-
ned in terms of their joint distribution function or the rst few joint moments.
On the other hand, parametric process are de ned in terms of a generating mech-
anism which is commonly a functional form based on a non-parametric process.
3.1 Non-Parametric process
De nition:
A stochastic process fXt; t 2 Tg is said to be a white-noise process if
(i): E(Xt) = 0;
(ii): E(XtX ) =
2 if t =
0 if t 6= :
Hence, a white-noise process is both time-homogeneous, in view of the fact that it
is a weakly-stationary process, and has no memory. In the case where fXt; t 2 Tg
is also assumed to be normal the process is also strictly stationary.
9
De nition:
A stochastic process fXt; t 2 Tg is said to be a martingales process if...
De nition:
A stochastic process fXt; t 2 Tg is said to be an innovation process if....
De nition:
A stochastic process fXt; t 2 Tg is said to be a Markov process if....
De nition:
A stochastic process fXt; t 2 Tg is said to be a Brownian motion process if...
3.2 Parametric stochastic processes
3.2.1 (Weakly) Stationary Process
De nition:
A stochastic process fXt; t 2 Tg is said to be a autoregressive of order one
(AR(1)) if it satis es the stochastic di erence equation,
Xt = Xt 1 + ut
where is a constant and ut is a white-noise process. We rst consider the index
set T = f0; 1; 2; :::g and assume that X T ! 0 as T ! 1.
De ne a lag operator L by
LXt Xt 1;
then the AR(1) process can be rewritten as
(1 L)Xt = ut;
when j j < 1, it can be inverted as
Xt = (1 L) 1ut = (1 + L + 2L2 + ::::)ut = ut + ut 1 + 2ut 2 + :::::
=
1X
i=0
iut i;
10
from which we can deduce that
E(Xt) = 0;
E(XtXt+ ) = E
( 1X
i=0
iut i
! 1X
j=0
iut+ j
!)
= 2u
1X
i=0
i i+
!
= 2u
1X
i=0
2i
!
; 0:
Hence, for j j < 1, the stochastic process fXt; t 2 T g is both weakly-stationary
and asymptotically uncorrelated since the autocovariance function
=
2
u
(1 2)
! 0; as ! 1:
Therefore, if any nite subset ofT , say t1; t2; :::; tT of a AR(1) process, (Xt1; Xt2; :::; XtT )
x0T has covariance matrix
E(xTx0T) = 2u 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
= 2u ;
where
= 1(1 2)
2
66
66
66
4
1 : : : T 1
1 : : T 2
: : : : : :
: : : : : :
: : : : : :
T 1 : : : : 1
3
77
77
77
5
:
It is straightforward to show by direct multiplication that
P0P = 1;
for
P =
2
66
66
66
4
p1 2 0 : : : 0
1 0 : : 0
0 1 0 : 0
: : : : : :
: : : : : :
0 0 : : 1
3
77
77
77
5
:
11
The above discussion of the AR(1) process generalizes directly to the AR(p)
process where p 1.
De nition:
A stochastic process fXt; t 2 Tg is said to be a autoregressive of order p
(AR(p)) if it satis es the stochastic di erence equation,
Xt = 1Xt 1 + 2Xt 2 + ::: + pXt p + ut;
where 1; 2; :::; p are constants and ut is a white-noise process.
De nition:
A stochastic process fXt; t 2 Tg is said to be a moving average process of
order q (MA(q)) if it can be expressed in the form
Xt = ut + 1ut 1 + 2ut 2 + ::: + qut q;
where 1; 2; :::; q are constants and ut is a white-noise process.
That is, the white-noise process is used to build the process fXt; t 2 Tg,
being a linear combination of the last q ut is.
De nition:
A stochastic process fXt; t 2 Tg is said to be an autoregressive moving
average process of order p; q (ARMA(p; q)) if it can be expressed in the form
Xt = 1Xt 1 + 2Xt 2 + ::: + pXt p + ut + 1ut 1 + 2ut 2 + ::: + qut q;
where 1; 2; :::; p; 1; 2; :::; q are constants and ut is a white-noise process.
De nition:
A stochastic process fYt; t 2 Tg is said to be an fractionally autoregres-
sive integrated moving average process of order p; d; q (ARFIMA(p; d; q))
12
if it can be expressed as a stationary ARMA(p; q) process after fractionally-
di erenced "d" times:
(1 L)dYt = Xt;
and
Xt = 1Xt 1 + 2Xt 2 + ::: + pXt p + ut + 1ut 1 + 2ut 2 + ::: + qut q;
where 1; 2; :::; p; 1; 2; :::; q are constants, jdj < 0:5 and ut is a white-noise
process.
3.2.2 Non-Stationary Process
De nition: unit root process
A stochastic process fYt; t 2 Tg is said to be an autoregressive integrated
moving average process of order p; q (ARIMA(p; 1; q)) if it can be expressed
as a stationary ARMA(p; q) process after rst-di erening
(1 L)1Yt = Xt;
and
Xt = 1Xt 1 + 2Xt 2 + ::: + pXt p + ut + 1ut 1 + 2ut 2 + ::: + qut q;
where 1; 2; :::; p; 1; 2; :::; q are constants and ut is a white-noise process.
13