16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 1 of 10
Lecture 7
Last time: Moments of the Poisson distribution from its generating function.
(1)
(1)
2
2(1)
2
1
2
2
2
1
1
2
222
22
()
s
s
s
s
s
s
Gs e
dG
e
ds
dG
e
ds
dG
X
ds
dG dG
X
ds ds
XX
X
μ
μ
μ
μ
μ
μ
μμ
σ
μμμ
μ
?
?
?
=
=
=
=
=
=
==
=+
=+
=?
=+?
=
=
Example: Using telescope to measure intensity of an object
Photon flux ? photoelectron flux. The number of photoelectrons are Poisson
distributed. During an observation we cause N photoelectron emissions. N is
the measure of the signal.
2
2
1
N
N
N
SN t
t
St
t
t
S
t
λ
σμλ
λ
λ
σ λ
λσ
==
==
==
??
=
??
??
For signal-to-noise ratio of 10, require 100N = photoelectrons. All this follows
from the property that the variance is equal to the mean.
This is an unbounded experiment, whereas the binomial distribution is for n
number of trials.
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 2 of 10
3. The Poisson Approximation to the Binomial Distribution
The binomial distribution, like the Poisson, is that of a random variable taking
only positive integral values. Since it involves factorials, the binomial
distribution is not very convenient for numerical application.
We shall show under what conditions the Poisson expression serves as a good
approximation to the binomial expression – and thus may be used for
convenience.
()
!
() (1 )
!!
knk
n
bk p p
knk
?
=?
?
Consider a large number of trials, n, with small probability of success in each, p,
such that the mean of the distribution, np, is of moderate magnitude.
1
2
1
2
1
2
1
2
1
2
Define with large and small
Recalling:
! ~ 2 Stirling's formula
lim 1
!
() 1
!( )!
2
1
!
2( )
!
1
n
n
n
n
nk
k
k
n
nk
kn
n
nk k
n
kn
n
np n p
p
n
nne
e
n
n
bk
knk n n
ne
nk e n
ne
k
n
μ
μ
μ
π
μ
μμ
μπ μ
π
μ
+
?
?
→∞
?
+
?
?
+
?+
+
?
+
≡
=
??
=?
??
??
??
=?
??
?
??
??
≈?
??
??
?
=
1
2
1
as becomes large relative to
!
1
!
nk
nk
k
k
kk
k
n
k
e
n
e
nk
kee
e
k
μ
μ
μ
μ
μ
?
?+
?
?
?
??
?
??
??
??
?
??
??
≈
=
The relative error in this approximation is of order of magnitude
2
()
Rel. Error ~
k
n
μ?
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 3 of 10
However, for values of k much smaller or larger than μ , the probability becomes
small.
The Normal Distribution
Outline:
1. Describe the common use of the normal distribution
2. The practical employment of the Central Limit Therorem
3. Relation to tabulated functions
Normal distribution function
Normal error function
Complementary error function
1. Describe the common use of the normal distribution
Normally distributed variables appear repeatedly in physical situations.
? Voltage across the plate of a vacuum tube
? Radar angle tracking noise
? Atmospheric gust velocity
? Wave height in the open sea
2. The practical employment of the Central Limit Therorem
( 1,2,..., )
i
X in= are independent random variables.
Define the sum of these X
i
as
2
1
1
2
1
i
n
i
i
n
i
i
n
SX
i
SX
SX
σ σ
=
=
=
=
=
=
∑
∑
∑
Then under the condition
()
3
1
3
lim 0
i
i
n
S
n
X
i
Xii
XX
β
σ
ββ
β
→∞
=
=
=
=?
∑
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 4 of 10
the limiting distribution of S is the normal distribution. Note that this is true for
any distributions of the X
i
.
These are sufficient conditions under which the theorem can be proved. It is not
clear that they are necessary.
Notice that each of the noises mentioned earlier depend on the accumulated
effect of a great many small causes e.g., voltage across plate: electrons traveling
from cathode to plate.
It is convenient to work with the characteristic function since we are dealing with
the sum of independent variables.
Normal probability density function:
2
2
()
2
1
()
2
xm
fx e
σ
π
?
?
=
Normal probability distribution function:
2
2
2
()
2
2
1
()
2
1
2
Where:
1
umx
xm
v
Fx e du
edv
mX
um
v
dv du
σ
σ
πσ
πσ
σ
σ
?
?
?∞
?
?
?∞
=
=
=
?
=
=
∫
∫
This integral with the integrand normalized is tabulated. It is called the normal
probability function and symbolized with Φ.
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 5 of 10
2
2
1
()
2
x v
x edv
π
?
?∞
Φ=
∫
This is a different x. Note the relationship between this and the quantity x
previous defined. We use x again here as this is how Φ is usually written.
Not only this function but also its first several derivatives which appear in
analytic work are tabulated.
3. Relation to tabulated functions
Even more generally available are the closely related functions:
Error function:
2
0
2
()
x
u
erf x e du
π
?
=
∫
Complementary error function:
2
2
()
u
x
cerf x e du
π
∞
?
=
∫
1
() 1
2 2
x
xerf
??
??
Φ= +
????
??
??
() ()
()
2
2
2
2
2
22
22
()
2
()
2
2
2
2
2
1
()
2
1
, where
2
1
(cos sin )
2
2
cos
2
2
xm
jtx
y
jt m y
y
jtm
y
jtm
t
jtm
t
jtm
teedx
x m
e e dy y
etyjtyed
etyed
e
e
e
σ
σ
σ
σ
φ
πσ
σπ
σσ
π
σ
π
π
π
?∞
?
?∞
∞
?
+
?∞
∞
?
?∞
∞
?
?∞
?
??
???
??
=
?
==
=+
=
=
=
∫
∫
∫
∫
Differentiation of this form will yield correctly the first 2 moments of the
distribution.
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 6 of 10
Most important property of normal variables: any linear combination (weighted
sum) of normal variables, whether independent or not, is another normal
variable.
Note that for zero mean variables
2
2
22
2
2
1
()
2
()
x
t
fx e
te
σ
σ
πσ
φ
?
?
=
=
Both are Gaussian forms.
The Normal Approximation to the Binomial Distribution
The binomial distribution deals with the outcomes of n independent trials of an
experiment. Thus if n is large, we should expect the binomial distribution to be
well approximated by the normal distribution. The approximation is given by
the normal distribution having the same mean and variance. Thus
2
()
2
1
(,, )
2
knp
npq
bknp e
npqπ
?
?
≈
Relative error is of the order of
3
2
()
()
knp
npq
?
The relative fit is good near the mean if npq is large and degenerates in the tails
where the probability itself is small.
The Normal Approximation to the Poisson Distribution
Also the Poisson distribution depends on the outcomes of independent events. If
there are enough of them,
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 7 of 10
2
()
2
1
(, )
2
k
Pk e
μ
μ
μ
πμ
?
?
≈
Relative error is of the order of
3
2
()k μ
μ
?
The relative fit is subject to the same behavior as the binomial approximation.
Interpretation of a continuous distribution approximating a discrete one:
The value of the normal density function at any k approximates the value of the
discrete distribution for that value of k. Think of spreading the area of each
impulse over a unit interval. Then the height of each rectangle is the probability
that the corresponding value of k will be taken. The normal curve approximates
this step-wise function.
Note that in summing the probabilities for values of k in some interval, the
approximating normal curve should be integrated over that interval plus ? on
each end to get all the probability associated with those values of k.
2
1
12
()()
N
kN
PN X N Pk
=
≤≤ =
∑
2
2
2
1
1
1
()
2
2
1
2
21
1
(, )
2
11
22
N
x
N
kN
N
Pk e dx
NN
μ
μ
μ
πμ
μ μ
μμ
+
?
?
=
?
≈
????
+? ??
????
=Φ ?Φ
????
????
∑
∫
Multidimensional Normal Distribution
Probability density function:
()
1
2
11
() exp ( ) ( )
2
2
n
f xxXMxX
Mπ
Τ?
??
=???
??
??
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 8 of 10
Assuming zero mean, which is often the case:
()
1
2
11
() exp
2
2
n
f xxMx
Mπ
Τ?
??
=?
??
??
For zero mean variables: contours of constant probability density are given by:
12T
x Mx c
?
=
Not expressed in principal coordinates if the X
i
are correlated.
Need to know the rudimentary properties of eigenvalues and eigenvectors.
M is symmetric and full rank.
1,
0,
iii
T
ij ij
Mv v
ij
vv
ij
λ
δ
=
=?
==
?
≠
?
This probability density function can be better visualized in terms of its principal
coordinates. These coordinates are defined by the directions of the eigenvectors
of the covariance matrix. The appropriate transformation is
1
T
T
n
yVx
v
V
v
=
??←→
??
=
??
←→
??
M
Thus y
i
is the component of x in the direction
i
v .
(In terms of the new variable y , the contours of constant probability density are)
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 9 of 10
111
1
111
1
1
11
1
1
...
... ...
TTT
T
T
T
n
T
nn
T
n
xM x yV MV y
yY y
YVMV
YVMV
VM v v
v
vv
v
λλ
λ
λ
???
?
????
=
=
=
=
??↑↑
??
=
??
↓↓
??
????←→↑ ↑
????
=
????
←→↓ ↓
????
??
??
=
??
??
O
Y is the covariance matrix for the random variable YVX= , so the
i
λ are the
variance of the Y
i
.
1
1
1
1
n
Y
λ
λ
?
??
??
=
??
??
??
O
1
2
22 2
12
12
12
1
1
...
T
n
T
n
n
yyc
yy y
yY y c
λ
λ
λλ λ
?
??
??
=
??
??
??
=+++=
O
These are the principal coordinate with intercepts at
ii
ycλ=± with
i
λ the
standard deviation of y
i
.
Note that two random variables, each having a normal distribution singly, do not
necessarily have a binormal joint distribution.
However, if the random variables are independent and normally distributed,
their joint distribution is clearly a multidimensional normal distribution.
16.322 Stochastic Estimation and Control, Fall 2004
Prof. Vander Velde
9/30/2004 9:55 AM Page 10 of 10
Two dimensional case- continued:
ij i j
mXX=
22
22 1 12 1 2 11 2
212 2
2
11 22 12
11 22 12
2
11 1
2
22 2
12 12
12
12
2
1122
12
212
2
12 12
12
(, ) exp
2( )
2
In terms of these symbols:
2
1
(, ) exp
21
xy
mx mxx mx
fxx
mm m
mm m
m
m
m
xxxx
fxx
π
σ
σ
μ
μ
ρ
σσ
ρ
σσσσ
πσσ ρ
???+
=?
??
?
?
??
=
=
=
=
?? ??????
?+
?? ?????
?? ?????
=?
?
2
2
12
2(1 )ρ
? ?
? ?
?
? ??
? ?
?
? ?
? ?
? ?
Note that if a set of random variables having the multidimensional normal
distribution is uncorrelated, they are independent. This is not true in general.