16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 1 of 10 Lecture 7 Last time: Moments of the Poisson distribution from its generating function. (1) (1) 2 2(1) 2 1 2 2 2 1 1 2 222 22 () s s s s s s Gs e dG e ds dG e ds dG X ds dG dG X ds ds XX X μ μ μ μ μ μ μμ σ μμμ μ ? ? ? = = = = = = == =+ =+ =? =+? = = Example: Using telescope to measure intensity of an object Photon flux ? photoelectron flux. The number of photoelectrons are Poisson distributed. During an observation we cause N photoelectron emissions. N is the measure of the signal. 2 2 1 N N N SN t t St t t S t λ σμλ λ λ σ λ λσ == == == ?? = ?? ?? For signal-to-noise ratio of 10, require 100N = photoelectrons. All this follows from the property that the variance is equal to the mean. This is an unbounded experiment, whereas the binomial distribution is for n number of trials. 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 2 of 10 3. The Poisson Approximation to the Binomial Distribution The binomial distribution, like the Poisson, is that of a random variable taking only positive integral values. Since it involves factorials, the binomial distribution is not very convenient for numerical application. We shall show under what conditions the Poisson expression serves as a good approximation to the binomial expression – and thus may be used for convenience. () ! () (1 ) !! knk n bk p p knk ? =? ? Consider a large number of trials, n, with small probability of success in each, p, such that the mean of the distribution, np, is of moderate magnitude. 1 2 1 2 1 2 1 2 1 2 Define with large and small Recalling: ! ~ 2 Stirling's formula lim 1 ! () 1 !( )! 2 1 ! 2( ) ! 1 n n n n nk k k n nk kn n nk k n kn n np n p p n nne e n n bk knk n n ne nk e n ne k n μ μ μ π μ μμ μπ μ π μ + ? ? →∞ ? + ? ? + ?+ + ? + ≡ = ?? =? ?? ?? ?? =? ?? ? ?? ?? ≈? ?? ?? ? = 1 2 1 as becomes large relative to ! 1 ! nk nk k k kk k n k e n e nk kee e k μ μ μ μ μ ? ?+ ? ? ? ?? ? ?? ?? ?? ? ?? ?? ≈ = The relative error in this approximation is of order of magnitude 2 () Rel. Error ~ k n μ? 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 3 of 10 However, for values of k much smaller or larger than μ , the probability becomes small. The Normal Distribution Outline: 1. Describe the common use of the normal distribution 2. The practical employment of the Central Limit Therorem 3. Relation to tabulated functions Normal distribution function Normal error function Complementary error function 1. Describe the common use of the normal distribution Normally distributed variables appear repeatedly in physical situations. ? Voltage across the plate of a vacuum tube ? Radar angle tracking noise ? Atmospheric gust velocity ? Wave height in the open sea 2. The practical employment of the Central Limit Therorem ( 1,2,..., ) i X in= are independent random variables. Define the sum of these X i as 2 1 1 2 1 i n i i n i i n SX i SX SX σ σ = = = = = = ∑ ∑ ∑ Then under the condition () 3 1 3 lim 0 i i n S n X i Xii XX β σ ββ β →∞ = = = =? ∑ 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 4 of 10 the limiting distribution of S is the normal distribution. Note that this is true for any distributions of the X i . These are sufficient conditions under which the theorem can be proved. It is not clear that they are necessary. Notice that each of the noises mentioned earlier depend on the accumulated effect of a great many small causes e.g., voltage across plate: electrons traveling from cathode to plate. It is convenient to work with the characteristic function since we are dealing with the sum of independent variables. Normal probability density function: 2 2 () 2 1 () 2 xm fx e σ π ? ? = Normal probability distribution function: 2 2 2 () 2 2 1 () 2 1 2 Where: 1 umx xm v Fx e du edv mX um v dv du σ σ πσ πσ σ σ ? ? ?∞ ? ? ?∞ = = = ? = = ∫ ∫ This integral with the integrand normalized is tabulated. It is called the normal probability function and symbolized with Φ. 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 5 of 10 2 2 1 () 2 x v x edv π ? ?∞ Φ= ∫ This is a different x. Note the relationship between this and the quantity x previous defined. We use x again here as this is how Φ is usually written. Not only this function but also its first several derivatives which appear in analytic work are tabulated. 3. Relation to tabulated functions Even more generally available are the closely related functions: Error function: 2 0 2 () x u erf x e du π ? = ∫ Complementary error function: 2 2 () u x cerf x e du π ∞ ? = ∫ 1 () 1 2 2 x xerf ?? ?? Φ= + ???? ?? ?? () () () 2 2 2 2 2 22 22 () 2 () 2 2 2 2 2 1 () 2 1 , where 2 1 (cos sin ) 2 2 cos 2 2 xm jtx y jt m y y jtm y jtm t jtm t jtm teedx x m e e dy y etyjtyed etyed e e e σ σ σ σ φ πσ σπ σσ π σ π π π ?∞ ? ?∞ ∞ ? + ?∞ ∞ ? ?∞ ∞ ? ?∞ ? ?? ??? ?? = ? == =+ = = = ∫ ∫ ∫ ∫ Differentiation of this form will yield correctly the first 2 moments of the distribution. 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 6 of 10 Most important property of normal variables: any linear combination (weighted sum) of normal variables, whether independent or not, is another normal variable. Note that for zero mean variables 2 2 22 2 2 1 () 2 () x t fx e te σ σ πσ φ ? ? = = Both are Gaussian forms. The Normal Approximation to the Binomial Distribution The binomial distribution deals with the outcomes of n independent trials of an experiment. Thus if n is large, we should expect the binomial distribution to be well approximated by the normal distribution. The approximation is given by the normal distribution having the same mean and variance. Thus 2 () 2 1 (,, ) 2 knp npq bknp e npqπ ? ? ≈ Relative error is of the order of 3 2 () () knp npq ? The relative fit is good near the mean if npq is large and degenerates in the tails where the probability itself is small. The Normal Approximation to the Poisson Distribution Also the Poisson distribution depends on the outcomes of independent events. If there are enough of them, 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 7 of 10 2 () 2 1 (, ) 2 k Pk e μ μ μ πμ ? ? ≈ Relative error is of the order of 3 2 ()k μ μ ? The relative fit is subject to the same behavior as the binomial approximation. Interpretation of a continuous distribution approximating a discrete one: The value of the normal density function at any k approximates the value of the discrete distribution for that value of k. Think of spreading the area of each impulse over a unit interval. Then the height of each rectangle is the probability that the corresponding value of k will be taken. The normal curve approximates this step-wise function. Note that in summing the probabilities for values of k in some interval, the approximating normal curve should be integrated over that interval plus ? on each end to get all the probability associated with those values of k. 2 1 12 ()() N kN PN X N Pk = ≤≤ = ∑ 2 2 2 1 1 1 () 2 2 1 2 21 1 (, ) 2 11 22 N x N kN N Pk e dx NN μ μ μ πμ μ μ μμ + ? ? = ? ≈ ???? +? ?? ???? =Φ ?Φ ???? ???? ∑ ∫ Multidimensional Normal Distribution Probability density function: () 1 2 11 () exp ( ) ( ) 2 2 n f xxXMxX Mπ Τ? ?? =??? ?? ?? 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 8 of 10 Assuming zero mean, which is often the case: () 1 2 11 () exp 2 2 n f xxMx Mπ Τ? ?? =? ?? ?? For zero mean variables: contours of constant probability density are given by: 12T x Mx c ? = Not expressed in principal coordinates if the X i are correlated. Need to know the rudimentary properties of eigenvalues and eigenvectors. M is symmetric and full rank. 1, 0, iii T ij ij Mv v ij vv ij λ δ = =? == ? ≠ ? This probability density function can be better visualized in terms of its principal coordinates. These coordinates are defined by the directions of the eigenvectors of the covariance matrix. The appropriate transformation is 1 T T n yVx v V v = ??←→ ?? = ?? ←→ ?? M Thus y i is the component of x in the direction i v . (In terms of the new variable y , the contours of constant probability density are) 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 9 of 10 111 1 111 1 1 11 1 1 ... ... ... TTT T T T n T nn T n xM x yV MV y yY y YVMV YVMV VM v v v vv v λλ λ λ ??? ? ???? = = = = ??↑↑ ?? = ?? ↓↓ ?? ????←→↑ ↑ ???? = ???? ←→↓ ↓ ???? ?? ?? = ?? ?? O Y is the covariance matrix for the random variable YVX= , so the i λ are the variance of the Y i . 1 1 1 1 n Y λ λ ? ?? ?? = ?? ?? ?? O 1 2 22 2 12 12 12 1 1 ... T n T n n yyc yy y yY y c λ λ λλ λ ? ?? ?? = ?? ?? ?? =+++= O These are the principal coordinate with intercepts at ii ycλ=± with i λ the standard deviation of y i . Note that two random variables, each having a normal distribution singly, do not necessarily have a binormal joint distribution. However, if the random variables are independent and normally distributed, their joint distribution is clearly a multidimensional normal distribution. 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 9/30/2004 9:55 AM Page 10 of 10 Two dimensional case- continued: ij i j mXX= 22 22 1 12 1 2 11 2 212 2 2 11 22 12 11 22 12 2 11 1 2 22 2 12 12 12 12 2 1122 12 212 2 12 12 12 (, ) exp 2( ) 2 In terms of these symbols: 2 1 (, ) exp 21 xy mx mxx mx fxx mm m mm m m m m xxxx fxx π σ σ μ μ ρ σσ ρ σσσσ πσσ ρ ???+ =? ?? ? ? ?? = = = = ?? ?????? ?+ ?? ????? ?? ????? =? ? 2 2 12 2(1 )ρ ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? Note that if a set of random variables having the multidimensional normal distribution is uncorrelated, they are independent. This is not true in general.