16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Lecture 3 Last time: Use of Bayes’ rule to find the probability of each outcome in a set of ( | 1 ,PA E E 2 ,... E ) k n In the special case when E’s are conditionally independent (though they all depend on the alternative, A k ), ( | 1 , ( | 1 ... ( | k n PA E E 2 ,... E ) = PA E E n?1 )PE A ) . k n ( | 1 ... ( | ∑ PA E E )PE A ) i n?1 n i i This is easy to do and can be done recursively. ( | ()(| k 1 k Take E 1 : PA E ) = PA PE A ) k 1 ()( | ∑ PA PE A ) i 1 i i ( | ( | ) ( | k 1 2 k Take E 2 : PA EE ) = PA E PE A ) k 12 (|)( | ∑ PA E PE A ) i 1 2 i i ? Recursive application of Bayes’ rule. Random Variables Start with a conceptual random experiment: ? Continuous-valued random variable ? Discrete-valued random variable The random variable is defined to be a function of the outcomes of the random experiment. Example: X = the number of spots on the top face of a die. How to characterize a random variable? Page 1 of 7 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Probability distribution function (or cumulative probability function) Fx (()= PX ≤ x) , where x is the argument and X is the random variable name. Properties: F(?∞) = 0 F() 1 ∞= (0 ≤ Fx) ≤ 1 Fb ( ()≥ Fa ), if b>a Continuous random variable Discrete random variable An alternate way to characterize random variables is the probability density function: dF x fx()= () dx () 0,fx≥?x x f x ) ()()= F(?∞ + fudu ∫ ?∞ x ()fudu= ∫ ?∞ Page 2 of 7 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde ( ( (Pa< X ≤ b) = PX ≤ b) ? PX ≤ a) () ? Fa )= Fb ( b a () ()f x dx ? ∫ f x dx = ∫ ?∞ ?∞ b ()fxdx= ∫ a ∞ F() ()∞= ∫ f udu= 1 ?∞ The density function is always a non-negative function, which integrates over the full range to 1. Consider the probability that an observation lies in a narrow interval between x and x+dx. xdx+ ( ( )lim P x < X ≤ x + dx) = lim f u du dx→0 dx→0 ∫ x ( )= lim fxdx dx→0 This can be used to measure f(x). We see clearly from this that the probability of a random variable having a continuous distribution function taking any prescribed value ( dx → 0 ) is zero. In the case of a continuous random variable, the probability that it takes any exact value is 0. Also we shall use this relation in setting up problem solutions to indicate the probability of x taking a value in an infinitesimal interval near x. fx) = lim Px< X ≤ x + ? x) ( ( x?→0 ?x The appropriate range has to be chosen due to the fact that if you make intervals too small you’ll have too few samples lying in each interval. The sampling error is too great if the number of samples you get in each interval is too small. If you make the interval too large then the approximation made in the limit is no longer applicable. Discrete variable Page 3 of 7 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Discrete variable Using the Dirac delta function δ() = 0, x ≠ 0x 0 + δ()xdx= 1 ∫ 0 ? The function ()f x ( ) i δ x x = ? we have ()f x = (PX ∑ = ) ( ) i i x δ x x ? i Expection The expectation or expected value of a function of a random variable is defined as: ∞ ( ( )EX) = xf xdx≡ mean value X ∫ ?∞ (= ∑ xP X = x i ) for a discrete-valued random variable X i i ∞ [( () () Egx )] = gx f xdx ∫ ?∞ ( ) (= ∑ gx PX = x ) for a discrete-valued random variable X i i i Some common expectations which partially characterize a random variable- moments of the distribution: Page 4 of 7 X 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde 1 st : Mean or mean value ∞ ( ( )EX) = xf xdx= X , deterministic ∫ ?∞ 2 nd : Mean squared value ∞ ( 2 ( )EX 2 ) = xf xdx= X 2 ∫ ?∞ These are the first two moments of the distribution. The complete infinite set of moments would completely characterize the random variable. We also define central moments (moments about the mean). The first non-zero one is: 1 st : Variance ∞ ) 2 [( ? ? ( )EX X ) 2 ] = ( x X f xdx ∫ ?∞ ∞ ∞ ∞ x 2 fxdx? 2 X ∫ xfxdx X ()() () + 2 f xdx= ∫ ∫ ?∞ ?∞ ?∞ 2 = X 2 ? 2 X + X 2 2 2 = X 2 ? X ≡σ Standard Deviation σ = Variance indicates the center of the distribution σ indicates the width of the distribution The variance of a random variable is the mean squared value of its deviation from its mean. In some sense then, the variance must measure the probability of stated deviations from the mean. Page 5 of 7 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde A very general quantitative statement of this is known as Chebyshev’s Inequality: 2 σ (|PX ? X |≥ t) ≤ t 2 This is a very general result in that it holds for any distribution of X. Note that it gives you just an upper bound on the probability of exceeding a given deviation from the mean. Being complete general, we should not expect this bound to be a tight one quantitatively. Because of its generality, it is a powerful analytical tool. For quantitative practical work it is preferable if possible to estimate the distribution of X and use this to calculate the probability indicated above. Characteristic Function 1. Definition of Characteristic Function 2. Inverse relation 3. Characteristic function and inverse for discrete distributions t t4. Prove φ ()=Πφ i ()where X = X 1 + X 2 + ...+ X (X i independent) x x n 5. MacLaurin series expansion of φ(t) 1. Definition of Characteristic Function jtx )()= E e φ t ( ∞ ef () jtx x dx = ∫ x ?∞ The characteristic function of a random variable X with density function f x (x) is defined to be the expectation of e jtx ; the integral expression then follows from the relation for the expectation of a function of a random variable. The main purpose in defining the characteristic function is its utility in the treatment of many problems. 2. Inverse relation The expression for φ(t) is in the form of the inverse Fourier transform of the density function fo X; thus by the theory of Fourier transforms the direct relation must hold and the functions involved are uniquely related. Note that the integral always converges. Page 6 of 7 16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde f ()= 1 ∞ ∫ e ? jxt φ( dt x ) x 2π ?∞ So f(x) and φ(t) imply the same information. 3. Characteristic function and inverse for discrete distributions For discrete distributions, the density function is the sum of terms of the type f i ()= p δ(x ? x )x i i For each such term, the characteristic function has an additive term of the form ∞ )φ ()= e jtx f i (x dx i t ∫ ?∞ ∞ jtx = ∫ ep i δ(x ? x i )dx ?∞ = pe jtx i i If the inverse relation is to make sense in this case we must have ()= 1 ∞ ∫ e ? jxt φ (t dt i fx 2π ?∞ i ) = 1 ∫ ∞ e ? jxt p e jtx i dt 2π ?∞ i = p 1 ∞ ∫ e ? jt ( x? x i ) dt = pδ(x ? x ) i 2π ?∞ i i This requires 1 ∞ ∫ e ? jt( x? x i ) dt =δ(x ? x i ) , 2π ?∞ which is true of any use of a delta function in transform operations. This is also true for the opposite transform pair, where the exponent of e is positive. Page 7 of 7