16.36: Communication Systems Engineering Lecture 2: Entropy Eytan Modiano Eytan Modiano Slide 1 Information content of a random variable ? Random variable X – Outcome of a random experiment – Discrete R.V. takes on values from a finite set of possible outcomes PMF: P(X = y) = P x (y) ? How much information is contained in the event X = y? – Will the sun rise today? Revealing the outcome of this experiment provides no information – Will the Celtics win the NBA championship? Since this is unlikely, revealing yes provides more information than revealing no ? Events that are less likely contain more information than likely events Eytan Modiano Slide 2 Measure of Information ? I(x i ) = Amount of information revealed by an outcome X = x i ? Desirable properties of I(x): 1. If P(x) = 1 or P(x) = 0, then I(x) = 0 2. If 0 < P(x) < 1, then I(x) > 0 3. If P(x) < P(y), then I(x) > I(y) 4. If x and y are independent events then I(x,y) = I(x)+I(y) ? Above is satisfied by: I(x) = Log 2 (1/P(x)) ? Base of Log is not critical – Base 2 => information measured in bits Eytan Modiano Slide 3 ∈∈∈ ∑∑∑ Entropy ? A measure of the information content of a random variable ? X ∈ {x 1 ,…,X M } ? H(X) = E[I(X)] = ∑ P(x i ) Log 2 (1/P(x i )) ? Example: Binary experiment – X = x 1 with probability p – X = x 2 with probability (1-p) – H(X) = pLog 2 (1/p) + (1-p)Log 2 (1/(1-p)) = H b (p) – H(X) is maximized with p=1/2, H b (1/2) = 1 Not surprising that the result of a binary experiment can be conveyed using one bit Eytan Modiano Slide 4 Simple bounds on entropy ? Theorem: Given a random variable with M possible values – 0 <= H(X) <= Log 2 (M) A) H(X) = 0 if and only if P( x i ) = 1 for some i B) H(X) = Log 2 (M) if and only if P( x i ) = 1/M for all i – Proof of A is obvious Y=x-1 – Proof of B requires – the Log Inequality: – if x>0 then l n (x) <= x-1 – Equality if x=1 Y= ln ( x ) Eytan Modiano Slide 5 P Proof, continued M 1 Consider the sum ∑ P i Log ( ) , by log inequality : i= 1 MP i M M 1 1 1 ≤ ∑ P i ( ? 1 ) = ∑ ( ? P i ) = 0 , equality when P i = i= 1 MP i i= 1 M M Writing this in another way : M M M 1 1 1 ∑ P i Log ( ) = ∑ P i Log ( ) + ∑ P i Log ( ) ≤ 0 , equality when P i = i= 1 MP i i= 1 P i i= 1 M M M M 1 That is , ∑ P i Log ( ) ≤ ∑ P i Log ( M ) = Log ( M ) i= 1 P i i= 1 Eytan Modiano Slide 6 1 HX px HX HX y Joint Entropy 1 Joint entropy : (, Y ) = ∑ p x y ( ,) log( (, y ) ) xy , Conditional entropy : H(X | Y ) = uncertainty in X given Y 1 (| Y = y ) = ∑ px HX ( | Y = y ) log ( (| Y = y ) ) px x (| Yy ) ] = ∑ p(Y = y ) H ( X | Yy ) (| Y ) = E [ H X = = y 1 (| Y ) = ∑ p ( x , y ) log ( (| Y = y ) ) px x , y In General : X 1 , ..., X n random variables 1 H(X n | X 1 , ..., X n - 1 ) = ∑ p ( x 1 , ..., x n ) log( Eytan Modiano x, ..., x n p ( xx 1 , ..., x n - 1 ) n | Slide 7 1 Rules for entropy 1. Chain rule: H(X 1 , .., X n ) = H(X 1 ) + H(X 2 |X 1 ) + H(X 3 |X 2 ,X 1 ) + …+ H(X n |X n- 1 …X 1 ) 2. H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y) 3. If X 1 , .., X n are independent then: H(X 1 , .., X n ) = H(X 1 ) + H(X 2 ) + …+H(X n ) If they are also identically distributed (I.I.d) then: H(X 1 , .., X n ) = nH ( X 1 ) 4. H(X 1 , .., X n ) <= H(X 1 ) + H(X 2 ) + …+ H(X n ) (with equality if independent) Proof: use chain rule and notice that H(X|Y) < H(X) entropy is not increased by additional information Eytan Modiano Slide 8 Mutual Information ? X, Y random variables ? Definition: I(X;Y) = H(Y) - H(Y|X) ? Notice that H(Y|X) = H(X,Y) - H(X) => I(X;Y) = H(X)+H(Y) - H(X,Y) ? I(X;Y) = I(Y;X) = H(X) - H(X|Y) ? Note: I(X,Y) >= 0 (equality if independent) – Because H(Y) >= H(Y|X) Eytan Modiano Slide 9