16.36:
Communication Systems Engineering
Lecture 2:
Entropy
Eytan Modiano
Eytan Modiano
Slide 1
Information content of a random variable
?
Random variable X
–
Outcome of a random experiment
–
Discrete R.V. takes on values from a finite set of possible outcomes
PMF: P(X = y) =
P
x
(y)
?
How much information is contained in the event X = y?
–
Will the sun rise today?
Revealing the outcome of this experiment provides no information
–
Will the Celtics win the NBA championship?
Since this is unlikely, revealing yes provides more information than revealing no
?
Events that are less likely contain more information than likely events
Eytan Modiano
Slide 2
Measure of Information
?
I(x
i
) = Amount of information revealed by an outcome X =
x
i
?
Desirable properties of I(x): 1.
If P(x) = 1 or P(x) = 0, then I(x) = 0
2.
If 0 < P(x) < 1, then I(x) > 0
3.
If P(x) < P(y), then I(x) > I(y)
4.
If x and y are independent events then I(x,y) = I(x)+I(y)
?
Above is satisfied by: I(x) = Log
2
(1/P(x))
?
Base of Log is not critical
–
Base 2 => information measured in bits
Eytan Modiano
Slide 3
∈∈∈
∑∑∑
Entropy
?
A measure of the information content of a random variable
?
X
∈
{x
1
,…,X
M
}
?
H(X) = E[I(X)] =
∑
P(x
i
) Log
2
(1/P(x
i
))
?
Example: Binary experiment
–
X = x
1
with probability p
–
X = x
2
with probability (1-p)
–
H(X) = pLog
2
(1/p) + (1-p)Log
2
(1/(1-p)) =
H
b
(p)
–
H(X) is maximized with p=1/2,
H
b
(1/2) = 1
Not surprising that the result of a binary experiment can be conveyed using one bit
Eytan Modiano
Slide 4
Simple bounds on entropy
?
Theorem: Given a random variable with M possible values
–
0 <= H(X) <= Log
2
(M)
A)
H(X) = 0 if and only if P(
x
i
) = 1 for some i
B)
H(X) = Log
2
(M) if and only if P(
x
i
) = 1/M for all
i
–
Proof of A is obvious
Y=x-1
–
Proof of B requires
–
the Log Inequality:
–
if x>0 then
l
n
(x) <= x-1
–
Equality if x=1
Y=
ln
(
x
)
Eytan Modiano
Slide 5
P
Proof, continued
M
1
Consider the sum
∑
P
i
Log
(
)
,
by log inequality
:
i=
1
MP
i
M
M
1
1
1
≤
∑
P
i
(
?
1
)
=
∑
(
?
P
i
)
=
0
,
equality when P
i
=
i=
1
MP
i
i=
1
M
M
Writing this in another way
:
M
M
M
1
1
1
∑
P
i
Log
(
)
=
∑
P
i
Log
(
)
+
∑
P
i
Log
(
)
≤
0
,
equality when P
i
=
i=
1
MP
i
i=
1
P
i
i=
1
M
M
M
M
1
That
is
,
∑
P
i
Log
(
)
≤
∑
P
i
Log
(
M
)
=
Log
(
M
)
i=
1
P
i
i=
1
Eytan Modiano
Slide 6
1
HX
px
HX
HX
y
Joint Entropy
1
Joint entropy
:
(,
Y
)
=
∑
p
x
y
(
,)
log(
(,
y
)
)
xy
,
Conditional entropy
:
H(X
|
Y
)
=
uncertainty in X given Y
1
(|
Y
=
y
)
=
∑
px
HX
(
|
Y
=
y
)
log
(
(|
Y
=
y
)
)
px
x
(|
Yy
)
]
=
∑
p(Y
=
y
)
H
(
X
|
Yy
)
(|
Y
)
=
E
[
H
X
=
=
y
1
(|
Y
)
=
∑
p
(
x
,
y
)
log
(
(|
Y
=
y
)
)
px
x
,
y
In General
:
X
1
,
...,
X
n
random variables
1
H(X
n
|
X
1
,
...,
X
n
-
1
)
=
∑
p
(
x
1
,
...,
x
n
)
log(
Eytan Modiano
x,
...,
x
n
p
(
xx
1
,
...,
x
n
-
1
)
n
|
Slide 7
1
Rules for entropy
1.
Chain rule:
H(X
1
, ..,
X
n
) = H(X
1
) + H(X
2
|X
1
) + H(X
3
|X
2
,X
1
) + …+ H(X
n
|X
n-
1
…X
1
)
2.
H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)
3. If X
1
, ..,
X
n
are independent then:
H(X
1
, ..,
X
n
) = H(X
1
) + H(X
2
) + …+H(X
n
)
If they are also identically distributed (I.I.d) then:
H(X
1
, ..,
X
n
) = nH
(
X
1
)
4. H(X
1
, ..,
X
n
) <=
H(X
1
) + H(X
2
) + …+ H(X
n
) (with equality if independent)
Proof: use chain rule and notice that H(X|Y) < H(X)
entropy is not increased by additional information
Eytan Modiano
Slide 8
Mutual Information
?
X, Y random variables
?
Definition:
I(X;Y) = H(Y) - H(Y|X)
?
Notice that H(Y|X) = H(X,Y) - H(X) => I(X;Y) = H(X)+H(Y) - H(X,Y)
?
I(X;Y) = I(Y;X) = H(X) - H(X|Y)
?
Note:
I(X,Y) >= 0 (equality if independent)
–
Because H(Y) >= H(Y|X)
Eytan Modiano
Slide 9