Principles of Information Science
Chapter 3
Measures of Information
3-1 Measures of Random Syntactic Information
Shannon Theory of Information
2,The amount of information can then be measured
by the amount of uncertainty it removed,
1,Information is something that can be used to remove
uncertainty,
Key point,
3,In the cases of communications,only waveform is
concerned while meaning and value are ignored,
4,Uncertainty and thus information are statistic in
nature and statistical mathematics is enough,
Shannon Theorem of Random Entropy
The measure of uncertainty,will take the form of
H (p,…,p ) = - k ? p log p n n=1 N n 1 N S
(1) H should be a continuous function of p,for all n; S n
if the conditions below are satisfied,
(2) H should be a monotonically increasing function
of N when p = 1/N for all n; S n
(3) H should observe the rule of stepped weighting
summation,
S
The rule of stepped weighting summation,
1/2
1/3
1/6
x
x
x
1
2
3
1/2
1/2 2/3
1/3
x
x
x
1
2
3
p = 1/2
p = 1/3
p = 1/6
1
2
3
H (1/2,1/3,1/6) = H (1/2,1/2) + 1/2 H (2/3,1/3) S S S
Proof,
(a) In the case of equal probabilities
S
S
H (1/N,…,1/N) = A(N)
By use condition (3),it is then easy to have
A(MN) = H (1/MN,…,1/MN)
S S
M
i=1
= A(M) + A(N) Then
A(N ) = 2 A(N),A(S ) = A(S),A(t ) = A(t) 2 a a b b
Let
= H (1/M,…,1/M) + ? (1/M) H (1/N,…,1/N)
For any given b,it is always possible to find a proper
a such that
S ? t < S a b a+1 (*)
a
b ?
log t
log S < b
a + 1
b
or equivalently
On the other hand,from (*) we have
A(S ) ? A(t ) < A(S )
a
a b a+1
or
A(s) ? b A(t) < (a+1)A(S)
(**)
b
a ? A(t)
A(S) <
a
b +
1
b (***) Thus
A(t)
A(S) -
log t
log S <
1
b e
When b is large enough,we have A(t) = k log t
(b) In the case of unequal and rational probabilities
Let
n ? n i i
i=1
N
i
then the unequal distribution entropy H (p,…,p )
becomes the case of equal probabilities,
i
i = n e,
n 1
H (e … e,…,e,.,e,…,e,.,e )
n n i N
p =
where n -- positive integer for all i,
S 1 N
On one hand we have
H (e … e) = A(1/e) = k log (1/ e)
On the other hand
H (e … e) = H (p,…,p ) + ? p H (1/n,…,1/n )
= H (p,…,p ) + A(n )
= H (p,…,p ) + k ? log n
1 e N i e i e i e
N e
i=1 e S e S e S e
S e
S 1 N i=1 e
N ? p
i e
i=1
N p
i 1 N S Hence
H (p,…,p ) = k[log(1/ e) - ? p log n ]
= - k ? p log n e = - k ? p log p
i
i
i=1
N
i i
i=1
N
i=1
N
i i i i
1 N
(c) If p are irrational,the equation also valid,see (1),
S
I(p,…,p ) = H (p,…,p ) - H (1,0,…,0)
= H (p,…,p )
= - k ?
S S
S
1 N 1 N
1 N
i=1
N p log p
i i
Let N=2,p = p,the base of logarithm takes the value 2
and H (1/2,1/2) = 1 bit,then k=1,Therefore we have
In the case of ideal observation
I(p,…,p ) = H (p,…,p ) = - ? p log p i i i=1
N
1 N 1 N S (bits)
(0 log 0 = 0)
Conclusion
For a system with N possible states and their associated
probabilities p,…,p,the average a priori uncertainty
about the state of the system is H (p,…,p ),N 1 N 1 S
The average amount of information about the system
obtained after observation in ideal case is numerically
equal to the uncertainty it removed,
Properties of Entropy
(1) Symmetry
H (p,…,p ) = H (p,…,p ) 1 N K(1) K(N)
(2) Normalization
H (1/2,1/2) = 1
S S
S
(3) Expandability
H (p,…,p ) = H (0,p,…,p ) = H (p,…,p,0) 1 S N
1 N
(4) Determination
H (1,0) = 0
1 N S S
(5) Extremity
H (p,…,p ) ? H (1/N,…,1/N) = log N 1 N S S
(6) Shannon Inequality
- ? p log p ? - ? p log q i i i i
(7) Conditional Entropy
- ?? p(x,y ) log p(x |y ) ? - ? p(x ) log p(x )
i i
i j i j i j i i i
(8) Addition
H (p q,…,p q,…,p q,…,p q )
= H (p,…,p ) + H (q,…,q )
1 1 1 N M 1 M N
1 M 1 N
(9) Strong Addition
(10) Partial Merging
Improvements and Extensions
-- Wiener (1948),Negative Entropy
-- Fadeev (1958),Loosing conditions
-- Reny (1970),Incomplete entropy
-- Reny (1960),a-entropy
-- Daroczy (1970),b-entropy
-- Arimoto (1971),c-entropy
-- Kolmogorov (1958),e-entropy
-- Posner (1967),e-d-entropy
-- Kullback (1951),Information Variance
-- Guiasu (1975),Weighted entropy
3-2 Measure of Fuzzy Syntactic Information
DeLuca-Termini Conjecture (1972), Fuzzy Entropy
Given a set I and a lattice L,an L-fuzzy set is defined
as a map from I to L and denoted as L(I),I ? L,
For each of its element f and g,define the operations,
(f ? g) (x) = L.u.b{f(x),g(x)}
(f ? g) (x) = G.l.b{f(x),g(x)}
When L = [0,1],then
(f ? g) (x) = max {f(x),g(x)}
(f ? g) (x) = min {f(x),g(x)}
Fuzzy entropy is assumed via the following conjecture
Fuzzy entropy d(f) satisfying three conditions below
may take the form of,
d(f) = -1/N ? [f(x ) log f(x ) + (1-f(x )) log (1- f(x ))] n n n n n=1 N
(1) d(f) = 0 iff f =0 or 1 for all f;
(2) d(f) = max iff f=1/2 for all f;
(3) d(f*) ? d(f) if and only if
f*(x) ? f(x) when f(x) ? 1/2;
f*(x) ? f(x) when f(x) ? 1/2,
3-3 Unified Measures of Syntactic Information
Note that the measure of incidental entropy has the
similar form as that of random entropy,
H (q,…,q ) = - ? q log q n n I 1 N N n=1
Question,Is there any unified form of the measures
for all type of syntactic information?
Answer,yes,
Definition 1
Consider a system X with N possible states x,…,x,
The possibility,chance,or likelihood with which X
may take the state x is defined as certainty of x and
denoted as c,n = 1,…,N,n n
1 N
n
c = n
p,if X is random
q,if X is semi-random
m,
{
if X is fuzzy
n
n
n
0 ? c ? 1,for all n,n ? c n n {
= 1,for p and q
? 1,for m
Definition 2
The set of c,…,c is defined as certainty distribution
and denoted as C,N
1
Definition 3
For any given event X = {x,…,x },there are two
special kinds of certainty distribution,
C = {c | c = 1 or 0},C = {c | c = 1/N} n n S n n 0
C and C are respectively named 0-1 type certainty
distribution and uniform distribution,
Definition 4
For any given C,define an average certainty as below,
Noticing Definition 1 that there are two different cases,
n n 1) ? c = 1 and 2) ? c ? 1,n n
3-3-1 The Case 1
M (C) = f { ? c f (c ) } n=1 N n n f
where f is a monotonically continuous function to be
fixed and f is its inverse function,also monotonically
continuous,
-1
-1
Definition 5
X and Y are two events with their respective certainty
distributions C and D,They are said to be mutually
irrelevant if
f { ? c f (c d )} = f { ? c f(c )} f { ? c f (d )} -1 n n=1 N n n -1 n=1 N n n -1 n=1 N n n
Theorem 3-3-1
The function will take the form of logarithm
if conditions expressed in Definitions 4 and 5 must
satisfied,
f (x) = ln x
Corollary 1
The average certainty of event (X,C) is measured as
M (C) = ? (c ) f n n=1 N C n
Corollary 2
1/N = M (C ) ? M (C) ? M (C ) = 1 0 S f f f
log M (C) = ? c log c n n
N
n=1 f
Definition 6
I(C) =
M (C)
M (C ) f 0 = log N + ? c log c n=1 n n
as the amount of information contained in (X,C),
log
N f
Definition 7
I(C,C*;R) = I(C*) - I(C)
= ? c* log c* - ? c log c n n n n n=1 n=1 N N
as the information amount about the event (X,C,C*)
gained by R through observation,
This is a fundamental result,
Properties of I(C,C*;R)
1,I(C,C*;R) = 0 iff M (C) = M (C*) f f
2,I(C,C*;R) = I(C,C*;R) = log N max
iff (C = C ) and (C* = C*) S 0
I(C,C*;R) = I(C,C*;R) = - log N
min iff (C = C ) and (C* = C* )
S
f 3,I(C,C*;R) ? 0 iff M (C*) ? M (C) f
4,I(C,C*;R ) ? I(C,C*;R ) iff M (C ) M (C ) ?1 1 2 2 2 1
5,I(C,C*;R) I(D,D*;R) iff M (C*) M (C) M (D*) M (D) ? ? f
f
f
f
Theorem 3-3-2
When X is random,I(C,C*;R) = I(P,P*;R) = H(P) S S
3-3-2 Case 2
For any given f,define C = {f,(1-f )} as certainty
distribution of x that is a normalized distribution,n
n n n n
C = {1/2,1/2},C = {1,0}?{0,1},n = 1,…,N n0 nS
M (C ) = f (1-f ),n = 1,…,N n f n n f (1-f ) n n
M (C ) = 1/2 f n0
I(C ) = log n M (C ) M (C ) n
n0 f
f
= log 2 + f log f + (1-f ) log (1-f ) n n n n
I(C,C*;R) = I(C*) - I(C ) n n n n
I(C,C*;R) = 1 N ? I(C,C*;R) n n=1 N n
= 1 N ? [f* log f* + (1-f* ) log (1-f* )
- f log f - (1-f ) log (1-f )]
n n=1
N
n n n
n n n n
Theorem 3-3-3
I(C,C*;R) = I(F,F*;R) =d(F)
It is easy to see from Theorems 3-3-2 and 3-3-3 that
I(C,C*;R) can be regarded as a unified measure for all
kinds of syntactic information,
When the event is random,I(C,C*;R) reduces to H(P),
when the event is semi-random,I(C,C*;R) reduces to
H(Q),and when the event is deterministic with fuzzy
states I(C,C*;R) reduces to d(F),
3-4 Measures of Comprehensive Information
3-4-1 Measure of Semantic Information
The measuring parameter for semantic information
is truth t of a state x,n = 1,…,N,
Nevertheless,t is a fuzzy quantity in nature,
n n
n
t = n {
0,logically false
1/2 logically neutral
1,logically true
others,logically fuzzy
for all n,
M (t ) = t (1-t ) n n t n (1-t ) n f n = 1,…,N
I(T ) = log M (T ) M (T ) n n
n0
f
f
= log 2 + t log t + (1-t ) log (1-t ),for all n,n n n n
I(T,T*; R) = I(T*) - I(T ),for all n,
I(T,T*;R) =
n n n n
1
N ? I(T,T*; R)
= N 1 ?
n n n=1
N
N
n=1
[t* log t* + (1-t*) log (1-t*)
- t log t - (1-t ) log (1-t )]
n n n n
n n n n
n
The properties of I(T,T*;R)
1,I(T,T*; R) ? 0 iff I(T*) ? I(T)
2,I(T,T*;R) = I(T,T*;R) = 1 0 S max
min I(T,T*;R) = I(T,T*;R) = -1,0 S
3,I(T,T*; R ) ? I(T,T*; R ) iff I(T ) ? I(T ) 1 1 2 2 1 2
4,I(T,T*; R) ? I(S,S*; R)
iff I(T*) - I(T) ? I(S*) - I(S);
If I(T*) = I(S*) is given,
then the above condition becomes I(S) ? I(T),
3-4-2 Measure of Pragmatic Information
The measuring parameter for pragmatic information
is truth u of a state x,n = 1,…,N,
Nevertheless,u is a fuzzy quantity in nature,
n n
n
u = n {
0,Minimum utility
1/2 Neutral utility
1,Maximum utility
others,Utility fuzzy
for all n,
M (u ) = u (1-u ) n n u n (1-u ) nf n = 1,…,N
I(U ) = log M (U ) M (U ) n n
n0
f
f
= log 2 + u log u + (1-u ) log (1-u ),for all n,n n n n
I(U,U*; R) = I(U*) - I(U ),for all n,
I(U,U*;R) =
n n n n
1
N ? I(U,U*; R)
= N 1 ?
n n n=1
N
N
n=1
[u* log u* + (1-u*) log (1-u*)
- u log u - (1-u ) log (1-u )]
n n n n
n n n n
n
The properties of I(U,U*;R)
1,I(U,U*; R) ? 0 iff I(U*) ? I(U)
2,I(U,U*;R) = I(U,U*;R) = 1 0 S max
min I(U,U*;R) = I(U,U*;R) = -1,0 S
3,I(U,U*; R ) ? I(U,U*; R ) iff I(U ) ? I(U ) 1 1 2 2 1 2
4,I(U,U*; R) ? I(V,V*; R)
iff I(U*) - I(U) ? I(V*) - I(V);
If I(U*) = I(V*) is given,
then the above condition becomes I(V) ? I(U),
3-4-3 Measures of Comprehensive Information
The parameters for integrative semantic information
? and that for pragmatic information ? are defined n n
0 ? ?,? ? 1 for all n n n
? ? ? 1 and ? ? 1 ? n n n n
Obviously,they are also fuzzy variables,
? = ac ? b t n n n ? = ac ? bt ? gu n n n n
? = {? |n=1,…,N}; ? = {? |n=1,…,N} n n
I(?,?*;R) =
Therefore,the results derived above can also be
well applied here,
1
N ? [?* log ?* + (1- ?*) log (1- ?*)
- ? log ? - (1- ? ) log (1- ? )]
n n n n
n n n n
I(?,?*;R) = 1 N ? [?* log ?* + (1- ?*) log (1- ?*)
- ? log ? - (1- ? ) log (1- ? )]
n=1
N
N
n=1 n n n n
n n n n
This is just the comprehensive information measure,
I(?,?*;R) = I(?,?*;R) iff u = u* = 1 for all n,n n
I(?,?*;R) = I(C,C*;R) iff t = t* =u =u* for all n,
I(?,?*;R) = I(U,U*;R) iff (C=F)?( c =c*=t =t* ) ?n,
I(?,?*;R) = I(T,T*;R) iff (C=F)?(c = c*=u =u*) ?n,
I(?,?*;R) = H(P) iff (C=P)?(C*=P*)?(u =u*=t =t*) ?n,
I(?,?*;R) = d(F) iff (C=F)?(C*=F*)?(u =u*=t =t*) ?n,
I(?,?*;R) = log N iff (C=P )?(C*=P*)?(u =u*=t =t*) ?n,
I(?,?*;R) = I(U,P) + I(P,U) iff (C=P )?(C*=P*)
?(U*=U*)?(t =t*) ?n,
n n n n
n n n n
n n n n
n n n n
n n n n
n n n n
n n
S
S
0 S
S
S
I(?,?*;R)
I(?,?*;R)
I(U,P) I(C,C*;R) I(T,T*;R) I(U,U*;R)
H(P) d(F)
log N
The Relationship among the Measures,
(Boltsmann,Ashby)
(DeLuca,Termini)
(Shannon,Wiener)
Comprehensive
Significance of CIT,Laid Foundation of Info-Science
Sensing System The World
Information
Transferring
Cognition and
Decision-making
Information
Transferring
Information
Acquisition
Information
Execution Source-Sink
Communication Computer Communication
Control System
SIT SIT CIT
WCT WRT