Chapter 2 Entropy, Relative Entropy & Mutual Information
DEF Entropy H(x) of a discrete random variable X is defined by H(X)=−∑p(x)logp(x)
PROP H(X)≥0
DEF Joint Entropy H(X,Y) of a pair of discrete random variables (X,Y)with a joint distribution p(x,y)is defined as H(X,Y)=−∑x,yp(x,y)logp(x,y).
DEF Conditional Entropy H(Y∣X)is defined as: H(Y∣X)=∑p(x)H(Y∣X=x)=−∑p(x)∑p(y∣x)logp(y∣x)=−∑x,yp(x,y)logp(y∣x)
THEOREM H(X,Y)=H(X)+H(Y∣X).
COR H(X,Y∣Z)=H(X∣Z)+H(Y∣X,Z)
PROOF H(X,Y,Z)=H(X,Y∣Z)+H(Z)=H(Y∣X,Z)+H(X,Z)=H(Y∣X,Z)+H(X∣Z)+H(Z)
DEF Kullback-Leibler Distance / Relative Entropy D(p∣∣q)=∑p(x)logq(x)p(x).
PROP 0log00=0,0logq0=0,plog0p=+∞.
DEF Mutual Information I(X;Y)is the relative entropy between the joint distribution and the product distribution p(x)p(y):I(X;Y)=D(p(x,y)∣∣p(x)p(y)).
PROP I(X;Y)=∑x,yp(x,y)logp(x)p(x,y)=H(X)−H(X∣Y).
PROP I(X;X)=H(X).
PROP I(X;Y)=H(X)+H(Y)−H(X,Y)
It's easy to see that relations between entropy, joint entropy, conditional entropy and mutual information can be represented by a Venn graph.
Last updated