← back to index

A Characterization of Entropy in Terms of Information Loss

Baez, Fritz, Leinster · 2011 · arxiv arXiv:1106.1791

Assumes you've seen 🍞 Markov categories. 5 minutes.

wpShannon entropy is the only way to measure information loss that respects composition. There is no other choice.

Information loss

When a stochastic function processes a distribution, some information is lost. How do you measure that loss? wpShannon defined entropy in 1948. Baez, Fritz, and Leinster proved his definition is the only one satisfying three natural properties.

Scheme
distribution p H 1.85 bits H(f ; g) = H(f) + H(g | f) loss of composite = sum of losses. the functor preserves composition.

Functoriality: the key constraint

The paper's theorem: Shannon entropy is the unique measure of information loss that is functorial. Functorial means: the loss of a composite equals the sum of the losses of the parts.

If you process data in two steps, the total information lost is the loss from step 1 plus the loss from step 2. No other definition of "loss" has this property (up to a constant factor).

Scheme

Data processing inequality

A corollary: information can only decrease through processing. Computation loses or preserves bits; it never creates them. This is the wpdata processing inequality (DPI), proved categorically.

Scheme

The uniqueness theorem

The paper's main result (Theorem 4): any function F that assigns a real number to a morphism in FinProb (finite probability distributions) and satisfies:

  1. Functoriality: F(g ∘ f) = F(f) + F(g)
  2. Convex linearity: F respects mixtures of distributions
  3. Continuity: small changes in probabilities → small changes in F

...must be F(f) = c · (H(input) − H(output)) for some constant c ≥ 0. Shannon entropy, up to scale.

Scheme

Notation reference

Paper Python Meaning
H(p)-sum(p*log2(p))Shannon entropy
F(f) = c(H(p)−H(q))h_in - h_outInformation loss of morphism f
FinProb# dicts summing to 1Category of finite probability distributions
DPIh_out <= h_inData processing inequality
Neighbors

Other paper pages

Foundations (Wikipedia)

Translation notes

All examples use finite distributions with explicit probabilities. The paper's theorem holds over FinProb (finite probability distributions with morphisms that preserve total mass). For example, the DPI example on this page merges four outcomes into two and checks that entropy decreases. In the paper, the same inequality applies to a noisy channel transmitting continuous signals: the received signal's entropy never exceeds the transmitted signal's, regardless of the noise model. The inequality is identical. The distributions are not.

The uniqueness result requires all three axioms: drop any one and other measures become possible. Every example is Simplified.

Framework connection: Shannon entropy is the unique measure of the Natural Framework pipeline's information budget. DPI governs how bits flow through each stage. (jkThe Natural Framework)

Ready for the real thing? arxiv Read the paper. Start at Theorem 4 (p.8) for the uniqueness result. 15 pages.