Notebooks

## Algorithmic Information Theory

24 Apr 2024 11:25

Fix your favorite universal computer $$U$$. Now consider any string of symbols $$x$$, or anything which we can describe by means of a string of symbols. Among all the programs which can be run on $$U$$, there will be some which will produce $$x$$ and then stop. There will therefore be a minimum length for such programs. Call this length $$K_U(x)$$. This is the "algorithmic information content" of $$x$$ (relative to $$U$$), or its Kolmgorov (-Chaitin-Solomonoff) complexity.

Note that one program which could produce $$x$$ would be
print($$x$$); stop;
(or its equivalent in the language of $$U$$), so $$K_U(x)$$ can't be much bigger than $$|x|$$, the length of $$x$$. Sometimes, clearly, $$K_U(x)$$ can be much smaller than $$x$$: if $$x$$ just consists of the same symbol repeated over and over, say 0, we could use the program for (i in 1:n) { print("0"); }; stop;
and give it n=$$|x|$$, for a total length of a constant (the loop-and-print bit) plus $$\log{|x|}$$ (the number of symbols needed to write the length of $$x$$). While precise values will, irritatingly, be relative to the universal computer $$U$$, because the computer is universal, it can emulate any other computer $$V$$ with a finite-length program which we might call $$K_U(V)$$. Hence $K_U(x) \leq K_V(x) + K_U(V),$ and algorithmic information content differs by at most an additive, string-independent constant across computers.

It turns out that the quantity $$K_U(x)$$ shares many formal properties with the (Shannon) entropy of a random variable, typically written $$H[X]$$, so that one can define analogs of all the usual information-theoretic quantities in purely algorithmic terms. Along these lines, we say that $$x$$ is "incompressible" if $$K_U(x) \approx |x|$$. What makes this mathematically interesting is that incompressible sequences turn out to provide a model of sequences of independent and identically distributed random variables, and, vice versa, the typical sample path of an IID stochastic process is incompressible. This generalizes: almost every trajectory of an ergodic stochastic process has a Kolmogorov complexity whose growth rate equals its entropy rate (Brudno's theorem). This, by the way, is why I don't think Kolmogorov complexity makes a very useful complexity measure: it's maximal for totally random things! But Kolmogorov complexity would be a good measure of (effective) stochasticity, if we could actually calculate it.

Unfortunately, we cannot calculate it. There is a lovely little proof of this, due to Nohre, which I learned of from a paper by Rissanen, and can't resist rehearsing. (There are older proofs, but they're not so pretty.) Suppose there was a program $$P$$ which could read in an object $$x$$ and return its algorithmic information content relative to some universe computer, so $$P(X) = K_U(x)$$. We now use $$P$$ to construct a new program $$V$$ which compresses incompressible objects.

1. Sort all sequences by length, and then alphabetically.
2. For the $$i^{\mathrm{th}}$$ sequence $$x^i$$, use $$P$$ to find $$K_U(x^i)$$.
3. If $$K_U(x^i) \leq |V|$$, then keep going.
4. Otherwise, set $$z$$ to $$x^i$$, return $$z$$, and stop.
$$V$$ outputs $$z$$ and stops, so $$K_U(z) \leq |V|$$, but, by construction, $$K_U(z) > |V|$$, a contradiction. The only way out of this contradiction is to deny the premise that $$P$$ exists: there is no algorithm which calculates the Kolmogorov complexity of an arbitrary string. You can in fact strengthen this to say that there is no algorithm which approximates $$K_U$$. (In particular, you can't approximate it with gzip.) $\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}$

Expected Kolmogorov complexity vs. Shannon entropy Up to (unimportant) additive constants, $$\Expect{K_U(X)} = H[X]$$. To prove this, start with the fact that $$H[X]$$ minimizes the expected code length for any coding scheme for $$X$$. That is, pick any invertible mapping $$c$$ from the values of $$X$$, $$\mathcal{X}$$, to binary sequences. Say $$\ell(c(x))$$ is the length of the binary encoding of $$x$$. Then $$\Expect{\ell(c(X))}$$ is the expected code length, and (i) $$\Expect{\ell(c(X))} \geq H[X]$$, and (ii) there are coding schemes which come arbitrarily close to the lower bound (while still being uniquely decode-able). Taking this as given, I make two claims:

1. $$\Expect{K_U(X)} \geq H[X]$$: The mapping from $$x$$ to the program on $$U$$ which generates $$x$$ (and then stops) is a binary encoding of $$x$$. $$K_U(x)$$ is the length of that encoding. Since $$H[X]$$ is a lower bound on the expected code length, it must be a lower bound on expected program length.
2. $$\Expect{K_U(X)} \leq H[X] + \text{constant}$$: Given a coding/decoding scheme which achieves the Shannon lower bound, there is a finite program $$D$$ on $$U$$ which implements the decoding half of the scheme, and the same program $$D$$ can be used for all $$x$$. Therefore, a program on $$U$$ which will generate $$x$$ and then stop is $$D$$ plus the Shannon coding of $$x$$. The length of such a program is $$|D| +$$ length of the encoding of $$x$$, and the expected length, over random choices of $$X$$, is therefore $$|D| + H[X]$$.
Combining the two claims, $$\Expect{K_U(X)} = H[X]$$ up to an additive constant.
Recommended, big picture:
• Cover and Thomas, Elements of Information Theory [specifically the chapter on Kolmogorov complexity]
• Ming Li and Paul M. B. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications
• Luis Antunes, Bruno Bauwens, Andre Souto, Andreia Teixeira, "Sophistication vs Logical Depth", arxiv:1304.8046
• John C. Baez, Mike Stay, "Algorithmic Thermodynamics", arxiv:1010.2067
• George Barmpalias and Andrew Lewis-Pye, "Compression of Data Streams Down to Their Information Content", IEEE Transactions on Information Theory 65 (2019): 4471--4485
• Fabio Benatti, Tyll Krueger, Markus Mueller, Rainer Siegmund-Schultze and Arleta Szkola, "Entropy and Algorithmic Complexity in Quantum Information Theory: a Quantum Brudno's Theorem", quant-ph/0506080
• Laurent Bienvenu, Adam Day, Mathieu Hoyrup, Ilya Mezhirov, Alexander Shen, "A constructive version of Birkhoff's ergodic theorem for Martin-Löf random points", arxiv:1007.5249
• Laurent Bienvenu, Peter Gacs, Mathieu Hoyrup, Cristobal Rojas, Alexander Shen, "Algorithmic tests and randomness with respect to a class of measures", arxiv:1103.1529
• Laurent Bienvenu, Rod Downey, "Kolmogorov Complexity and Solovay Functions", arxiv:0902.1041
• Laurent Bienvenu, Alexander Shen, "Algorithmic information theory and martingales", arxiv:0906.2614
• Claudio Bonanno, "The Manneville map: topological, metric and algorithmic entropy," math.DS/0107195
• Claudio Bonanno and Pierre Collet, "Complexity for Extended Dynamical Systems", Communications in Mathematical Physics 275 (2007): 721--748, math/0609681
• The Computer Journal, 42:4 (1999) [Special issue on Kolmogorov complexity and inference]
• Boris Darkhovsky, Alexandra Pyriatinska, "Epsilon-complexity of continuous functions", arxiv:1303.1777
• Łukasz Dębowski, "Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited", Entropy 20 (2018): 85
• David Doty, "Every sequence is compressible to a random one", cs.IT/0511074 ["Kucera and Gacs independently showed that every infinite sequence is Turing reducible to a Martin-Lof random sequence. We extend this result to show that every infinite sequence S is Turing reducible to a Martin-Lof random sequence R such that the asymptotic number of bits of R needed to compute n bits of S, divided by n, is precisely the constructive dimension of S."]
• David Doty and Jared Nichols, "Pushdown Dimension", cs.IT/0504047
• Peter Gács, "Uniform test of algorithmic randomness over a general space", Theoretical Computer Science 341 (2005): 91--137 ["The algorithmic theory of randomness is well developed when the underlying space is the set of finite or infinite sequences and the underlying probability distribution is the uniform distribution or a computable distribution. These restrictions seem artificial. Some progress has been made to extend the theory to arbitrary Bernoulli distributions (by Martin-Lof) and to arbitrary distributions (by Levin). We recall the main ideas and problems of Levin's theory, and report further progress in the same framework...."]
• Travis Gagie, "Compressing Probability Distributions", cs.IT/0506016 [Abstract (in full): "We show how to store good approximations of probability distributions in small space."]
• Stefano Galatolo, Mathieu Hoyrup, Cristóbal Rojas
• "Dynamical systems, simulation, abstract computation", arxiv:1101.0833
• "A constructive Borel-Cantelli Lemma. Constructing orbits with required statistical properties", arxiv:0711.1478
• Peter Grünwald and Paul Vitányi, "Shannon Information and Kolmogorov Complexity", cs.IT/0410002
• Mrinalkanti Ghosh, Satyadev Nandakumar, Atanu Pal, "Ornstein Isomorphism and Algorithmic Randomness", arxiv:1404.0766
• Michael Hochman, "Upcrossing Inequalities for Stationary Sequences and Applications to Entropy and Complexity", arxiv:math.DS/0608311 [where "complexity" = algorithmic information content]
• Mathieu Hoyrup, Cristobal Rojas, "Computability of probability measures and Martin-Lof randomness over metric spaces", arxiv:0709.0907
• S. Jalalai, A. Maleki and R. G. Baraniuk, "Minimum Complexity Pursuit for Universal Compressed Sensing", IEEE Transactions on Information Theory 60 (2014): 2253--2268, arxiv:1208.5814
• Takakazu Mori, Yoshiki Tsujii, Mariko Yasugi, "Computability of Probability Distributions and Characteristic Functions", arxiv:1307.6357
• Andrej Muchnik, "Algorithmic randomness and splitting of supermartingales", arxiv:0807.3156
• Markus Mueller, "Stationary Algorithmic Probability", arxiv:cs/0608095
• Sven Neth, "A Dilemma for Solomonoff Prediction", arxiv:2206.06473
• Andrew Nies, Computability and Randomness
• E. Rivals and J.-P. Delahae, "Optimal Representation in Average Using Kolmogorov Complexity," Theoretical Computer Science 200 (1998): 261--287
• Jason Rute, Topics in Algorithmic Randomness and Computable Analysis
• Andrei N. Soklakov, "Complexity Analysis for Algorithmically Simple Strings," cs.LG/0009001
• H. Takashashi, "Redundancy of Universal Coding, Kolmogorov Complexity, and Hausdorff Dimension", IEEE Transactions on Information Theory 50 (2004): 2727--2736
• Nikolai Vereshchagin and Paul Vitanyi, "Kolmogorov's Structure Functions with an Application to the Foundations of Model Selection," cs.CC/0204037
• Paul Vitanyi, "Randomness," math.PR/0110086
• Vladimir V'yugin, "On Instability of the Ergodic Limit Theorems with Respect to Small Violations of Algorithmic Randomness", arxiv:1105.4274
• C. S. Wallace and David L. Dowe, "Minimum Message Length and Kolmogorov Complexity", The Computer Journal 42 (1999): 270--283