## Algorithmic Information Theory

*24 Apr 2024 11:25*

Fix your favorite universal computer \( U \). Now consider any string of symbols \( x \), or anything which we can describe by means of a string of symbols. Among all the programs which can be run on \( U \), there will be some which will produce \( x \) and then stop. There will therefore be a minimum length for such programs. Call this length \( K_U(x) \). This is the "algorithmic information content" of \( x \) (relative to \( U \)), or its Kolmgorov (-Chaitin-Solomonoff) complexity.

Note that one program which could produce \( x \) would
be

`print(`\( x \)`); stop;`

(or its equivalent in the language of
\( U \)), so \( K_U(x) \) can't be much bigger than \( |x| \), the length of \( x \).
Sometimes, clearly, \( K_U(x) \) can be much smaller than \( x \): if \( x \) just consists
of the same symbol repeated over and over, say 0, we could use the
program
`for (i in 1:n) { print("0"); }; stop;`

and give
it `n=`\( |x| \), for a total length of a constant (the loop-and-print bit)
plus \( \log{|x|} \) (the number of symbols needed to write the length of \( x \)).
While precise values will, irritatingly, be relative to the universal computer
\( U \), because the computer is universal, it can emulate any other computer \( V \)
with a finite-length program which we might call \( K_U(V) \). Hence
\[
K_U(x) \leq K_V(x) + K_U(V),
\]
and algorithmic information content differs by at most an
additive, string-independent constant across computers.

It turns out that the quantity \( K_U(x) \) shares many formal properties with the (Shannon) entropy of a random variable, typically written \( H[X] \), so that one can define analogs of all the usual information-theoretic quantities in purely algorithmic terms. Along these lines, we say that \( x \) is "incompressible" if \( K_U(x) \approx |x| \). What makes this mathematically interesting is that incompressible sequences turn out to provide a model of sequences of independent and identically distributed random variables, and, vice versa, the typical sample path of an IID stochastic process is incompressible. This generalizes: almost every trajectory of an ergodic stochastic process has a Kolmogorov complexity whose growth rate equals its entropy rate (Brudno's theorem). This, by the way, is why I don't think Kolmogorov complexity makes a very useful complexity measure: it's maximal for totally random things! But Kolmogorov complexity would be a good measure of (effective) stochasticity, if we could actually calculate it.

Unfortunately, we cannot calculate it. There is a lovely little proof of this, due to Nohre, which I learned of from a paper by Rissanen, and can't resist rehearsing. (There are older proofs, but they're not so pretty.) Suppose there was a program \( P \) which could read in an object \( x \) and return its algorithmic information content relative to some universe computer, so \( P(X) = K_U(x) \). We now use \( P \) to construct a new program \( V \) which compresses incompressible objects.

- Sort all sequences by length, and then alphabetically.
- For the \( i^{\mathrm{th}} \) sequence \( x^i \), use \( P \) to find \( K_U(x^i) \).
- If \( K_U(x^i) \leq |V| \), then keep going.
- Otherwise, set \( z \) to \( x^i \), return \( z \), and stop.

*arbitrary*string. You can in fact strengthen this to say that there is no algorithm which

*approximates*\( K_U \). (In particular, you can't approximate it with

`gzip`.) \[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \]

**Expected Kolmogorov complexity vs. Shannon entropy**
Up to (unimportant) additive constants, \( \Expect{K_U(X)} = H[X] \). To prove
this, start with the fact that \( H[X] \) minimizes the expected code length
for any coding scheme for \( X \). That is, pick any invertible mapping \( c \) from the values of \( X \), \( \mathcal{X} \), to binary sequences. Say \( \ell(c(x)) \) is the length of the binary encoding of \( x \). Then \( \Expect{\ell(c(X))} \) is the expected code length, and (i) \( \Expect{\ell(c(X))} \geq H[X] \), and (ii) there are coding schemes which come arbitrarily close to the lower
bound (while still being uniquely decode-able). Taking this as given, I make two claims:

- \( \Expect{K_U(X)} \geq H[X] \): The mapping from \( x \) to the program on \( U \) which generates \( x \) (and then stops) is a binary encoding of \( x \). \( K_U(x) \) is the length of that encoding. Since \( H[X] \) is a lower bound on the expected code length, it must be a lower bound on expected program length.
- \( \Expect{K_U(X)} \leq H[X] + \text{constant} \): Given a coding/decoding
scheme which achieves the Shannon lower bound, there is a
*finite*program \( D \) on \( U \) which implements the decoding half of the scheme, and the same program \( D \) can be used for all \( x \). Therefore, a program on \( U \) which will generate \( x \) and then stop is \( D \) plus the Shannon coding of \( x \). The length of such a program is \( |D| + \) length of the encoding of \( x \), and the expected length, over random choices of \( X \), is therefore \( |D| + H[X] \).

See also: Complexity Measures; Ergodic Theory; Information Theory; the Minimum Description Length Principle; Probability; "Occam"-style Bounds for Long Programs

- Recommended, big picture:
- Cover and Thomas, Elements of Information Theory [specifically the chapter on Kolmogorov complexity]
- Ming Li and Paul M. B. Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications

- Recommended, close-ups:
- Peter Gacs, John T. Tromp and Paul M. B. Vitanyi, "Algorithmic Statistics", IEEE Transactions on Information Theory
**47**(2001): 2443--463, arxiv:math.PR/0006233 - Stefano Galatolo, Mathieu Hoyrup, and Cristóbal Rojas,
"Effective symbolic dynamics, random points, statistical behavior, complexity
and entropy", arxiv:0801.0209
[
*All*, not almost all, Martin-Lof points are statistically typical.] - Jan Lemeire, Dominik Janzing, "Replacing Causal Faithfulness with Algorithmic Independence of Conditionals", Minds and Machines
**23**(2013): 227--249 - G. W. Müller, "Randomness and extrapolation", Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 2 (Univ. of Calif. Press, 1972), 1--31 [On a notion of randomness supposedly related to, but stronger than, that of Martin-Löf.]
- Tom F. Sterkenburg, "Solomonoff Prediction and Occam's Razor",
Philosophy of Science
**83**(2016): 459--479, phil-sci/12429 - Bastian Steudel, Nihat Ay, "Information-theoretic inference of common ancestors", arxiv:1010.5720
- Paul M. B. Vitanyi and Ming Li, "Minimum Description Length
Induction, Bayesianism, and Kolmogorov Complexity", IEEE Transactions on
Information Theory
**46**(2000): 446--464, cs.LG/9901014 - Vladimir Vovk, "Superefficiency from the Vantage Point of Computability", Statistical Science
**24**(2009): 73--86

- Modesty forbids me to recommend:
- The notes and slides from lecture 8 in my "Chaos, Complexity and Inference" class

- To read:
- Luis Antunes, Bruno Bauwens, Andre Souto, Andreia Teixeira, "Sophistication vs Logical Depth", arxiv:1304.8046
- John C. Baez, Mike Stay, "Algorithmic Thermodynamics", arxiv:1010.2067
- George Barmpalias and Andrew Lewis-Pye, "Compression of Data Streams Down to Their Information Content", IEEE Transactions on Information Theory
**65**(2019): 4471--4485 - Fabio Benatti, Tyll Krueger, Markus Mueller, Rainer Siegmund-Schultze and Arleta Szkola, "Entropy and Algorithmic Complexity in Quantum Information Theory: a Quantum Brudno's Theorem", quant-ph/0506080
- Laurent Bienvenu, Adam Day, Mathieu Hoyrup, Ilya Mezhirov, Alexander Shen, "A constructive version of Birkhoff's ergodic theorem for Martin-Löf random points", arxiv:1007.5249
- Laurent Bienvenu, Peter Gacs, Mathieu Hoyrup, Cristobal Rojas, Alexander Shen, "Algorithmic tests and randomness with respect to a class of measures", arxiv:1103.1529
- Laurent Bienvenu, Rod Downey, "Kolmogorov Complexity and Solovay Functions", arxiv:0902.1041
- Laurent Bienvenu, Alexander Shen, "Algorithmic information theory and martingales", arxiv:0906.2614
- Claudio Bonanno, "The Manneville map: topological, metric and algorithmic entropy," math.DS/0107195
- Claudio Bonanno and Pierre Collet, "Complexity for Extended Dynamical Systems", Communications in Mathematical Physics
**275**(2007): 721--748, math/0609681 - The Computer Journal,
**42:4**(1999) [Special issue on Kolmogorov complexity and inference] - Boris Darkhovsky, Alexandra Pyriatinska, "Epsilon-complexity of continuous functions", arxiv:1303.1777
- Łukasz Dębowski, "Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited", Entropy
**20**(2018): 85 - David Doty, "Every sequence is compressible to a random one", cs.IT/0511074 ["Kucera and Gacs independently showed that every infinite sequence is Turing reducible to a Martin-Lof random sequence. We extend this result to show that every infinite sequence S is Turing reducible to a Martin-Lof random sequence R such that the asymptotic number of bits of R needed to compute n bits of S, divided by n, is precisely the constructive dimension of S."]
- David Doty and Jared Nichols, "Pushdown Dimension", cs.IT/0504047
- Peter Gács, "Uniform test of algorithmic randomness over a
general space", Theoretical Computer
Science
**341**(2005): 91--137 ["The algorithmic theory of randomness is well developed when the underlying space is the set of finite or infinite sequences and the underlying probability distribution is the uniform distribution or a computable distribution. These restrictions seem artificial. Some progress has been made to extend the theory to arbitrary Bernoulli distributions (by Martin-Lof) and to arbitrary distributions (by Levin). We recall the main ideas and problems of Levin's theory, and report further progress in the same framework...."] - Travis Gagie, "Compressing Probability Distributions", cs.IT/0506016 [
*Abstract*(in full): "We show how to store good approximations of probability distributions in small space."] - Stefano Galatolo, Mathieu Hoyrup, Cristóbal Rojas
- "Dynamical systems, simulation, abstract computation", arxiv:1101.0833
- "A constructive Borel-Cantelli Lemma. Constructing orbits with required statistical properties", arxiv:0711.1478

- Peter Grünwald and Paul Vitányi, "Shannon Information and Kolmogorov Complexity", cs.IT/0410002
- Mrinalkanti Ghosh, Satyadev Nandakumar, Atanu Pal, "Ornstein Isomorphism and Algorithmic Randomness", arxiv:1404.0766
- Michael Hochman, "Upcrossing Inequalities for Stationary Sequences and Applications to Entropy and Complexity", arxiv:math.DS/0608311 [where "complexity" = algorithmic information content]
- Mathieu Hoyrup, Cristobal Rojas, "Computability of probability measures and Martin-Lof randomness over metric spaces", arxiv:0709.0907
- S. Jalalai, A. Maleki and R. G. Baraniuk, "Minimum Complexity Pursuit for Universal Compressed Sensing", IEEE Transactions on Information Theory
**60**(2014): 2253--2268, arxiv:1208.5814 - Takakazu Mori, Yoshiki Tsujii, Mariko Yasugi, "Computability of Probability Distributions and Characteristic Functions", arxiv:1307.6357
- Andrej Muchnik, "Algorithmic randomness and splitting of supermartingales", arxiv:0807.3156
- Markus Mueller, "Stationary Algorithmic Probability", arxiv:cs/0608095
- Sven Neth, "A Dilemma for Solomonoff Prediction", arxiv:2206.06473
- Andrew Nies, Computability and Randomness
- E. Rivals and J.-P. Delahae, "Optimal Representation in Average
Using Kolmogorov Complexity," Theoretical Computer Science
**200**(1998): 261--287 - Jason Rute, Topics in Algorithmic Randomness and Computable Analysis
- Andrei N. Soklakov, "Complexity Analysis for Algorithmically Simple Strings," cs.LG/0009001
- H. Takashashi, "Redundancy of Universal Coding, Kolmogorov
Complexity, and Hausdorff Dimension", IEEE Transactions on Information
Theory
**50**(2004): 2727--2736 - Nikolai Vereshchagin and Paul Vitanyi, "Kolmogorov's Structure Functions with an Application to the Foundations of Model Selection," cs.CC/0204037
- Paul Vitanyi, "Randomness," math.PR/0110086
- Vladimir V'yugin, "On Instability of the Ergodic Limit Theorems with Respect to Small Violations of Algorithmic Randomness", arxiv:1105.4274
- C. S. Wallace and David L. Dowe, "Minimum Message Length and Kolmogorov Complexity", The Computer Journal
**42**(1999): 270--283