Algorithmic Information Theory

27 Feb 2017 16:30

Fix your favorite universal computer $U$. Now consider any string of symbols $x$, or anything which we can describe by means of a string of symbols. Among all the programs which can be run on $U$, there will be some which will produce $x$ and then stop. There will therefore be a minimum length for such programs. Call this length (x)$. This is the "algorithmic information content" of $x$ (relative to $U$), or its Kolmgorov (-Chaitin-Solomonoff) complexity.

Note that one program which could produce $x$ would be print($x$); stop; (or its equivalent in the language of $U$), so (x)$ can't be much bigger than $|x|$, the length of $x$. Sometimes, clearly, (x)$ can be much smaller than $x$: if $x$ just consists of the same symbol repeated over and over, say 0, we could use the program for (i in 1:n) { print("0"); }; stop; and give it n=$|x|$, for a total length of a constant (the loop-and-print bit) and $\log{|x|}$ (the number of symbols needed to write the length of $x$). While precise values will, irritatingly, be relative to the universal computer $U$, because the computer is universal, it can emulate any other computer $V$ with a finite-length program which we might call (V)$. Hence (x) \leq K_V(x) + K_U(V)$, and algorithmic information content differs by at most an additive, string-independent constant across computers.

It turns out that the quantity (x)$ shares many formal properties with the (Shannon) entropy of a random variable, typically written $H[X]$, so that one can define analogs of all the usual information-theoretic quantities in purely algorithmic terms. Along these lines, we say that $x$ is "incompressible" if (x) \approx |x|$. What makes this mathematically interesting is that incompressible sequences turn out to provide a model of sequences of independent and identically distributed random variables, and, vice versa, the typical sample path of an IID stochastic process is incompressible. This generalizes: almost every trajectory of an ergodic stochastic process has a Kolmogorov complexity whose growth rate equals its entropy rate (Brudno's theorem). This, by the way, is why I don't think Kolmogorov complexity makes a very useful complexity measure: it's maximal for totally random things!

Unfortunately, algorithmic information content is also incomputable, and doesn't even have computable approximations. (In particular, you can't approximate it with gzip.)

See also: Complexity Measures; Ergodic Theory; Information Theory; the Minimum Description Length Principle; Probability