The Bactra Review Stochastic
Complexity in Statistical Inquiry
Thus: If we have n individual data-points, drawn from a finite
alphabet A, use the expanded alphabet A^n. If our data in the new alphabet is
the letter x, and we want to use a binary prefix code, define the
code-word for x to be 0. All other aspects of the code are free to
vary, so long as the Kraft inequality (pp. 21--24) is obeyed, and this can
always be done. Extending this to concatenating multiple data-streams is
trivial. (Matters are a bit more complicated when our data can take on
infinitely many values, whether a countable or uncountable infinity, but in
principle we can do the same thing, using, e.g., Rissanen's arithmetic codes.)
In short: by tailoring the code to the data, we can always reduce
the description-length to one bit.