How to Cheat the MDL

The Bactra Review Stochastic Complexity in Statistical Inquiry

Thus: If we have n individual data-points, drawn from a finite alphabet A, use the expanded alphabet A^n. If our data in the new alphabet is the letter x, and we want to use a binary prefix code, define the code-word for x to be 0. All other aspects of the code are free to vary, so long as the Kraft inequality (pp. 21--24) is obeyed, and this can always be done. Extending this to concatenating multiple data-streams is trivial. (Matters are a bit more complicated when our data can take on infinitely many values, whether a countable or uncountable infinity, but in principle we can do the same thing, using, e.g., Rissanen's arithmetic codes.)

In short: by tailoring the code to the data, we can always reduce the description-length to one bit.