Bootstrapping Entropy Estimates

31 Oct 2003 12:46

Speaking personally, I often want to know the entropy H[X] of a random variable X, where I do not know the distribution of X. I have, however, many samples of X, so I can form an estimate of the entropy. But, for personal reasons, I'd like to be able to put confidence confidence intervals around my estimates, and calculating those bounds is frankly beyond my intelligence. Here bootstrapping comes to the rescue. The idea, in general, is as follows. What I want to know is the distribution of estimates around the true parameter. What I do instead is to use my estimate to generate new, simulated data, and then repeat the estimation on the simulations. The distribution of the simulations around the estimate should in some sense approximate the distribution of estimates around the true parameter.

Here a very simple strategy suggests itself. The easiest way to estimate the entropy is to just calculate the entropy of the empirical distribution. If I resample from the empirical distribution, and calculate the entropy of the resamplings, will I get something that looks like the distribution of empirical entropies around the true entropy? And if this works for the entropy, would it work for, say, the relative entropy, i.e., the divergence of the empirical distribution from the true distribution? How about the mutual information? It would be very nice to think so, but I wonder why, if something so simple works, I haven't heard of it being done.

Why might this not work? Suppose X is binary, and both outcomes are equally likely. Then H[X] = 1, which is the maximum value. Hence empirical entropies will be less than or equal to the true entropy. Suppose our empirical distribution has less than maximum entropy, i.e., it's not completely fair. But if we resample the empirical distribution, it may chance that the sample is more nearly fair than the original. So the resampled entropy can be higher than the estimate, while the estimate cannot be higher than the true entropy. But it could be, of course, that the two distributions do approach each other, in some limiting sense which I might have to think carefully about.

Assuming that difficulty isn't insurmountable, could one use resampling from a reconstructed causal state model to bootstrap estimates of the entropy rate or the statistical complexity?

See also: Estimating Entropies and Informations