Bootstrapping Entropy Estimates

Speaking personally, I often want to know the entropy H[X] of a random variable X, where I do not know the distribution of X. I have, however, many samples of X, so I can form an estimate of the entropy. But, for personal reasons, I'd like to be able to put confidence confidence intervals around my estimates, and calculating those bounds is frankly beyond my intelligence. Here bootstrapping comes to the rescue. The idea, in general, is as follows. What I want to know is the distribution of estimates around the true parameter. What I do instead is to use my estimate to generate new, simulated data, and then repeat the estimation on the simulations. The distribution of the simulations around the estimate should in some sense approximate the distribution of estimates around the true parameter.

Here a very simple strategy suggests itself. The easiest way to estimate the entropy is to just calculate the entropy of the empirical distribution. If I resample from the empirical distribution, and calculate the entropy of the resamplings, will I get something that looks like the distribution of empirical entropies around the true entropy? And if this works for the entropy, would it work for, say, the relative entropy, i.e., the divergence of the empirical distribution from the true distribution? How about the mutual information? It would be very nice to think so, but I wonder why, if something so simple works, I haven't heard of it being done.

Why might this not work? Suppose X is binary, and both outcomes are equally likely. Then H[X] = 1, which is the maximum value. Hence empirical entropies will be less than or equal to the true entropy. Suppose our empirical distribution has less than maximum entropy, i.e., it's not completely fair. But if we resample the empirical distribution, it may chance that the sample is more nearly fair than the original. So the resampled entropy can be higher than the estimate, while the estimate cannot be higher than the true entropy. But it could be, of course, that the two distributions do approach each other, in some limiting sense which I might have to think carefully about.

Assuming that difficulty isn't insurmountable, could one use resampling from a reconstructed causal state model to bootstrap estimates of the entropy rate or the statistical complexity?

Martin Biewen, "Bootstrap inference for inequality, mobility and poverty measurement", Journal of Econometrics 108 (2002): 317--342 [What Biewen calls the Thiel Index of inequality looks very like the Shannon entropy, if one identifies his measured variables with probabilities. I'm not sure what that does to his bootstrap arguments. Thanks to Peter Muhlberger for pointing out the reference. Biewen has Stata code available.]
M. Salicru, S. Vives and J. Ocana, "Testing the homogeneity of diversity measures: a general framework", Journal of Statistical Planning and Inference 132 (2005): 117--129
Ritei Shibata, "Bootstrap Estimate of Kullback-Leibler Information for Model Selection," Technical Report No. 424, UC Berkeley statistics department [Abstract]

Notebooks

Bootstrapping Entropy Estimates