## Bootstrapping Entropy Estimates

*31 Oct 2003 12:46*

Speaking personally, I often want to know the entropy *H[X]* of a random
variable *X*, where I do not know the distribution of *X*. I
have, however, many samples of *X*, so I can form an estimate of the
entropy. But, for personal reasons, I'd like to be able to put confidence
confidence intervals around my estimates, and calculating those bounds is
frankly beyond my intelligence. Here bootstrapping comes to the rescue. The
idea, in general, is as follows. What I want to know is the distribution of
estimates around the true parameter. What I do instead is to use my estimate
to generate new, simulated data, and then repeat the estimation on the
simulations. The distribution of the simulations around the estimate should in
some sense approximate the distribution of estimates around the true parameter.

Here a very simple strategy suggests itself. The easiest way to estimate the entropy is to just calculate the entropy of the empirical distribution. If I resample from the empirical distribution, and calculate the entropy of the resamplings, will I get something that looks like the distribution of empirical entropies around the true entropy? And if this works for the entropy, would it work for, say, the relative entropy, i.e., the divergence of the empirical distribution from the true distribution? How about the mutual information? It would be very nice to think so, but I wonder why, if something so simple works, I haven't heard of it being done.

Why might this not work? Suppose *X* is binary, and both outcomes
are equally likely. Then *H[X] = 1*, which is the maximum value. Hence
empirical entropies will be less than or equal to the true entropy. Suppose
our empirical distribution has less than maximum entropy, i.e., it's not
completely fair. But if we resample the empirical distribution, it may chance
that the sample is more nearly fair than the original. So the resampled
entropy can be *higher* than the estimate, while the estimate cannot be
higher than the true entropy. But it could be, of course, that the two
distributions do approach each other, in some limiting sense which I might have
to think carefully about.

Assuming that difficulty isn't insurmountable, could one use resampling from a reconstructed causal state model to bootstrap estimates of the entropy rate or the statistical complexity?

See also: Estimating Entropies and Informations

- To read:
- Martin Biewen, "Bootstrap inference for inequality, mobility and
poverty measurement", Journal of
Econometrics
**108**(2002): 317--342 [What Biewen calls the Thiel Index of inequality looks very like the Shannon entropy,*if*one identifies his measured variables with probabilities. I'm not sure what that does to his bootstrap arguments. Thanks to Peter Muhlberger for pointing out the reference. Biewen has Stata code available.] - M. Salicru, S. Vives and J. Ocana, "Testing the homogeneity of
diversity measures: a general framework", Journal of
Statistical Planning and Inference
**132**(2005): 117--129 - Ritei Shibata, "Bootstrap Estimate of Kullback-Leibler Information for Model Selection," Technical Report No. 424, UC Berkeley statistics department [Abstract]