Empirical Likelihood
04 Dec 2024 09:33
Yet Another Inadequate Placeholder
This is an extremely clever idea for retaining many of the conveniences of ordinary likelihood-based statistical methods, when the assumptions behind those methods are just too much to swallow. Embarrassingly, however, I can never remember the tricks, so I break this out as a notebook in large part to force myself to re-re-re-learn it, in the hope that it will stick this time.
Inference for the expected value of a function
Say we have data points \( Z_1, Z_2, \ldots Z_n \), iidly distributed with common distribution $\mu$. As every schoolchild knows, the nonparametric maximum likelihood distribution here is the empirical distribution, \[ \hat{\mu}_n(z) \equiv \frac{1}{n}\sum_{i=1}^{n}{\delta_{Z_i}(z)} \] (If you're a physicist you'd rather write the summands as \( \delta(z-Z_i) \), big deal.) This is because any other distribution either puts probability mass on points which aren't observed (hence lowering the likelihood), or mis-allocates probability among the observed points (hence lowering the likelihood).
Now suppose we think there ought to be some restriction on the distribution, and that this takes the form of some aspect of the distribution matching some function of a parameter: \[ \Expect{g(Z;\theta) = 0 \]
(In many cases, \( g(Z;\theta) = f(Z) - h(\theta) \), but some flexibility to allow for more complicated forms is harmless at this point.) How do we maximize likelihood, non-parametrically, while respecting this constraint?
First, we need to recognize that it's still the case that putting any probability on un-observed values of $z$ just lowers the likelihood. So we are only interested in distributions which re-allocate probability among the $n$ data points. These can of course be given by $n$ numbers \( p_1, \ldots p_n \), with \( p_i \geq 0 \), \( \sum_{i=1}^{n}{p_i} = 1 \). Using Lagrange multipliers to turn the constrained problem into an unconstrained one, Now we have an optimization problem: \[ \max_{p_1, \ldots p_n, \theta}{\sum_{i=1}^{n}{\log{p_i}}} - \lambda \left( \sum_{i=1}^{n}{p_i} - 1 \right) - \gamma \sum_{i=1}^{n}{p_i g(Z_i;\theta)} \] Let's pretend that [[the angels]] have told us $\theta$ so we only need to maximize over the weights. Take derivatives and set them to zero at the optimum: \[ \frac{1}{\hat{p}_i} - \hat{\lambda} - \hat{gamma} g(Z_i;\theta) = 0 \]
(Here ends the nap-time during which I am writing this; updates to follow, inshallah.)
- See also:
- Large Deviations
- Statistics
- Recommended:
- Yuichi Kitamura, "Empirical Likelihood Methods in Econometrics: Theory and Practice", pp. 174--237 in Richard Blundell, Whitney Newey and Torsten Persson (eds.), Advances in Economics and Econometrics: Theory and Applications, Ninth World Congress, ssrn/917901 (2006)
- To read:
- Gianfranco Adimari and Annamaria Guolo, "A note on the asymptotic behaviour of empirical likelihood statistics", Statistical Methods and Applications 19 (2010): 463--476
- Jianqing Fan and Jian Zhang, "Sieve empirical likelihood ratio tests for nonparametric functions", Annals of Statistics 32 (2004): 1858--1907, math.ST/0503667
- Nils Lid Hjort, Ian W. McKeague, Ingrid Van Keilegom, "Extending the scope of empirical likelihood", Annals of Statistics 37 (2009): 1079--1111, arxiv:0904.2949
- K. L. Mengersen, P. Pudlo, C. P. Robert, "Bayesian computation via empirical likelihood", arxiv:1205.5658
- Art Owen, Empirical Likelihood
- Hanxiang Peng and Anton Schick, "Empirical likelihood approach to goodness of fit testing", Bernoulli 19 (2013): 954--981
- Susanne M. Schennach, "Point estimation with exponentially tilted empirical likelihood", Annals of Statistics 35 (2007): 634--672, arxiv:0708.1874