Notebooks

Empirical Likelihood

04 Dec 2024 09:33

Yet Another Inadequate Placeholder

This is an extremely clever idea for retaining many of the conveniences of ordinary likelihood-based statistical methods, when the assumptions behind those methods are just too much to swallow. Embarrassingly, however, I can never remember the tricks, so I break this out as a notebook in large part to force myself to re-re-re-learn it, in the hope that it will stick this time.

Inference for the expected value of a function

Say we have data points \( Z_1, Z_2, \ldots Z_n \), iidly distributed with common distribution $\mu$. As every schoolchild knows, the nonparametric maximum likelihood distribution here is the empirical distribution, \[ \hat{\mu}_n(z) \equiv \frac{1}{n}\sum_{i=1}^{n}{\delta_{Z_i}(z)} \] (If you're a physicist you'd rather write the summands as \( \delta(z-Z_i) \), big deal.) This is because any other distribution either puts probability mass on points which aren't observed (hence lowering the likelihood), or mis-allocates probability among the observed points (hence lowering the likelihood).

Now suppose we think there ought to be some restriction on the distribution, and that this takes the form of some aspect of the distribution matching some function of a parameter: \[ \Expect{g(Z;\theta) = 0 \]

(In many cases, \( g(Z;\theta) = f(Z) - h(\theta) \), but some flexibility to allow for more complicated forms is harmless at this point.) How do we maximize likelihood, non-parametrically, while respecting this constraint?

First, we need to recognize that it's still the case that putting any probability on un-observed values of $z$ just lowers the likelihood. So we are only interested in distributions which re-allocate probability among the $n$ data points. These can of course be given by $n$ numbers \( p_1, \ldots p_n \), with \( p_i \geq 0 \), \( \sum_{i=1}^{n}{p_i} = 1 \). Using Lagrange multipliers to turn the constrained problem into an unconstrained one, Now we have an optimization problem: \[ \max_{p_1, \ldots p_n, \theta}{\sum_{i=1}^{n}{\log{p_i}}} - \lambda \left( \sum_{i=1}^{n}{p_i} - 1 \right) - \gamma \sum_{i=1}^{n}{p_i g(Z_i;\theta)} \] Let's pretend that [[the angels]] have told us $\theta$ so we only need to maximize over the weights. Take derivatives and set them to zero at the optimum: \[ \frac{1}{\hat{p}_i} - \hat{\lambda} - \hat{gamma} g(Z_i;\theta) = 0 \]

(Here ends the nap-time during which I am writing this; updates to follow, inshallah.)


Notebooks: