Notebooks

## Partial Identification of Parametric Statistical Models

27 Feb 2017 16:30

A parametric statistical model is said to be "identifiable" if no two parameter settings give rise to the same distribution of observations. This means that there is always some way to test whether the parameters take one value rather than another. If this is not the case, then the model is said to be "unidentifiable". Sometimes models are unidentifiable because they are bad models, specified in a stupid way which leads to redundancies. Sometimes, however, models are unidentifiable because the data are bad --- if you could measure certain variables, or measure them more precisely, the model would be identifiable, but in fact you have to put up with noisy, missing, aggregated, etc. data. (Technically: the information we get from observations is represented by a sigma algebra, or, over time, a filtration. If two distributions differ on the full filtration, their restrictions to some smaller filtration might coincide.) Presumably then you could still partially identify the model, up to, say, some notion of observational equivalence. Query: how to make this precise?

If the distribution predicted by the model depends, in a reasonably smooth way, on the parameters, then we can form the Fisher information matrix, which is basically the matrix of expected second derivatives of the likelihood with respect to all the parameters. (I realize that's not a very helpful statement if you haven't at least forgotten the real definition of the Fisher information.) Suppose one or more of the eigenvalues of the Fisher information matrix is zero. Any vector orthogonal to the span of the eigenvectors corresponding to the non-zero eigenvalues then gives a linear combination of the original parameters which is unidentifiable, at least in the vicinity of the point at which you're taking derivatives. This suggests at least two avenues of approach here.

1. Re-parameterization. Perform some kind of rotation of the coordinate system in parameter space so that one has a clear distinction between identifiable and non-identifiable parameters into two orthogonal groupings, and inference for the former can proceed in total ignorance of the values of the latter.
2. Equivalent models. Say (as I implied above) that two parameter settings at observationally equivalent if they yield the same distribution over observations. The set of all parameter values observationally equivalent to a given value should, plausibly, form a sub-manifold of parameter space. The zero eigenvectors of the Fisher information matrix give the directions in which we could move in parameter space and (locally) stay on this sub-manifold. Is this enough to actually define that sub-manifold? It sounds plausible. (I am thinking of how, in dynamical systems, we can go from knowing the stable/unstable/neutral directions in the neighborhood of a fixed point to the stable/unstable/neutral manifolds, extending, potentially, arbitrarily far away.) Could this actually be used, in an exercise in computational differential geometry, to calculate the sub-manifold?
(2) is actually more ambitious, because implicitly it would let us accomplish (1), re-parameterization, not just locally, in the vicinity of our favored parameter value, but globally --- moves along the sub-manifold, by construction, do not affect the likelihood, while moves from one sub-manifold to another must.

— Since writing the first version of this, I've run across the work of Charles Manski, which is centrally concerned with "partial identification", but not quite in the sense I had in mind. Rather than reparameterizing to get some totally identifiable parameters and ignore the rest, he wants to take the parameters as given, and put bounds on the ones which can't be totally identified. This is only natural for the kinds of parameters he has in mind, like the effects of policy interventions.

(Thank to Gustavo Lacerda for corrections.)

Recommended:
• Charles Manski, Identification for Prediction and Decision [Review: Better Roughly Right Than Exactly Wrong]
• Giorgio Picci, "Some Connections Between the Theory of Sufficient Statistics and the Identifiability Problem", SIAM Journal on Applied Mathematics 33 (1977): 383--398 [Introduces the idea of a "maximal identifiable statistic" --- the coarsest partition of hypothesis space where each equivalence class/cell of the partition gives rise to a distinct distribution of observables. (I would prefer "parameter" or "functional", rather than "statistic", since it's a function of the distribution, not the observables, but that's a quibble.) See more under sufficiency. JSTOR. Thanks to Rhiannon Weaver for the pointer.]
• Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", PLoS Computational Biology 3 (2007): e205 [When your model is unidentified, do an experiment]