## Partial Identification of Parametric Statistical Models

*27 Feb 2017 16:30*

A parametric statistical model is said to be "identifiable" if no two
parameter settings give rise to the same distribution of observations. This
means that there is always *some* way to test whether the parameters
take one value rather than another. If this is not the case, then the model is
said to be "unidentifiable". Sometimes models are unidentifiable because they
are bad models, specified in a stupid way which leads to redundancies.
Sometimes, however, models are unidentifiable because the data are bad --- if
you could measure certain variables, or measure them more precisely, the model
would be identifiable, but in fact you have to put up with noisy, missing,
aggregated, etc. data. (Technically: the information we get from observations
is represented by a sigma algebra, or, over time, a filtration. If two
distributions differ on the full filtration, their restrictions to some smaller
filtration might coincide.) Presumably then you could still *partially*
identify the model, up to, say, some notion of observational
equivalence. *Query*: how to make this precise?

If the distribution predicted by the model depends, in a reasonably smooth way, on the parameters, then we can form the Fisher information matrix, which is basically the matrix of expected second derivatives of the likelihood with respect to all the parameters. (I realize that's not a very helpful statement if you haven't at least forgotten the real definition of the Fisher information.) Suppose one or more of the eigenvalues of the Fisher information matrix is zero. Any vector orthogonal to the span of the eigenvectors corresponding to the non-zero eigenvalues then gives a linear combination of the original parameters which is unidentifiable, at least in the vicinity of the point at which you're taking derivatives. This suggests at least two avenues of approach here.

- Re-parameterization. Perform some kind of rotation of the coordinate system in parameter space so that one has a clear distinction between identifiable and non-identifiable parameters into two orthogonal groupings, and inference for the former can proceed in total ignorance of the values of the latter.
- Equivalent models. Say (as I implied above) that two parameter
settings at observationally equivalent if they yield the same distribution over
observations. The set of all parameter values observationally equivalent to a
given value should, plausibly, form a sub-manifold of parameter space. The
zero eigenvectors of the Fisher information matrix give the directions in which
we could move in parameter space and (locally) stay on this sub-manifold. Is
this enough to actually
*define*that sub-manifold? It sounds plausible. (I am thinking of how, in dynamical systems, we can go from knowing the stable/unstable/neutral directions in the neighborhood of a fixed point to the stable/unstable/neutral manifolds, extending, potentially, arbitrarily far away.) Could this actually be used, in an exercise in computational differential geometry, to*calculate*the sub-manifold?

— Since writing the first version of this, I've run across the work of
Charles Manski, which is centrally concerned with "partial identification", but
not quite in the sense I had in mind. Rather than reparameterizing to get some
totally identifiable parameters and ignore the rest, he wants to take the
parameters as given, and put *bounds* on the ones which can't be totally
identified. This is only natural for the kinds of parameters he has in mind,
like the effects of policy interventions.

(Thank to Gustavo Lacerda for corrections.)

See also: Information Geometry; Statistics

- Recommended:
- Charles Manski, Identification for Prediction and Decision [Review: Better Roughly Right Than Exactly Wrong]
- Giorgio Picci, "Some Connections Between the Theory of Sufficient
Statistics and the Identifiability Problem", SIAM Journal on Applied
Mathematics
**33**(1977): 383--398 [Introduces the idea of a "maximal identifiable statistic" --- the coarsest partition of hypothesis space where each equivalence class/cell of the partition gives rise to a*distinct*distribution of observables. (I would prefer "parameter" or "functional", rather than "statistic", since it's a function of the distribution, not the observables, but that's a quibble.) See more under sufficiency. JSTOR. Thanks to Rhiannon Weaver for the pointer.] - Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", PLoS Computational Biology
**3**(2007): e205 [When your model is unidentified,*do an experiment*]

- To read:
- Elizabeth S. Allman, Catherine Matias, John A. Rhodes, "Identifiability of parameters in latent structure models with many observed variables",
Annals of Statistics
**37**(2009): 3099--3132, arxiv:0809.5032 - David Campbell, Subhash Lele, "An ANOVA Test for Parameter Estimability using Data Cloning with Application to Statistical Inference for Dynamic Systems", arxiv:1305.3299
- Marisa C. Eisenberg, Michael A. L. Hayashi, "Determining Structurally Identifiable Parameter Combinations Using Subset Profiling", arxiv:1307.2298
- Paul Gustafson
- "On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables", Statistical
Science
**20**(2005): 111--140 [Thanks to Gustavo Lacerda for the pointer] - "On the behaviour of Bayesian credible intervals in partially identified models", Electronic Journal of Statistics
**6**(2012): 2107--2124

- "On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables", Statistical
Science
- Changsung Kang, Jin Tian, "Inequality Constraints in Causal Models with Hidden Variables", arxiv:1206.6829
- Subhash R. Lele, Khurram Nadeem and Byron Schmuland, "Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning",
Journal of the American Statistical Association
**105**(2010): 1617--1625 - Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, James P. Sethna, "Parameter Space Compression Underlies Emergent Theories and Predictive Models", arxiv:1303.6738
- Charles Manski, Partial Identification of Probability Distributions
- Robert Nishihara, Thomas Minka, Daniel Tarlow, "Detecting Parameter Symmetries in Probabilistic Models", arxiv:1312.5386