Notebooks
http://bactra.org/notebooks
Cosma's NotebooksenPartial Identification of Parametric Statistical Models
http://bactra.org/notebooks/2022/06/15#partial-identification
<P>A parametric statistical model is said to be "identifiable" if no two
parameter settings give rise to the same distribution of observations. This
means that there is always <em>some</em> way to test whether the parameters
take one value rather than another. If this is not the case, then the model is
said to be "unidentifiable". Sometimes models are unidentifiable because they
are bad models, specified in a stupid way which leads to redundancies.
Sometimes, however, models are unidentifiable because the data are bad --- if
you could measure certain variables, or measure them more precisely, the model
would be identifiable, but in fact you have to put up with noisy, missing,
aggregated, etc. data. (Technically: the information we get from observations
is represented by a sigma algebra, or, over time, a filtration. If two
distributions differ on the full filtration, their restrictions to some smaller
filtration might coincide.) Presumably then you could still <em>partially</em>
identify the model, up to, say, some notion of observational
equivalence. <em>Query</em>: how to make this precise?
<P>If the distribution predicted by the model depends, in a reasonably smooth
way, on the parameters, then we can form the Fisher information matrix, which
is basically the matrix of expected second derivatives of the likelihood with
respect to all the parameters. (I realize that's not a very helpful statement
if you haven't at least forgotten the real definition of the Fisher
information.) Suppose one or more of the eigenvalues of the Fisher information
matrix is zero. Any vector orthogonal to the span of the eigenvectors
corresponding to the non-zero eigenvalues then gives a linear combination of
the original parameters which is unidentifiable, at least in the vicinity of
the point at which you're taking derivatives. This suggests at least two
avenues of approach here.
<ol>
<li> Re-parameterization. Perform some kind of rotation of the
coordinate system in parameter space so that one has a clear distinction
between identifiable and non-identifiable parameters into two orthogonal
groupings, and inference for the former can proceed in total ignorance of the
values of the latter.
<li> Equivalent models. Say (as I implied above) that two parameter
settings at observationally equivalent if they yield the same distribution over
observations. The set of all parameter values observationally equivalent to a
given value should, plausibly, form a sub-manifold of parameter space. The
zero eigenvectors of the Fisher information matrix give the directions in which
we could move in parameter space and (locally) stay on this sub-manifold. Is
this enough to actually <em>define</em> that sub-manifold? It sounds
plausible. (I am thinking of how, in dynamical systems, we can go from knowing
the stable/unstable/neutral directions in the neighborhood of a fixed point to
the stable/unstable/neutral manifolds, extending, potentially, arbitrarily far
away.) Could this actually be used, in an exercise in computational
differential geometry, to <em>calculate</em> the sub-manifold?
</ol>
(2) is actually more ambitious, because implicitly it would let us accomplish
(1), re-parameterization, not just locally, in the vicinity of our favored
parameter value, but globally --- moves along the sub-manifold, by
construction, do not affect the likelihood, while moves from one sub-manifold
to another must.
<P>--- Since writing the first version of this, I've run across the work of
Charles Manski, which is centrally concerned with "partial identification", but
not quite in the sense I had in mind. Rather than reparameterizing to get some
totally identifiable parameters and ignore the rest, he wants to take the
parameters as given, and put <em>bounds</em> on the ones which can't be totally
identified. This is only natural for the kinds of parameters he has in mind,
like the effects of policy interventions.
<P>(Thank to Gustavo Lacerda for corrections.)
<P>See also:
<a href="info-geo.html">Information Geometry</a>;
<a href="statistics.html">Statistics</a>
<ul>Recommended, big picture:
<li>Charles Manski, <cite>Identification for Prediction and
Decision</cite> [Review: <a href="../reviews/manski-on-identification/">Better
Roughly Right Than Exactly Wrong</a>]
<li>Giorgio Picci, "Some Connections Between the Theory of Sufficient
Statistics and the Identifiability Problem", <cite>SIAM Journal on Applied
Mathematics</cite> <strong>33</strong> (1977): 383--398 [Introduces the idea of
a "maximal identifiable statistic" --- the coarsest partition of hypothesis
space where each equivalence class/cell of the partition gives rise to
a <em>distinct</em> distribution of observables. (I would prefer "parameter"
or "functional", rather than "statistic", since it's a function of the
distribution, not the observables, but that's a quibble.) See more
under <a href="sufficient-statistics.html">sufficiency</a>. <a href="http://www.jstor.org/stable/2100699">JSTOR</a>.
Thanks to Rhiannon Weaver for the pointer.]
<ul>Recommended, close-ups:
<li>Omar Melikechi, Alexander L. Young, Tao Tang, Trevor Bowman, David Dunson, James Johndrow, "Limits of epidemic prediction using SIR models", <a href="http://arxiv.org/abs/2112.07039">arxiv:2112.07039</a> [<a href="https://pinboard.in/u:cshalizi/b:74293ce53c25">My comments</a>]
<li>Sven Zenker, Jonathan Rubin, Gilles Clermont, "From Inverse Problems in Mathematical Physiology to Quantitative Differential Diagnoses", <a href="http://dx.doi.org/10.1371/journal.pcbi.0030204"><cite>PLoS Computational Biology</cite> <strong>3</strong> (2007): e205</a> [When your model is unidentified, <em>do an experiment</em>]
</ul>
<ul>To read:
<li>Elizabeth S. Allman, Catherine Matias, John A. Rhodes, "Identifiability of parameters in latent structure models with many observed variables",
<cite>Annals of Statistics</cite> <strong>37</strong> (2009): 3099--3132,
<a href="http://arxiv.org/abs/0809.5032">arxiv:0809.5032</a>
<li>David Campbell, Subhash Lele, "An ANOVA Test for Parameter Estimability using Data Cloning with Application to Statistical Inference for Dynamic Systems", <a href="http://arxiv.org/abs/1305.3299">arxiv:1305.3299</a>
<li>Marisa C. Eisenberg, Michael A. L. Hayashi, "Determining Structurally Identifiable Parameter Combinations Using Subset Profiling", <a href="http://arxiv.org/abs/1307.2298">arxiv:1307.2298</a>
<li>Paul Gustafson
<ul>
<li>"On Model Expansion, Model Contraction, Identifiability and Prior Information: Two Illustrative Scenarios Involving Mismeasured Variables", <a href="http://projecteuclid.org/euclid.ss/1121347636"><cite>Statistical
Science</cite> <strong>20</strong> (2005): 111--140</a> [Thanks to Gustavo Lacerda for the pointer]
<li>"On the behaviour of Bayesian credible intervals in partially identified models", <a href="http://dx.doi.org/10.1214/12-EJS741"><cite>Electronic Journal of Statistics</cite> <strong>6</strong> (2012): 2107--2124</a>
</ul>
<li>Changsung Kang, Jin Tian, "Inequality Constraints in Causal Models with Hidden Variables", <a href="http://arxiv.org/abs/1206.6829">arxiv:1206.6829</a>
<li>Subhash R. Lele, Khurram Nadeem and Byron Schmuland, "Estimability and Likelihood Inference for Generalized Linear Mixed Models Using Data Cloning",
<a href="http://dx.doi.org/10.1198/jasa.2010.tm09757"><cite>Journal of the American Statistical Association</cite> <strong>105</strong> (2010): 1617--1625</a>
<li>Benjamin B. Machta, Ricky Chachra, Mark K. Transtrum, James P. Sethna, "Parameter Space Compression Underlies Emergent Theories and Predictive Models", <a href="http://arxiv.org/abs/1303.6738">arxiv:1303.6738</a>
<li>Charles Manski, <cite>Partial Identification of Probability
Distributions</cite>
<li>Robert Nishihara, Thomas Minka, Daniel Tarlow, "Detecting Parameter Symmetries in Probabilistic Models", <a href="http://arxiv.org/abs/1312.5386">arxiv:1312.5386</a>
</ul>