Notebooks

Indirect Inference

17 Nov 2021 21:33

A technique of parameter estimation for simulation models. You go and build a stochastic generative model of your favorite process or assemblage, and, being a careful scientist, you do a conscientious job of trying to include what you guess are all the most important mechanisms. The result is something you can step through to produce a simulation of the process of interest. But your model contains some unknown parameters, let's say generically $\theta$, and you would like to tune those to match the data — or see if, despite your best efforts, there are aspects of the data which your model just can't match.

Very often, you will find that your model is too complicated for you to appeal to any of the usual estimation methods of statistics. Because you've been aiming for scientific adequacy rather than statistical tractability, it will often happen that there is no way to even calculate the likelihood of a given data set $x_1, x_2, \ldots x_t \equiv x_1^t$ under parameters $\theta$ in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods, should you be into them. (For concreteness, I am writing as though the data were just a time series, possibly vector-valued, but the ideas adapt in the obvious way to spatial processes or more complicated formats.) Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data.

This is where indirect inference comes in, with what I think is a really brilliant idea. Introduce a new model, called the "auxiliary model", which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don't have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector $\beta$, with an estimator $\hat{\beta}$. These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters $\theta$ by trying to match those aspects of observations, by trying to match the auxiliary parameters.

On the one side, start with the data $x_1^t$ and get auxiliary parameter estimates $\hat{\beta}(x_1^t) \equiv \hat{\beta}_t$. On the other side, for each $\theta$ we can generate a simulated realization $\tilde{X}_1^t(\theta)$ of the same size (and shape, if applicable) as the data, leading to auxiliary estimates $\hat{\beta}(\tilde{X}_1^t(\theta)) \equiv \tilde{\beta}_t(\theta)$. The indirect inference estimate $\hat{\theta}$ is the value of $\theta$ where $\tilde{\beta}_t(\theta)$ comes closest to $\hat{\beta}_t$. More generally, we can introduce a (symmetric, positive-definite) matrix $\mathbf{W}$ and minimize the quadratic form $\left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right) \cdot \mathbf{W} \left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right)$ with the entries in the matrix chosen to give more or less relative weight to the different auxiliary parameters.

The remarkable thing about this is that it works, in the sense of giving consistent parameter estimates, under not too strong conditions. Suppose that the data really are generated under some parameter value $\theta_0$; we'd like to see $\hat{\theta} \rightarrow \theta_0$. (Estimating the pseudo-truth in a mis-specified model works similarly but is more complicated than I feel like going into right now.) Sufficient conditions for this are that

1. the auxiliary estimates converge to a non-random "binding function" $\tilde{\beta}_t(\theta) \rightarrow b(\theta)$ uniformly in $\theta$, and
2. the binding function $b(\theta)$ is invertible.
(Really, both properties just need to hold in some suitable domain $\Theta$ which includes $\theta_0$.)

Basically, these mean that the set of auxiliary parameters have to be rich enough to characterize or distinguish the different values of the generative parameters, and we need to be able to consistently estimate the former. This means we need at least as many auxiliary parameters as generative ones, so auxiliary models tend to be ones where it's easy to keep loading on parameters. (Adding too many auxiliary parameters does lead to loss of efficiency, however.) If $b(\theta)$ is also differentiable in $\theta$, and some additional regularity conditions hold, then we even get asymptotic Gaussian errors, with the matrix of partial derivatives $\partial \beta_i/\partial \theta_j$ playing a role like the Fisher information matrix. — I can't resist adding that the usual conditions quoted for the consistency of indirect inference are stronger, and that these come from a chapter in the dissertation of my student Linqiao Zhao.

I think this is a really, really powerful idea, and one which should be much more widely adopted by people working with simulation models. In particular, one of my Cunning Plans is to make it work for agent-based modeling, and especially for models of social network formation.

A topic of particular interest to me is how to use non-parametric estimators, of regression or density curves say, as the auxiliary models, since then there is never any problem of having too few auxiliary parameters (though they might still be insensitive to the generative parameters, if one is looking the wrong curves. Nickl and Pötscher, below, have some initial results in this direction.

("Approximate Bayesian computation" is a very similar idea, but where the plain truth of the evidence is corrupted by prejudice a prior distribution is used to stabilize estimates, at some cost in sensitivity. I need to learn more about it.)

(I wrote the first version of this sometime before 19 September 2010...)

Recommended, close-ups:
• Ernesto Carrella, Richard M. Bailey and Jens Koed Madsen, "Indirect inference through prediction", Journal of Artificial Societies and Social Simulation 23:1 (2020): 7, arxiv:1807.01579 [The idea here is to start with a big candidate pool of auxiliary statistics, sample $\theta$ values randomly, calculate the statistics on simulations from each $\theta$, and then use a penalized linear regression to learn to predict $\theta$ as a function of the $\hat{\beta}$. (The point of the penalization is to stabilize the learned function, and ideally to do some variable selection.) I can well believe that this is computationally nicer than optimization, but it's (implicitly) learning a linear approximation to the inverse binding function, i.e., to $b^{-1}(\beta)$. Unless $b^{-1}$ really is linear, though, the best linear approximation learned by regression will change with the distribution of $\theta$. Maybe this could be iterated, to get a succession of linear approximations over successively smaller domains, hopefully shrinking towards the ideal estimate.]
• Giovanni Luca Ciampaglia, "A framework for the calibration of social simulation models", Advances in Complex Systems accepted, arxiv:1305.3842 [At last, somebody's doing this!!!]
• Niccolò Dalmasso, Rafael Izbicki, Ann B. Lee, "Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting", arxiv:2002.10399
• Jean-Jacques Forneron, Serena Ng, "The ABC of Simulation Estimation with Auxiliary Statistics", arxiv:1501.01265
• Florian Gach, Benedikt M. Pötscher, "Non-Parametric Maximum Likelihood Density Estimation and Simulation-Based Minimum Distance Estimators", arxiv:1012.3851 [See the comments on the earlier paper by Nickl and Pötscher below.]
• Bruce E. Kendall, Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch and Peter Turchin "Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses", Ecological Monographs 75 (2005): 259--276 [PDF reprint. I learned about indirect inference by hearing Prof. Ellner talk about this paper at the 2007 Montreal workshop on statistics for dynamical systems.]
• Richard Nickl, Benedikt M. Pötscher, "Efficient Simulation-Based Minimum Distance Estimation and Indirect Inference", arxiv:0908.0433 [Proving that by using a particular nonparametric density estimator, based on the method of sieves, as the auxiliary estimator, the indirect-inference estimator becomes as efficient, asymptotically, as maximum likelihood. This is impressive, but they have to assume the process being simulated gives IID data, and generalizing would not be easy (at least not for me).
• Simon N. Wood, "Statistical inference for noisy nonlinear ecological dynamic systems", Nature 466 (1102--1104)
Recommended, close-ups on ABC:
• Chris P. Barnes, Sarah Filippi, Michael P.H. Stumpf, Thomas Thorne, "Considerate Approaches to Achieving Sufficiency for ABC model selection", Statistics and Computing 22 (2012): 1181--1197, arxiv:1106.6281 [Given a large candidate of summary statistics, this uses an information-theoretic characterization of sufficiency to efficiently search for a subset of statistics which is approximately sufficient.]
• Kyle Cranmer, Johann Brehmer, and Gilles Louppe, "The frontier of simulation-based inference", Proceedings of the National Academy of Sciences (USA) 117 (2020): 30055--30062, arxiv:1911.01429 [I find it a bit remarkable that this paper completely ignored indirect inference, simulated moments, etc., but it's good on what it does cover]
• Paul Fearnhead, Dennis Prangle, "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation", Journal of the Royal Statistical Society B 74 (2012): 419--474 [If we knew the posterior means of the parameters as functions of the data, those functions would make excellent summary statistics. Of course the whole point of ABC is to find the posterior distribution, so this seems like a vicious circle. They offer a clever way to break the circle by beginning with bad summary statistics and then trying to learn the posterior expectation functions on the basis of the initial approximation to the posterior distribution. Not circular, but not exactly easy.]
• David T. Frazier, Gael M. Martin, Christian P. Robert and Judith Rousseau, "Asymptotic Properties of Approximate Bayesian Computation", Biometrika 105 (2018): 593--607, arxiv:1607.06903
• J.-M. Marin, N. Pillai, C. P. Robert, J. Rousseau, "Relevant statistics for Bayesian model choice", arxiv:1110.4700
• Dennis Prangle, Paul Fearnhead, Murray P. Cox, Patrick J. Biggs, Nigel P. French, "Semi-automatic selection of summary statistics for ABC model choice", arxiv:1302.5624
• Michael Vespe, "The potential of likelihood-free inference of cosmological parameters with weak lensing data", Proceedings of the International Astronomical Union 10:S306 (2015): 90--93
Modesty forbids me to recommend:
• CRS, "A Note on Simulation-Based Inference by Matching Random Features", arxiv:2111.09220
Pride compels me to recommend:
• Linqiao Zhao, A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure, Ph.D. thesis, Statistics Department, Carnegie Mellon University, 2010 [PDF]
• Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
• Michael G. B. Blum, "Approximate Bayesian Computation: A Nonparametric Perspective", Journal of the American Statistical Association 105 (2010): 1178--1187, arxiv:0904.0635
• M. G. B. Blum, M. A. Nunes, D. Prangle, and S. A. Sisson, "A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation", Statistical Science 28 (2013): 189--208
• Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics 3 (2009): 319--348, arxiv:0802.0021
• Marianne Bruins, James A. Duffy, Michael P. Keane, Anthony A. Smith Jr, "Generalized Indirect Inference for Discrete Choice Models", arxiv:1507.06115
• D. R. Cox and Christiana Kartsonaki, "The fitting of complex parametric models", Biometrika 99 (2012): 741--747
• Veronika Czellar and Elvezio Ronchetti, "Accurate and robust tests for indirect inference", Biometrika 97 (2010): 621--630
• Pierre Del Moral, Arnaud Doucet and Ajay Jasra, "An Adaptive Sequential Monte Carlo Method for Approximate Bayesian Computation" [PDF preprint]
• Christopher C. Drovandi, Anthony N. Pettitt, Malcolm J. Faddy, "Approximate Bayesian computation using indirect inference", Journal of the Royal Statistical Society C 60 (2011): 317--337
• Christopher C. Drovandi, Anthony N. Pettitt, and Anthony Lee, "Bayesian Indirect Inference Using a Parametric Auxiliary Model", Statistical Science 30 (2015): 72--95
• Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, "Playing Russian Roulette with Intractable Likelihoods", arxiv:1306.4032
• Aude Grelaud, Christian Robert, Jean-Michel Marin, Francois Rodolphe, Jean-Francois Taly, "ABC likelihood-freee methods for model choice in Gibbs random fields", arxiv:0807.2767
• Ajay Jasra, Nikolas Kantas, Elena Ehrlich, "Approximate Inference for Observation Driven Time Series Models with Intractable Likelihoods", arxiv:1303.7318
• Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, Robin Ryder, "Approximate Bayesian Computational methods", arxiv:1101.0955
• Umberto Picchini, "Inference for SDE models via Approximate Bayesian Computation", arxiv:1204.5459
• Oliver Ratmann, Anton Camacho, Adam Meijer, Gé Donker, "Statistical modelling of summary values leads to accurate Approximate Bayesian Computations", arxiv:1305.4283 [Sounds a bit like what Wood does in his Nature paper]
• Oliver Ratmann, Pierre Pudlo, Sylvia Richardson, Christian Robert, "Monte Carlo algorithms for model assessment via conflicting summaries", arxiv:1106.5919
• F. J. Rubio, Adam M. Johansen, "A Simple Approach to Maximum Intractable Likelihood Estimation", Electronic Journal of Statistics 7 (2013): 1632--1654, arxiv:1301.0463
• Guosheng Yin, Yanyuan Ma, Faming Liang, and Ying Yuan, "Stochastic Generalized Method of Moments", Journal of Computational and Graphical Statistics forthcoming (2011) [Fast stochastic optimization for GMM --- applicable to II as well?]

Previous versions: 2010-09-19 21:17