Notebooks

## Indirect Inference

08 Nov 2018 13:41

A technique of parameter estimation for simulation models. You go and build a stochastic generative model of your favorite process or assemblage, and, being a careful scientist, you do a conscientious job of trying to include what you guess are all the most important mechanisms. The result is something you can step through to produce a simulation of the process of interest. But your model contains some unknown parameters, let's say generically $\theta$, and you would like to tune those to match the data — or see if, despite your best efforts, there are aspects of the data which your model just can't match.

Very often, you will find that your model is too complicated for you to appeal to any of the usual estimation methods of statistics. Because you've been aiming for scientific adequacy rather than statistical tractability, it will often happen that there is no way to even calculate the likelihood of a given data set $x_1, x_2, \ldots x_t \equiv x_1^t$ under parameters $\theta$ in closed form, which would rule out even numerical likelihood maximization, to say nothing of Bayesian methods, should you be into them. (For concreteness, I am writing as though the data were just a time series, possibly vector-valued, but the ideas adapt in the obvious way to spatial processes or more complicated formats.) Yet you can simulate; it seems like there should be some way of saying whether the simulations look like the data.

This is where indirect inference comes in, with what I think is a really brilliant idea. Introduce a new model, called the "auxiliary model", which is mis-specified and typically not even generative, but is easily fit to the data, and to the data alone. (By that last I mean that you don't have to impute values for latent variables, etc., etc., even though you might know those variables exist and are causally important.) The auxiliary model has its own parameter vector $\beta$, with an estimator $\hat{\beta}$. These parameters describe aspects of the distribution of observables, and the idea of indirect inference is that we can estimate the generative parameters $\theta$ by trying to match those aspects of observations, by trying to match the auxiliary parameters.

On the one side, start with the data $x_1^t$ and get auxiliary parameter estimates $\hat{\beta}(x_1^t) \equiv \hat{\beta}_t$. On the other side, for each $\theta$ we can generate a simulated realization $\tilde{X}_1^t(\theta)$ of the same size (and shape, if applicable) as the data, leading to auxiliary estimates $\hat{\beta}(\tilde{X}_1^t(\theta)) \equiv \tilde{\beta}_t(\theta)$. The indirect inference estimate $\hat{\theta}$ is the value of $\theta$ where $\tilde{\beta}_t(\theta)$ comes closest to $\hat{\beta}_t$. More generally, we can introduce a (symmetric, positive-definite) matrix $\mathbf{W}$ and minimize the quadratic form $\left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right) \cdot \mathbf{W} \left(\hat{\beta}_t - \tilde{\beta}_t(\theta)\right)$ with the entries in the matrix chosen to give more or less relative weight to the different auxiliary parameters.

The remarkable thing about this is that it works, in the sense of giving consistent parameter estimates, under not too strong conditions. Suppose that the data really are generated under some parameter value $\theta_0$; we'd like to see $\hat{\theta} \rightarrow \theta_0$. (Estimating the pseudo-truth in a mis-specified model works similarly but is more complicated than I feel like going into right now.) Sufficient conditions for this are that

1. the auxiliary estimates converge $\tilde{\beta}_t(\theta) \rightarrow \beta(\theta)$ uniformly in $\theta$, and
2. the function $\beta(\theta)$ is invertible.
(Really, both properties just need to hold in some suitable domain $\Theta$ which includes $\theta_0$.)

Basically, these mean that the set of auxiliary parameters have to be rich enough to characterize or distinguish the different values of the generative parameters, and we need to be able to consistently estimate the former. This means we need at least as many auxiliary parameters as generative ones, so auxiliary models tend to be ones where it's easy to keep loading on parameters. (Adding too many auxiliary parameters does lead to loss of efficiency, however.) If $\beta(\theta)$ is also differentiable in $\theta$, and some additional regularity conditions hold, then we even get asymptotic Gaussian errors, with the matrix of partial derivatives $\partial \beta_i/\partial \theta_j$ playing a role like the Fisher information matrix. — I can't resist adding that the usual conditions quoted for the consistency of indirect inference are stronger, and that these come from a chapter in the dissertation of my student Linqiao Zhao.

I think this is a really, really powerful idea, and one which should be much more widely adopted by people working with simulation models. In particular, one of my Cunning Plans is to make it work for agent-based modeling, and especially for models of social network formation.

A topic of particular interest to me is how to use non-parametric estimators, of regression or density curves say, as the auxiliary models, since then there is never any problem of having too few auxiliary parameters (though they might still be insensitive to the generative parameters, if one is looking the wrong curves. Nickl and Pötscher, below, have some initial results in this direction.

("Approximate Bayesian computation" is a very similar idea, but where the plain truth of the evidence is corrupted by prejudice a prior distribution is used to stabilize estimates, at some cost in sensitivity. I need to learn more about it.)

Recommended, close-ups:
• Ernesto Carrella, Richard M. Bailey and Jens Koed Madsen, arxiv:1807.01579
• Bruce E. Kendall, Stephen P. Ellner, Edward Mccauley, Simon N. Wood, Cheryl J. Briggs, William W. Murdoch and Peter Turchin "Population Cycles in the Pine Looper Moth: Dynamical Tests of Mechanistic Hypotheses", Ecological Monographs 75 (2005): 259--276 [PDF reprint. I learned about indirect inference by hearing Prof. Ellner talk about this paper at the 2007 Montreal workshop on statistics for dynamical systems.]
• Richard Nickl, Benedikt M. Pötscher, "Efficient Simulation-Based Minimum Distance Estimation and Indirect Inference", arxiv:0908.0433
• Simon N. Wood, "Statistical inference for noisy nonlinear ecological dynamic systems", Nature 466 (1102--1104)
Pride compels me to recommend:
• Linqiao Zhao, A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure, Ph.D. thesis, Statistics Department, Carnegie Mellon University, 2010 [PDF]
• Chris P. Barnes, Sarah Filippi, Michael P.H. Stumpf, Thomas Thorne, "Considerate Approaches to Achieving Sufficiency for ABC model selection", Statistics and Computing 22 (2012): 1181--1197, arxiv:1106.6281
• Johanna Bertl, Gregory Ewing, Carolin Kosiol, Andreas Futschik, "Approximate Maximum Likelihood Estimation", arxiv:1507.04553
• Michael G. B. Blum, "Approximate Bayesian Computation: A Nonparametric Perspective", Journal of the American Statistical Association 105 (2010): 1178--1187
• M. G. B. Blum, M. A. Nunes, D. Prangle, and S. A. Sisson, "A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation", Statistical Science 28 (2013): 189--208
• Carles Bretó, Daihai He, Edward L. Ionides, Aaron A. King, "Time series analysis via mechanistic models", Annals of Applied Statistics 3 (2009): 319--348, arxiv:0802.0021
• Marianne Bruins, James A. Duffy, Michael P. Keane, Anthony A. Smith Jr, "Generalized Indirect Inference for Discrete Choice Models", arxiv:1507.06115
• Giovanni Luca Ciampaglia, "A framework for the calibration of social simulation models", Advances in Complex Systems accepted, arxiv:1305.3842 [At last, somebody's doing this!!!]
• D. R. Cox and Christiana Kartsonaki, "The fitting of complex parametric models", Biometrika 99 (2012): 741--747
• Veronika Czellar and Elvezio Ronchetti, "Accurate and robust tests for indirect inference", Biometrika 97 (2010): 621--630
• Pierre Del Moral, Arnaud Doucet and Ajay Jasra, "An Adaptive Sequential Monte Carlo Method for Approximate Bayesian Computation" [PDF preprint]
• Christopher C. Drovandi, Anthony N. Pettitt, Malcolm J. Faddy, "Approximate Bayesian computation using indirect inference", Journal of the Royal Statistical Society C 60 (2011): 317--337
• Christopher C. Drovandi, Anthony N. Pettitt, and Anthony Lee, "Bayesian Indirect Inference Using a Parametric Auxiliary Model", Statistical Science 30 (2015): 72--95
• Paul Fearnhead, Dennis Prangle, "Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation", Journal of the Royal Statistical Society B 74 (2012): 419--474
• Jean-Jacques Forneron, Serena Ng, "The ABC of Simulation Estimation with Auxiliary Statistics", arxiv:1501.01265
• Florian Gach, Benedikt M. Pötscher, "Non-Parametric Maximum Likelihood Density Estimation and Simulation-Based Minimum Distance Estimators", arxiv:1012.3851
• Mark Girolami, Anne-Marie Lyne, Heiko Strathmann, Daniel Simpson, Yves Atchade, "Playing Russian Roulette with Intractable Likelihoods", arxiv:1306.4032
• Aude Grelaud, Christian Robert, Jean-Michel Marin, Francois Rodolphe, Jean-Francois Taly, "ABC likelihood-freee methods for model choice in Gibbs random fields", arxiv:0807.2767
• Ajay Jasra, Nikolas Kantas, Elena Ehrlich, "Approximate Inference for Observation Driven Time Series Models with Intractable Likelihoods", arxiv:1303.7318
• J.-M. Marin, N. Pillai, C. P. Robert, J. Rousseau, "Relevant statistics for Bayesian model choice", arxiv:1110.4700
• Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, Robin Ryder, "Approximate Bayesian Computational methods", arxiv:1101.0955
• Umberto Picchini, "Inference for SDE models via Approximate Bayesian Computation", arxiv:1204.5459
• Dennis Prangle, Paul Fearnhead, Murray P. Cox, Patrick J. Biggs, Nigel P. French, "Semi-automatic selection of summary statistics for ABC model choice", arxiv:1302.5624
• Oliver Ratmann, Anton Camacho, Adam Meijer, Gé Donker, "Statistical modelling of summary values leads to accurate Approximate Bayesian Computations", arxiv:1305.4283 [Sounds a bit like what Wood does in his Nature paper]
• Oliver Ratmann, Pierre Pudlo, Sylvia Richardson, Christian Robert, "Monte Carlo algorithms for model assessment via conflicting summaries", arxiv:1106.5919
• F. J. Rubio, Adam M. Johansen, "A Simple Approach to Maximum Intractable Likelihood Estimation", Electronic Journal of Statistics 7 (2013): 1632--1654, arxiv:1301.0463
• Guosheng Yin, Yanyuan Ma, Faming Liang, and Ying Yuan, "Stochastic Generalized Method of Moments", Journal of Computational and Graphical Statistics forthcoming (2011) [Fast stochastic optimization for GMM --- applicable to II as well?]

Previous versions: 2010-09-19 21:17