Simulation-Based Econometric Methods

<head>
<title>Review of Gouri&eacute;roux and Monfort, Simulation-Based Econometric Methods</title> 

<script>
window.MathJax = {
  tex: {
    inlineMath: [['$','$'], ['\\(','\\)']],
    processEscapes: true,
    autoload: {
      color: [],
      colorv2: ['color']
    },
    packages: {'[+]': ['noerrors']}
  },
  options: {
    ignoreHtmlClass: 'tex2jax_ignore',
    processHtmlClass: 'tex2jax_process'
  },
  loader: {
    load: ['[tex]/noerrors']
  }
};
</script>
<script src="http://bactra.org/mathjax/tex-svg.js" id="MathJax-script"></script>


</head>

<cite><a href="../">The Bactra Review: Occasional and eclectic book reviews by 
Cosma Shalizi</a></cite> &nbsp; 138 

<h1>Simulation-Based Econometric Methods</h1> 
<h2><em>by</em> <a href="../authors.html#christian-gourieroux">Christian Gouri&eacute;roux</a> 
<em>and</em> <a href="../authors.html#alain-monfort">Alain Monfort</a></h2> 

Oxford University Press, 1996 

<hr> 

<h4>By Indirection Find Direction Out</h4> 

Statistical modeling has two parts: one is devising stochastic models of 
phenomena we care about; the other is relating those models to data, i.e., 
statistical inference.  These stochastic models have unknown parameters, and 
one of the most important parts of statistics is estimating those parameters 
from data.  The classical approach is to extract some prediction about what the 
data will look like from the model, and then use as one's estimate the 
parameter value whose prediction most nearly comes true.  This book is about 
what to do when the model one wants to use is so complicated that one cannot 
calculate its predictions exactly, but <em>can</em> run simulations from it. 
The authors' most important idea is that of "indirect inference", where one 
introduces an "auxiliary" model which is easily fit to the data.  One then fits 
the auxiliary both to the real data, and to simulated data from different 
settings of the primary model's parameters, and chooses the value of the latter 
where the fit to simulations best matches the fit to reality.  In other words, 
the primary model is predicting the estimate of the auxiliary model's 
parameters!  This sounds paradoxical, but it can work and work well, even if 
the auxiliary model isn't even remotely accurate, so long as it's easy to fit. 

<P>Indirect inference is, I think, a really important methodological advance, 
one which opens the door to doing a <em>lot</em> of useful statistics on models 
of complex systems.  However, Gouri&eacute;roux and Monfort write for a reader 
who is very familiar with theoretical statistics, in particular with concepts 
such as the likelihood and maximum likelihood estimation, Fisher information, 
the score, consistency and efficiency, and so forth, though no measure theory. 
(Say, <a 
href="http://www.stat.cmu.edu/~larry/">Wasserman</a>'s <cite><a 
href="http://www.stat.cmu.edu/~larry/all-of-statistics/">All of 
Statistics</a></cite>.)  No special knowledge of econometrics is really needed, 
though the last three chapters may seem under-motivated to those not committed 
to standard econometric models.  All this being the case, in the rest of my 
review I will presume the reader has at least some recollection of the basic 
ideas of probability, expectation, etc. 


<P>Let me start by giving some concrete examples of what I mean by "what the 
model predicts for different parameters".  Typically, predictions will depend 
not just on the parameters,  $ \theta  $, but also on some external or 
"exogenous" variables, which the model doesn't attempt to predict, <em>z</em>. 
Different methods of estimation can then be based on different predictions 
about the "endogenous" variables <em>y</em>. 

<P>In the "generalized method of moments", one picks a number of functions of 
the data <em>y</em> and the exogenous variables, say  $  K_i(y,z) $, 
with <em>i</em> here just being an index for these "generalized moments".  One 
would then calculate both the expected or predicted value of the moments (a 
function of the parameter) 

\[
\mathbf{E}_{\theta}[K_i(y,z)] \equiv k_i(\theta,z) 
 \]

and the empirical or realized value of the moments (a function of the data) 

\[ 
\frac{1}{T}\sum_{t}{K_i(y_t,z)} \equiv \hat{k}_i(z) 
\]

with the sum running over all the data points.  One's guess for the parameter,
$ \hat{\theta}_{GMM} $, is the value of $ \theta $ which makes the expectations
as close to the realization as possible.  Provided some law of large numbers
or <a href="../../notebooks/ergodic-theory.html">ergodic theorem</a> holds,

\[
\hat{k}_i(z) \rightarrow k_i(\theta_0,z) 
 \]

where $ \theta_0 $ is the true parameter value, so the estimator is 
"consistent", i.e., 

\[ 
\hat{\theta}_{GMM} \rightarrow \theta_0 
 \]

if the mapping from $ \theta$ to generalized moments $k_i(\theta,z)$ is
invertible.


(There are actually some minor "regularity" conditions needed for consistency, 
over and above the law of large numbers, but let's let that slide here.) 

<P>The method of least squares works similarly.  We assume that 

\[ 
\mathbf{E}_{\theta}[Y_t|y_1^{t-1},z] = f(y_1^{t-1},z;\theta) 
 \]

where we know the functional form <em>f</em> (and where $ y_1^{t-1} $
means "all the observations from time 1 to time <em>t</em>-1").  The mean squared 
prediction error at a given  $ \theta $ is then 

\[
\frac{1}{T}\sum_{t}{{\left(y_t - f(y_1^{t-1},z;\theta)\right)}^2} 
 \]

which we minimize over  $ \theta $.  (This can be seen as a version of 
the method of moments, with a different "moment" for each observation.) 

<P>Finally, the method of maximum likelihood asks "how often should we expect 
to see data like this, under this model?", and tries to maximize that probability: 

\[
L(\theta,z) = \sum_{t}{\log{p_{\theta}(y_t|y_1^{t-1},z)}} 
 \]

where $ p_{\theta}(y_t|y_1^{t-1},z) $ is the probability density.  (Bayesian
estimation is a likelihood-based method, in which the impact of facts and
experience is blunted and smoothed by prejudice.)

<P>Originally, all of these methods of estimation were practical only if one 
could derive a simple formula for the best-fitting parameter values as a 
function of the data.  Latter, with the rise of numerical optimization on 
cheap, fast computers, one could get away from needing an exact formula, 
provided it was possible to say precisely what the model predicted --- most 
often, what the likelihood function was. 

<P>This sounds like it ought to be easy, but there are many models which are 
very natural from a scientific view-point (because they nicely represent 
mechanisms we guess are at work) for which exact expressions for the 
likelihood, or indeed for other predictions, just are not available.  In 
modeling dynamics, for example, if what we observe is not the full state of the 
system, but rather only part of it (and generally a part distorted by noise and 
nonlinearity at that), it becomes exceedingly difficult to calculate the 
probability of seeing a given sequence of observations.  Or, again, if one's 
model is specified in terms of the behavior of large numbers of interacting 
entities (like molecules or economic agents), each possibly with an unobserved 
internal state, finding an exact likelihood function is pretty much hopeless. 
If we nonetheless want to connect our models to reality, and estimate 
parameters, what then should we do? 

<P>Gouri&eacute;roux and Monfort's answer turns on the fact that even though 
many interesting models can be simulated even when they can't be solved.  That 
is, one can fairly quickly and cheaply "run them forward" to generate examples 
of the kind of behavior they say should happen, if necessary making many 
simulation runs to get many samples of the behavior they predict.  One can then 
use those samples for estimation, and this in two ways, "direct" and 
"indirect". 

<P>The "direct" method of simulation-based inference is older and more 
straightforward; just use the sample of simulation runs as an approximation to 
the probability distribution generated by the model.  In the formulas where one 
would want to use the theoretical probabilities to calculate expectations, 
likelihoods, etc., substitute the appropriate average over simulations.  The 
easiest way to see how this works is with the method of moments.  The actual 
expectations  $ k_i(\theta,z) $ can be very hard to calculate 
analytically.  In the "method of simulated moments" (chapter 2), one doesn't 
even try, but rather fixes  $ \theta $ and runs the simulator <em>S</em> 
times, each run being the same size as the data, giving simulated 
values $ y^{(s,\theta)}_{t} $.  One then treats the simulated mean, 

\[ 
\hat{k}^{S}_i(\theta,z) = \frac{1}{S}\sum_{s}{\frac{1}{T}\sum_{t}{K_i(y^{(s,\theta)}_t,z)}} 
 \]

as though it were the exact mean.  This introduced extra error into the 
estimate of $ \theta $, of course, but this error will shrink as the 
number of simulation runs (<em>S</em>) grows.  (Gouri&eacute;roux and Monfort 
consider some clever tricks for re-using the same set of random number draws 
for multiple $ \theta $, which reduces the computational load.)  Some 
care is needed to preserve convergence to the truth as the data size 
(<em>T</em>) grows, but this can still be arranged. 

The other classical estimation methods work similarly (chapter 3).  If one can
draw from the predictive density, $ p_{\theta}(y_t|y_1^{t-1},z) $, then the
average of several such draws is an estimate of the conditional expectation, $
\mathbf{E}_{\theta}[Y_t|y_1^{t-1},z] $, and can be used in the method of
simulated least squares.  Only slightly more exotic, if $
p_{\theta}(y_t|y_1^{t-1},z) $ can't itself be drawn from, but one can generate
a random variable whose <em>expectation</em> is equal to the conditional
density, one can then employ the method of simulated maximum likelihood.
Remarkably, this retains (approximately) many of the nice properties of actual
maximum likelihood estimation, at least if the number of simulation runs is
large enough compared to the data size.

<P>The "principle of indirect inference" (ch. 4) is more subtle, and to me much 
more exciting.  In this approach, one introduces an "auxiliary" or 
"instrumental" model, which is not in general expected to be correct, but is 
supposed to be something which is easy to fit to the data.  One then fits the 
auxiliary model both to the data, getting auxiliary parameter 
values $ \hat{\beta}(\mathrm{data}) $, simulations from the primary 
model for various values of the latter's parameters, getting auxiliary 
parameter values  $ \hat{\beta}(\mathrm{sim},\theta) $.  The indirect 
estimate of  $ \theta $  is then the parameter setting where 
$ \hat{\beta}(\mathrm{sim},\theta) $  comes closest 
to  $ \hat{\beta}(\mathrm{data}) $.  In effect, one is still comparing 
the model's predictions to the data, but the prediction is now "what will the 
auxiliary model look like?", rather than more direct feature of the data. 

<P>For this to work, there are essentially two requirements.  The first 
requirement is that, if we feed in larger and larger samples from the primary 
model, with its parameters held to  $ \theta $, then the estimates of the 
auxiliary parameters will converge,  $ \hat{\beta}(\mathrm{sim},\theta) 
\rightarrow b(\theta) $.  The second requirement is 
that  $ b(\theta) $ be invertible.  Assuming these assumptions hold, the 
indirect estimate will be consistent, that is, it will converge on the true 
value of  $ \theta $.  (Gouri&eacute;roux and Monfort actually [p. 85] 
prove consistency under a stronger set of assumptions, which entail these, but 
these are the ones which actually do the work.)  Under somewhat stronger 
assumptions, they are also able to say something about the limiting 
distribution of indirect estimates around the truth, and even to derive a 
version of the <a href="../cramer-on-math-stat/">Cram&eacute;r</a>-Rao 
inequality. 

<P>The first assumption, convergence of auxiliary parameter estimates, is very 
weak, though not altogether trivial.  The second assumption basically demands 
that the auxiliary model be rich enough to distinguish between different 
versions of the primary model.  Typically, but not necessarily always, this 
will entail their being at least as many auxiliary parameters as there are 
primary ones, though these needn't correspond in any useful or comprehensible 
way.  The distributional and Cram&eacute;r-Rao-style results are of the kind 
one would expect: the indirect estimates will be more precise when the 
auxiliary parameters can be precisely estimated from the data, and when small 
differences in the auxiliary parameters correspond to large differences in the 
primary parameters. 

<P>Chapters 5, 6 and 7 apply direct and indirect simulation inference to a 
range of popular models from econometrics, comparing the results to those of 
other estimation methods on both simulated and real-world data.  Some of these 
are extremely impressive &mdash; in particular some of the results on 
complicated time-series models are simply astonishing &mdash; but these 
chapters will frankly be very hard going for anyone who has not seen these 
econometric models before.  (Chapter 5, in particular, includes <em>an awful 
lot</em> on how to simulate discrete choice models.)  Other applications will 
readily suggest themselves to any reader who has worked with simulation models. 

<hr> 

<P>x+174 pp., bibliography, line figures, index (spotty) 

<P> 
	<a href="../subjects/economics.html">Economics</a> / 
	<a href="../subjects/probability.html">Probability and Statistics</a> 

<P>In print as a hardback, ISBN 
<a href="http://www.powells.com/partner/35751/biblio/0-19-877475-3">0-19-877475-3</a>

<hr> 

24 November 2007; thanks 
to <a href="http://www.eeb.cornell.edu/Ellner/">Stephen 
Ellner</a>, <a href="http://www.stat.cmu.edu/~linqiaoz/">Linqiao Zhao</a> 
and <a href="http://www.stat.cmu.edu/~mark/">Mark Schervish</a> 

<P>Updated 16 March 2012: small typo fixes, switched to using MathJax