Attention conservation notice:1800+ words of academic self-promotion, boosting a paper in which statisticians say mean things about some economists' favored toys. They're not even peer-reviewed mean things (yet). Contains abundant unexplained jargon, and cringe-worthy humor on the level of using a decades-old reference for a title.

Entirely seriously: Daniel is in no way responsible for this post.

I am very happy that after *many* years, this preprint is loosed
upon the world:

- Daniel J. McDonald and CRS, "Empirical Macroeconomics and DSGE Modeling in Statistical Perspective", arxiv:2210.16224
*Abstract:*Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model validation checks to the canonical Smets and Wouters 2007 DSGE: (1) we simulate the model and see how well it can be estimated from its own simulation output, and (2) we see how well it can seem to fit nonsense data. We find that (1) even with centuries' worth of data, the model remains poorly estimated, and (2) when we swap series at random, so that (e.g.) what the model gets as the inflation rate is really hours worked, what it gets as hours worked is really investment, etc., the fit is often only slightly impaired, and in a large percentage of cases actually improves (even out of sample). Taken together, these findings cast serious doubt on the meaningfulness of parameter estimates for this DSGE, and on whether this specification represents anything structural about the economy. Constructively, our approaches can be used for model validation by anyone working with macroeconomic time series.

To expand a little: DSGE models are models of macroeconomic aggregate
quantities, like levels of unemployment and production in a national economy.
As economic models, they're a sort of origin story for where the data comes
from. Some people find DSGE-style origin stories completely compelling, others
think they reach truly mythic levels of absurdity, with very little in between.
While settling *that* is something I will leave to the professional
economists
(cough *obviously*
they're absurd myths cough), we can also view them as *statistical*
models, specifically multivariate time series models, and ask about their
properties as such.

Now, long enough ago that blogging was still a thing and Daniel was doing
his dissertation on statistical
learning for time series
with Mark Schervish and myself,
he convinced us that DSGEs were an interesting and important target for the
theory we were working on. One important question within that was trying to
figure out just how flexible these models really were. The standard learning-theoretic principle is that the *more* flexible model classes learn
*slower* than less flexible ones. (If you are willing and able to
reproduce really complicated patterns, it's hard for you to distinguish between
signal and noise in limited data. There are important qualifications to this
idea, but it's a good start.) We thus began by thinking about trying to get
the DSGEs to fit random binary noise, because that'd tell us about
their Rademacher complexity,
but that seemed unlikely to go well. That led to thinking about trying to get
the models to fit the original time series, but with the series randomly
scrambled, a sort of permutation test of just how flexible the models were.

At some point, one of us had the idea of leaving the internal order of each time series alone, but swapping the labels on the series. If you have a merely-statistical multivariate model, like a vector autoregression, the different variables are so to speak exchangeable --- if you swap series 1 and series 2, you'll get a different coefficient matrix out, but it'll be a permutation of the original. (The parameters will be "covariant" with the permutations.) It'll fit as well as the original order of the variables. But if you have a properly scientific, structural model, each variable will have its own meaning and its own role in the model, and swapping variables around should lead to nonsense, and grossly degraded fits. (Good luck telling the Lotka-Volterra model that hares are predators and lynxes are prey.) There might be a few weird symmetries of some models which leave the fit alone (*), but for the most part, randomly swapping variables around should lead to drastically worse fits, if your models really are structural.

Daniel did some initial trials with the classic "real business cycle" DSGE
of Kydland and Prescott
(1982), and found, rather astonishingly, that the model fit the swapped
data *better* a large fraction of the time. Exactly how often, and how
much better, depended on the details of measuring the fit, but the general
result was clear.

The reason we'd gotten in to all this was wanting
to apply statistical learning
theory to macroeconomic forecasting, to put bounds on how bad the forecasts
would be. Inverting those bounds would tell us how much data would be needed
to achieve a given level of accuracy. Our results were pretty pessimistic,
suggesting that thousands of years of *stationary* data might be needed.
But those bounds were "distribution-free", using just the capacity or
flexibility of the model class, and the rate at which
new points in the time series
become independent of its past. This *could* be pessimistic about
how well this very particular model class can learn to predict this very
particular data source.

We therefore turned to another exercise: estimate the model on real data (or
take published estimates); simulate increasingly long series from the model;
and re-estimate the model on the simulation. That is, bend over backwards to
be fair to the model: if it's *entirely right* about the data-generating
process, how well can it predict? how well can it learn the parameters? how
much data would it need for accurate prediction? With, again, the
Kydland-Prescott model, the answer was... hundreds if not thousands of years
worth of data.

Of course, even in the far-off days of 2012, the Kydland-Prescott model was obsolete, so we knew that if we wanted anyone to take this seriously, we'd need to use a more up-to-date model. Also, since this was all numerical, we didn't know if this was a general problem with DSGEs, or just (more) evidence that Prescott and data analysis were a bad combination. So we knew we should look at a more recent, and more widely-endorsed, DSGE model...

Daniel graduated; the
workhorse Smets and Wouters
(2007) DSGE is a more complicated creature, and needed both a lot of
programming time and a *lot* of computing time to churn through
thousands of variable swaps and tens of thousands of fits to simulations. We
both got busy with other things. Grants came and (regrettably) went. But what
we can tell you now, with great assurance, is that:

- Even if the Smets-Wouters model was completely correct about the structure of the economy, and it was given access to centuries of stationary data, it would predict very badly, and many "deep" parameters would remain very poorly estimated;
- Swapping the series around randomly
*improves*the fit a lot of the time, even when the results are substantive nonsense.

Series swapping is something we dreamed up, so I'm not surprised we couldn't
find anyone doing it. But "let's try out the estimator on simulation output"
is, or ought to be, an utterly standard diagnostic, and it too seems to be
lacking, despite the immense controversial literature about DSGEs. (Of course,
it *is* an immense literature --- if we've missed precedents for either,
please let me know.) We have some thoughts about what might be leading to both
forms of bad behavior, which I'll let you read about in the paper, but the main
thing to take away, I think, is the *fact* that this widely-used DSGE
works so badly, and the *methods*. Those methods are, to repeat,
"simulate the model to see how well it could be estimated / how well it would
predict if it was totally right about how the economy works" and "see whether
the model fits better when you swap variables around so you're feeding it
nonsense". If you want to say those are too simple to rise to the dignity of
"methods", I won't fight you, but I will insist all the more on their
importance.

It *might* be that we just so happened to have tried the only two
DSGEs with these pathologies. (It'd be a weird coincidence, but it's
possible.) We also don't look at any non-DSGE models, which might be as bad on
these scores or even worse. (Maybe time series macroeconometrics is inherently
doomed.) But anyone who is curious about how whether their favorite
macroeconomic model meets these very basic criteria can *check*, ideally
before they publish and ~~rack up thousands of citations~~ lead
the community of inquirers down false trails. Doing so is conceptually simple,
if perhaps labor-intensive and painstaking, but that's science.

**Update**, December 2022: Irritatingly, there are some bugs.
(One of them, when fixed, will actually *increase* the flexibility of
the model...) I'll update this again when we're done with re-running the code
and update the preprint.

*: E.g., in Hamiltonian mechanics, with generalized positions \( q_1, \ldots q_k \) and corresponding momenta \( p_1, \ldots p_k \) going into the Hamiltonian \( H \), we have \( \frac{dq_i}{dt} = \frac{\partial H}{\partial p_i} \) and \( \frac{dp_i}{dt} = -\frac{\partial H}{\partial q_i} \). A little work shows then that we can exchange the roles of \( q_i \) and \( -p_i \) with the same Hamiltonian. But you can't (in general) swap position variables for each other, or momenta for each other, or \( q_1 \) for \( -p_2 \), or even \( q_i \) for \( p_i \), etc.

Posted at November 02, 2022 14:51 | permanent link