## November 02, 2022

### Your Favorite DSGE Sucks

Attention conservation notice: 1800+ words of academic self-promotion, boosting a paper in which statisticians say mean things about some economists' favored toys. They're not even peer-reviewed mean things (yet). Contains abundant unexplained jargon, and cringe-worthy humor on the level of using a decades-old reference for a title.
Entirely seriously: Daniel is in no way responsible for this post.

I am very happy that after many years, this preprint is loosed upon the world:

Daniel J. McDonald and CRS, "Empirical Macroeconomics and DSGE Modeling in Statistical Perspective", arxiv:2210.16224
Abstract: Dynamic stochastic general equilibrium (DSGE) models have been an ubiquitous, and controversial, part of macroeconomics for decades. In this paper, we approach DSGEs purely as statstical models. We do this by applying two common model validation checks to the canonical Smets and Wouters 2007 DSGE: (1) we simulate the model and see how well it can be estimated from its own simulation output, and (2) we see how well it can seem to fit nonsense data. We find that (1) even with centuries' worth of data, the model remains poorly estimated, and (2) when we swap series at random, so that (e.g.) what the model gets as the inflation rate is really hours worked, what it gets as hours worked is really investment, etc., the fit is often only slightly impaired, and in a large percentage of cases actually improves (even out of sample). Taken together, these findings cast serious doubt on the meaningfulness of parameter estimates for this DSGE, and on whether this specification represents anything structural about the economy. Constructively, our approaches can be used for model validation by anyone working with macroeconomic time series.

To expand a little: DSGE models are models of macroeconomic aggregate quantities, like levels of unemployment and production in a national economy. As economic models, they're a sort of origin story for where the data comes from. Some people find DSGE-style origin stories completely compelling, others think they reach truly mythic levels of absurdity, with very little in between. While settling that is something I will leave to the professional economists (cough obviously they're absurd myths cough), we can also view them as statistical models, specifically multivariate time series models, and ask about their properties as such.

Now, long enough ago that blogging was still a thing and Daniel was doing his dissertation on statistical learning for time series with Mark Schervish and myself, he convinced us that DSGEs were an interesting and important target for the theory we were working on. One important question within that was trying to figure out just how flexible these models really were. The standard learning-theoretic principle is that the more flexible model classes learn slower than less flexible ones. (If you are willing and able to reproduce really complicated patterns, it's hard for you to distinguish between signal and noise in limited data. There are important qualifications to this idea, but it's a good start.) We thus began by thinking about trying to get the DSGEs to fit random binary noise, because that'd tell us about their Rademacher complexity, but that seemed unlikely to go well. That led to thinking about trying to get the models to fit the original time series, but with the series randomly scrambled, a sort of permutation test of just how flexible the models were.

At some point, one of us had the idea of leaving the internal order of each time series alone, but swapping the labels on the series. If you have a merely-statistical multivariate model, like a vector autoregression, the different variables are so to speak exchangeable --- if you swap series 1 and series 2, you'll get a different coefficient matrix out, but it'll be a permutation of the original. (The parameters will be "covariant" with the permutations.) It'll fit as well as the original order of the variables. But if you have a properly scientific, structural model, each variable will have its own meaning and its own role in the model, and swapping variables around should lead to nonsense, and grossly degraded fits. (Good luck telling the Lotka-Volterra model that hares are predators and lynxes are prey.) There might be a few weird symmetries of some models which leave the fit alone (*), but for the most part, randomly swapping variables around should lead to drastically worse fits, if your models really are structural.

Daniel did some initial trials with the classic "real business cycle" DSGE of Kydland and Prescott (1982), and found, rather astonishingly, that the model fit the swapped data better a large fraction of the time. Exactly how often, and how much better, depended on the details of measuring the fit, but the general result was clear.

The reason we'd gotten in to all this was wanting to apply statistical learning theory to macroeconomic forecasting, to put bounds on how bad the forecasts would be. Inverting those bounds would tell us how much data would be needed to achieve a given level of accuracy. Our results were pretty pessimistic, suggesting that thousands of years of stationary data might be needed. But those bounds were "distribution-free", using just the capacity or flexibility of the model class, and the rate at which new points in the time series become independent of its past. This could be pessimistic about how well this very particular model class can learn to predict this very particular data source.

We therefore turned to another exercise: estimate the model on real data (or take published estimates); simulate increasingly long series from the model; and re-estimate the model on the simulation. That is, bend over backwards to be fair to the model: if it's entirely right about the data-generating process, how well can it predict? how well can it learn the parameters? how much data would it need for accurate prediction? With, again, the Kydland-Prescott model, the answer was... hundreds if not thousands of years worth of data.

Of course, even in the far-off days of 2012, the Kydland-Prescott model was obsolete, so we knew that if we wanted anyone to take this seriously, we'd need to use a more up-to-date model. Also, since this was all numerical, we didn't know if this was a general problem with DSGEs, or just (more) evidence that Prescott and data analysis were a bad combination. So we knew we should look at a more recent, and more widely-endorsed, DSGE model...

Daniel graduated; the workhorse Smets and Wouters (2007) DSGE is a more complicated creature, and needed both a lot of programming time and a lot of computing time to churn through thousands of variable swaps and tens of thousands of fits to simulations. We both got busy with other things. Grants came and (regrettably) went. But what we can tell you now, with great assurance, is that:

1. Even if the Smets-Wouters model was completely correct about the structure of the economy, and it was given access to centuries of stationary data, it would predict very badly, and many "deep" parameters would remain very poorly estimated;
2. Swapping the series around randomly improves the fit a lot of the time, even when the results are substantive nonsense.
The bad news is that even if this model was right, we couldn't hope to actually estimate it; the good news is that the model can't be right, because it fits better when we tell it that consumption is really wages, inflation is really consumption, and output is really inflation.

Series swapping is something we dreamed up, so I'm not surprised we couldn't find anyone doing it. But "let's try out the estimator on simulation output" is, or ought to be, an utterly standard diagnostic, and it too seems to be lacking, despite the immense controversial literature about DSGEs. (Of course, it is an immense literature --- if we've missed precedents for either, please let me know.) We have some thoughts about what might be leading to both forms of bad behavior, which I'll let you read about in the paper, but the main thing to take away, I think, is the fact that this widely-used DSGE works so badly, and the methods. Those methods are, to repeat, "simulate the model to see how well it could be estimated / how well it would predict if it was totally right about how the economy works" and "see whether the model fits better when you swap variables around so you're feeding it nonsense". If you want to say those are too simple to rise to the dignity of "methods", I won't fight you, but I will insist all the more on their importance.

It might be that we just so happened to have tried the only two DSGEs with these pathologies. (It'd be a weird coincidence, but it's possible.) We also don't look at any non-DSGE models, which might be as bad on these scores or even worse. (Maybe time series macroeconometrics is inherently doomed.) But anyone who is curious about how whether their favorite macroeconomic model meets these very basic criteria can check, ideally before they publish and rack up thousands of citations lead the community of inquirers down false trails. Doing so is conceptually simple, if perhaps labor-intensive and painstaking, but that's science.

Update, December 2022: Irritatingly, there are some bugs. (One of them, when fixed, will actually increase the flexibility of the model...) I'll update this again when we're done with re-running the code and update the preprint.

*: E.g., in Hamiltonian mechanics, with generalized positions $q_1, \ldots q_k$ and corresponding momenta $p_1, \ldots p_k$ going into the Hamiltonian $H$, we have $\frac{dq_i}{dt} = \frac{\partial H}{\partial p_i}$ and $\frac{dp_i}{dt} = -\frac{\partial H}{\partial q_i}$. A little work shows then that we can exchange the roles of $q_i$ and $-p_i$ with the same Hamiltonian. But you can't (in general) swap position variables for each other, or momenta for each other, or $q_1$ for $-p_2$, or even $q_i$ for $p_i$, etc.

Posted at November 02, 2022 14:51 | permanent link