Attention conservation notice: 1800+ words of academic self-promotion, boosting a paper in which statisticians say mean things about some economists' favored toys. They're not even peer-reviewed mean things (yet). Contains abundant unexplained jargon, and cringe-worthy humor on the level of using a decades-old reference for a title.
Entirely seriously: Daniel is in no way responsible for this post.
Update, December 2022: Irritatingly, there are some small but real bugs, glitching all our numerical results. This is an even stronger reason for you to direct your attention elsewhere. (Details at the end.)
I am very happy that after many years, this preprint is loosed upon the world:
To expand a little: DSGE models are models of macroeconomic aggregate quantities, like levels of unemployment and production in a national economy. As economic models, they're a sort of origin story for where the data comes from. Some people find DSGE-style origin stories completely compelling, others think they reach truly mythic levels of absurdity, with very little in between. While settling that is something I will leave to the professional economists (cough obviously they're absurd myths cough), we can also view them as statistical models, specifically multivariate time series models, and ask about their properties as such.
Now, long enough ago that blogging was still a thing and Daniel was doing his dissertation on statistical learning for time series with Mark Schervish and myself, he convinced us that DSGEs were an interesting and important target for the theory we were working on. One important question within that was trying to figure out just how flexible these models really were. The standard learning-theoretic principle is that the more flexible model classes learn slower than less flexible ones. (If you are willing and able to reproduce really complicated patterns, it's hard for you to distinguish between signal and noise in limited data. There are important qualifications to this idea, but it's a good start.) We thus began by thinking about trying to get the DSGEs to fit random binary noise, because that'd tell us about their Rademacher complexity, but that seemed unlikely to go well. That led to thinking about trying to get the models to fit the original time series, but with the series randomly scrambled, a sort of permutation test of just how flexible the models were.
At some point, one of us had the idea of leaving the internal order of each time series alone, but swapping the labels on the series. If you have a merely-statistical multivariate model, like a vector autoregression, the different variables are so to speak exchangeable --- if you swap series 1 and series 2, you'll get a different coefficient matrix out, but it'll be a permutation of the original. (The parameters will be "covariant" with the permutations.) It'll fit as well as the original order of the variables. But if you have a properly scientific, structural model, each variable will have its own meaning and its own role in the model, and swapping variables around should lead to nonsense, and grossly degraded fits. (Good luck telling the Lotka-Volterra model that hares are predators and lynxes are prey.) There might be a few weird symmetries of some models which leave the fit alone (*), but for the most part, randomly swapping variables around should lead to drastically worse fits, if your models really are structural.
Daniel did some initial trials with the classic "real business cycle" DSGE of Kydland and Prescott (1982), and found, rather astonishingly, that the model fit the swapped data better a large fraction of the time. Exactly how often, and how much better, depended on the details of measuring the fit, but the general result was clear.
The reason we'd gotten in to all this was wanting to apply statistical learning theory to macroeconomic forecasting, to put bounds on how bad the forecasts would be. Inverting those bounds would tell us how much data would be needed to achieve a given level of accuracy. Our results were pretty pessimistic, suggesting that thousands of years of stationary data might be needed. But those bounds were "distribution-free", using just the capacity or flexibility of the model class, and the rate at which new points in the time series become independent of its past. This could be pessimistic about how well this very particular model class can learn to predict this very particular data source.
We therefore turned to another exercise: estimate the model on real data (or take published estimates); simulate increasingly long series from the model; and re-estimate the model on the simulation. That is, bend over backwards to be fair to the model: if it's entirely right about the data-generating process, how well can it predict? how well can it learn the parameters? how much data would it need for accurate prediction? With, again, the Kydland-Prescott model, the answer was... hundreds if not thousands of years worth of data.
Of course, even in the far-off days of 2012, the Kydland-Prescott model was obsolete, so we knew that if we wanted anyone to take this seriously, we'd need to use a more up-to-date model. Also, since this was all numerical, we didn't know if this was a general problem with DSGEs, or just (more) evidence that Prescott and data analysis were a bad combination. So we knew we should look at a more recent, and more widely-endorsed, DSGE model...
Daniel graduated; the workhorse Smets and Wouters (2007) DSGE is a more complicated creature, and needed both a lot of programming time and a lot of computing time to churn through thousands of variable swaps and tens of thousands of fits to simulations. We both got busy with other things. Grants came and (regrettably) went. But what we can tell you now, with great assurance, is that:
Series swapping is something we dreamed up, so I'm not surprised we couldn't find anyone doing it. But "let's try out the estimator on simulation output" is, or ought to be, an utterly standard diagnostic, and it too seems to be lacking, despite the immense controversial literature about DSGEs. (Of course, it is an immense literature --- if we've missed precedents for either, please let me know.) We have some thoughts about what might be leading to both forms of bad behavior, which I'll let you read about in the paper, but the main thing to take away, I think, is the fact that this widely-used DSGE works so badly, and the methods. Those methods are, to repeat, "simulate the model to see how well it could be estimated / how well it would predict if it was totally right about how the economy works" and "see whether the model fits better when you swap variables around so you're feeding it nonsense". If you want to say those are too simple to rise to the dignity of "methods", I won't fight you, but I will insist all the more on their importance.
It might be that we just so happened to have tried the only two
DSGEs with these pathologies. (It'd be a weird coincidence, but it's
possible.) We also don't look at any non-DSGE models, which might be as bad on
these scores or even worse. (Maybe time series macroeconometrics is inherently
doomed.) But anyone who is curious about how whether their favorite
macroeconomic model meets these very basic criteria can check, ideally
before they publish and rack up thousands of citations lead
the community of inquirers down false trails. Doing so is conceptually simple,
if perhaps labor-intensive and painstaking, but that's science.
After posting the preprint, people helpfully found some bugs in our code. These glitch up all our numerical results. Since this is primarily a paper about our numerical results, this is obviously bad. The preprint needs to be revised after we've fixed our code and re-run everything. I am pretty confident, however, about the general shape of the numbers, because as I said we got the same kind of behavior from the Kydland-Prescott model and (importantly, in this context) off-the-shelf code. Of course, you being less confident in my confidence after this would be entirely sensible. In any event, I'll update this again when we're done with re-running the code and have updated the preprint.
*: E.g., in Hamiltonian mechanics, with generalized positions \( q_1, \ldots q_k \) and corresponding momenta \( p_1, \ldots p_k \) going into the Hamiltonian \( H \), we have \( \frac{dq_i}{dt} = \frac{\partial H}{\partial p_i} \) and \( \frac{dp_i}{dt} = -\frac{\partial H}{\partial q_i} \). A little work shows then that we can exchange the roles of \( q_i \) and \( -p_i \) with the same Hamiltonian. But you can't (in general) swap position variables for each other, or momenta for each other, or \( q_1 \) for \( -p_2 \), or even \( q_i \) for \( p_i \), etc.
Posted at November 02, 2022 14:51 | permanent link