July 26, 2010

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting"

Attention conservation notice: 500 words on a student's thesis proposal, combining all the thrills of macroeconomic forecasting with the stylish vivacity of statistical learning theory. Even if you care, why not check back in a few years when the work is further along?

Daniel McDonald is writing his thesis, under the joint supervision of Mark Schervish and myself. I can use the present participle, because on Thursday he successfully defended his proposal:

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting" [PDF]
Abstract: In this thesis, I propose to derive entirely data dependent generalization error bounds for state space models. These results can characterize the out-of-sample accuracy of many types of forecasting methods. The bounds currently available for time series data rely both on a quantity describing the dependence properties of the data generating process known as the mixing rate and on a quantification of the complexity of the model space. I will derive methods for estimating the mixing behavior from data and characterize the complexity of state space models. The resulting risk bounds will be useful for empirical researchers at the forefront of economic forecasting as well as for economic policy makers. The bounds can also be applied in other situations where state space models are employed.

Some of you may prefer the slides (note that Daniel is using DeLong's reduction of DSGEs to D2 normal form), or an even more compact visual summary:

Most macroeconomic forecasting models are, or can be turned into, "state-space models". There's an underlying state variable or variables, which evolves according to a nice Markov process, and then what we actually measure is a noisy function of the state; given the current state, future states and current observations are independent. (Some people like to draw a distinction between "state-space models" and "hidden Markov models", but I've never seen why.) The calculations can be hairy, especially once you allow for nonlinearities, but one can show that, asymptotically, maximum likelihood estimation, as well as various regularizations, have all the nice asymptotic properties one could want.

Asymptotic statistical theory is, of course, useless for macroeconomics. Or rather: if our methods weren't consistent even with infinite data, we'd know we should just give up. But if the methods only begin to give usably precise answers when the number of data points gets over 1024, we should give up too. Knowing that things could work with infinite data doesn't help when we really have 252 data points, and serial dependence shrinks the effective sample size to about 12 or 15. The wonderful thing about modern statistical learning theory is that it gives non-asymptotic results, especially risk bounds that hold at finite sample sizes. This is, of course, the reason why ergodic theorems, and the correlation time of US GDP growth rates, have been on my mind recently. In particular, this is why we are thinking about ergodic theorems which give not just finite-sample bounds (like the toy theorem I posted about), but can be made to do so uniformly over whole classes of functions, e.g., the loss functions of different macro forecasting models and their parameterizations.

Anyone wanting to know how to deal with non-stationarity is reminded that Daniel is proposing a dissertation in statistics, and not a solution to the problem of induction.

Posted at July 26, 2010 15:30 | permanent link