Factor Models for High-Dimensional Time Series and Spatio-Temporal Data

What it says.

A little more elaborately, the basic factor model is that a \( p \)-dimensional vector of observables (or "manifest variables") is a linear function of a \( q \)-dimensional variable of latent factors, \( q \ll p \): \[ \vec{X} = \mathbf{w}\vec{F} + \vec{\epsilon} \] where \( \mathbf{w} \) is some \( p\times q \) matrix of factor "loadings", and the vector of noise \( \vec{\epsilon} \) has no correlation among its coordinates. If we have observables evolving over time, the natural thing to do is to index everything by time: \[ \vec{X}(t) = \mathbf{w}\vec{F}(t) + \vec{\epsilon}(t) \] This won't be a complete model however until we specify how factor vectors at different times are related to each other. So long as we're going for brutal simplifications of delicate issues, that might as well be linear too: \[ \vec{F}(t+1) = \mathbf{a}\vec{F}(t) + \vec{\eta}(t+1) \] This is, of course, a classic form of state-space or hidden Markov model. The potentially interesting bit is that there are (by hypothesis) many more observables than latent factors. This raises some issues like the ability to imagine (or approximate) regimes where \( p/q \rightarrow \infty \), and so we may be able to recover the latent factors exactly [*] from the observables (which isn't possible when \( p/q \) is fixed).

If the dimensions of \( \vec{X}(t) \) are values of some field (or fields) at various spatial locations, we have a model where the evolution of a spatio-temporal random field is controlled by the evolution of a low (\(q \)) dimensional random vector. (It'd be natural then to imagine that \( w_{ij} \), the loading of location \( i \) on factor \( j \), tends to be close to \( w_{kj} \) when location \( k \) is spatially close to location \( i \); I haven't seen much use made of this, but perhaps I've just not read the right papers yet.)

The implied linear model here is often substantive nonsense (e.g., in meteorological data), but that's not the point. The point is instead to have a tractable but wrong model for low-dimensional summarization and (crude but speedy) prediction. All analysis here really ought to be done presuming mis-specification, and being explicit about wanting the best low-dimensional, low-rank linear approximation, rather than purporting to care about the True Parameters.

Here is a crude, even stupid, way to estimate such a model. Take all the \( \vec{X}(t) \) vectors, and do principal components analysis on them. Discard all but the top \( q \) principal components, so that we estimate \( \mathbf{w} \) as the \( p\times q \) matrix formed by stacking the top \( q \) eigenvectors of the empirical covariance matrix of the \( \vec{X}(t) \)'s. Estimate \( \vec{F}(t) \) by projecting each \( \vec{X}(t) \) on to those eigenvectors. Finally, estimate \( \mathbf{a} \) by regressing the estimate of \( \vec{F}(t+1) \) on the estimate of \( \vec{F}(t) \). I am very aware that this is, as I said, a crude approach. (For instance, it implicitly assumes that the directions of maximum variance for the observables correspond to the subspace traced out by the factors, \( \mathbf{w}\vec{F} \), but it's entirely possible for the noise variance in the observables to be larger than the variance contributed by the factors, in which case this will give nonsense.) One thing I'm curious about is how much more refined approaches improve over this crude approach, and what needs to be assumed for refinements to work.

Things I need to understand better: how does this relate to the state-space reconstruction approach? E.g., if I took a random field generated from a dynamic factor model, and subjected it to time-delay embedding, would I recover the true number of latent factors \( q \) as the embedding dimension? Would this work in the limit where the noise terms \( \vec{\epsilon}(t) \) and \( \vec{\eta}(t) \) approached zero variance?

[*]: As usual with factor models, the rotation / factor indeterminacy problem is inescapable. If the data-generating process obeys the two equations I gave above, with random latent factor vectors \( \vec{F} \), then let \( \vec{G} = \mathbf{r}\vec{F} \), where \( \mathbf{r} \) is any orthogonal matrix. A little algebra shows that we get exactly the same distribution of observables over time from \( \vec{G} \) (and the factor loading matrix \( \mathbf{v} \equiv \mathbf{w}\mathbf{r}^{T} \)) as we did from \( \vec{F} \). Really, all we've done is changed our (arbitrary) coordinate system for the latent vectors. So, more exactly, what can happen in the limit \( p/q \rightarrow \infty \) is that the vectors \( \mathbf{w}\vec{F}(t) \), the expected observables, become identifiable from the actual observables \( \vec{X}(t) \).

Factor Models (where I explain my thinking about this class of models more broadly)
Optimal Linear Prediction and Estimation
Spatio-Temporal Statistics
Time Series

Carlos M. Carvalho and Mike West, "Dynamic matrix-variate graphical models", Bayesian Analysis 2 (2007): 69--97
Dani Gamerman, Hedibert Freitas Lopes, and Esther Salazar, "Spatial dynamic factor analysis", Bayesian Analysis 3 (2008): 759--792
James H. Stock and Mark W. Watson, "Dynamic Factor Models" [PDF]

Pierre Alquier and Nicolas Marie, "Matrix factorization for multivariate time series analysis", Electronic Journal of Statistics 13 (2019): 4346--4366
Jushan Bai and Serena Ng, "Determining the Number of Factors in Approximate Factor Models", Econometrica 70 (2002): 191--221
Jushan Bai and Kunpeng Li, "Statistical analysis of factor models of high dimension", Annals of Statistics 40 (2012): 436--465
Matteo Barigozzi, Haeran Cho, "Consistent estimation of high-dimensional factor models when the factor number is over-estimated", Electronic Journal of Statistics 14 (202): 2892--2921
Matteo Barigozzi, Matteo Luciani
- "Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm", arxiv:1910.03821
- "Quasi Maximum Likelihood Estimation of Non-Stationary Large Approximate Dynamic Factor Models", arxiv:1910.09841
Ngai Hang Chan Ye Lu Chun Yip Yau, "Factor Modelling for High-Dimensional Time Series: Inference and Model Selection", Journal of Time Series Analysis 38 (2017): 285--307
Jinyuan Chang, Bin Guo, Qiwei Yao, "High dimensional stochastic regression with latent factors, endogeneity and nonlinearity", Journal of Econometrics 189 (2015): 297--312, arxiv:1310.1990
Zhaoxing Gao, Ruey S. Tsay
- "Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Factor Models with Diverging Eigenvalues", arxiv:1808.07932
- "A Structural-Factor Approach to Modeling High-Dimensional Time Series and Space-Time Data", Journal of Time Series Analysis 40 (2019): 343--362, arxiv:1808.06518
Clifford Lam and Qiwei Yao, "Factor modeling for high-dimensional time series: Inference for the number of factors", Annals of Statistics 40 (2012): 694--726
Emanuel Moench, Serena Ng, Simon Potter, "Dynamic Hierarchical Factor Models", Review of Economics and Statistics 95 (2013): 1811--1817
James Stock and Mark Watson, "Forecasting Using Principal Components From a Large Number of Predictors", Journal of the American Statistical Association 97 (2002): 1167--1179 [PDF reprint via Prof. Watson]
Bo Zhang, Guangming Pan, Qiwei Yao, Wang Zhou, "Factor Modelling for Clustering High-dimensional Time Series", arxiv:2101.01908

Notebooks

Factor Models for High-Dimensional Time Series and Spatio-Temporal Data