Notebooks

Factor Models for High-Dimensional Time Series and Spatio-Temporal Data

09 Mar 2024 13:38

What it says.

A little more elaborately, the basic factor model is that a \( p \)-dimensional vector of observables (or "manifest variables") is a linear function of a \( q \)-dimensional variable of latent factors, \( q \ll p \): \[ \vec{X} = \mathbf{w}\vec{F} + \vec{\epsilon} \] where \( \mathbf{w} \) is some \( p\times q \) matrix of factor "loadings", and the vector of noise \( \vec{\epsilon} \) has no correlation among its coordinates. If we have observables evolving over time, the natural thing to do is to index everything by time: \[ \vec{X}(t) = \mathbf{w}\vec{F}(t) + \vec{\epsilon}(t) \] This won't be a complete model however until we specify how factor vectors at different times are related to each other. So long as we're going for brutal simplifications of delicate issues, that might as well be linear too: \[ \vec{F}(t+1) = \mathbf{a}\vec{F}(t) + \vec{\eta}(t+1) \] This is, of course, a classic form of state-space or hidden Markov model. The potentially interesting bit is that there are (by hypothesis) many more observables than latent factors. This raises some issues like the ability to imagine (or approximate) regimes where \( p/q \rightarrow \infty \), and so we may be able to recover the latent factors exactly [*] from the observables (which isn't possible when \( p/q \) is fixed).

If the dimensions of \( \vec{X}(t) \) are values of some field (or fields) at various spatial locations, we have a model where the evolution of a spatio-temporal random field is controlled by the evolution of a low (\(q \)) dimensional random vector. (It'd be natural then to imagine that \( w_{ij} \), the loading of location \( i \) on factor \( j \), tends to be close to \( w_{kj} \) when location \( k \) is spatially close to location \( i \); I haven't seen much use made of this, but perhaps I've just not read the right papers yet.)

The implied linear model here is often substantive nonsense (e.g., in meteorological data), but that's not the point. The point is instead to have a tractable but wrong model for low-dimensional summarization and (crude but speedy) prediction. All analysis here really ought to be done presuming mis-specification, and being explicit about wanting the best low-dimensional, low-rank linear approximation, rather than purporting to care about the True Parameters.

Here is a crude, even stupid, way to estimate such a model. Take all the \( \vec{X}(t) \) vectors, and do principal components analysis on them. Discard all but the top \( q \) principal components, so that we estimate \( \mathbf{w} \) as the \( p\times q \) matrix formed by stacking the top \( q \) eigenvectors of the empirical covariance matrix of the \( \vec{X}(t) \)'s. Estimate \( \vec{F}(t) \) by projecting each \( \vec{X}(t) \) on to those eigenvectors. Finally, estimate \( \mathbf{a} \) by regressing the estimate of \( \vec{F}(t+1) \) on the estimate of \( \vec{F}(t) \). I am very aware that this is, as I said, a crude approach. (For instance, it implicitly assumes that the directions of maximum variance for the observables correspond to the subspace traced out by the factors, \( \mathbf{w}\vec{F} \), but it's entirely possible for the noise variance in the observables to be larger than the variance contributed by the factors, in which case this will give nonsense.) One thing I'm curious about is how much more refined approaches improve over this crude approach, and what needs to be assumed for refinements to work.

Things I need to understand better: how does this relate to the state-space reconstruction approach? E.g., if I took a random field generated from a dynamic factor model, and subjected it to time-delay embedding, would I recover the true number of latent factors \( q \) as the embedding dimension? Would this work in the limit where the noise terms \( \vec{\epsilon}(t) \) and \( \vec{\eta}(t) \) approached zero variance?

[*]: As usual with factor models, the rotation / factor indeterminacy problem is inescapable. If the data-generating process obeys the two equations I gave above, with random latent factor vectors \( \vec{F} \), then let \( \vec{G} = \mathbf{r}\vec{F} \), where \( \mathbf{r} \) is any orthogonal matrix. A little algebra shows that we get exactly the same distribution of observables over time from \( \vec{G} \) (and the factor loading matrix \( \mathbf{v} \equiv \mathbf{w}\mathbf{r}^{T} \)) as we did from \( \vec{F} \). Really, all we've done is changed our (arbitrary) coordinate system for the latent vectors. So, more exactly, what can happen in the limit \( p/q \rightarrow \infty \) is that the vectors \( \mathbf{w}\vec{F}(t) \), the expected observables, become identifiable from the actual observables \( \vec{X}(t) \).


Notebooks: