Notebooks
http://bactra.org/notebooks
Cosma's NotebooksenFactor Models for High-Dimensional Time Series and Spatio-Temporal Data
http://bactra.org/notebooks/2024/03/09#factor-models-for-time-and-space-time
<P>What it says.
<P>A little more elaborately, the basic <a href="factor-model.html">factor model</a> is that a \( p \)-dimensional vector of observables (or "manifest variables") is a linear function of a \( q \)-dimensional variable of latent factors, \( q \ll p \):
\[
\vec{X} = \mathbf{w}\vec{F} + \vec{\epsilon}
\]
where \( \mathbf{w} \) is some \( p\times q \) matrix of factor "loadings", and the
vector of noise \( \vec{\epsilon} \) has no correlation among its coordinates.
If we have observables evolving over time, the natural thing to do is to index
everything by time:
\[
\vec{X}(t) = \mathbf{w}\vec{F}(t) + \vec{\epsilon}(t)
\]
This won't be a complete model however until we specify how factor vectors at
different times are related to each other. So long as we're going for brutal
simplifications of delicate issues, that might as well be linear too:
\[
\vec{F}(t+1) = \mathbf{a}\vec{F}(t) + \vec{\eta}(t+1)
\]
This is, of course, a classic form of state-space or hidden Markov model. The
potentially interesting bit is that there are (by hypothesis) many more
observables than latent factors. This raises some issues like the ability to
imagine (or approximate) regimes where \( p/q \rightarrow \infty \), and so we may
be able to recover the latent factors <em>exactly</em> [*] from the observables
(which isn't possible when \( p/q \) is fixed).
<P>If the dimensions of \( \vec{X}(t) \) are values of some field (or fields) at
various spatial locations, we have a model where the evolution of a
spatio-temporal random field is controlled by the evolution of a low (\(q \))
dimensional random vector. (It'd be natural then to imagine that \( w_{ij} \),
the loading of location \( i \) on factor \( j \), tends to be close to \( w_{kj} \)
when location \( k \) is spatially close to location \( i \); I haven't seen much use
made of this, but perhaps I've just not read the right papers yet.)
<P>The implied linear model here is often substantive nonsense
(e.g., in meteorological data), but that's not the point. The point is instead
to have a tractable <em>but wrong</em> model for low-dimensional summarization
and (crude but speedy) prediction. All analysis here really ought to be done
presuming mis-specification, and being explicit about wanting the best
low-dimensional, low-rank linear approximation, rather than purporting to care
about the True Parameters.
<P>Here is a crude, even stupid, way to estimate such a model. Take all the
\( \vec{X}(t) \) vectors, and do principal components analysis on them. Discard
all but the top \( q \) principal components, so that we estimate \( \mathbf{w} \) as
the \( p\times q \) matrix formed by stacking the top \( q \) eigenvectors of the
empirical covariance matrix of the \( \vec{X}(t) \)'s. Estimate \( \vec{F}(t) \) by
projecting each \( \vec{X}(t) \) on to those eigenvectors. Finally, estimate
\( \mathbf{a} \) by regressing the estimate of \( \vec{F}(t+1) \) on the estimate of
\( \vec{F}(t) \). I am very aware that this is, as I said, a crude approach. (For
instance, it implicitly assumes that the directions of maximum variance for the
observables correspond to the subspace traced out by the factors,
\( \mathbf{w}\vec{F} \), but it's entirely possible for the noise variance in the
observables to be larger than the variance contributed by the factors, in which
case this will give nonsense.) One thing I'm curious about is <em>how
much</em> more refined approaches improve over this crude approach, and what
needs to be assumed for refinements to work.
<P>Things I need to understand better: how does this relate to the
<a href="state-space-reconstruction.html">state-space reconstruction</a>
approach? E.g., if I took a random field generated from a dynamic factor
model, and subjected it to time-delay embedding, would I recover the true
number of latent factors \( q \) as the embedding dimension? Would this work in
the limit where the noise terms \( \vec{\epsilon}(t) \) and \( \vec{\eta}(t) \)
approached zero variance?
<P>[*]: As usual with factor models, the rotation / factor indeterminacy
problem is inescapable. If the data-generating process obeys the two equations
I gave above, with random latent factor vectors \( \vec{F} \), then let \( \vec{G} =
\mathbf{r}\vec{F} \), where \( \mathbf{r} \) is any orthogonal matrix. A little
algebra shows that we get <em>exactly the same</em> distribution of observables
over time from \( \vec{G} \) (and the factor loading matrix \( \mathbf{v} \equiv
\mathbf{w}\mathbf{r}^{T} \)) as we did from \( \vec{F} \). Really, all we've done is
changed our (arbitrary) coordinate system for the latent vectors. So, more
exactly, what can happen in the limit \( p/q \rightarrow \infty \) is that the
vectors \( \mathbf{w}\vec{F}(t) \), the <em>expected observables</em>, become
identifiable from the actual observables \( \vec{X}(t) \).
<ul>See also:
<li><a href="factor-models.html">Factor Models</a> (where I explain my thinking about this class of models more broadly)
<li><a href="optimal-linear-prediction.html">Optimal Linear Prediction and Estimation</a>
<li><a href="spatio-temporal-statistics.html">Spatio-Temporal Statistics</a>
<li><a href="time-series.html">Time Series</a>
</ul>
<ul>Recommended:
<li>Carlos M. Carvalho and Mike West, "Dynamic matrix-variate graphical models", <a href="https://doi.org/10.1214/07-BA204"><cite>Bayesian Analysis</cite> <strong>2</strong> (2007): 69--97</a>
<li>Dani Gamerman, Hedibert Freitas Lopes, and Esther Salazar, "Spatial
dynamic factor
analysis", <a href="http://dx.doi.org/10.1214/08-BA329"><cite>Bayesian
Analysis</cite> <strong>3</strong> (2008): 759--792</a>
<li>James H. Stock and Mark W. Watson, "Dynamic Factor Models" [<a href="https://www.princeton.edu/~mwatson/papers/dfm_oup_4.pdf">PDF</a>]
</ul>
<ul>To read (with thanks to David Childers for references):
<li>Pierre Alquier and Nicolas Marie, "Matrix factorization for multivariate time series analysis", <a href="https://doi.org/10.1214/19-EJS1630"><cite>Electronic Journal of Statistics</cite> <strong>13</strong> (2019): 4346--4366</a>
<li>Jushan Bai and Serena Ng, "Determining the Number of Factors in Approximate Factor Models", <a href="https://doi.org/10.1111/1468-0262.00273"><cite>Econometrica</cite> <strong>70</strong> (2002): 191--221</a>
<li>Jushan Bai and Kunpeng Li, "Statistical analysis of factor models
of high dimension", <a href="http://dx.doi.org/10.1214/11-AOS966"><cite>Annals
of Statistics</cite>
<strong>40</strong> (2012): 436--465</a>
<li>Matteo Barigozzi, Haeran Cho, "Consistent estimation of high-dimensional factor models when the factor number is over-estimated", <a href="https://doi.org/10.1214/20-EJS1741"><cite>Electronic Journal of Statistics</cite> <strong>14</strong> (202): 2892--2921</a>
<li>Matteo Barigozzi, Matteo Luciani
<ul>
<li>"Quasi Maximum Likelihood Estimation and Inference of Large Approximate Dynamic Factor Models via the EM algorithm", <a href="http://arxiv.org/abs/1910.03821">arxiv:1910.03821</a>
<li>"Quasi Maximum Likelihood Estimation of Non-Stationary Large Approximate Dynamic Factor Models", <a href="http://arxiv.org/abs/1910.09841">arxiv:1910.09841</a>
</ul>
<li>Ngai Hang Chan Ye Lu Chun Yip Yau, "Factor Modelling for High-Dimensional Time Series: Inference and Model Selection", <a href="https://doi.org/10.1111/jtsa.12207"><citE>Journal of Time Series Analysis</cite> <strong>38</strong> (2017): 285--307</a>
<li>Jinyuan Chang, Bin Guo, Qiwei Yao, "High dimensional stochastic regression with latent factors, endogeneity and nonlinearity", <a href="https://doi.org/10.1016/j.jeconom.2015.03.024"><cite>Journal of Econometrics</cite> <strong>189</strong> (2015): 297--312</a>, <a href="http://arxiv.org/abs/1310.1990">arxiv:1310.1990</a>
<li>Zhaoxing Gao, Ruey S. Tsay
<ul>
<li>"Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Factor Models with Diverging Eigenvalues",
<a href="http://arxiv.org/abs/1808.07932">arxiv:1808.07932</a>
<li>"A Structural-Factor Approach to Modeling High-Dimensional Time Series and Space-Time Data", <a href="https://doi.org/10.1111/jtsa.12466"><cite>Journal of Time Series Analysis</cite> <strong>40</strong> (2019): 343--362</a>, <a href="http://arxiv.org/abs/1808.06518">arxiv:1808.06518</a>
</ul>
<li>Clifford Lam and Qiwei Yao, "Factor modeling for high-dimensional time series: Inference for the number of factors", <a href="http://dx.doi.org/10.1214/12-AOS970"><cite>Annals of Statistics</cite> <strong>40</strong> (2012): 694--726</a>
<li>Emanuel Moench, Serena Ng, Simon Potter, "Dynamic Hierarchical Factor Models",
<a href="http://dx.doi.org/10.1162/REST_a_00359"><cite>Review of Economics
and Statistics</cite> <strong>95</strong> (2013): 1811--1817</a>
<li>James Stock and Mark Watson, "Forecasting Using Principal Components From a Large Number of Predictors", <a href="https://doi.org/10.1198/016214502388618960"><cite>Journal of the American Statistical Association</cite> <strong>97</strong> (2002): 1167--1179</a> [<a href="http://www.princeton.edu/~mwatson/papers/Stock_Watson_JASA_2002.pdf">PDF reprint via Prof. Watson</a>]
<li>Bo Zhang, Guangming Pan, Qiwei Yao, Wang Zhou, "Factor Modelling for Clustering High-dimensional Time Series", <a href="http://arxiv.org/abs/2101.01908">arxiv:2101.01908</a>
</ul>