Notebooks
http://bactra.org/notebooks
Cosma's NotebooksenMultiple or Vector Time Series
http://bactra.org/notebooks/2020/08/24#multiple-time-series
<P>Lots of general ideas of <a href="time-series.html">time series analysis</a>
don't care about whether the time series is a scalar, a vector, or something
weirder. But there are situations where it's interesting to parse out
coordinates of a vector evolving over time, and consider their inter-relations,
so this notebook collects references for this problem.
<P>I am particularly interested in using <a href="factor-models-for-time-and-space-time.html">factor models</a> here, so they get their own notebook.
<P>Basic tricks: vector autoregression,
\[
\vec{X}(t+1) = \mathbf{a}\vec{X}(t) + \vec{\epsilon}(t+1)
\]
and similarly vector moving average or vector ARMA; vectorized additive
models,
\[
X_i(t+1) = \sum_{j=1}^{k}{f_{ij}(X_j(t))} + \epsilon_i(t+1)
\]
etc., etc.; all these things involve very little change from scalar time series
analysis. These can be practically useful, in lieu of actual scientific
models, but don't interest me very much.
<P>(What econometricians call "structural" vector autoregressions strike me
as simple VARs, with somewhat arbitrary restrictions imposed on the parameters,
usually with only a tenuous, I might even say "metaphorical" or "mythological",
connection to an actual economic theory. I am open to being persuaded that
this is unfair.)
<h4>Panel or Longitudinal Data</h4>
<P>If you have $k$ independent realizations of the <em>same</em> stochastic process, then your over-all log-likelihood is
\[
\sum_{i=1}^{k}{L(X^{(i)};\theta)}
\]
where $L(X;\theta)$ is the log-likelihood function for one trajectory $X$ at
parameters $\theta$. Likelihood maximization thus works pretty much as it does
for one trajectory. (IIRC, this observation goes back at least to Bartlett's
book on stochastic processes from the 1950s.) There are a few wrinkles to
this situation which are perhaps worth some comment:
<ol>
<li> <em>Identically-distributed but not independent time series</em>. Say you
believe that all $k$ series have the same marginal distribution, but that
they're dependent. Summing their marginal log-likelihoods will thus not give
you the correct over-all log-likelihood, but only a pseudo-likelihood. It
will, however, generally give you a consistent estimator of that marginal
distribution. Ignoring the dependence will manifest as a lack of statistical
efficiency, which might be more than compensated for by gains
to <em>computational</em> efficiency.
<li> <em>Many short time series</em>. Ordinarily, when considering one long
time series, we analyze inference in the limit where the trajectory's duration
$T \rightarrow\infty$. With multiple time series, each can in principle have
its own duration \( T_i \), and so there's a limit where $k \rightarrow \infty$
but $\max_{i}{T_i} = T$ is fixed. This may or may not be a problem. If we
were dealing with stationary Markov processes, for instance, and $T \geq 2$,
$k\rightarrow\infty$ would be enough to get convergence. It would however be
generally inadequate to learn a second (or higher) order Markov chain. (You
can't, in general, identify the effects of what happened two time steps ago if
you never <em>know</em> what happened two time steps ago.) Similarly if the process is $m$-dependent (i.e., $X(t)$ can depend on $X(t-m)$ but is independent of $X(t-m-1)$ and earlier), then we need $T \geq m$. In the general case of a non-Markov, non-$m$-dependent process, I think we need $T \rightarrow \infty$, but I don't have a knock-down proof.
<li> <em>Non-likelihood inference</em>. If we're minimizing a loss function
other than negative log probability, the same logic holds: calculate separately
across trajectories, and add up across trajectories. (Thus <a href="indirect-inference.html">indirect inference</a> using multiple observed trajectories is
pretty straightforward.) A delicate question is what to do if loss function is a time average (e.g., mean-squared error per unit time), and the time series are of different lengths. My intuition is that we'd generally want to add all the raw losses <em>first</em>, and then divide by the total length of time, on the grounds that we have twice as much information from a time series that
is twice as long. (This is implicit in just summing the log-likelihoods.) If,
however, the duration of each realization is going to infinity, then (assuming
<a href="ergodic-theory.html">ergodicity</a>) we should get the same limit if we take the within-realization time average first, and then average cross-sectionally, across realizations. (See (2) above.)
</ol>
<P>As the above notes make clear, I tend to regard "panel" or "longitudinal"
data analysis as a very straightforward extension of ordinary time series
analysis, to the point where I'm not sure why it's regarded as a separate
subject. I dare say this means just I don't understand it very well yet.
<P>See also:
<a href="graphical-models.html">Graphical Models</a>;
<a href="synchronization.html">Synchronization</a>
<ul>Recommended:
<li>Tianjiao Chu and Clark Glymour, "Search for Additive Nonlinear Time Series Causal Models", <a href="http://jmlr.csail.mit.edu/papers/v9/chu08a.html"><cite>Journal of Machine Learning Research</cite> <strong>9</strong> (2008): 967--991</a>
<li>Peter J. Diggle, Kung-Yee Liang and Scott L. Zeger, <cite><a href="http://bactra.org/weblog/algae-2020-04.html#aold">Analysis of Longitudinal Data</a></cite> [Exclusively concerned with the many-independent-time-series setting]
<li>Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "Joint Modeling of Multiple Related Time Series via the Beta Process", <a href="http://arxiv.org/abs/1111.4226">arxiv:1111.4226</a>
<li>Emily B. Fox, Mike West, "Autoregressive Models for Variance Matrices: Stationary Inverse Wishart Processes", <a href="http://arxiv.org/abs/1107.5239">arxiv:1107.5239</a>
<li>Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", <a href="http://jmlr.csail.mit.edu/papers/v11/hyvarinen10a.html"><cite>Journal of Machine Learning Research</cite> <strong>11</strong>
(2010): 1709--1731</a>
</ul>
<ul>To read:
<li>Shun-ichi Amari, "Estimating Functions of Independent Component
Analysis for Temporally Correlated Signals," <cite>Neural Computation</citE>
<strong>12</strong> (2000): 2083--2107
<li>Sumanta Basu, George Michailidis, "Estimation in High-dimensional Vector Autoregressive Models", <a href="http://arxiv.org/abs/1311.4175">arxiv:1311.4175</a>
<li>Nathaniel Beck and Jonathan N. Katz
<ul>
<li>"What to Do (and Not to Do) with Time-Series Cross-Section Data", <cite>American Political Science Review</cite> <strong>89</strong>
(1995): 634--647 [<a href="http://www.jstor.org/pss/2082979">JSTOR</a>]
<li>Commentary by the
authors, <a href="http://dx.doi.org/10.1017/S0003055406292566"><cite>American Political Science
Review</cite> <strong>100</strong> (2006): 676--677</a>
</ul>
<li>Jose Bento, Morteza Ibrahimi and Andrea Montanari, "Learning Networks of Stochastic Differential
Equations", <a href="http://books.nips.cc/papers/files/nips23/NIPS2010_0715.pdf">NIPS
23 (2010)</a>,
<a href="http://arxiv.org/abs/1011.0415">arxiv:1011.0415</a>
<li>Alain Berlinet and Gérar Biau, "Minimax Bounds in
Nonparametric Estimation of Multidimensional Deterministic Dynamical
Systems", <a href="http://dx.doi.org/10.1023/A:1012225204854"><cite>Statistical Inference for Stochastic Processes</cite>
<strong>4</strong> (2001): 229--248</a> ["We consider the problem of estimating
a multidimensional discrete deterministic dynamical system from the first n+1
observations. We exhibit the optimal rate function ... the near neighbor
estimator achives this optimal rate.... optimal rate function is defined from
multidimensonal spacings which are edge lengths of simplicies associated with a
triangulation of the Voronoi cells built from the observations." Sounds very
cool!]
<li>Alain Berlinet and Christian Francq, "On the Identifiability
of minimal VARMA representations", <cite>Statistical Inference for Stochastic
Processes</cite> <strong>1</strong> (1998): 1--15
<li>Miles Crosskey, Mauro Maggioni, "ATLAS: A geometric approach to learning high-dimensional stochastic systems near manifolds", <a href="http://arxiv.org/abs/1404.0667">arxiv:1404.0667</a>
<li>Richard A. Davis, Pengfei Zang, Tian Zheng, "Sparse Vector Autoregressive Modeling", <a href="http://arxiv.org/abs/1207.0520">arxiv:1207.0520</a>
<li>Michael Eichler
<ul>
<li>"Fitting Graphical Interaction Models to Multivariate Time Series", UAI 2006, <a href="http://arxiv.org/abs/1206.6839">arxiv:1206.6839</a>
<li>"Graphical modelling of multivariate time
series", <a href="http://arxiv.org/abs/math.ST/0610654">math.ST/0610654</a>
</ul>
<li>Roland Fried and Vanessa Didelez, "Latent variable analysis and
partial correlation graphs for multivariate time series", <a
href="http://dx.doi.org/10.1016/j.spl.2005.04.002"><cite>Statistics and
Probability Letters</cite> <strong>73</strong> (2005): 287--296</a>
<li>Stefan Haufe, Guido Nolte, Klaus-Robert Mueller and Nicole Kraemer,
"Sparse Causal Discovery in Multivariate Time
Series", <a href="http://arxiv.org/abs/0901.1234">arxiv:0901.1234</a> [I am not
altogether happy with defining "causes" as "has a non-zero coefficient in a
vector autoregression"...]
<li>Daniel M. Keenan, Xin Wang, Steven M. Pincus and Johannes D. Veldhuis, "Modelling the nonlinear time dynamics of multidimensional hormonal systems",
<a href="http://dx.doi.org/10.1111/j.1467-9892.2012.00795.x"><cite>Journal of Time Series Analysis</cite> <strong>33</strong> (2012): 779--796</a>
<li>Lutz Kilian and Helmut Lutkepohl, <cite><a href="http://cambridge.org/9781107196575">Structural Vector Autoregressive Analysis</a></cite>
<li>Helmut Lutkepohl, <cite><a href="https://doi.org/10.1007/978-3-540-27752-1">New Introduction to Multiple Time Series Analysis</cite></a>
<li>Norbert Marwan and Jurgen Kurths, "Nonlinear analysis of bivariate
data with cross recurrence plots," <a
href="http://arxiv.org/abs/physics/0201061">physics/0201061</a>
<li>Norbert Marwan, M. Thiel, N. R. Nowaczyk, "Cross Recurrence Plot
Based Synchronization of Time Series," <a
href="http://arxiv.org/abs/physics/0201062">physics/0201062</a>
<li>Norbert Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, J. Kurths,
"Recurrence Plot Based Measures of Complexity and its Application to Heart
Rate Variability Data," <a
href="http://arxiv.org/abs/physics/0201064">physics/0201064</a>
<li>Tomomichi Nakamura, Yoshito Hirata, and Michael Small, "Testing for
correlation structures in short-term variabilities with long-term trends of
multivariate time
series", <a href="http://dx.doi.org/10.1103/PhysRevE.74.041114"><cite>Physical
Review E</cite> <strong>74</strong> (2006): 041114</a>
<li>Yuval Nardi, Alessandro Rinaldo, "Autoregressive Process Modeling via the Lasso Procedure", <a href="http://arxiv.org/abs/0805.1179">arxiv:0805.1179</a>
<li>Sahand Negahban, Pradeep Ravikumar, Martin J. Wainwright, Bin Yu, "A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers", <a href="http://arxiv.org/abs/1010.2731">arxiv:1010.2731</a>
<li>Gregory C. Reinsel, <cite>Elements of Multivariate Time Series Analysis</cite> [Vector ARMA will continue until morale improves]
<li>Suchi Saria, Daphne Koller, Anna Penn, "Discovering shared and individual latent structure in multiple time series", <a href="http://arxiv.org/abs/1008.2028">arxiv:1008.2028</a>
<li>Przemyslaw Sliwa and Wolfgang Schmid, "Monitoring the
cross-covariances of a multivariate time series", <a
href="http://dx.doi.org/10.1007/s001840400326"><cite>Metrika</cite>
<strong>61</strong> (2005): 89--115</a>
<li>Song Song and Peter J. Bickel, "Large Vector Auto Regressions",
<a href="http://arxiv.org/abs/1106.3915">arxiv:1106.3915</a>
<li>Granville Tunnicliffe Wilson, Marco Reale and John Haywood, <cite>Models for Dependent Time Series</cite>
</ul>