Multiple or Vector Time Series
24 Aug 2020 18:37
Lots of general ideas of time series analysis don't care about whether the time series is a scalar, a vector, or something weirder. But there are situations where it's interesting to parse out coordinates of a vector evolving over time, and consider their inter-relations, so this notebook collects references for this problem.
I am particularly interested in using factor models here, so they get their own notebook.
Basic tricks: vector autoregression, \[ \vec{X}(t+1) = \mathbf{a}\vec{X}(t) + \vec{\epsilon}(t+1) \] and similarly vector moving average or vector ARMA; vectorized additive models, \[ X_i(t+1) = \sum_{j=1}^{k}{f_{ij}(X_j(t))} + \epsilon_i(t+1) \] etc., etc.; all these things involve very little change from scalar time series analysis. These can be practically useful, in lieu of actual scientific models, but don't interest me very much.
(What econometricians call "structural" vector autoregressions strike me as simple VARs, with somewhat arbitrary restrictions imposed on the parameters, usually with only a tenuous, I might even say "metaphorical" or "mythological", connection to an actual economic theory. I am open to being persuaded that this is unfair.)
Panel or Longitudinal Data
If you have $k$ independent realizations of the same stochastic process, then your over-all log-likelihood is \[ \sum_{i=1}^{k}{L(X^{(i)};\theta)} \] where $L(X;\theta)$ is the log-likelihood function for one trajectory $X$ at parameters $\theta$. Likelihood maximization thus works pretty much as it does for one trajectory. (IIRC, this observation goes back at least to Bartlett's book on stochastic processes from the 1950s.) There are a few wrinkles to this situation which are perhaps worth some comment:
- Identically-distributed but not independent time series. Say you believe that all $k$ series have the same marginal distribution, but that they're dependent. Summing their marginal log-likelihoods will thus not give you the correct over-all log-likelihood, but only a pseudo-likelihood. It will, however, generally give you a consistent estimator of that marginal distribution. Ignoring the dependence will manifest as a lack of statistical efficiency, which might be more than compensated for by gains to computational efficiency.
- Many short time series. Ordinarily, when considering one long time series, we analyze inference in the limit where the trajectory's duration $T \rightarrow\infty$. With multiple time series, each can in principle have its own duration \( T_i \), and so there's a limit where $k \rightarrow \infty$ but $\max_{i}{T_i} = T$ is fixed. This may or may not be a problem. If we were dealing with stationary Markov processes, for instance, and $T \geq 2$, $k\rightarrow\infty$ would be enough to get convergence. It would however be generally inadequate to learn a second (or higher) order Markov chain. (You can't, in general, identify the effects of what happened two time steps ago if you never know what happened two time steps ago.) Similarly if the process is $m$-dependent (i.e., $X(t)$ can depend on $X(t-m)$ but is independent of $X(t-m-1)$ and earlier), then we need $T \geq m$. In the general case of a non-Markov, non-$m$-dependent process, I think we need $T \rightarrow \infty$, but I don't have a knock-down proof.
- Non-likelihood inference. If we're minimizing a loss function other than negative log probability, the same logic holds: calculate separately across trajectories, and add up across trajectories. (Thus indirect inference using multiple observed trajectories is pretty straightforward.) A delicate question is what to do if loss function is a time average (e.g., mean-squared error per unit time), and the time series are of different lengths. My intuition is that we'd generally want to add all the raw losses first, and then divide by the total length of time, on the grounds that we have twice as much information from a time series that is twice as long. (This is implicit in just summing the log-likelihoods.) If, however, the duration of each realization is going to infinity, then (assuming ergodicity) we should get the same limit if we take the within-realization time average first, and then average cross-sectionally, across realizations. (See (2) above.)
As the above notes make clear, I tend to regard "panel" or "longitudinal" data analysis as a very straightforward extension of ordinary time series analysis, to the point where I'm not sure why it's regarded as a separate subject. I dare say this means just I don't understand it very well yet.
See also: Graphical Models; Synchronization
- Recommended:
- Tianjiao Chu and Clark Glymour, "Search for Additive Nonlinear Time Series Causal Models", Journal of Machine Learning Research 9 (2008): 967--991
- Peter J. Diggle, Kung-Yee Liang and Scott L. Zeger, Analysis of Longitudinal Data [Exclusively concerned with the many-independent-time-series setting]
- Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, Alan S. Willsky, "Joint Modeling of Multiple Related Time Series via the Beta Process", arxiv:1111.4226
- Emily B. Fox, Mike West, "Autoregressive Models for Variance Matrices: Stationary Inverse Wishart Processes", arxiv:1107.5239
- Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
- To read:
- Shun-ichi Amari, "Estimating Functions of Independent Component Analysis for Temporally Correlated Signals," Neural Computation 12 (2000): 2083--2107
- Sumanta Basu, George Michailidis, "Estimation in High-dimensional Vector Autoregressive Models", arxiv:1311.4175
- Nathaniel Beck and Jonathan N. Katz
- "What to Do (and Not to Do) with Time-Series Cross-Section Data", American Political Science Review 89 (1995): 634--647 [JSTOR]
- Commentary by the authors, American Political Science Review 100 (2006): 676--677
- Jose Bento, Morteza Ibrahimi and Andrea Montanari, "Learning Networks of Stochastic Differential Equations", NIPS 23 (2010), arxiv:1011.0415
- Alain Berlinet and Gérar Biau, "Minimax Bounds in Nonparametric Estimation of Multidimensional Deterministic Dynamical Systems", Statistical Inference for Stochastic Processes 4 (2001): 229--248 ["We consider the problem of estimating a multidimensional discrete deterministic dynamical system from the first n+1 observations. We exhibit the optimal rate function ... the near neighbor estimator achives this optimal rate.... optimal rate function is defined from multidimensonal spacings which are edge lengths of simplicies associated with a triangulation of the Voronoi cells built from the observations." Sounds very cool!]
- Alain Berlinet and Christian Francq, "On the Identifiability of minimal VARMA representations", Statistical Inference for Stochastic Processes 1 (1998): 1--15
- Miles Crosskey, Mauro Maggioni, "ATLAS: A geometric approach to learning high-dimensional stochastic systems near manifolds", arxiv:1404.0667
- Richard A. Davis, Pengfei Zang, Tian Zheng, "Sparse Vector Autoregressive Modeling", arxiv:1207.0520
- Michael Eichler
- "Fitting Graphical Interaction Models to Multivariate Time Series", UAI 2006, arxiv:1206.6839
- "Graphical modelling of multivariate time series", math.ST/0610654
- Roland Fried and Vanessa Didelez, "Latent variable analysis and partial correlation graphs for multivariate time series", Statistics and Probability Letters 73 (2005): 287--296
- Stefan Haufe, Guido Nolte, Klaus-Robert Mueller and Nicole Kraemer, "Sparse Causal Discovery in Multivariate Time Series", arxiv:0901.1234 [I am not altogether happy with defining "causes" as "has a non-zero coefficient in a vector autoregression"...]
- Daniel M. Keenan, Xin Wang, Steven M. Pincus and Johannes D. Veldhuis, "Modelling the nonlinear time dynamics of multidimensional hormonal systems", Journal of Time Series Analysis 33 (2012): 779--796
- Lutz Kilian and Helmut Lutkepohl, Structural Vector Autoregressive Analysis
- Helmut Lutkepohl, New Introduction to Multiple Time Series Analysis
- Norbert Marwan and Jurgen Kurths, "Nonlinear analysis of bivariate data with cross recurrence plots," physics/0201061
- Norbert Marwan, M. Thiel, N. R. Nowaczyk, "Cross Recurrence Plot Based Synchronization of Time Series," physics/0201062
- Norbert Marwan, N. Wessel, U. Meyerfeldt, A. Schirdewan, J. Kurths, "Recurrence Plot Based Measures of Complexity and its Application to Heart Rate Variability Data," physics/0201064
- Tomomichi Nakamura, Yoshito Hirata, and Michael Small, "Testing for correlation structures in short-term variabilities with long-term trends of multivariate time series", Physical Review E 74 (2006): 041114
- Yuval Nardi, Alessandro Rinaldo, "Autoregressive Process Modeling via the Lasso Procedure", arxiv:0805.1179
- Sahand Negahban, Pradeep Ravikumar, Martin J. Wainwright, Bin Yu, "A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers", arxiv:1010.2731
- Gregory C. Reinsel, Elements of Multivariate Time Series Analysis [Vector ARMA will continue until morale improves]
- Suchi Saria, Daphne Koller, Anna Penn, "Discovering shared and individual latent structure in multiple time series", arxiv:1008.2028
- Przemyslaw Sliwa and Wolfgang Schmid, "Monitoring the cross-covariances of a multivariate time series", Metrika 61 (2005): 89--115
- Song Song and Peter J. Bickel, "Large Vector Auto Regressions", arxiv:1106.3915
- Granville Tunnicliffe Wilson, Marco Reale and John Haywood, Models for Dependent Time Series