Notebooks

Multiple or Vector Time Series

17 Dec 2023 19:53

Lots of general ideas of time series analysis don't care about whether the time series is a scalar, a vector, or something weirder. But there are situations where it's interesting to parse out coordinates of a vector evolving over time, and consider their inter-relations, so this notebook collects references for this problem.

I am particularly interested in using factor models here, so they get their own notebook.

Basic tricks: vector autoregression, \[ \vec{X}(t+1) = \mathbf{a}\vec{X}(t) + \vec{\epsilon}(t+1) \] and similarly vector moving average or vector ARMA; vectorized additive models, \[ X_i(t+1) = \sum_{j=1}^{k}{f_{ij}(X_j(t))} + \epsilon_i(t+1) \] etc., etc.; all these things involve very little change from scalar time series analysis. These can be practically useful, in lieu of actual scientific models, but don't interest me very much as such.

(What econometricians call "structural" vector autoregressions strike me as simple VARs, with somewhat arbitrary restrictions imposed on the parameters, usually with only a tenuous, I might even say "metaphorical" or "mythological", connection to an actual economic theory. I am open to being persuaded that this is unfair.)

Panel or Longitudinal Data

If you have \( k \) independent realizations of the same stochastic process, then your over-all log-likelihood is \[ \sum_{i=1}^{k}{L(X^{(i)};\theta)} \] where \( L(X;\theta) \) is the log-likelihood function for one trajectory \( X \) at parameters \( \theta \). Likelihood maximization thus works pretty much as it does for one trajectory. (IIRC, this observation goes back at least to Bartlett's book on stochastic processes from the 1950s.) There are a few wrinkles to this situation which are perhaps worth some comment:

  1. Identically-distributed but not independent time series. Say you believe that all \( k \) series have the same marginal distribution, but that they're dependent. Summing their marginal log-likelihoods will thus not give you the correct over-all log-likelihood, but only a pseudo-likelihood. It will, however, generally give you a consistent estimator of that marginal distribution. Ignoring the dependence will manifest as a lack of statistical efficiency, which might be more than compensated for by gains to computational efficiency.
  2. Many short time series. Ordinarily, when considering one long time series, we analyze inference in the limit where the trajectory's duration \( T \rightarrow\infty \). With multiple time series, each can in principle have its own duration \( T_i \), and so there's a limit where \( k \rightarrow \infty \) but \( \max_{i}{T_i} = T \) is fixed. This may or may not be a problem. If we were dealing with stationary Markov processes, for instance, and \( T \geq 2 \), \( k\rightarrow\infty \) would be enough to get convergence. It would however be generally inadequate to learn a second (or higher) order Markov chain. (You can't, in general, identify the implications of what happened two time steps ago if you never know what happened two time steps ago.) Similarly if the process is \( m \)-dependent (i.e., \( X(t) \) can depend on \( X(t-m) \) but is independent of \( X(t-m-1) \) and earlier), then we need \( T \geq m \). In the general case of a non-Markov, non-\( m \)-dependent process, I think we need \( T \rightarrow \infty \) in general, but I don't have a knock-down proof.
  3. Non-likelihood inference. If we're minimizing a loss function other than negative log probability, the same logic holds: calculate separately across trajectories, and add up across trajectories. (Thus indirect inference using multiple observed trajectories is pretty straightforward.) A delicate question is what to do if loss function is a time average (e.g., mean-squared error per unit time), and the time series are of different lengths. My intuition is that we'd generally want to add all the raw losses first, and then divide by the total length of time, on the grounds that we have twice as much information from a time series that is twice as long. (This is implicit in just summing the log-likelihoods.) If, however, the duration of each realization is going to infinity, then (assuming ergodicity) we should get the same limit if we take the within-realization time average first, and then average cross-sectionally, across realizations. (See (2) above.)

As the above notes make clear, I tend to regard "panel" or "longitudinal" data analysis as a very straightforward extension of ordinary time series analysis, to the point where I'm not sure why it's regarded as a separate subject. I dare say this means just I don't understand it very well yet.

See also: Graphical Models; Synchronization


Notebooks: