Koopman Operators for Modeling Dynamical Systems and Time Series

24 Feb 2024 17:05

Start with your favorite deterministic dynamical system, say (in discrete time to make things easy) \( x_{t+1} = f(x_t) \), where \( x_t \) is the state. Ordinarily we think of time evolving by repeatedly applying the mapping \( f \), so \( x_{t+2} = f(f(x_t)) = f^{(2)}(x_t) \), and so forth. In general \( x_t = f^{(t)}(x_0) \). The state evolves, according to repeated applications of the map. In general, this mapping is extremely nonlinear, and it only becomes harder to parse out after multiple time steps.

Now consider any nice function \( h \) of the state, which gives us an observable \( y_t = h(x_t) \). How does this observable evolve? Well, obviously \( y_{t+1} = h(f(x_t)) \), and in general \( y_t = h(f^{(t)}(x_0) \). So far so trivial. The trick comes from realizing that what we have actually done is define an operator on the space of observables, canonically called \( K \) or \( \mathcal{K} \), where \( h(f^{(t)}(x_) = (K^t h)(x_0) \). (I'm being pedantic about parentheses to make the order of operations very clear. Also, this is a good place to say that "nice function" means "measurable function, plus any other regularity properties we might happen to find useful", e.g., sometimes we just care about square-integrable observables.) Instead of thinking about time evolution in the state space, we can just leave the state alone, and have the \( K \) operator transform the observables. The advantage of doing this is that \( K \) is a linear operator on the space of observables; it's really easy to convince yourself that \(K (h_1+h_2) = K h_1 + K h_2 \). And linear operators are easy! It's true we've gone from a finite-dimensional state space to an infinite-dimensional function space, but linearity is still a really powerful simplification. If the gods were very kind, \( K \) would have a countable basis in eigenfunctions, \( K \phi_i = \lambda_i \phi_i \), and \( h(x) = \sum_{i=1}^{\infty}{c_i \phi_i(x)} \) for some coefficients \( c_i \). Then the dynamics of any observable would be really simple: \[ K^t h(x_0) = \sum_{i=1}^{\infty}{K^t c_i \phi_i(x_0)} = \sum_{i=1}^{\infty}{\lambda_i^t c_i \phi_i(x_0)} \] If we want the dynamics of a different observable, \( z_t = g(x(t)) \), then we'd have \( g(x) = \sum_{i=1}^{\infty}{d_i \phi_i(x)} \), and the dynamics would be \( g(x_t) = K^t g(x_0) = \sum_{i=1}^{\infty}{\lambda^i_t d_i \phi_i(x_0)} \). That is, only the coefficients going into the observable would change. We could think of \( \phi_i(x_0) \) as an (infinite) set of coordinates in which the dynamics are linear (and governed by the eigenvalues \( \lambda_i \)), and the observations are also linear (and given by the coefficients defining the observables). Even if the gods are not quite so kind, and we have to learn some actual spectral theory, linear dynamics, even on an infinite-dimensional space, is still a lot nicer to have to deal with than nonlinearity...

(For a stochastic process, or at least a Markov process, we'd have the contrast between the "transition kernel" which gives the conditional distribution for the next state, \( \kappa(x, A) \equiv \mathbb{P}\left( X_{t+1} \in A| X_t =x \right) \), versus the conditional expectation of an observable, \( \mathbb{E}\left( h(X_{t+1}) | X_t =x \right) \). Now, rather than looking at the conditional distribution directly through the kernel, we can define the Markov operator which takes probability measures to probability measures, \( M\nu(A) = \int{\kappa(x, A) d\nu(x)} \). (If we're dealing with a deterministic dynamical system, which is after all a special case of a Markov process, the equivalent of the Markov operator is called a Perron-Frobenius or Frobenius-Perron operator.) This is a linear operator on probability measures (and every linear operator on probability measures likewise defines a kernel.) The adjoint operator to the Markov operator, which acts on observables, is called, in the literature, the transition operator. So lots of Koopman operator theory generalizes very directly to the theory of Markov operators.)

The first person to clearly realize all this was, indeed, Koopman in the 1930s. (I haven't read his original papers so I won't cite them, but the references I do give below agree on this history and I trust them.*) For a long time this was just a bit of a neat technical trick. (That's certainly how I learned it in graduate school, as part of ergodic theory, and how I used it in a 2004 paper on the arrow of time.) What's intriguing to me, and why I have begun this notebook, is that since then, and especially over the last decade, people have begun trying to practically use this idea, by learning or estimating \( K \) from observations. In particular, control theorists seem to be very taken with this. Of course this involves some sort of finite-dimensional truncation of the infinite-dimensional operator, sometimes, it seems to me, an extremely crude one**.

I don't have any immediate plans to do anything with or in this literature, but I do want to keep track of it. In particular, at some point I want to really wrap my head around whether learning the infinite-dimensional but linear operator \( K \) is really any easier than learning the finite-dimensional but nonlinear map \( f \) directly. Also, the truncations involved in work with "data-driven" Koopman operators make me wonder about using random features somehow. In particular, I offer a conjecture, with the disclaimer that I have thought about it for, literally, five minutes: Say the underlying state \( x_0 \) lives on a \( d \)-dimensional manifold. Pick \( m \geq 2d+1 \) real-valued observables \( h_1, \ldots h_{m} \) from a probability distribution supported on some set of nice functions \( H \). (E.g,, \( H \) might consist of finite-frequency sine waves with random phase offsets.) Conjecture: An operator which linearly evolves \( h_1, \ldots h_m \) can be extended (somehow) to an operator which linearly evolves any function in the span of \( H \). (E.g., the span of random sine waves is all functions with nice Fourier transforms.)

*: The difference between the state-evolution viewpoint and the observable-evolution viewpoint corresponds to the difference between the Schrodinger and the Heisenberg "pictures" of quantum mechanics. ("Recall" that in time-indepenent QM, if the system has wave-function \( \psi \), the expected value of an observable, represented by an operator \( A \), is \( \langle \psi | A | \psi \rangle \equiv \int{\psi^*(x) A \psi(x) dx} \). In the Schrodinger picture, we time-evolve the wave function, so \( \psi \) at time 0 evolves to \( e^{iHt} \psi \) at time \( t \), \( H \) being the Hamiltonian operator. In the Heisenberg picture, wave functions are static, but the operators representing observables evolve, so \( A(t) = e^{-iHt} A e^{iHt} \). Either way we get the same expression for the expectation of the observable at time \( t \), namely \( \int{\psi(x) e^{-iHt} A e^{iHt} \psi(x) dx} \). This distinction goes back to the 1920s, so I imagine if we were to read back into the history of operator theory before 1926 we'd find someone (Hilbert?) stating the idea as what Terence Tao would call a "trick" (or however you said that in German a century ago). --- Incidentally, before you start wondering whether quantum mechanics, which is linear and infinite-dimensional, mightn't just be the result of looking at the Koopman (or Frobenius-Perron) operators of a finite-dimensional nonlinear dynamical system, I remember hearing my nonlinear dynamics teachers idly batting around the same notion back in the 1990s. They didn't think it was worth pursuing, for a whole host of reasons (starting with Bell's inequalities), and I do not presume to be wiser than them.

**: In particular, "dynamical mode decomposition" seems to mean just "fit a VAR(1) to successive observations and then prophesy upon the eigenvectors", but perhaps I am missing some subtleties.

See also: Equations of Motion from a Time Series