Koopman Operators for Modeling Dynamical Systems and Time Series

Last update: 07 Jul 2025 13:30
First version: 21 July 2022

Start with your favorite deterministic dynamical system, say (in discrete time to make things easy) \( x_{t+1} = f(x_t) \), where \( x_t \) is the state. Ordinarily we think of time evolving by repeatedly applying the mapping \( f \), so \( x_{t+2} = f(f(x_t)) = f^{(2)}(x_t) \), and so forth. In general \( x_t = f^{(t)}(x_0) \). The state evolves, according to repeated applications of the map. In general, this mapping is extremely nonlinear, and it only becomes harder to parse out after multiple time steps.

Now consider any nice function \( h \) of the state, which gives us an observable \( y_t = h(x_t) \). How does this observable evolve? Well, obviously \( y_{t+1} = h(f(x_t)) \), and in general \( y_t = h(f^{(t)}(x_0) \). So far so trivial. The trick comes from realizing that what we have actually done is define an operator on the space of observables, canonically called \( K \) or \( \mathcal{K} \), where \( h(f^{(t)}(x_) = (K^t h)(x_0) \). (I'm being pedantic about parentheses to make the order of operations very clear. Also, this is a good place to say that "nice function" means "measurable function, plus any other regularity properties we might happen to find useful", e.g., sometimes we just care about square-integrable observables.) Instead of thinking about time evolution in the state space, we can just leave the state alone, and have the \( K \) operator transform the observables. The advantage of doing this is that \( K \) is a linear operator on the space of observables; it's really easy to convince yourself that \(K (h_1+h_2) = K h_1 + K h_2 \). And linear operators are easy! It's true we've gone from a finite-dimensional state space to an infinite-dimensional function space, but linearity is still a really powerful simplification. If the gods were very kind, \( K \) would have a countable basis in eigenfunctions, \( K \phi_i = \lambda_i \phi_i \), and \( h(x) = \sum_{i=1}^{\infty}{c_i \phi_i(x)} \) for some coefficients \( c_i \). Then the dynamics of any observable would be really simple: \[ K^t h(x_0) = \sum_{i=1}^{\infty}{K^t c_i \phi_i(x_0)} = \sum_{i=1}^{\infty}{\lambda_i^t c_i \phi_i(x_0)} \] If we want the dynamics of a different observable, \( z_t = g(x(t)) \), then we'd have \( g(x) = \sum_{i=1}^{\infty}{d_i \phi_i(x)} \), and the dynamics would be \( g(x_t) = K^t g(x_0) = \sum_{i=1}^{\infty}{\lambda^i_t d_i \phi_i(x_0)} \). That is, only the coefficients going into the observable would change. We could think of \( \phi_i(x_0) \) as an (infinite) set of coordinates in which the dynamics are linear (and governed by the eigenvalues \( \lambda_i \)), and the observations are also linear (and given by the coefficients defining the observables). Even if the gods are not quite so kind, and we have to learn some actual spectral theory, linear dynamics, even on an infinite-dimensional space, is still a lot nicer to have to deal with than nonlinearity...

(For a stochastic process, or at least a Markov process, we'd have the contrast between the "transition kernel" which gives the conditional distribution for the next state, \( \kappa(x, A) \equiv \mathbb{P}\left( X_{t+1} \in A| X_t =x \right) \), versus the conditional expectation of an observable, \( \mathbb{E}\left( h(X_{t+1}) | X_t =x \right) \). Now, rather than looking at the conditional distribution directly through the kernel, we can define the Markov operator which takes probability measures to probability measures, \( M\nu(A) = \int{\kappa(x, A) d\nu(x)} \). (If we're dealing with a deterministic dynamical system, which is after all a special case of a Markov process, the equivalent of the Markov operator is called a Perron-Frobenius or Frobenius-Perron operator.) This is a linear operator on probability measures (and every linear operator on probability measures likewise defines a kernel.) The adjoint operator to the Markov operator, which acts on observables, is called, in the literature, the transition operator. So lots of Koopman operator theory generalizes very directly to the theory of Markov operators.)

The first person to clearly realize all this was, indeed, Koopman in the 1930s. (I haven't read his original papers so I won't cite them, but the references I do give below agree on this history and I trust them.*) For a long time this was just a bit of a neat technical trick. (That's certainly how I learned it in graduate school, as part of ergodic theory, and how I used it in a 2004 paper on the arrow of time.) What's intriguing to me, and why I have begun this notebook, is that since then, and especially over the last decade, people have begun trying to practically use this idea, by learning or estimating \( K \) from observations. In particular, control theorists seem to be very taken with this. Of course this involves some sort of finite-dimensional truncation of the infinite-dimensional operator, sometimes, it seems to me, an extremely crude one**.

I don't have any immediate plans to do anything with or in this literature, but I do want to keep track of it. In particular, at some point I want to really wrap my head around whether learning the infinite-dimensional but linear operator \( K \) is really any easier than learning the finite-dimensional but nonlinear map \( f \) directly. Also, the truncations involved in work with "data-driven" Koopman operators make me wonder about using random features somehow. In particular, I offer a conjecture, with the disclaimer that I have thought about it for, literally, five minutes: Say the underlying state \( x_0 \) lives on a \( d \)-dimensional manifold. Pick \( m \geq 2d+1 \) real-valued observables \( h_1, \ldots h_{m} \) from a probability distribution supported on some set of nice functions \( H \). (E.g,, \( H \) might consist of finite-frequency sine waves with random phase offsets.) Conjecture: An operator which linearly evolves \( h_1, \ldots h_m \) can be extended (somehow) to an operator which linearly evolves any function in the span of \( H \). (E.g., the span of random sine waves is all functions with nice Fourier transforms.)

*: The difference between the state-evolution viewpoint and the observable-evolution viewpoint corresponds to the difference between the Schrodinger and the Heisenberg "pictures" of quantum mechanics. ("Recall" that in time-indepenent QM, if the system has wave-function \( \psi \), the expected value of an observable, represented by an operator \( A \), is \( \langle \psi | A | \psi \rangle \equiv \int{\psi^*(x) A \psi(x) dx} \). In the Schrodinger picture, we time-evolve the wave function, so \( \psi \) at time 0 evolves to \( e^{iHt} \psi \) at time \( t \), \( H \) being the Hamiltonian operator. In the Heisenberg picture, wave functions are static, but the operators representing observables evolve, so \( A(t) = e^{-iHt} A e^{iHt} \). Either way we get the same expression for the expectation of the observable at time \( t \), namely \( \int{\psi(x) e^{-iHt} A e^{iHt} \psi(x) dx} \). This distinction goes back to the 1920s, so I imagine if we were to read back into the history of operator theory before 1926 we'd find someone (Hilbert?) stating the idea as what Terence Tao would call a "trick" (or however you said that in German a century ago). --- Incidentally, before you start wondering whether quantum mechanics, which is linear and infinite-dimensional, mightn't just be the result of looking at the Koopman (or Frobenius-Perron) operators of a finite-dimensional nonlinear dynamical system, I remember hearing my nonlinear dynamics teachers idly batting around the same notion back in the 1990s. They didn't think it was worth pursuing, for a whole host of reasons (starting with Bell's inequalities), and I do not presume to be wiser than them.

**: In particular, "dynamical mode decomposition" seems to mean just "fit a VAR(1) to successive observations and then prophesy upon the eigenvectors", but perhaps I am missing some subtleties.

Equations of Motion from a Time Series
Operator Semigroups

Steven L. Brunton, Marko Budišić, Eurika Kaiser and J. Nathan Kutz, "Modern Koopman Theory for Dynamical Systems", SIAM Review 64 (2022): 229--340 [With apologies to the second author for my ignorant inability to reproduce the accent symbols in his family name in HTML; and with thanks to one of my neighbors for leaving this copy of SIAM Review in the local little free library!]
Andrzej Lasota and Michael C. Mackey, Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics [Though this focuses more on the Frobenius-Perron operator that linearly evolves distributions over states than on the Koopman operator that linearly evolves observables]

CRS, Almost None of the Theory of Stochastic Processes [Where I tried to explain what I knew about Markov and transition operators, when I knew something about Markov and transition operators]
CRS, "The Backwards Arrow of Time of the Consistently Bayesian Statistical Mechanic", cond-mat/0410063 [Self-exposition]

Craig Bakker, Steven Rosenthal, Kathleen E. Nowak, "Koopman Representations of Dynamic Systems with Control", arxiv:1908.02233
Marko Budišić, Ryan M. Mohr and Igor Mezić, "Applied Koopmanism", Chaos 22 (2012): 047510, arxiv:1206.3164
Ido Cohen and Guy Gilboa, Latent Modes of Nonlinear Flows: a Koopman Theory Analysis
Matthew J. Colbrook, "The Multiverse of Dynamic Mode Decomposition Algorithms", arxiv:2312.00137
Stefan Klus, Ingmar Schuster, Krikamol Muandet, "Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces", Journal of Nonlinear Science 30 (2020): 283--315, arxiv:1712.01572
Vladimir Kostic, Pietro Novelli, Andreas Maurer, Carlo Ciliberto, Lorenzo Rosasco, Massimiliano Pontil, "Learning Dynamical Systems via Koopman Operator Regression in Reproducing Kernel Hilbert Spaces", arxiv:2205.14027
Matthew D. Kvalheim, Philip Arathoon, "Linearizability of flows by embeddings", arxiv:2305.18288
Yen Ting Lin, Yifeng Tian, Daniel Livescu, Marian Anghel, "Data-driven learning for the Mori--Zwanzig formalism: a generalization of the Koopman learning framework", arxiv:2101.05873
Ilan Naiman, N. Benjamin Erichson, Pu Ren, Michael W. Mahoney, Omri Azencot, "Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs", arxiv:2310.02619
Samuel E. Otto and Clarence W. Rowley, "Koopman Operators for Estimation and Control of Dynamical Systems", Annual Review of Control, Robotics, and Autonomous Systems 4 (2021): 59--87
Adam Rupe, Derek DeSantis, Craig Bakker, Parvathi Kooloth, Jian Lu, "Causal Discovery in Nonlinear Dynamical Systems using Koopman Operators", arxiv:2410.10103
Manuel Santos Gutiérrez, Valerio Lucarini, Mickaël D. Chekroun, Michael Ghil, "Reduced-Order Models for Coupled Dynamical Systems: Data-driven Methods and the Koopman Operator", Chaos 31 (2021): 053116, arxiv:2012.01068
Ali Tavasoli, Teague Henry, Heman Shakeri, "A purely data-driven framework for prediction, optimization, and control of networked processes: application to networked SIS epidemic model", arxiv:2108.02005