Attention conservation notice: 1600+ dry, pedantic words and multiple equations on how some heterodox economists mis-understand ergodic theory.

Robert Vienneau, at Thoughts on Economics, has posted an example of a stationary but non-ergodic stochastic process. This serves as a reasonable prompt to follow up on my comment, a propos of Yves Smith's book, that the post-Keynesian school of economists seems to be laboring under a number of confusions about "ergodicity".

I hasten to add that there is nothing wrong with Vienneau's example: it is indeed a stationary but non-ergodic process. (In what follows, I have lightly tweaked his notation to suit my own tastes.) Time proceeds in discrete steps, and \( X_t = Y Z_t \), where \( Z \) is a sequence of independent, mean-zero, variance 1 Gaussian random variables (i.e., standard discrete-time white noise), and \( Y \) is a chi-distributed random variable (i.e., the square root of something which has a chi-squared distribution). \( Z \) is transparently a stationary process, and \( Y \) is constant over time, so \( X \) must also be a stationary process. However, by simulation Vienneau shows that the empirical cumulative distribution functions from different realizations of the process do not converge on a common limit.

In fact, the result can be strengthened considerably. Given \( Y= y\), \( X
\) is just Gaussian white noise with standard deviation \( y \), so by the
Glivenko-Cantelli theorem, the empirical CDF of \( X \) converges almost surely
on the CDF of that Gaussian. The marginal distribution of \( X_t \) for each
\( t \) is however a *mixture* of Gaussians of different standard
deviations, and not a Gaussian. Conditionally on \( Y \), therefore, the
empirical CDF converges to the marginal distribution of the stationary process
with probability 0. Since this convergence has conditional probability zero
for *every* value of \( y \), it has probability zero unconditionally as
well. So Vienneau's process very definitely fails to be ergodic.

(Proof of the unconditionality claim: Let \( C \) be the indicator variable for the empirical CDF converging to the marginal distribution. \[ \mathbf{E}\left[C|Y=y\right] = 0 \] for all \( y \), but \[ \mathbf{E}\left[C\right] = \mathbf{E}\left[\mathbf{E}\left[C|Y=y\right]\right] \] by the law of total expectation.)

Two things, however, are worth noticing. First, Vienneau's \( X \) process
is a mixture of ergodic processes; second, which mixture component is sampled
from is set once, at the beginning, and thereafter each sample path looks like
a perfectly well-behaved realization of an ergodic process. These observations
generalize. The ergodic decomposition theorem (versions of which go back as
far as von Neumann's original work on ergodic theory) states that every
stationary process is a mixture of processes which are both stationary and
ergodic. Moreover, which ergodic component a sample path is in is an invariant
of the motion — there is no mixing of ergodic processes *within* a
realization. It's worth taking a moment, perhaps, to hand-wave about this.

Start with the actual definition of ergodic processes. Ergodicity is a property of the probability distribution for whole infinite sequences \( X = (X_1, X_2, \ldots X_t, \ldots ) \). As time advances, the dynamics chop off the initial parts of this sequence of random variables. Some sets of sequences are invariant under such "shifts" — constant sequences, for instance, but also many other more complicated sets. A stochastic process is ergodic when all invariant sets either have probability zero or probability one. What this means is that (almost) all trajectories generated by an ergodic process belong to a single invariant set, and they all wander from every part of that set to every other part — they are "metrically transitive". (Because: no smaller set with any probability is invariant.) From this follows Birkhoff's individual ergodic theorem, which is the basic strong law of large numbers for dependent data. If \( X \) is an ergodic process, then for any (integrable) function \( f \), the average of \( f(X_t) \) a sample path, the "time average" of \( f \), converges to a unique value almost surely. So with probability 1, time averages converge to values characteristic of the ergodic process.

Now go beyond a single ergodic probability distribution. Two distributions
are called "mutually singular" if one of them gives probability 1 to an event
which has probability zero according to the other, and vice versa. Any two
ergodic processes are either identical or mutually singular. To see this,
realize that two distributions must give different expectation values to at
least *one* function; otherwise they're the same distribution. Pick
such a distinguishing function and call it \( f \), with expectation values \(
f_1 \) and \( f_2 \) under the two distributions. Well, the set of sample
paths where
\[
\frac{1}{n}\sum_{t=1}^{n}{f(X_t)} \rightarrow f_1
\]
has probability 1 under the first measure, and probability 0 under the second.
Likewise, under the second measure the time average is almost certain to
converge on \( f_2 \), which almost never happens under the first measure. So
any two ergodic measures are mutually singular.

This means that a mixture of two (or more) ergodic processes cannot, itself,
be ergodic. But a mixture of stationary processes is stationary. So the
stationary ergodic processes are "extremal points" in the set of all stationary
processes. The convex hull of these extremal points are the set of stationary
but non-ergodic processes which can be obtained by mixing stationary and
ergodic processes. It is less trivial to show that *every* stationary
process belongs to this family, that it is a mixture of stationary and ergodic
processes, but this can indeed be done. (See, for
instance, this
beautiful paper by Dynkin.) Part of the proof shows that which ergodic
component a stationary process's sample path is in does not change over time
— ergodic components are themselves invariant sets of trajectories. The
general form of Birkhoff's theorem thus has time averages converging to
a *random* limit, which depends on the ergodic component the process
started in. This can be shown even at the advanced undergraduate level, as in
Grimmett
and Stirzaker.

At this point, three notes seem in order.

- Many statisticians will be more familiar with a special case of the ergodic decomposition, which is de Finetti's result about how infinite exchangeable random sequences are mixtures of independent and identically-distributed random sequences. The ergodic decomposition is like that, only much cooler, and not tainted by the name of a Fascist. (That said, de Finetti's theorem actually covers Vienneau's example.)
- Following tradition, I have stated the ergodic decomposition above for
stationary processes. However, it is very important that this limitation
is
*not*essential. The broadest class of processes I know of for which an ergodic decomposition holds are the "asymptotically mean-stationary processes". The defining property of such processes is that their probability laws converge in Cesaro mean. In symbols, and writing \( P_t \) for the law of the process from \( t \) onwards, we must have \[ \lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=1}^{n}{P_t(A)}} = P(A) \] for some limiting law \( P \). (I learned to appreciate the importance of AMS processes from Robert Gray's Probability, Random Processes and Ergodic Properties, and stole those ideas shamelessly for Almost None.) This allows for cyclic variation in the process, for asymptotic approach to a stationary distribution, for asymptotic approach to a cyclically varying process, etc. Every AMS process is a mixture of ergodic AMS processes, in exactly the way that every stationary process is a mixture of ergodic stationary processes.I actually don't know whether the ergodic decomposition can extend beyond this, but I suspect not, since the defining condition for AMS is very close to a Cesaro-mean decay-of-dependence property which turns out to be equivalent to ergodicity, namely that, for any two sets \( A \) and \( B \) \[ \lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=0}^{n-1}{P_1(A \cap T^{-t} B)}} = P_1(A) P(B) \] where \( T^{-t} \) are the powers of the back-shift operator (what time series econometricians usually write \( L \)), so that \( ^{-t} B \) are all the trajectories which will be in the set \( B \) in \( t \) time-steps. (See Lemma 6.7.4 in the first, online, edition, of Gray, p. 148). This means that, on average, the far future becomes unpredictable from the present.

- In light of the previous note, if dynamical systems people want to read "basin of attraction" for "ergodic component", and "natural invariant measure on the attractor" for "limit measure of an AMS ergodic process", they will not go far wrong.

As the last remark suggests, it is entirely possible for a process to be stationary and ergodic but to have sensitive dependence on initial conditions; this is generally the case for chaotic processes, which is why there are classic articles with titles like "The Ergodic Theory of Chaos and Strange Attractors". Chaotic systems rapidly amplify small perturbations, at least along certain directions, so they are subject to positive destabilizing feedbacks, but they have stable long-run statistical properties.

Going further, consider the sort of self-reinforcing urn processes which
Brian Arthur and collaborators made famous as models
of lock-in and path dependence.
(Actually, in
the classification
of my old boss Scott Page,
these models are merely state-dependent, and do not rise to the level of path
dependence, or even of phat dependence, but that's another story.) These are
non-stationary, but it is easily checked that, so long as the asymptotic
response function has only a finite number of stable fixed points, they satisfy
the definition of asymptotic mean stationarity given above. (I leave it as an
exercise whether this remains true in a case like the original Polya urn
model.) Hence they are mixtures of ergodic processes. Moreover, if we have
only a single realization — a unique historical trajectory — then
we have something which looks just like a sample path of an ergodic process,
because it is one. ("[L]imiting sample averages will behave as if they were in
fact produced by a stationary and ergodic system" — Gray, p. 235 of 2nd
edition.) That this was just one component of a larger, non-ergodic model
limits our ability to extrapolate to *other components*, unless we make
strong modeling assumptions about how the components relate to each other, but
so what?

I make a fuss about this because the post-Keynesians seem to have fallen into a number of definite errors here. (One may see these errors in e.g., Crotty's "Are Keynesian Uncertainty and Macrotheory Compatible?" [PDF], which however also has insightful things to say about conventions and institutions as devices for managing uncertainty.) It is not true that non-stationarity is a sufficient condition for non-ergodicity; nor is it a necessary one. It is not true that "positive destabilizing feedback" implies non-ergodicity. It is not true that ergodicity is incompatible with sensitive dependence on initial conditions. It is not true that ergodicity rules out path-dependence, at least not the canonical form of it exhibited by Arthur's models.

*Update*, 12 September: Fixed the embarrassing mis-spelling
of Robert's family name in my title.

*Manual trackback*: Robert Vienneau;
Beyond Microfoundations

Posted at September 01, 2010 11:50 | permanent link