Exchangeable Random Sequences

Last update: 07 Jul 2025 13:42
First version: 14 March 2012

A sequence \( X \) of random variables, \( X_1, X_2, \ldots \) is exchangeable if its distribution is invariant under permutation of the indices. What implications does such symmetry have for a stochastic process?

(To make this precise needs some notation. Start with a finite vector of non-repeating indices \( I \), say (1, 30, 2, 47, 94980) for definiteness. Then the collection of the \( X_i \) at those indices, in that order, \( X_I \), has some distribution or other. [Formally speaking, the distribution of \( X_I \) is obtained by applying a projection operator to the infinite-dimensional distribution of \( X \).] Now pick some permutation \( s \) of all the indices, not necessarily just those in \( I \). Write \( sX \) for the sequence we get by permuting \( X \), so \( {(sX)}_j = X_{s^{-1}(j)} \), and so on. For instance, s might swap each odd-numbered index with the even-numbered index one larger, so 1 with 2, 30 with 29, 47 with 48, 94980 with 94979. We say that the sequence \( X \) is exchangeable when \( X_I \) has the same distribution as \( {(sX)}_{I} \), for all permutations \( s \) and all finite sets of indices \( I \). [The Kolmogorov extension theorem then tells us that X and sX have the same infinite-dimensional distribution.] In particular, we'd need \( X_{(1, 30, 2, 47, 94980)} = X_{(2, 29, 1, 48, 94979)} \). In fact, it is easy to check that two necessary conditions for X to be exchangeable are that (i) \( X_I \) has the same distribution as \( X_J \) whenever \( I \) and \( J \) have the same length; and (ii) for each \( n \), the distribution of \( X_{1:n} \) is invariant under permutations of the first \( n \) indices. I leave it as an exercise to the reader to determine whether the combination of these necessary conditions is sufficient for exchangeability.)

It's easy to confirm that a sequence of independent, identically-distributed (IID) random variables is exchangeable. This at least shows that the concept is not self-contradictory. Is it, however, just another way of saying "IID"? The answer, interestingly enough, turns out to be "almost". The classical result, generally called "de Finetti's theorem" after the person who proved the first version, is that every infinite-dimensional exchangeable distribution \( m \) is a mixture of infinite-dimensional IID distributions: \[ m = \int_{\mathrm{IID}}{\lambda dp(\lambda)} \] where \( p \) is some prior distribution over IID distributions \( \lambda \). In terms of random variables and sequences, the way to generate an infinite exchangeable sequence is to first randomly chose a distribution for \( X_1 \), and then keep drawing all later \( X_i \), independently, from that same distribution. Symbolically, \[ \begin{eqnarray*} \lambda & \sim & p\\ X_i|\lambda & \sim_{\mathrm{IID}} & \lambda \end{eqnarray*} \] (It is important that the sequences be infinite here. A finite-dimensional exchangeable distribution is not necessarily a mixture of finite-dimensional IID distributions, though in a sense it must be close to such a mixture; see Diaconis and Freedman.)

A mixture of IID distributions is not IID. To see this, consider a very simple example, where the X_i are binary, say coin-tosses, with some bias towards heads. There is a well-defined distribution where all the coin-tosses are independent, and the probability of heads is 75% per toss. There is another where the coin tosses are independent, and the probability of heads is only 25%. If we take an equally-weighted mixture of these, we get an exchangeable sequence where the probability of heads on any given toss is 50%. However, the coin tosses in the mixture are no longer independent. If the first \( n \) tosses contain \( h \) heads and \( n - h \) tails, then, by Bayes's rule, the odds that we initially chose the head-biased distribution are not 1:1 but \[ \frac{{n \choose h} {\left(\frac{3}{4}\right)}^h {\left(\frac{1}{4}\right)}^{n-h} }{{n \choose h} {\left(\frac{1}{4}\right)}^h {\left(\frac{3}{4}\right)}^{n-h}} = \frac{3^h}{3^{n-h}} = 3^{2h -n} \] and so the probability that the \( {n+1}^{\mathrm{st}} \) coin toss will be heads is \[ \frac{3}{4}\frac{3^{2h -n}}{1+3^{2h -n}} + \frac{1}{4}\frac{1}{1+3^{2h -n}} = \frac{3^{2h-n+1}+1}{4(3^{2h-n}+1)} \] which will not be 1/2 unless h = n/2.

The de Finetti theorem has some important uses for Bayesian statistical inference — for instance, it lets helps us test prior distributions, i.e., check Bayesian models — but since I am not a Bayesian I don't find those all that compelling. Rather, I am interested in the mathematical extensions. In one direction, de Finetti's theorem is a special case of the decomposition of stationary processes into stationary-and-ergodic processes. (Or, even more generally, of asymptotically mean-stationary processes into AMS-and-ergodic processes.) In another, exchangeability has a natural extension to graphs and arrays, which is important for network data analysis, and where the extremal distributions are rather more interesting than just IID sequences.

Bert Fristedt and Lawrence Gray, A Modern Approach to Probability Theory [Very good chapter on exchangeable sequences]
Olav Kallenberg, Probabilistic Symmetries and Invariance Principles [Extensive, even exhaustive, treatment of exchangeability and its extensions]
Steffen L. Lauritzen, "Exchangeability and de Finetti's Theorem" [PDF slides]

Patrizia Berti, Irene Crimaldi, Luca Pratelli, and Pietro Rigo, "Rate of convergence of predictive distributions for dependent data", Bernoulli 15 (2009): 1351--1367, arxiv:1001.2152 [See comments under Statistical Learning Theory with Dependent Data]
Patrizia Berti and Pietro Rigo, "A Glivenko-Cantelli theorem for exchangeable random variables", Statistics and Probability Letters 32 (1997): 385--391 [See comments under Statistical Learning Theory with Dependent Data]
Tamara Broderick, Jim Pitman, Michael I. Jordan, "Feature allocations, probability functions, and paintboxes", arxiv:1301.6647
Persi Diaconis and David Freedman, "Finite Exchangeable Sequences", Annals of Probability (1980): 745--764
E. Hewitt and L. J. Savage, "Symmetric measures on Cartesian products", Transactions of the American Mathematical Society 80 (1955): 470--501
Matthew O. Jackson, Ehud Kalai and Rann Smorodinsky, "Bayesian Representation of Stochastic Processes Under Learning: De Finetti Revisited", Econometrica 67 (1999): 875--893 [JSTOR]
J. F. C. Kingman, "Uses of Exchangeability", Annals of Probability 6 (1978): 183--197
Vladimir Pestov, "Predictive PAC learnability: a paradigm for learning from exchangeable input data", arxiv:1006.1129 [See comments under Statistical Learning Theory with Dependent Data]

CRS and Aryeh Kontorovich, "Predictive PAC Learning and Process Decompositions", NIPS 2013, arxiv:1309.4859

Tim Austin, Dmitry Panchenko, "A hierarchical version of the de Finetti and Aldous-Hoover representations", arxiv:1301.1259