Exchangeable Random Sequences

09 Aug 2019 12:31

A sequence \( X \) of random variables, \( X_1, X_2, \ldots \) is exchangeable if its distribution is invariant under permutation of the indices. What implications does such symmetry have for a stochastic process?

(To make this precise needs some notation. Start with a finite vector of non-repeating indices \( I \), say (1, 30, 2, 47, 94980) for definiteness. Then the collection of the \( X_i \) at those indices, in that order, \( X_I \), has some distribution or other. [Formally speaking, the distribution of \( X_I \) is obtained by applying a projection operator to the infinite-dimensional distribution of \( X \).] Now pick some permutation \( s \) of all the indices, not necessarily just those in \( I \). Write \( sX \) for the sequence we get by permuting \( X \), so \( {(sX)}_j = X_{s^{-1}(j)} \), and so on. For instance, s might swap each odd-numbered index with the even-numbered index one larger, so 1 with 2, 30 with 29, 47 with 48, 94980 with 94979. We say that the sequence \( X \) is exchangeable when \( X_I \) has the same distribution as \( {(sX)}_{I} \), for all permutations \( s \) and all finite sets of indices \( I \). [The Kolmogorov extension theorem then tells us that X and sX have the same infinite-dimensional distribution.] In particular, we'd need \( X_{(1, 30, 2, 47, 94980)} = X_{(2, 29, 1, 48, 94979)} \). In fact, it is easy to check that two necessary conditions for X to be exchangeable are that (i) \( X_I \) has the same distribution as \( X_J \) whenever \( I \) and \( J \) have the same length; and (ii) for each \( n \), the distribution of \( X_{1:n} \) is invariant under permutations of the first \( n \) indices. I leave it as an exercise to the reader to determine whether the combination of these necessary conditions is sufficient for exchangeability.)

It's easy to confirm that a sequence of independent, identically-distributed (IID) random variables is exchangeable. This at least shows that the concept is not self-contradictory. Is it, however, just another way of saying "IID"? The answer, interestingly enough, turns out to be "almost". The classical result, generally called "de Finetti's theorem" after the person who proved the first version, is that every infinite-dimensional exchangeable distribution \( m \) is a mixture of infinite-dimensional IID distributions: \[ m = \int_{\mathrm{IID}}{\lambda dp(\lambda)} \] where \( p \) is some prior distribution over IID distributions \( \lambda \). In terms of random variables and sequences, the way to generate an infinite exchangeable sequence is to first randomly chose a distribution for \( X_1 \), and then keep drawing all later \( X_i \), independently, from that same distribution. Symbolically, \[ \begin{eqnarray*} \lambda & \sim & p\\ X_i|\lambda & \sim_{\mathrm{IID}} & \lambda \end{eqnarray*} \] (It is important that the sequences be infinite here. A finite-dimensional exchangeable distribution is not necessarily a mixture of finite-dimensional IID distributions, though in a sense it must be close to such a mixture; see Diaconis and Freedman.)

A mixture of IID distributions is not IID. To see this, consider a very simple example, where the Xi are binary, say coin-tosses, with some bias towards heads. There is a well-defined distribution where all the coin-tosses are independent, and the probability of heads is 75% per toss. There is another where the coin tosses are independent, and the probability of heads is only 25%. If we take an equally-weighted mixture of these, we get an exchangeable sequence where the probability of heads on any given toss is 50%. However, the coin tosses in the mixture are no longer independent. If the first \( n \) tosses contain \( h \) heads and \( n - h \) tails, then, by Bayes's rule, the odds that we initially chose the head-biased distribution are not 1:1 but \[ \frac{{n \choose h} {\left(\frac{3}{4}\right)}^h {\left(\frac{1}{4}\right)}^{n-h} }{{n \choose h} {\left(\frac{1}{4}\right)}^h {\left(\frac{3}{4}\right)}^{n-h}} = \frac{3^h}{3^{n-h}} = 3^{2h -n} \] and so the probability that the \( {n+1}^{\mathrm{st}} \) coin toss will be heads is \[ \frac{3}{4}\frac{3^{2h -n}}{1+3^{2h -n}} + \frac{1}{4}\frac{1}{1+3^{2h -n}} = \frac{3^{2h-n+1}+1}{4(3^{2h-n}+1)} \] which will not be 1/2 unless h = n/2.

The de Finetti theorem has some important uses for Bayesian statistical inference — for instance, it lets helps us test prior distributions, i.e., check Bayesian models — but since I am not a Bayesian I don't find those all that compelling. Rather, I am interested in the mathematical extensions. In one direction, de Finetti's theorem is a special case of the decomposition of stationary processes into stationary-and-ergodic processes. (Or, even more generally, of asymptotically mean-stationary processes into AMS-and-ergodic processes.) In another, exchangeability has a natural extension to graphs and arrays, which is important for network data analysis, and where the extremal distributions are rather more interesting than just IID sequences.

See also: Characterizing Mixtures of Processes by Summarizing Statistics