Notebooks

Ergodic Theory of Markov and Related Processes

Last update: 01 May 2025 23:37
First version: 20 August 2007

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \]

Yet Another Inadequate Placeholder of references.

--- Oh, OK, I'll take a stab at this. Suppose we have a $k$ state Markov chain, with current state \( X_t \), and with transition matrix $\mathbf{p}$. By an ancient convention, $\mathbf{p}_{ij}$ is the probability of going from state $i$ to state $j$, $\mathbf{p}_{ij} = \Prob{X_{t+1}=j|X_t=i}$, so if $f$ is a length-$k$ vector of state probabilities at one time, $f \mathbf{p}$ is the vector of state probabilities at the next time step, and $f \mathbf{p}^t$ is the vector of state probabilities at time $t$. Let's assume that $\mathbf{p}$ has only a single connected component and is aperiodic, so we can go from every state to every other state and back again, and there are no periodicities to the possible return times. Then we know that (i) the largest eigenvalue of $\mathbf{p}$, $\lambda_1$, is 1, and (ii) there is a unique all-positive left eigenvector going along with this eigenvalue, say $. (That's the Frobenius-Perron theorem, or at any rate its application to this situation.) We can scale this vector until its entries sum to 1. Since they're non-negative and add up to 1, they're a possible probability distribution on the states of the chain. Since \mathbf{p} = v_1$, this distribution is invariant.

Now further assume that the linear-algebra gods are kind and the left eigenvectors of $\mathbf{p}$ , v_2, \ldots v_k$ form a basis, so we can write \[ f=\sum_{i=1}^{k}{c_i v_i} \] Now the dynamics under the Markov chain are simple: \[ f \mathbf{p} = \sum_{i=1}^{k}{c_i v_i \mathbf{p}} = \sum_{i=1}^{k}{c_i v_i \lambda_i} \] and indeed \[ f \mathbf{p}^t = \sum_{i=1}^{k}{c_i v_i \lambda_i^k} \] Since 1 is the dominant eigenvalue, $|\lambda_i| < 1$ for $i >1$. Thus every component of $f \mathbf{p}$ which comes from the other eigenvectors is shrinking exponentially fast, and \[ f \mathbf{p}^t \rightarrow c_1 v_1 \] Since $f \mathbf{p}^t$ must be a probability distribution, and so add up to 1, we can conclude that =1$, and thus \[ f \mathbf{p}^t \rightarrow v_1 \] That is, starting from any initial distribution, the distribution of the Markov chain approaches the invariant distribution. This holds in particular even if $f$ puts probability 1 on a particular state $x$: \[ X_{t}|X_0=x \squigrightarrow v_1 \] This means that \( X_t \) becomes asymptotically independent of \( X_0 \) as $t\rightarrow\infty$. But since there's nothing special about the starting state, that means that \( X_t \) and \( X_s \) become asymptotically independent as $|t-s| \rightarrow \infty$.

Now consider some real-valued function of the state $h(X_t)$. By the argument about, $h(X_t)$ and $h(X_s)$ will be asymptotically independent as $|t-s| \rightarrow \infty$. In fact, one can show that the correlation between $h(X_t)$ and $h(X_s)$ will decay exponentially fast, with the exponent depending on the second eigenvalue $\lambda_2$. (I'll write this out some other time, it's more linear algebra than this nap will allow.) But now we invoke the world's simplest ergodic theorem and conclude that $\frac{1}{n}\sum_{t=1}^{n}{h(X_t)}$ converges (in mean square) to the expected value of $h(X)$ under the distribution $.


Notebooks: