## Cumulants, and Cumulant Generating Functions

*24 Mar 2024 13:32*

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \]Attention conservation notice:Mostly an embarrassing admission of not understanding stuff I should have grasped decades ago.

I still have no real intuition for cumulants, despite their importance to
probability theory, their role in quantum field theory (which I first studied
more than a quarter century ago), and despite my having used cumulant
generating functions in multiple papers
(e.g.). Since I am middle aged,
and trying to be more shameless about admitting to ignorance, I will follow my
usual procedure, of writing down what I *do* understand, until I get to
the point where I *know* I'm lost. This includes a lot of "working
things out from first principles", which really means "reconstructing from
memory (badly)".

## Moment generating functions

I understand, I think, moment generating functions. I start with
my favorite random variable \( X \), which has (I assume) **moments** \( \Expect{X^k} \) for all integer \( k \). As was revealed to us
by the Illuminati, if I collect all of those in the right power
series,
\[
M_X(t) \equiv \sum_{k=0}^{\infty}{\frac{\Expect{X^k}}{k!}t^k}
\]
I get a function of \( t \), the **moment generating function** (MGF),
where the derivatives at the origin "encode" the moments:
\[
\left. \frac{d^k M_X}{d t^k} \right|_{t=0} = \Expect{X^k}
\]

(For this to make sense dimensionally, the units of \( t \) need to be the reciprocal of whatever the units of \( X \) happen to be --- inverse kilograms, or inverse dollars, or square inches per pound, as the case happens to be. --- Also, I will sometimes drop the subscript \( X \) for brevity, but only when it can be understood from context.)

#### Generating Functions: Why???!?

At this point, while I'm in a confessional mood, I should mention that I
never got the *point* of generating functions while I was a student. (I
am sure my teachers explained it but I tuned them out or otherwise didn't get
it.) If you start from the definition, as I gave it above, then it seems like
you have to already know all the moments to get the MGF, so the MGF
doesn't *tell* you anything. Maybe there are some circumstances where
if you forget one of the moments but remember the generating function, it's easier to
differentiate \( M_X \) than it is to integrate \( \int{ x^k p(x) dx } \), but
that hardly seemed important enough to warrant all the fuss. The answer, which
didn't click for me until embarrassingly far into my teaching career, is that
generating functions are useful when there's a trick to *get the generating
function first*, and *then* we can differentiate it to extract the
series it encodes.

#### Back to the Moment Generating Function

Expectations are linear, so we can equally well write \[ M_X(t) = \Expect{\sum_{k=0}^{\infty}{\frac{X^k}{k!} t^k}} \] and, recognizing the power series inside the expectation, \[ M_X(t) = \Expect{e^{tX}} \] Indeed many sources will start with that exponential form as the definition, which makes things look a little more mysterious. But the form facilitates manipulations. For instance, what's the MGF of \( a+X \), for constant \( a \)? \[ \Expect{e^{t(a+X)}} = \Expect{e^{ta} e^{tX}} = e^{ta} \Expect{e^{tX}} = e^{ta} M_X(t) \] What's the MGF of \( b X \), for constant \( b \)? \[ \Expect{e^{t bX}} = \Expect{e^{(tb)X}} = M_X(bt) \]

Even more importantly, if \( X \) and \( Y \) are statistically
independent, what's the MGF of their sum \( X+Y \)?
\[
M_{X+Y}(t) = \Expect{e^{t(X+Y)}} = \Expect{e^{tX} e^{tY}} = \Expect{e^{tX}} \Expect{e^{tY}} = M_X(t) M_Y(t)
\]
so the MGF for the sum of two independent random variables is just the product of their individual MGFs. One can, in fact, use these three rules to give
a heuristic "derivation" of the central limit theorem: define \( \overline{X}_n = n^{-1}\sum_{i=1}^{n}{X_i} \) for independent and identically distributed \( X_i \), with mean \( \mu \) and variance \( \sigma^2 \), then
\[
\sqrt{n}\frac{\overline{X}_n - \mu}{\sigma} \rightarrow \mathcal{N}(0,1)
\]
the right-hand side being the standard Gaussian (or "Normal") distribution.
(Exercise: Do this, first working out the MGF of the standard Gaussian.) I put
"derivation" in scare quotes because what this shows is that the (appropriately
centered and scaled) sample mean ends up having increasingly
Gaussian-looking *moments*, and it's not *quite* true
that convergence of all the moments implies convergence of the whole
distribution.

## From the Moment Generating Function to the Cumulant Generating Function and Then Cumulants

So much for the moment generating function. The **cumulant generating
function** is defined in terms of the MGF:
\[
C_X(t) \equiv \log{M_X(t)} = \log{\Expect{e^{tX}}}
\]
The **cumulants** are defined, as it were, derivatively:
\[
\kappa_k \equiv \left. \frac{d^k C_X}{d t^k}\right|_{t=0}
\]

(I have sometimes seen people try to motivate this by claiming to want a function that's on the same scale as \( X \), rather than exponentiated, but I'm not sure how that makes sense. As I said earlier, \( t \) needs to have units inverse to \( X \), so \( t X \) is already a dimensionless quantity...)

Some character-building work with the chain rule and quotient rule shows that \begin{eqnarray} \kappa_1 & = & \frac{M^{\prime}(0)}{M(0)}\\ & = & \Expect{X}\\ \kappa_2 & = & \frac{M(0) M^{\prime\prime}(0) - (M^{\prime}(0))^2}{M^2(0)}\\ & = & \Expect{X^2} - (\Expect{X})^2 \equiv \mathrm{Var}(X)\\ \kappa_3 & = & \frac{M^2(0) M^{\prime\prime\prime}(0) - 3 M(0) M^{\prime}(0) M^{\prime\prime}(0) + 2 (M^{\prime}(0))^3}{M^3(0)}\\ & = & \Expect{X^3} - 3\Expect{X^2}\Expect{X} + 2(\Expect{X})^3 \end{eqnarray}

The first cumulant is the first moment. The second cumulant is the second *central* moment, because \( \Expect{X^2} - \(\Expect{X})^2 = \Expect{(X-\Expect{X})^2} \). The third cumulant is actually also the third
central moment, but this is *not* true of the higher cumulants.

Now here is where my failure-to-grasp begins. (Or maybe it began earlier
and this is just where it becomes unmistakable.) I understand what the first
three cumulants are saying about the distribution. I have no intuition for
what the fourth cumulant measures, or any higher cumulant, or why *all*
of these are measuring the same *kind* of thing, the way I grasp how all
the moments, and all the central moments, are measuring the same kind of thing.
I can *show*, algebraically, that the \( k^{\mathrm{th}} \) cumulant is a
polynomial, of order \( k \), in the first \( k \) moments, and that the \(
k^{\mathrm{th}} \) moment \( \Expect{X^k} \) is always the first term in that
polynomial. (And once I've shown that, I can recover the moments from the
cumulants.) But *why* those are the *right* polynomials, I can't
tell you (or a student).

(I realize that a moment ago I was talking about how, as a student, I didn't see the point of the moment generating function when it seemed that we needed the moments to get it, and now I'm complaining that I don't understand the point of the cumulants which are, seemingly, best defined in terms of the cumulant generating function. The common themes here, across decades, are about my unpleasant combination of intellectual arrogance with mathematical ineptitude.)

I *can* tell you that when you take the sum of two independent
random variables, their cumulant generating functions add,
\[
C_{X+Y}(t) = \log{\Expect{e^{tX} e^{tY}}} = \log{\Expect{e^{tX}}} + \log{\Expect{e^{tY}}} = C_X(t) + C_Y(t)
\]
and consequently their cumulants, of whatever order, must add.
I *think* it's the case that if you demand polynomials in the moments
which add up for sums of independent random variables, regardless of their
distribution, you are forced to use the cumulants, but I'm not sure of that.
(That sounds like the kind of fact I could look up, if I were sufficiently
motivated.) Even if so, that doesn't give me any intuition about why (say) it
the third cumulant needs to be \( \Expect{X^3} - 3\Expect{X^2}\Expect{X} +
2(\Expect{X})^3 \).

#### The Cumulant Generating Function and Exponential Tail Bounds

Suppose the moment generating function exists. Then it takes a little work with Markov's inequality to conclude that \[ \Prob{ X > h } < e^{-th} M_X(t) = e^{-th+C_X(t))} ~. \] for any \( t > 0 \). Notice that the \( e^{-th} \) factor declines exponentially as \( h \) increases, but the \( e^{C_X(t)} \) factor doesn't, so we're getting an exponential bound on the probability of large values of \( X \). (And we'd better! Otherwise, \( \Expect{ e^{tX} } \) would be infinite.) Because this is true for any \( t \), we can optimize to get the tightest bound: \[ C_X^*(h) = \sup_{t > 0}{th - C_X(t)} \] and then \[ \Prob{X > h} \leq e^{-C^*_X(h)} \] So the cumulant generating function, through its transformed version \( C^* \), lets us upper-bound the probability of very large values of \( X \). Additivity for independent random variables then becomes a handy way of getting laws of large numbers, the upper-bound half of large deviation principles, etc.#### "Understanding"

As I write this all out, it occurs to me that I am not really sure what
*would* satisfy me here. A physical sense of what the cumulants measure
would be ideal, but perhaps a bit much to hope for. Another possibility would
be something like the following: I can never remember the exact coefficients of
the Hermite polynomials, or the Laguerre polynomials, or any of the other
orthogonal polynomial series. But I *can* remember that the members of
each family are supposed to be orthogonal to each other, under such-and-such a
distribution. Since there are \( n+1 \) coefficients in the polynomial of
order \( n \), and it needs to be orthogonal to the previous \( n \)
polynomials (including the order-zero, constant polynomial) and the leading
term should be just \( x^n \), there are \( n+1 \) linear equations to solve
for the coefficients, and I can find each successive polynomial recursively.
Moreover, I get why we want orthogonal functions! So one possible answer to my
puzzlement about cumulants would be "here is a desirable property of some
transformation of the moments, and here's a rule for getting those
transformations", with bonus points if it's a recurrence relation which tells
me how to get higher cumulants once I know low-order ones. (Something like
"this is the part of the kth moment you couldn't guess, somehow, from the lower
moments"?) Or, alternately, "Here's a desirable property of some
transformation of the moments, and here's a procedure for getting all the terms
which have to go in to the kth order cumulant".

With any luck, this is all well-known in some corner of the mathematical universe, and posting this will lead to my being pointed in the right direction.

## Partition functions

Supposing I'm doing equilibrium statistical mechanics, and have a bunch of discrete states \( k=0, 1, \ldots \), each with energy \( u_k \). Under the Boltzmann distribution the probability of finding the system in state \( j \) is then \[ \frac{e^{-\beta u_j}}{\sum_{k=0}^{\infty}{e^{-\beta u_k}}} \] where \( \beta = 1/k_B \tau \), \( \tau \) is the absolute temperature, and \( k_B \) is Boltzmann's constant. The normalizing factor in the denominator gets broken out as its own thing, the**partition function**, \[ z(\beta) \equiv \sum_{k=0}^{\infty}{e^{-\beta u_k}} \]

This isn't the moment generating function, but it's *close* to the
moment generating function, thanks to the magic properties of exponentials:
\[
\begin{eqnarray}
M_U(t) & = & \Expect{e^{tU}}\\
& = & \sum_{k}{e^{t u_k} \frac{e^{-\beta u_k}}{z(\beta)}}\\
& = & \frac{\sum_{k}{e^{-(\beta-t) u_k}}}{z(\beta)}\\
& = & \frac{z(\beta-t)}{z(\beta)}
\end{eqnarray}
\]

That is, the MGF is the *ratio* of values of the partition function.
It follows that
\[
C_U(t) = \log{z(\beta-t)} - \log{z(\beta)}
\]

So the cumulant generating function is the difference between log partition functions. In terms of extracting the cumulants, however, what we care about are derivatives with respect to \( t \), so the second term doesn't actually matter.

Now, in statistical mechanics, we know that \( f(\beta) \equiv
\log{z(\beta)} \) is the **(Helmholtz) free energy**, so we've
just convinced ourselves that the free energy is basically the cumulant
generating function (at least for Boltzmann distributions). But this is a
little funny; in stat. mech. we're taught to take derivatives of \( f \) *at
inverse temperature* \( \beta \) to find the expected energy, the variance
around it, etc., but to extract cumulants we take derivatives at zero \( t=0
\). But
\[
C^{\prime}(0) = \left.\frac{1}{z(\beta-t)}z^{\prime}(\beta-t)\right|_{t=0} = -\frac{z^{\prime}(\beta)}{z(\beta)}
\]
and so on for the higher derivatives, which is what we'd get by taking derivatives of the free energy...

- To read:
- Marco Bianucci, Mauro Bologna, "About the foundation of the Kubo Generalized Cumulants theory. A revisited and corrected approach", Journal of Statistical Mechanics: Theory and Experiment (2020): 043405, arxiv:1911.09620
- Patric Bonnier, Harald Oberhauser, "Signature Cumulants, Ordered Partitions, and Independence of Stochastic Processes", arxiv:1908.06496
- Philippe Flajolet and Robert Sedgewick, Analytic Combinatorics
- Jonathan Novak, Michael LaCroix, "Three lectures on free probability", arxiv:1205.2097
- Giovanni Peccati and Murad S. Taqqu
- "Moments, cumulants and diagram formulae for non-linear functionals of random measures", arxiv:0811/1726
- Wiener Chaos: Moments, Cumulants and Diagrams: A survey with Computer Implementation

- Herbert S. Wilf, Generatingfunctionology