Books to Read While the Algae Grow in Your Fur, January 2015
Attention conservation notice: I have no taste.
- Jin Feng and Thomas
G. Kurtz, Large Deviations for Stochastic Processes
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)}
\]
- Kurtz is best known for work in the 1970s and 1980s on how sequences of
Markov processes converge on a limiting Markov process, especially on how they
converge on a limiting deterministic dynamical system. This book is
an extension of those ideas, and best appreciated in that context.
- As every school-child knows, we ordinarily specify a Markov process by
its transition probabilities or transition kernels, say
\[
\kappa_t(x,B) = \Pr(X_{t}\in B \mid X_0=x) ~.
\]
The transition kernels form a semi-group: $\kappa_{t+s}(x,B) =
\int{\kappa_t(y,B) \kappa_s(x,dy)}$. For analytical purposes, however, it is
more convenient to talk about transition operators, which give us conditional
expectations:
\[
T_t f(x) = \Expect{f(X_{t})\mid X_0=x} = \int{f(y)\kappa_t(x,dy)} ~.
\]
(It's less obvious, but if we're given all the transition operators, they fix
the kernels.) These, too, form a semi-group: $T_{t+s} f(x) = T_t T_s f(x)$.
The generator $A$ of the semi-group $T_t$ is, basically, the time-derivative
of the transition operators, the limit
\[
A f(x) = \lim_{t\rightarrow 0}{\frac{T_t f(x) - f(x)}{t}} ~.
\]
so
\[
\frac{d}{dt}{T_t f} = A T_t f
\]
A more precise statement, however, which explains the name "generator", is that
\[
T_t f = e^{t A}f = \sum_{m=0}^{\infty}{\frac{(tA)^m f}{m!}} ~.
\]
Notice that the transition operators and their generator are all linear
operators, no matter how nonlinear the state-to-state transitions of the Markov
process may be. Also notice that a deterministic dynamical system has
a perfectly decent transition operator: writing $g(x,t)$ for the trajectory
beginning at $x$ at time $h$, $T_t f(x) = f(g(x,t))$, and
\[
A f(x) = {\left.\frac{d T_t f(x)}{dt}\right|}_{t=0} ~.
\]
- Suppose we have a sequence of Markov processes, $X^{(1)}, X^{(2)}, \ldots$.
What Kurtz and others showed is that these converge in distribution to a
limiting process $X$ when their semi-groups $T^{(n)}_h$ converges to the
limiting semi-group $T_h$. This in turn happens when the generators $A^{(n)}$
converge on the limiting generator $A$. To appreciate why this is
natural, remember that a sequence of distributions $P^{(n)}$ converges
on a limiting distribution $P$ if and only if $\int{f(x) dP^{(n)}(x)}
\rightarrow \int{f(x) dP(x)}$ for all bounded and continuous "test" functions
$f$; and $A^{(n)}$ and $A$ generate the semi-groups which give us conditional
expectations. (Of course, actually proving a "natural" assertion is
what separates real math from mere
physicists' nonsense
hopes.) In saying this, naturally, I gloss over lots of qualifications and
regularity conditions, but this is the essence of the thing. In particular,
such
results give
conditions under which Markov processes converge on a deterministic dynamical
system, such as an ordinary differential equation. Essentially, the
limiting generator $A$ should be the differential operator which'd go along
with the ODE. These results are laws of large numbers for sequences of Markov
processes, showing how they approach a deterministic limit as the fluctuations
shrink.
- Large deviations theory, as I've
said elsewhere,
tries to make laws of large numbers quantitative. The laws say that
fluctuations around the deterministic limit decay to zero; large deviations
theory gives an asymptotic bound on these fluctuations. Roughly speaking, a
sequence of random variables or processes $X^{(n)}$ obeys the large
deviations principle when
\[
n^{-1} \log{\Prob{X^{(n)} \in B}} = -\inf_{x \in B}{I(x)}
\]
for some well-behaved "rate function" $I$. (Again, I gloss over some
subtleties about the distinctions between open and closed sets.) The subject
of this book is, depending on your point of view, either strengthening Kurtz's
previous work on convergence of Markov processes to large deviations, or
extending the large deviations theory of stochastic processes, as the title
says.
- In dealing with large deviations, it's very common to have to deal with
cumulant generating
functions. The reason is a basic approximation which goes back
to Harald
Cramér in the 1930s. Start with the Markov inequality: for a
non-negative random variable $X$, $\Prob{X \geq a} \leq \Expect{X}/a$. Since
$e^{nx}$ is an increasing function, when $t > 0$, and non-negative,
\[
\Prob{X\geq a} = \Prob{e^{tX} \geq e^{ta}} \leq e^{-ta} \Expect{e^{tX}} ~ .
\]
Now take the log:
\[
\log{\Prob{X\geq a}} \leq -ta + \log{\Expect{e^{tX}}}
\]
Since this is true for all $t$,
\[
\begin{eqnarray*}
\log{\Prob{X\geq a}} & \leq & \inf_{t}{-ta + \log{\Expect{e^{tX}}}}\\
& =& -\sup_{t}{ta - \log{\Expect{e^{tX}}}}
\end{eqnarray*}
\]
$\log{\Expect{e^{tX}}}$ is precisely the cumulant generating function of $X$,
and $\sup_{t}{ta - \log{\Expect{e^{tX}}}}$ is its Legendre transform.
- Now imagine a sequence of variables $X_1, X_2, \ldots$, where $X_n =
n^{-1}\sum_{i=1}^{n}{Y_i}$, with the $Y_i$ being (for simplicity) IID.
Then we have a very parallel calculation which gives an exponentially
shrinking probability:
\[
\begin{eqnarray*}
\Prob{X_n \geq a} & = & \Prob{\sum_{i=1}^{n}{Y_i} \geq na}\\
\Prob{X_n \geq a} & \leq & e^{-nta}{\left(\Expect{e^{tY_1}}\right)}^n\\
n^{-1}\log{\Prob{X_n \geq a}} &\leq & -\sup_{t}{ta - \log{\Expect{e^{tY_1}}}}
\end{eqnarray*}
\]
Of course, there is still the matter of getting the matching lower bound, which
I won't go into here, but is attainable.
- More generally, beyond just looking at the mean, one defines $\Lambda(f) =
\lim_{n\rightarrow\infty}{n^{-1} \log{\Expect{e^{nf(X_n)}}}}$, and the large
deviations rate function $I(x)$ is generally $\sup_{f}{f(x) - \Lambda(f)}$,
taking the supremum over all bounded, continuous functions.
- Accordingly, for each process $X_n$ we define operators which give us
conditional cumulant functions:
\[
V_n(t)f(x) = n^{-1} \log{\Expect{e^{nf(X_n(t))} \mid X_n(0)=x}}
\]
For fixed $n$ and $t$, $V_n(t)$ is a nonlinear operator (that is, nonlinear in
$f$), but they still form a semi-group, $V_n(t+s) = V_n(t) V_n(s)$.
\[
\begin{eqnarray*}
V_n(t) V_n(s) f(x) &= & n^{-1}\log{\Expect{e^{n n^{-1}\log{\Expect{e^{nf(X_n(s))}\mid X_n(0)=X_n(t)}}}\mid X_n(0)=x}}\\
& = & n^{-1}\log{\Expect{\Expect{e^{nf(X_n(s))}\mid X_n(0)=X_n(t)}\mid X_n(0)=x}}\\
& = & n^{-1}\log{\Expect{e^{nf(X_n(s+t))} \mid X_n(0)=x}} = V_n(s+t) f(x)
\end{eqnarray*}
\]
- When we have a sequence of processes, they give rise to a sequence of $V_n$
semi-groups. When that sequence of semi-groups converges to some limiting
semi-group $V(t)$, the sequence of processes obeys a large deviations
principle, and the rate function follows a somewhat complicated expression
involving the limiting operators $V(t)$. Since we're talking about a
probability measure over stochastic processes, the basic "points" are now
trajectories, functions of time $x(t)$, and the rate function is a rather
complicated expression that basically involves taking a path integral of the
conditional cumulant generating functions and so $V(t)$. Matters become
drastically simplified if we introduce what is, at least formally, the
generator of the $V_n$ group,
\[
\frac{d}{dt} V_n(t) f = H_n V_n(t) f
\]
or more explicitly
\[
H_n f = n^{-1} e^{-nf} A_n e^{nf}
\]
- Convergence of the $H_n$ operators to a limiting operator $H$ is now the key
part of showing large deviations for the processes $H_n$, and the rate function
can be written in terms of the limiting operator $H$.
- $H f(x)$ can typically be written as a function of $x$, $f(x)$ and $\nabla
f(x)$ (or perhaps also higher derivatives of $f$). Similarly, the generator
of the transitions, $A f(x)$ can often be written in terms of $x$, $f(x)$ and
the derivatives of $f$. Bundle everything other than $x$ up into a variable
$u$; then we can often write
\[
H f(x) = \sup_{u} {A(x,u) - L(x,u) }
\]
for some operator $L$. The reason for going through all these gyrations is
that then the large-deviations rate function takes a wonderfully simple form:
\[
I(x) = \inf_{u \in \mathcal{J}(x)}\int_{t=0}^{t=\infty}{L(x(t), u(t)) dt}
\]
where $\mathcal{J}(x)$ consists of all functions $u(t)$ such that, for all
reasonable $f$,
\[
f(x(t)) - f(x(0)) = \int_{s=0}^{t}{Af(x(s), u(s)) ds}
\]
- One can think of the limiting generator $A$ as saying what the derivative
of the trajectory ought, in some sense, to be, and $L$ measuring how big a
perturbation or disturbance has to be applied to drive the actual trajectory
away from this easiest and most probable path. The rate function $I(x)$ then
indicated the least amount of control that has to be applied to drive the
process along the desired trajectory $x(t)$. ("Improbable events happen
in the most probable possible way.") In fact, one can often simplify
down to
\[
I(x) = \int_{t=0}^{t=\infty}{L(x(t), \dot{x}(t)) dt}
\]
- Much of what I've just sketched were already established results in the
literature. What's new in this book are two big things: a powerful set of
methods for verifying that the conditional cumulant operators $H_n$ converge on
a limiting $H$, and another set of tools for writing $H$ in variational form.
The former is the machinery
of "viscosity
solutions", which are weaker than the ordinary notion of solutions of
differential equations. The latter brings in the machinery of control theory,
or indeed of classical mechanics — as the notation suggests, $H$ is
(like) a Hamiltonian, and $L$ (like) the corresponding Lagrangian. (The large
deviations principle is thus a principle of least action, or perhaps vice
versa, as suggested some time ago by,
e.g., Eyink.)
- The power of the Feng-Kurtz approach is considerable. Basically the whole
of the classic Freidlin-Wentzell theory of ODEs perturbed by small-amplitude
Gaussian noise falls out as an easy corollary. Moreover, the Feng-Kurtz theory
extends to much more difficult cases, outside the reach of the
Freidlin-Wentzell apprach, such as infinite-dimensional interacting particle
systems. While this book is not for the mathematically inadept, it's extremely
rewarding, and, I might add, probably about as well-written as such a book
could possibly be. Chapter 1 describes the main lines of the approach, gives
an extensive catalog of examples with explicitly heuristic reasoning, and then
describes the difficulties in the way of turning the heuristics into actual
theorems. Chapter 2 is a sketch of the main definitions and results that will
be taken up in the rest of the book. Chapters 3--4 review general concepts of
process-level large deviations, specialized in chapter 5 to Markov processes
and convergence of conditional cumulants ("classical semigroup approach").
Chapters 6 and 7 introduce viscosity solutions and their associated machinery.
Chapter 8, in many ways the heart of the book, relates the limiting conditional
cumulant semi-groups to control problems, and derives variational formulae for
rate functions. The remaining chapters are devoted to giving rigorous versions
of the examples from chapter 1.
- Obviously, from the space I've devoted to this, I think this work is
incredibly cool. I recommend it to anyone with the necessary math, and a
serious interest in large deviations, in Markov processes, or the emergence
of macroscopic regularities from microscopic interactions.
- Charles Stross, Halting State
- Mind candy science fiction, at the intersection of near-future police
procedural, for-profit central banking, and online role-playing games. It
gained quite a bit as an audiobook through being read by a Scots actor, one
Robert Ian MacKenzie (though I had to re-listen to some bits to figure out what
was being said). — Sequel.
- Stephen José Hanson
and Martin Bunzl
(eds.), Foundational Issues in Human Brain Mapping
- Like all paper collections, mixed. There is actually a certain amount of
dialogue here, which is unusual and speaks highly of the organizers, and the
papers are mostly (*) of high quality. The first three chapters are an
exchange about the use of "functional localizers" (Friston et al. con,
Saxe et al. pro); chapters 6--8 another on the related topic of
"non-independence" (Vul and Kanwisher are against it, others try to salvage
something); many of them (e.g., Poldrack's ch. 13) deal with the troublesome
idea of "pure insertion", that it makes sense to just add one psychological
process in to a task or setting, leaving the others undisturbed; many of them
(among others, ch. 13 again, and chs. 17--19) revolve around the question of
how much of psychology we need to get right, or can get right, before fMRI
results become anything more
than blinkenlights.
Of the non-controversial chapters, I would highlight Haxby on multivariate
methods, Poline et al. on inter-subject variability, and Hanson and
Glymour on causal inference (though something has gone horribly wrong with
the figures and captions in that last chapter).
- Disclaimer: While I don't think I've met either editor, I've met
at least two of the contributors at conferences, corresponded with and met
another, and owe considerable intellectual and professional debts to a fourth.
Writing a really negative review of this collection might've been awkward for
me, but it would've been easy to say nothing at all.
- *: I can't help pointing out the conspicuous
exception of chapter 12, which is so transparently a recycled grant proposal
which that it still refers to itself as "this proposal" [e.g., pp. 136 and
142.].
- Jeffrey Robinson, BitCon: The Naked Truth about BitCoin
- I picked this up on the recommendation
of Izabella
Kaminska. It's decent, if very pedestrian, when it comes to basic facts and
figures, including the trivial scope of bitcoin and its deep unsuitability for
many tasks. It is also a shrewd point that most (all?) of the mainstream
companies which "accept bitcoin" really funnel on-line customers to services
which convert bitcoin to dollars — so they really accept dollars, and
are not (as the jargon goes) exposed to the risk of changes in the
dollar price of bitcoins.
- But I found the book quite dis-satisfying in at least three ways. First,
the writing: Robinson rambles, and is never very good at either telling a
consecutive story or assembling a cumulative argument. Second, it lacks
technical depth; I get the impression that Robinson frankly does not understand
the bitcoin protocol. He doesn't even really try to explain it, which of
course makes it impossible for him to convey why it's technically impressive,
why it might have uses other than for a weird medium of exchange, what makes a
"50%+1" attack on the protocol possible, or even what miners
actually do. (There
are proofs
by example that explaining such things is quite possible.) He also doesn't
explain any of the usual economics of money, like how fractional reserve
banking works and why deflation is dangerous
(topics old as the hills), the
network economies of scale of money
(a bit
younger than
the hills),
or why
states always play such a central role in currencies, relevant though those
topics are. (I'm tempted to say that Halting
State, which is after all just a novel, is actually clearer on the
relevant economics, but that may be unfair.) Third, he really does not write
to convince; it's a mixture of facts and figures, rhetorical incredulity, and
appeals to authority. His dis-inclination to concede anything to "the
Faithful" is such that even when they have a sensible point to make (as, e.g.,
that the valuation of gold is overwhelmingly much more due to a shared,
self-perpetuating social convention than the actual demand for jewelry,
electronics, dental fillings and perpetual coffee filters), he refuses to
concede it or even to present the counter-arguments. In consequence, I very
much doubt he would persuade any reasonably skeptical teenager. (It will also
not be as effective stirring such a teenager's feelings of contempt as getting
them to read Shit r/Bitcoin/
says.) All of which is a shame, because bitcoin has been, at best, an
idiotic waste of time and energy, and a good debunking book would be a
valuable thing, which would have much to say about our time.
- D. J. Finney, Experimental Design and Its Statistical Basis
- Classic Fisherian experimental design theory, adapted to the meanest
biological understanding of 1952. But it's very clear (I'm not ashamed to say
I don't think I fully grasped fractional factorial designs before reading
this), and most of the advice remains relevant, though there are of course
now-quaint passages (*).
- *: "For most experiments, the labor of statistical
analysis is small relative to the total cost.... [T]he choice of a design
should be scarcely influenced by consideration of whether or not the results
will be easy to analyze. The symmetry of a well-designed experiment usual
insures that the analysis is not excessively laborious... The attitude... of an
experimenter who must do his own computing and who has no calculating machine
will naturally differ from that of one in an organization with a well-equipped
computing section. In any field of biology in which extensive numerical
records are obtained, a calculating machine is an investment whose small cost
is rapidly repaid by the saving of time, the increase in accuracy, and the
practicability of computations previously thought prohibitively laborious that
its use makes possible. A machine should be regarded as an indispensable
adjunct to quantitative biological research, and an operator especially skilled
in its use is an obvious economy if the volume of computing is large. This
point is quite distinct from that of employing a statistician, and much
research would benefit from the realization that a more systematic approach to
its computations need not await the appointment of a statistician.
Nevertheless, any biologist who has read this far will realize that he also
needs access to the advice of a statistical specialist if he is to make the
best use of modern ideas in experimental design." (pp. 160--161) It's worth
remarking that, first, I think all of the computation done in the book involves
fewer operations than what my phone does to play a medium-length song, and
second, it'd still be good advice to make sure someone in your lab knows a
little coding, even if you can't hire a statistician.
- Harry
Connolly, The
Way into
Chaos, The
Way into Magic
and The
Way into Darkness
- Mind candy fantasy epic, but of high quality. A basically bronze-age city
has parlayed its control of magic, bestowed by visitors from another dimension
in exchange for art, into a continental empire, complete with things like clean
water, medicine that works, and steel. Then there's an outbreak from the
dungeon dimensions, effectively decapitating the empire, and naturally the
subordinate kings decide it's more important to fight over the scraps than deal
with the outbreakees. Then things get really bad. It's told
engagingly and at a great pace, but also with thought to the consequences of
the speculative premises, and an
almost Cherryhan sense
of politics.
- Queries involving spoilers
(ROT-13'd):
Nz V penml, be qbrf gur ynfg cneg bs gur ynfg obbx onfvpnyyl
fnl gur Terng Jnl vf Lbt-Fbgubgu? Naq qba'g bhe urebrf zngpu hc jvgu
gur gjb uhzna tbqf juvpu gur Terng Jnl qbrfa'g, ng yrnfg ng svefg, erpbtavmr?
- Gordon S. Wood, The American Revolution: A History
- A brief history, emphasizing changes in ideology and attitudes, from
monarchy through republicanism to democracy. The whole of the revolutionary
war is confined to one chapter, which was personally annoying since that was
what I most wanted to learn more about, though I guess defensible. I also
found it a bit frustrating that Wood doesn't consider what seem to me obvious
comparative questions, and just takes certain institutional facts for granted.
The comparative question: why did the 13 colonies which became the United
States follow such a different course than did the British colonies which
became Canada? On the institutional front: Wood claims that agitation against
the British government's new taxes, and its efforts to govern directly more
generally, was organized through local associations of people in the same
trade, or for local improvements, or volunteer public services like
fire-fighting. Where did those groups come from? Had they a history of being
involved in political questions? If not, what tipped them over into opposition
to these measures? Were there no volunteer fire departments in
Newfoundland? Etc.
- Despite my quibbling, I learned a great deal from this, which means I'm in
no position to evaluate its accuracy or scholarly adequacy.
- John Julius Norwich, Absolute Monarchs: A History of the
Papacy
- A very readable history of a rather old-fashioned kings-and-battles kind,
focused on the persons of the Popes and their scheming and fighting with other
European potentates. The papacy, as an institution which has changed
over time, gets much less attention. (*) Admittedly, that would have made for
a much longer, and less popular, book. It over-laps less
with The Bad
Popes than my prejudices would have suggested.
- *: E.g., what did the popes actually do,
day to day, in the early middle ages, or the high Renaissance, or the 19th
century? How was their time divided between ruling Rome (or Avignon),
administering the Church hierarchy (which meant doing what?), religious offices,
and nepotism and intrigue? Just who were those around them who enabled them to
do these things, who made the papacy an effective, or ineffective, power in the
world? Etc.
Thanks to John S. for giving me access to Wood and Norwich.
Books to Read While the Algae Grow in Your Fur;
Scientifiction and Fantastica;
Enigmas of Chance;
Writing for Antiquity;
The Beloved Republic;
The Dismal Science;
Pleasures of Detection, Portraits of Crime;
Minds, Brains, and Neurons
Posted at January 31, 2015 23:59 | permanent link