January 31, 2015

Books to Read While the Algae Grow in Your Fur, January 2015

Attention conservation notice: I have no taste.

Jin Feng and Thomas G. Kurtz, Large Deviations for Stochastic Processes
\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \]
Kurtz is best known for work in the 1970s and 1980s on how sequences of Markov processes converge on a limiting Markov process, especially on how they converge on a limiting deterministic dynamical system. This book is an extension of those ideas, and best appreciated in that context.
As every school-child knows, we ordinarily specify a Markov process by its transition probabilities or transition kernels, say \[ \kappa_t(x,B) = \Pr(X_{t}\in B \mid X_0=x) ~. \] The transition kernels form a semi-group: $\kappa_{t+s}(x,B) = \int{\kappa_t(y,B) \kappa_s(x,dy)}$. For analytical purposes, however, it is more convenient to talk about transition operators, which give us conditional expectations: \[ T_t f(x) = \Expect{f(X_{t})\mid X_0=x} = \int{f(y)\kappa_t(x,dy)} ~. \] (It's less obvious, but if we're given all the transition operators, they fix the kernels.) These, too, form a semi-group: $T_{t+s} f(x) = T_t T_s f(x)$. The generator $A$ of the semi-group $T_t$ is, basically, the time-derivative of the transition operators, the limit \[ A f(x) = \lim_{t\rightarrow 0}{\frac{T_t f(x) - f(x)}{t}} ~. \] so \[ \frac{d}{dt}{T_t f} = A T_t f \] A more precise statement, however, which explains the name "generator", is that \[ T_t f = e^{t A}f = \sum_{m=0}^{\infty}{\frac{(tA)^m f}{m!}} ~. \] Notice that the transition operators and their generator are all linear operators, no matter how nonlinear the state-to-state transitions of the Markov process may be. Also notice that a deterministic dynamical system has a perfectly decent transition operator: writing $g(x,t)$ for the trajectory beginning at $x$ at time $h$, $T_t f(x) = f(g(x,t))$, and \[ A f(x) = {\left.\frac{d T_t f(x)}{dt}\right|}_{t=0} ~. \]
Suppose we have a sequence of Markov processes, $X^{(1)}, X^{(2)}, \ldots$. What Kurtz and others showed is that these converge in distribution to a limiting process $X$ when their semi-groups $T^{(n)}_h$ converges to the limiting semi-group $T_h$. This in turn happens when the generators $A^{(n)}$ converge on the limiting generator $A$. To appreciate why this is natural, remember that a sequence of distributions $P^{(n)}$ converges on a limiting distribution $P$ if and only if $\int{f(x) dP^{(n)}(x)} \rightarrow \int{f(x) dP(x)}$ for all bounded and continuous "test" functions $f$; and $A^{(n)}$ and $A$ generate the semi-groups which give us conditional expectations. (Of course, actually proving a "natural" assertion is what separates real math from mere hopes.) In saying this, naturally, I gloss over lots of qualifications and regularity conditions, but this is the essence of the thing. In particular, such results give conditions under which Markov processes converge on a deterministic dynamical system, such as an ordinary differential equation. Essentially, the limiting generator $A$ should be the differential operator which'd go along with the ODE. These results are laws of large numbers for sequences of Markov processes, showing how they approach a deterministic limit as the fluctuations shrink.
Large deviations theory, as I've said elsewhere, tries to make laws of large numbers quantitative. The laws say that fluctuations around the deterministic limit decay to zero; large deviations theory gives an asymptotic bound on these fluctuations. Roughly speaking, a sequence of random variables or processes $X^{(n)}$ obeys the large deviations principle when \[ n^{-1} \log{\Prob{X^{(n)} \in B}} = -\inf_{x \in B}{I(x)} \] for some well-behaved "rate function" $I$. (Again, I gloss over some subtleties about the distinctions between open and closed sets.) The subject of this book is, depending on your point of view, either strengthening Kurtz's previous work on convergence of Markov processes to large deviations, or extending the large deviations theory of stochastic processes, as the title says.
In dealing with large deviations, it's very common to have to deal with cumulant generating functions. The reason is a basic approximation which goes back to Harald Cramér in the 1930s. Start with the Markov inequality: for a non-negative random variable $X$, $\Prob{X \geq a} \leq \Expect{X}/a$. Since $e^{nx}$ is an increasing function, when $t > 0$, and non-negative, \[ \Prob{X\geq a} = \Prob{e^{tX} \geq e^{ta}} \leq e^{-ta} \Expect{e^{tX}} ~ . \] Now take the log: \[ \log{\Prob{X\geq a}} \leq -ta + \log{\Expect{e^{tX}}} \] Since this is true for all $t$, \[ \begin{eqnarray*} \log{\Prob{X\geq a}} & \leq & \inf_{t}{-ta + \log{\Expect{e^{tX}}}}\\ & =& -\sup_{t}{ta - \log{\Expect{e^{tX}}}} \end{eqnarray*} \] $\log{\Expect{e^{tX}}}$ is precisely the cumulant generating function of $X$, and $\sup_{t}{ta - \log{\Expect{e^{tX}}}}$ is its Legendre transform.

Now imagine a sequence of variables $X_1, X_2, \ldots$, where $X_n = n^{-1}\sum_{i=1}^{n}{Y_i}$, with the $Y_i$ being (for simplicity) IID. Then we have a very parallel calculation which gives an exponentially shrinking probability: \[ \begin{eqnarray*} \Prob{X_n \geq a} & = & \Prob{\sum_{i=1}^{n}{Y_i} \geq na}\\ \Prob{X_n \geq a} & \leq & e^{-nta}{\left(\Expect{e^{tY_1}}\right)}^n\\ n^{-1}\log{\Prob{X_n \geq a}} &\leq & -\sup_{t}{ta - \log{\Expect{e^{tY_1}}}} \end{eqnarray*} \] Of course, there is still the matter of getting the matching lower bound, which I won't go into here, but is attainable.

More generally, beyond just looking at the mean, one defines $\Lambda(f) = \lim_{n\rightarrow\infty}{n^{-1} \log{\Expect{e^{nf(X_n)}}}}$, and the large deviations rate function $I(x)$ is generally $\sup_{f}{f(x) - \Lambda(f)}$, taking the supremum over all bounded, continuous functions.
Accordingly, for each process $X_n$ we define operators which give us conditional cumulant functions: \[ V_n(t)f(x) = n^{-1} \log{\Expect{e^{nf(X_n(t))} \mid X_n(0)=x}} \] For fixed $n$ and $t$, $V_n(t)$ is a nonlinear operator (that is, nonlinear in $f$), but they still form a semi-group, $V_n(t+s) = V_n(t) V_n(s)$. \[ \begin{eqnarray*} V_n(t) V_n(s) f(x) &= & n^{-1}\log{\Expect{e^{n n^{-1}\log{\Expect{e^{nf(X_n(s))}\mid X_n(0)=X_n(t)}}}\mid X_n(0)=x}}\\ & = & n^{-1}\log{\Expect{\Expect{e^{nf(X_n(s))}\mid X_n(0)=X_n(t)}\mid X_n(0)=x}}\\ & = & n^{-1}\log{\Expect{e^{nf(X_n(s+t))} \mid X_n(0)=x}} = V_n(s+t) f(x) \end{eqnarray*} \]
When we have a sequence of processes, they give rise to a sequence of $V_n$ semi-groups. When that sequence of semi-groups converges to some limiting semi-group $V(t)$, the sequence of processes obeys a large deviations principle, and the rate function follows a somewhat complicated expression involving the limiting operators $V(t)$. Since we're talking about a probability measure over stochastic processes, the basic "points" are now trajectories, functions of time $x(t)$, and the rate function is a rather complicated expression that basically involves taking a path integral of the conditional cumulant generating functions and so $V(t)$. Matters become drastically simplified if we introduce what is, at least formally, the generator of the $V_n$ group, \[ \frac{d}{dt} V_n(t) f = H_n V_n(t) f \] or more explicitly \[ H_n f = n^{-1} e^{-nf} A_n e^{nf} \]
Convergence of the $H_n$ operators to a limiting operator $H$ is now the key part of showing large deviations for the processes $H_n$, and the rate function can be written in terms of the limiting operator $H$.
$H f(x)$ can typically be written as a function of $x$, $f(x)$ and $\nabla f(x)$ (or perhaps also higher derivatives of $f$). Similarly, the generator of the transitions, $A f(x)$ can often be written in terms of $x$, $f(x)$ and the derivatives of $f$. Bundle everything other than $x$ up into a variable $u$; then we can often write \[ H f(x) = \sup_{u} {A(x,u) - L(x,u) } \] for some operator $L$. The reason for going through all these gyrations is that then the large-deviations rate function takes a wonderfully simple form: \[ I(x) = \inf_{u \in \mathcal{J}(x)}\int_{t=0}^{t=\infty}{L(x(t),u(t)) dt} \] where $\mathcal{J}(x)$ consists of all functions $u(t)$ such that, for all reasonable $f$, \[ f(x(t)) - f(x(0)) = \int_{s=0}^{t}{Af(x(s), u(s)) ds} \]
One can think of the limiting generator $A$ as saying what the derivative of the trajectory ought, in some sense, to be, and $L$ measuring how big a perturbation or disturbance has to be applied to drive the actual trajectory away from this easiest and most probable path. The rate function $I(x)$ then indicated the least amount of control that has to be applied to drive the process along the desired trajectory $x(t)$. ("Improbable events happen in the most probable possible way.") In fact, one can often simplify down to \[ I(x) = \int_{t=0}^{t=\infty}{L(x(t), \dot{x}(t)) dt} \]
Much of what I've just sketched were already established results in the literature. What's new in this book is two big things: a powerful set of methods for verifying that the conditional cumulant operators $H_n$ converge on a limiting $H$, and another set of tools for writing $H$ in variational form. The former is the machinery of "viscosity solutions", which are weaker than the ordinary notion of solutions of differential equations. The latter brings in the machinery of control theory, or indeed of classical mechanics — as the notation suggests, $H$ is (like) a Hamiltonian, and $L$ (like) the corresponding Lagrangian. (The large deviations principle is thus a principle of least action, or perhaps vice versa, as suggested some time ago by, e.g., Eyink.)
The power of the Feng-Kurtz approach is considerable. Basically the whole of the classic Friedlin-Wentzell theory of ODEs perturbed by small-amplitude Gaussian noise falls out as an easy corollary. Moreover, the Feng-Kurtz theory extends to much more difficult cases, outside the reach of the Friedlin-Wentzell apprach, such as infinite-dimensional interacting particle systems. While it is not for the mathematically inadept, it's extremely rewarding, and, I might add, probably about as well-written as such a book could possibly be. Chapter 1 describes the main lines of the approach, gives an extensive catalog of examples with explicitly heuristic reasoning, and then describes the difficulties in the way of turning the heuristics into actual theorems. Chapter 2 is a sketch of the main definitions and results that will be taken in the rest of the book. Chapters 3--4 review general concepts of process-level large deviations, specialized in chapter 5 to Markov processes and convergence of conditional cumulants ("classical semigroup approach"). Chapters 6 and 7 introduce viscosity solutions and their associated machinery. Chapter 8, in many ways the heart of the book, relates the limiting conditional cumulant semi-groups to control problems, and derives variational formulae for rate functions. The remaining chapters are devoted to giving rigorous versions of the examples from chapter 1.
Obviously, from the space I've devoted to this, I think this work is incredibly cool. I recommend it to anyone with the necessary math, and a serious interest in large deviations, in Markov processes, or the emergence of macroscopic regularities from microscopic interactions.
Charles Stross, Halting State
Mind candy science fiction, at the intersection of near-future police procedural, for-profit central banking, and online role-playing games. It gained quite a bit as an audiobook through being read by a Scots actor, one Robert Ian MacKenzie (though I had to re-listen to some bits to figure out what was being said). — Sequel.
Stephen José Hanson and Martin Bunzl (eds.), Foundational Issues in Human Brain Mapping
Like all paper collections, mixed. There is actually a certain amount of dialogue here, which is unusual and speaks highly of the organizers, and the papers are mostly (*) of high quality. The first three chapters are an exchange about the use of "functional localizers" (Friston et al. con, Saxe et al. pro); chapters 6--8 another on the related topic of "non-independence" (Vul and Kanwisher are against it, others try to salvage something); many of them (e.g., Poldrack's ch. 13) deal with the troublesome idea of "pure insertion", that it makes sense to just add one psychological process in to a task or setting, leaving the others undisturbed; many of them (among others, ch. 13 again, and chs. 17--19) revolve around the question of how much of psychology we need to get right, or can get right, before fMRI results become anything more than blinkenlights. Of the non-controversial chapters, I would highlight Haxby on multivariate methods, Poline et al. on inter-subject variability, and Hanson and Glymour on causal inference (though something has gone horribly wrong with the figures and captions in that last chapter).
Disclaimer: While I don't think I've met either editor, I've met at least two of the contributors at conferences, corresponded with and met another, and owe considerable intellectual and professional debts to a fourth. Writing a really negative review of this collection might've been awkward for me, but it would've been easy to say nothing at all.
*: I can't help pointing out the conspicuous exception of chapter 12, which is so transparently a recycled grant proposal which that it still refers to itself as "this proposal" [e.g., pp. 136 and 142.].
Jeffrey Robinson, BitCon: The Naked Truth about BitCoin
I picked this up on the recommendation of Izabella Kaminska. It's decent, if very pedestrian, when it comes to basic facts and figures, including the trivial scope of bitcoin and its deep unsuitability for many tasks. It is also a shrewd point that most (all?) of the mainstream companies which "accept bitcoin" really funnel on-line customers to services which convert bitcoin to dollars — so they really accept dollars, and are not (as the jargon goes) exposed to the risk of changes in the dollar price of bitcoins.
But I found the book quite dis-satisfying in at least three ways. First, the writing: Robinson rambles, and is never very good at either telling a consecutive story or assembling a cumulative argument. Second, it lacks technical depth; I get the impression that Robinson frankly does not understand the bitcoin protocol. He doesn't even really try to explain it, which of course makes it impossible for him to convey why it's technically impressive, why it might have uses other than for a weird medium of exchange, what makes a "50%+1" attack on the protocol possible, or even what miners actually do. (There are proofs by example that explaining such things is quite possible.) He also doesn't explain any of the usual economics of money, like how fractional reserve banking works and why deflation is dangerous (topics old as the hills), the network economies of scale of money (a bit younger than the hills), or why states always play such a central role in currencies, relevant though those topics are. (I'm tempted to say that Halting State, which is after all just a novel, is actually clearer on the relevant economics, but that may be unfair.) Third, he really does not write to convince; it's a mixture of facts and figures, rhetorical incredulity, and appeals to authority. His dis-inclination to concede anything to "the Faithful" is such that even when they have a sensible point to make (as, e.g., that the valuation of gold is overwhelmingly much more due to a shared, self-perpetuating social convention than the actual demand for jewelry, electronics, dental fillings and perpetual coffee filters), he refuses to concede it or even to present the counter-arguments. In consequence, I very much doubt he would persuade any reasonably skeptical teenager. (It will also not be as effective stirring such a teenager's feelings of contempt as getting them to read Shit r/Bitcoin/ says.) All of which is a shame, because bitcoin has been, at best, an idiotic waste of time and energy, and a good debunking book would be a valuable thing, which would have much to say about our time.
D. J. Finney, Experimental Design and Its Statistical Basis
Classic Fisherian experimental design theory, adapted to the meanest biological understanding of 1952. But it's very clear (I'm not ashamed to say I don't think I fully grasped fractional factorial designs before reading this), and most of the advice remains relevant, though there are of course now-quaint passages (*).
*: "For most experiments, the labor of statistical analysis is small relative to the total cost.... [T]he choice of a design should be scarcely influenced by consideration of whether or not the results will be easy to analyze. The symmetry of a well-designed experiment usual insures that the analysis is not excessively laborious... The attitude... of an experimenter who must do his own computing and who has no calculating machine will naturally differ from that of one in an organization with a well-equipped computing section. In any field of biology in which extensive numerical records are obtained, a calculating machine is an investment whose small cost is rapidly repaid by the saving of time, the increase in accuracy, and the practicability of computations previously thought prohibitively laborious that its use makes possible. A machine should be regarded as an indispensable adjunct to quantitative biological research, and an operator especially skilled in its use is an obvious economy if the volume of computing is large. This point is quite distinct from that of employing a statistician, and much research would benefit from the realization that a more systematic approach to its computations need not await the appointment of a statistician. Nevertheless, any biologist who has read this far will realize that he also needs access to the advice of a statistical specialist if he is to make the best use of modern ideas in experimental design." (pp. 160--161) It's worth remarking that, first, I think all of the computation done in the book involves fewer operations than what my phone does to play a medium-length song, and second, it'd still be good advice to make sure someone in your lab knows a little coding, even if you can't hire a statistician.
Harry Connolly, The Way into Chaos, The Way into Magic and The Way into Darkness
Mind candy fantasy epic, but of high quality. A basically bronze-age city has parlayed its control of magic, bestowed by visitors from another dimension in exchange for art, into a continental empire, complete with things like clean water, medicine that works, and steel. Then there's an outbreak from the dungeon dimensions, effectively decapitating the empire, and naturally the subordinate kings decide it's more important to fight over the scraps than deal with the outbreakees. Then things get really bad. It's told engagingly and at a great pace, but also with thought to the consequences of the speculative premises, and an almost Cherryhan sense of politics.
Queries involving spoilers (ROT-13'd): Nz V penml, be qbrf gur ynfg cneg bs gur ynfg obbx onfvpnyyl fnl gur Terng Jnl vf Lbt-Fbgubgu? Naq qba'g bhe urebrf zngpu hc jvgu gur gjb uhzna tbqf juvpu gur Terng Jnl qbrfa'g, ng yrnfg ng svefg, erpbtavmr?
Gordon S. Wood, The American Revolution: A History
A brief history, emphasizing changes in ideology and attitudes, from monarchy through republicanism to democracy. The whole of the revolutionary war is confined to one chapter, which was personally annoying since that was what I most wanted to learn more about, though I guess defensible. I also found it a bit frustrating that Wood doesn't consider what seem to me obvious comparative questions, and just takes certain institutional facts for granted. The comparative question: why did the 13 colonies which became the United States follow such a different course than did the British colonies which became Canada? On the institutional front: Wood claims that agitation against the British government's new taxes, and its efforts to govern directly more generally, was organized through local associations of people in the same trade, or for local improvements, or volunteer public services like fire-fighting. Where did those groups come from? Had they a history of being involved in political questions? If not, what tipped them over into opposition to these measures? Were there no volunteer fire departments in Newfoundland? Etc.
Despite my quibbling, I learned a great deal from this, which means I'm in no position to evaluate its accuracy or scholarly adequacy.
John Julius Norwich, Absolute Monarchs: A History of the Papacy
A very readable history of a rather old-fashioned kings-and-battles kind, focused on the persons of the Popes and their scheming and fighting with other European potentates. The papacy, as an institution which has changed over time, gets much less attention. (*) Admittedly, that would have made for a much longer, and less popular, book. It over-laps less with The Bad Popes than my prejudices would have suggested.
*: E.g., what did the popes actually do, day to day, in the early middle ages, or the high Renaissance, or the 19th century? How was their time divided between ruling Rome (or Avignon), administering the Church hierarchy (which meant doing what?), religious offices, and nepotism and intrigue? Just who were those around them who enabled them to do these things, who made the papacy an effective, or ineffective, power in the world? Etc.
Thanks to John S. for giving me access to Wood and Norwich.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; Writing for Antiquity; The Beloved Republic; The Dismal Science; Pleasures of Detection, Portraits of Crime; Minds, Brains, and Neurons

Posted at January 31, 2015 23:59 | permanent link

Three-Toed Sloth