Books to Read While the Algae Grow in Your Fur, April 2009
- Jérôme Dedecker, Paul Doukhan, Gabriel Lang, José
Rafael León R., Sana Louhichi and Clémentine
Prieur, Weak
Dependence: With Examples and Applications
- As every
school-child knows, a sequence of probability measures \( D_n \) converges
"weakly" or "in distribution" on a limit \( D \) if \( D_n f \equiv \int{f(x) d
D_n(x) }\) converges on \( Df \) for every bounded and continuous test function
\( f \). In particular, if \( D_n \) is the joint distribution of two random
variables from a time series separated by a time-lag \( n \), with marginal
distributions \( P \) and \( Q \) for the past and the future, then the series
is asymptotically independent if \( D_n \) converges in distribution on the
product measure \( P \otimes Q \).
- (This is "weak" convergence because the \( D_n \) can look
very different from the limiting \( D \). A classic example: \( D_n
\) puts probability 1/n on the points \( \frac{1}{n}, \frac{2}{n},
\ldots \frac{n}{n} =1 \). This converges weakly to the uniform distribution on
the unit interval [0,1], despite apparent obstacles like the latter giving
probability 1 to the irrational numbers, while all the \( D_n \) give them
probability 0.)
- A large literature has grown up since the 1950s which concerns itself with
properties of asymptotically-independent stochastic processes, usually
measuring the approach to independence by means of various "mixing
coefficients", along the lines introduced
by Rosenblatt. As the
name implies, if these coefficients go to zero asymptotically, then the process
is strongly mixing. Having these mixing coefficients go to zero is very nice,
since it implies that many of the nice properties of IID sequences (central
limit theorems, PAC-learning
results, etc.), carry over to the dependent-but-mixing sequences.
Unfortunately, strong mixing is a very strong property, which is hard to prove
and in many cases is known to fail --- even when there is asymptotic
independence!
- This book, which summarizes and extends a series of recent papers by the
authors (in various combinations) is about proving limit theorems --- laws
of large numbers, central limit
theorems, empirical-process
convergence, etc. --- for asymptotically-independent but non-mixing
processes, which the authors call "weakly dependent". Their theorems in
chs. 6--10 can, in large part, be seen as instances of
the trick that I learned
as "blocking", and which they attribute
to S. N. Bernstein.
To calculate the time average of some function of a (stationary) process, for
instance, divide the series up into a series of long blocks, plus padding or
filler in between them. The global average is, trivially, equal to the average
of the within-block averages, plus a remainder term for contributions from the
filler. By stationarity, the blocks all have the same distribution, and by
weak dependence the blocks are nearly independent --- more nearly with
longer fillers. So one can show that the global average will have the same
behavior as it would in the IID case, if one can control (1) the corrections
due to the remaining dependence among the blocks, and (2) the remainder due to
the fillers between the blocks. Controlling (2) is fairly straightforward; the
new stuff here comes from controlling (1), the residual dependence.
- Following lines laid down in mixing theory, the authors tackle (1) by
constructing deviation inequalities, bounds on the probability that sample
values differ from expectations by more than a certain amount. Typically,
these inequalities (chs. 4 and 5) have a more-or-less combinatorial flavor,
though they also use coupling, and a lot of manipulation
of cumulants (without
making the connection to large
deviations theory via cumulant generating functions). --- As usual
with such inequalities, it's rarely clear whether the various factors of 33 (or
whatever) are for real, or just the best one can do with the current method of
proof; but they suffice for many purposes, and I suppose it's better to have
the constants be explicit than to bury them inside \( O() \) or \( o() \).
- In mixing theory, the analogous bounds are stated in terms of the mixing
coefficients. Here the new bounds are stated in terms of a range of new
dependence measures (ch. 2), all denoted by utterly un-mnenomic Greek letters
(again following the mixing-theory convention). Some of them play directly off
the definition of convergence in distribution in terms of expectations of test
functions: how big can the difference between \( D_n f \) and \( \left(P
\otimes Q\right) f\) get, when \( f \) is restricted to some suitably
well-behaved, normalized class of test functions? The other set of measures
are "projective". The conditional expectation of a function \( f \), given \(
X_1, X_2, \) etc., is the projection of \( f \) on to the space of functions
calculable from \( X_1, X_2\), etc.; the unconditional expectation of \( f \) is
the projection of \( f \) on to the space of constant functions, which are
calculable from nothing. (By "calculable from" I mean "measurable in the
minimal sigma-algebra of".) If \( f \) is independent of \( X \), then its
conditional expectation is still its projection on to the constants. The other
class of dependence coefficients, therefore, looks at how far conditional
expectations depart from constancy --- either maximizing over some family
of test-functions, or by making future values of the time series itself the
function in question.
- Applications to non-parametric regression, spectral-density estimation, and
various econometric problems occupy chs. 11--13, and ch. 3 shows that many
standard and some non-standard time series models are weakly dependent, with
estimates of the dependence coefficients.
- --- The book shows definite signs of its patch-up job origins. The
prose is disjointed and slightly repetitive. The notation, too, is not always
consistent; sometimes the generic dependence measure is c, sometimes
it's \mu, and it can switch back and forth from one line to the next. A bigger
annoyance is terminology. What the authors (and, it must be said, many
probabilists) call "mixing" is more properly strong mixing, i.e.,
convergence of joint distributions to product measure in the strong, not the
weak, topology. (If the Adversary can pick any pair of events, and
you can show that that pair become independent as the separation
between them grows, you have proved ordinary mixing. If the Adversary gets to
pick a different pair of events with each separation, and you can
still show asymptotic independence, that's strong mixing.) In dynamical
systems and ergodic theory, "mixing" unmodified refers to the weaker notion of
mixing. (See for instance the detailed treatment
in Lasota
and Mackey.) Thus for someone of my background, the Bernoulli shift is one
of the canonical mixing processes, but they give it as their first example of a
non-mixing process! (Their proof, incidentally, is wrong, but easily fixed.)
In fact, I'd say that the failure to connect what they have done to the work in
dynamics on decay of correlations
(e.g.) is a real lost
opportunity for both fields --- optimistically, a direction for future
research.
- In the end, however, these are all minor complaints. The authors have
brought together a great deal of original and important work, as a result of
which the classical probabilistic limit theorems can now be rigorously applied
to a much wider range of stochastic models. It's cool stuff and I would be
very happy to teach it. I strongly recommend the book to statisticians
interested in inference for stochastic processes, learning theorists interested
in dependent data, probabilists interested in new applications, and theoretical
econometricians. §
- Warren Ellis and Paul Duffield, , Freakangels, vol. 2
- Jack Campbell, Relentless
- Mind candy, in the process of moving from the Anabasis to
De
Bello Civili. (Which does not mean it's a historical pastiche.) §
- Earlier morsels: 1--3, 4.
- David Freidel, Linda Schele and Joy Parker, Maya Cosmos: Three
Thousand Years on the Shaman's Path
- A detailed and intelligent, if very conjectural, exploration of ancient
Maya cosmology, based on deciphering surviving inscriptions, artifacts, and
extrapolation from the modern Maya. (The presentation of the conjectures is
remarkably confident.) Marred by irritating and silly stabs at
cultural relativism (mostly at the beginning and the end). §
- Peter D. Harrison, The Lords of
Tikal: Rulers of an Ancient Maya City
- History of the greatest Maya city from the viewpoint of its rulers (the
only view we really have any access to), from the beginning to the end. Clear
on the difference between what we actually know and what the Mayanists are
merely guessing at. Very good on architecture. §
- Bruce Kitchens and Selim Tuncel, Finitary Measures for Subshifts of
Finite Type and Sofic Systems
- As you know, Bob, a shift dynamical system has a state-space consisting of
infinite sequences (one or two sided), say \( x \) drawn from a
finite alphabet. The time-evolution operator \( T \),
or shift, simply moves the sequence one step to the right, i.e., \(
{Tx}_n = x_{n+1} \). This moves all the complexity and distinctions between
different systems into the set of initial conditions, and space of allowed
sequences, rather than in the time-evolution rule. The full shift has all
possible sequences among its initial conditions (and so the space never
shrinks); sub-shifts restrict the set of initial conditions. Suppose that the
allowed values of \( x_n \) depend on \( x_{n-1}, x_{n-2}, \ldots x_{n-k} \),
but not on earlier entries in \( x \), and that this \( k \) is the same for
all \( n \) and all sequences. Then we have a subshift of finite
type, the type or order of the subshift being \( k \). These are the
symbolic-dynamical equivalents of Markov chains of order \( k \). As with
Markov chains, any higher-order subshift of finite type can be converted to a
first-order subshift. (Define a new alphabet where each symbol is a block of
length \( k \) from the original alphabet.) A sofic system can be
defined in one of two equivalent ways. One introduces the notion of
a follower set, the set of all one-sided infinite sequences which can
succeed a given history; a subshift is sofic if it has only finitely many
follower sets. Alternately, introduce the notion of factor maps
--- continuous functions from one sequence space to another which commute
with the shift. A sofic system is the image of a subshift of finite type under
a factor map. A strictly sofic system is one which is sofic, but not
a subshift of finite type.
- Said another way, for any sofic system there is a set of states, possibly
hidden, where the current state determines what symbols can be seen next, and
the current state plus the next symbol determines the next state. Sofic
systems are (the languages generated by) deterministic finite automata.
- All of this is at the purely topological, possible-or-not level. One can
of course also add probability measures on sequences. A Markov measure is one
under which the distribution of \( X_n \) depends on \( X_{n-1} \), but not,
given that, on previous symbols. (Similarly for higher-order chains.) In this
monograph, Kitchens and Tuncel ask, what measures are to Markov measures as
sofic systems are to subshifts of finite type? They call these
measures finitary, and characterize them in several ways, including
(1) measures where the number of follower distributions is finite, (2)
factor-map images of Markov measures, (3) ones where the conditional
distribution of the future is always independent of the remote past given
a finite segment of the immediate past, though one of possibly
variable, context-dependent length. (The last is what motivates the name
"finitary".) They develop the theory of finitary measures in considerable
detail, and in parallel to the usual theory of Markov measures, as that is
found in ergodic theory and the thermodynamic formalism. Statistical aspects
(the reconstruction of finitary measures from sample data) are not addressed
--- fortunately for me. Prior acquaintance
with symbolic dynamics and
the thermodynamic formalism is absolutely essential, and a lot of familiarity
with manipulating semi-groups would help. §
- (Your best bet for obtaining this
is directly
from the publisher. However, the community would be better served by
putting it on the arxiv.)
- Misty Massey, Mad Kestrel
- Mind candy fantasy. More piracy and less romance than blurb implies. §
- John
McGowan, Democracy's
Children: Intellectuals and the Rise of Cultural Politics
- Nowhere near as cohesive as the subtitle suggests. Rather it's a set of
essays on the conditions, activities and aspirations of academic in the
humanities (especially literature as such) in the US from the mid-1980s through
the mid-1990s, along with some thoughts about how they ought to try
relating to the larger society they're part of. (I am quite sympathetic to the
latter.) I wish McGowan would write a book about "intellectuals and
the rise of cultural politics"; I think it would be very interesting.
- (His mid-1990s remarks on the Web are more amusing in retrospect than they
would have been at the time, since he's gone from sniffing at it to
guest-blogging
for Bérubé. To be
fair, the idea that English departments would need specialists in hypertext
does seem rather dated --- because we have all become specialists
in hypertext.) §
- Update, September 2023: Open Access from the publisher and the National Endowment for the Humanities.
- C. J. Sansom, Revelation
- Michelle Sagara, Cast in Fury
- Previously. Sequels.
Books to Read While the
Algae Grow in Your Fur;
Scientifiction and Fantastica;
Writing for Antiquity;
Enigmas of Chance;
Mathematics;
The Progressive Forces;
The Commonwealth of Letters
Posted at April 30, 2009 23:59 | permanent link