←August→
| Sun |
Mon |
Tue |
Wed |
Thu |
Fri |
Sat |
| |
|
|
|
1 |
2 |
3 |
| 4 |
5 |
6 |
7 |
8 |
9 |
10 |
| 11 |
12 |
13 |
14 |
15 |
16 |
17 |
| 18 |
19 |
20 |
21 |
22 |
23 |
24 |
| 25 |
26 |
27 |
28 |
29 |
30 |
31 |
Archives
Categories
Self-Centered
Books to Read While the Algae Grow in Your Fur
Books (etc.) I've read this month and
feel I can recommend (warning: I have no taste)
- Palani Mohan, Hunting with Eagles: In the Realm of the Mongolian Kazakhs
- Beautiful black-and-white photographs of, as it says, Mongolian Kazakhs hunting with eagles, and their landscape. Many of them are just stunningly composed.
Upcoming Talks
Upcoming Talks
- Statistics Department, Wharton School, University of Pennsylvania, 13--17 and 20--22 March 2017
- A short course on "Nonparametric tools for statistical network modeling",
based on 36-781.
- Santa Fe Institute, Complex Systems Summer School, 20--21 June 2017
- Exact dates tentative.
|
August 14, 2019
Course Announcement: Data Mining (36-462/662), Fall 2019
For the first time in ten years, I find myself
teaching data mining in the fall. This means I need to figure out what data
mining is in 2019. Naturally,
my first stab at a
syllabus is based on what I thought data mining was in 2009. Perhaps it's
changed too little; nonetheless, I'm feeling OK with it at the moment*. I am sure the thoughtful
and constructive suggestions of the Internet will only reinforce this
satisfaction.
--- Seriously, suggestions are welcome, except for suggesting that I teach
about neural networks, which I deliberately omitted because I am an
out-of-date stick-in-the-mud reasons**.
*: Though I am not done
selecting readings from the textbook, the recommended books, and sundry
articles --- those will however come before the respective classes. I have
been teaching long enough to realize that most students, particularly in a
class like this, will read just enough of the most emphatically required
material to think they know how to do the assignments, but there are
exceptions, and anecdotally even some of thoe majority come back to the
material later, and benefit from pointers. ^
**: On the one hand, CMU (now)
has plenty of well-attended classes on neural networks and deep learning, so
what would one more add? On the other, my admittedly cranky opinion is
that we have no
idea why the new crop works better than the 1990s version, and it's
not always clear that
they do work better than good old-fashioned machine learning, so
there.
Corrupting the Young;
Enigmas of Chance
Posted at August 14, 2019 17:17 | permanent link
August 06, 2019
Notes on "Intriguing Properties of Neural Networks", and two other papers (2014)
\[ \DeclareMathOperator*{\argmax}{argmax} \]
Attention conservation notice: Slides full of bullet points are never good reading; why would you force yourself to read painfully obsolete slides (including even more painfully dated jokes) about a rapidly moving subject?
These are basically the slides I presented at CMU's Statistical Machine Learning Reading Group on 13 November 2014, on the first paper on what have come to be called "adversarial examples". It includes some notes I made after the group meeting on the Q-and-A, but I may not have properly credited (or understood) everyone's contributions even at the time. It also includes some even rougher notes about two relevant papers that came out the next month. Presented now because I'm procrastinating preparing for my fall class in the interest of the historical record.
Background
- Nostalgia for the early 1990s: G. Hinton and company are poised to take over the world, NIPS is mad for neural networks, Clinton is running for President...
- Learning about neural networks for the first time in cog. sci. 1
- Apocrypha: a neural network supposed to distinguish tanks from trucks in aerial photographs actually learned about parking lots...
- The models
- Multilayer perceptron \[ \phi(x) = (\phi_K \circ \phi_{K-1} ... \circ \phi_1)(x) \]
- This paper not concerned with training protocol, just following what others have done
- The applications
- MNIST digit-recognition
- ImageNet
- \(10^7\) images from YouTube
- So we've got autoencoders, we've got convolutinal networks, we've got your favorite architecture and way of training it
Where Are the Semantics?


- My assessment of the critique:
- Weakness of both claim and critique: people are very good at finding semantic features that link random objects (e.g., Zhu, Rogers and Gibson, "Human Rademacher Complexity", NIPS 2009 where random word lists come up with semantics like "related to motel service")
- How much semantics would people be able to read into a random collection of training \(x\)'s of equal size to \(\mathcal{X}_i\) or \(\mathcal{X}_v\)?
- Implications of the critique: This is if anything a vindication for good old fashioned parallel distributed processing (as in the 1990s...)
- Doesn't matter as engineering...
- Also doesn't matter as a caricature of animal nervous systems: just as there are no grandmother cells in the brain, there is no "white flower" cell in the network, that's actually distributed across the network
- Suggestions from the audience:
- Ryan Tibshirani: maybe the basis vectors are more "prototypical" than the random vectors? Referenced papers by Nina Baclan and by Robert Tibs. on prototype clustering
- Yu-Xing Wang: What about images corresponding to weights of hidden-layer neurons in convolutional networks? Me: I need to see what's going on in that paper before I can comment...
The Learned Classifier Isn't Perceptually Continuous
- Claim: generalization based on semantics, or at least on features not local in the input space
- "Adversarial examples"
- Find the smallest perturbation \(r\) we can apply to a given \(x\) to drive it to the desired class \(l\), i.e., smallest \(r\) s.t. \(f(x+r) = l\)
- Robustness: use the same perturbation on a different network (different hyperparameters, different training set) and see whether \(f^{\prime}(x+r) = l\) as well
- Some adversarial images
- "All images in the right column are predicted to be an"ostrich, Struthio camelus"
- "The examples are strictly randomly chosen. There is not any postselection involved."

- Some more adversarial images: car vs. not-car

- Some adversarial digits
- RMS distortion \(0.063\) on a 0-1 scale, accuracy 0

- Some non-adversarial digits
- Gaussian white noise, RMS distortion \(0.1\), accuracy 51% (!)
- \(\therefore\) it's not just the magnitude of the noise, it's something about the structure

- Assessing the adversarial examples
- Obviously all adversial examples are mis-classified on the \(f\) they're built for
- But much higher mis-classification rates on other networks as well
- Much higher mis-classification rates than much bigger Gaussian noise
- To be fair, what would adversarial examples look like for any other classifier? For any other classifier using the kernel trick? Someone should try this for a support vector machine
- Suggestions from the audience:
- Ryan: stability somehow? Changing one data point (say one dog for the adversarial dog) must be really changing the function; me: but isn't stability about the stability of the risk (in R^1), not the stability of the classifier (as a point in function space)? Risk averages over the measure again, smoother than the function as such. Martin Azizyan: so even if you always get wrong a dense subset, it wouldn't affect your risk. Me: right, though good luck proving that there is a countable dense subset of adversarial dogs. Ryan: senses of stability ought to be related somehow? Me, after: maybe, see London et al.
- David Choi: might this be some sort of generic phenomenon in very high dimensions? Since most of the volume of a closed body is very close to its surface, unless the geometric margin is very large, most points will be very close in the input space to crossing some decision boundary
How Can This Be?
- The networks aren't over-fitting in any obvious sense
- CV shows they generalize to new instances from the same training pool
- The paper suggests looking at "instability", i.e., the Lipschitz constant of the \(\phi\) mapping
- They can only upper-bound this
- And frankly it's not very persuasive as an answer (Lipschitz constant is a global property not a local one)
- Speculative thought 1: what it is like to be an autoencoder?
- Adversarial examples are perceptually indistinguishable for humans but not for the networks
- \(\therefore\) human perceptual feature space is very different from the network's feature space
- What do adversarial examples look like for humans? (Possible psych. experiment with Mechanical Turk)
- Speculative thoguht 2: "be careful what you wish for"
- Good generalization to the distribution generating instances
- This is \(n^{-1}\sum_{i=1}^{n}{\delta(x-x_i)}\), and \(n = 10^7 \ll\) dimension of the input space
- Big gaps around every point in the support
- \(\therefore\) training to generalize to this distribution doesn't care about small-scale continuity...
- IOW, the network is doing exactly what it's designed to do
- Do we need lots more than \(10^7\) images? How many does a baby see by the time it's a year old, anyway?
- What if we added absolutely-continuous noise to every image every time it was used? (Not enough, they use Gaussians)
- They do better when they fed in adversial examples into the training process (natch), but not clear whether that's not just shifting around the adversarial examples...
- Szegedy et al. looked for minimally-small perturbations which changed the predicted class
- Nguyen et al. look for the largest possible perturbations which leave the class alone: "it is easy to produce images that are completely unrecognizable to humans, but that state-of-the-art DNNs believe to be recognizable objects with 99.99% confidence (e.g. labeling with certainty that white noise static is a lion)."
- Used evolutionary algorithms to find these examples, either "directly encoded" (genome = the pixels) or "indirectly encoded" (genome = a picture-producing network, with more structure in the image).
- Also used gradient ascent

"Evolved images that are unrecognizable to humans, but that state-of-the-art DNNs trained on ImageNet believe with \(\geq 99.6\)% certainty to be a familiar object. ... Images are either directly (top) or indirectly (bottom) encoded."
As you can tell from that figure, the evolved images look nothing at all like what human beings would think of as examples of those classes.
Results for MNIST are similar: in each plot, we see images classified (with \(\geq 99.9\)% confidence) as the digits given by the columns; the rows are 5 different evolutionary runs, with either direct (white-noise-y panel) or indirect (abstract-art-y panel) encodings.

- Paper I says that imperceptible-to-humans perturbations make huge differences to neural networks; paper II says that huge-to-human perturbations can leave neural networks utterly indifferent. And this is, N.B., on networks that do well under cross-validation on the data.
- Conclusion: Clearly human iso-classification contours are nothing at all like DNN iso-classification contours.
- It's entirely possible that any method which works with a high-dimensional feature space is vulnerable to similar attacks. I'm not sure how we'd do this with human beings as the classifiers (some sort of physiological measure of bogglement in place of the network's numerical confidence?), but it'd be in-principle straightforward enough to d with a common-or-garden support vector machine.
- Desimone et al. (1984), in a now-classic paper in neuroscience, demonstrated that macaques had neurons which selectively responded to faces. They went on to show that (at least some) of those cells still responded to quite creepy altered faces, with their features re-arranged, blanking bars drawn across them, and even faces of a different species (as in this portion of their Figure 6A):

It'd seem in-principle possible to use the technique of this paper to evolve images for maximal response from these cells, if you could get the monkey to tolerate the apparatus for long enough.
- The basic idea: defining what microscopic, low-level features cause an image to have macroscopic, high-level properties
- This is a counterfactual / interventional definition, not just about probabilistic association
- This gives an in-principle sound way of constraining a learning system to only pay attention to the causally relevant features
- With observational data, you'd need to do causal inference
- They have a very clever way of doing that here, based on a coarsening theorem, which says that the correct causal model is (almost always) a coarsening of the optimally-predictive observational model
- This in turn gets into issues of predictive states...
I guess a more purely Greek phrase would be "isotaxon", but that's just a guess.
i.e., I learned about this paper from Shallice and Cooper's excellent The Organisation of Mind, but felt dumb for not knowing about it before.
Posted at August 06, 2019 15:17 | permanent link
February 05, 2019
"Causal inference in social networks: A new hope?" (Friday at the Ann Arbor Statistics Seminar)
Attention
conservation notice: Self-promoting notice of a very academic
talk, at a university far from you, on a very recondite
topic, solving a problem that doesn't concern you under a set of
assumptions you don't understand, and wouldn't believe if I explained to
you.
I seem to be giving talks again:
- "Causal inference in social networks: A new hope?"
- Abstract: Latent homophily generally makes it impossible to identify contagion or influence effects from observations on social networks. Sometimes, however, homophily also makes it possible to accurately infer nodes' latent attributes from their position in the larger network. I will lay out some assumptions on the network-growth process under which such inferences are good enough that they enable consistent and asymptotically unbiased estimates of the strength of social influence. Time permitting, I will also discuss the prospects for tracing out the "identification possibility frontier" for social contagion.
- Joint work with Edward McFowland III
- Time and place: 11:30 am -- 12:30 pm on 8 February 2019, in 411 West Hall, Statistics Department, University of Michigan
--- The underlying paper grows out of an idea that was in
my paper with Andrew Thomas on social contagion: latent
homophily is the problem with causal inference in social networks, but latent
homophily also leads to large-scale structure in networks, and allows
us to infer latent attributes from the graph; we call this "community
discovery". Some years later, my student Hannah Worrall, in
her senior thesis,
did an extensive series of simulations showing that controlling for estimated
community membership lets us infer the strength of social inference, in regimes
where community-discovery is consistent. Some years after that, Ed asked me
what I was wanting to work on, but wasn't, so I explained about what seemed to
me the difficulties in doing some proper theory about this. As I did so,
the difficulties dissolved under Ed's questioning, and the
paper followed very naturally. We're
now revising in reply to referees (Ed, if you're reading this --- I
really am working on it!), which
is as pleasant as
always. But I am very pleased to have finally made a positive
contribution to a problem which has occupied me for many years.
Constant Conjunction Necessary Connexion;
Enigmas of Chance;
Networks;
Self-Centered
Posted at February 05, 2019 21:04 | permanent link
February 03, 2019
On Godzilla and the Nature and Conditions of Cultural Success; or, Shedding the Skin
Attention conservation notice: 1100+ words of Deep Thoughts on a creature-feature monster and cultural selection, from someone with no qualifications to write on either subject. Expresses long-held semi-crank notions; composed while simultaneously reading Morin on diffusion chains and drinking sake; revived over a year after it was drafted because Henry was posting about similar themes, finally posted because I am procrasting finishing a grant proposal celebrating submitting a grant proposal on time.
Godzilla is an outstanding example of large-scale cultural success, and of how successful cultural items become detached from their original meanings.
Godzilla's origins are very much in a particular time and place, namely Japan, recently (if not quite immediately) post-WWII and the national trauma of the atomic bombings and their lingering effects. This is a very particular setting, on the world-historical scale. It is now seven decades in the past, and so increasingly gone from living memory, even for the very long-lived population of Japan.
Against this, Godzilla has been tremendously successful culturally all over the world, over basically the whole time since it appeared. I don't mean that it's made money (thought it has) --- I mean that it has been popular, that people have liked consuming stories (and images and toys and other representations) about it, that they have liked creating such representations, and that they have liked thinking about and with Godzilla.. (In contemporary America, for instance, Godzilla is so successful that the suffix "-zilla" is a morpheme, denoting something like "a destructive, mindlessly-enraged form of an entity".) Necessarily, the vast majority of this success and popularity has been distant in time, space, social structure and cultural context from 1950s Japan. How can these two observations --- the specificity of origins and the generality of success --- be reconciled?
To a disturbing extent, of course, any form of cultural success can be self-reinforcing (cf. Salganik et al.), but there is generally something to the representations which succeed (cf., again, Salganik et al.). But, again, Godzilla is endemic in many contexts remote in space, time and other cultural features from immediately-post-war Japan. So it would seem that whatever makes it successful in those contexts, including here and now as I write this, must be different from what made it successful at its point of origin.
It could be that Godzilla is successful in 1950s Japan and in 2010s USA because it happened to fit two very different but very specific cultural niches --- the trauma of defeat culminating in nuclear war, on the one hand; and (to make something up) a compulsive desire for re-enactments of 9/11 on the other hand. But explaining wide-spread success by a series of particular fits falters as we consider all the many other social contexts in which Godzilla has been popular. Maybe it happened, by chance, to appeal narrowly to one new context, but two? three? ten?
An alternative is that Godzilla has managed to spread because it appeals to tastes which are not very context-specific, but on the contrary very widely distributed, if not necessarily constant and universal. In the case of Godzilla, we have a monster who breaks big things and breathes fire: an object of thought, in other words, enduringly relevant to crude interests in predators,
in destruction, and in fire. Since those interests are very common across all social contexts, something which appeals to them has a very good source of "pull".
This is not to say that Godzilla wasn't, originally, all about being the only country ever atom-bombed into submission. But it is to say that we can draw a useful distinction between the meanings successful cultural products had originally and those attached to them as they diffuse. It is analogous to the distinction the old philosophy of science used to draw between an idea's "context of discovery" and its "context of justification", though that had a normative force I am not aiming at. (For the record, I think that many of the criticisms of the discovery-justification distinction are weak, mis-conceived or just flat wrong, and that it's actually a pretty useful distinction. But that's another story for another time.)
For Godzilla, like many other successful cultural products, the "context of invention" was a very historically-specific confluence of issues, concerns and predecessors. But the "context of diffusion" was that it could appeal to vastly more generic tastes, and make use of vastly more generic opportunities. These are still somewhat historically-specific (e.g., no motion-picture technology, no Godzilla), but much less so. I am even tempted to formulate a generalization: the more diffused a cultural product is, in space or time or social position, the less its appeal owes to historically-specific contexts, and the more it owes to forces which are nearly a-historical and constant.
What holds me back from declaring cultural diffusion to be a low-pass filter is that it is, in fact, logically possible for a cultural product to succeed in many contexts because it seems to be narrowly tailored to them all. What's needed, as a kind of meta-ingredient, is for the cultural product to be suggestively ambiguous. It is ambiguity which allows very different people to find in the same artifact the divergent but specific meanings they seek; but it also has to somehow suggest to many people that there is a specific, compelling meaning to be found in it. When we consider cultural items which have endured for a very long time, like some sacred texts or other works of literature, then I suspect we are seeing representations which have been strongly selected for suggestive ambiguity.
It is a cliche of literary criticism that each generation gives its own interpretation of these great works. It is somewhat less of a cliche, though equally true, that every generation finds a reason to interpret them. Pace Derrida and his kin, I don't think that every text or artifact is equally amenable to this sort of re-interpretation and re-working. (Though that notion may have seemed more plausible to literary scholars who were most familiar with a canon of books inadvertently selected, in part, for just such ambiguity.) There are levels of ambiguity, and some things are just too straightforward to succeed this way. It is also plainly not enough just to be ambiguous, since ambiguous representations are very common, and usually dismal failures at propagating themselves. The text or artifact must also possess features which suggest that there is an important meaning to be found in it. What those features are, in terms of rhetorical or other sorts of design, is a nice question, though perhaps not beyond all conjecture. (I strongly suspect Gene Wolfe of deliberately aiming for such effects.) Something keeps the great works alive over time and space, saving them from being as dead as Gilgamesh, of merely historical interest. Because they are interpreted so variously, they can't be surviving because any one of their interpretations is the right one, conveying a compelling message that assures human interest. Rather, works outlast ages precisely because they simultaneously promise and lack such messages. This quality of suggestive ambiguity could, of course, also contribute to academic and intellectual success --- making it seem like you have something important to say, while leaving what that thing is open to debate, is one route to keeping people talking about you for a long time.
… or so I think in my more extreme moments. In another mood, I might try to poke holes in my own arguments. As for Godzilla, I suspect it's too early to tell whether it possesses this quality of suggestive ambiguity, but my hunch is that this dragon is not a shape-shifter.
The Collective Use and Evolution of Concepts;
Scientifiction and Fantastica
Posted at February 03, 2019 15:08 | permanent link
Data Over Space and Time
Collecting posts related to this course (36-3467/36-667).
- Fall 2018:
- Course Announcement
- Lecture 1: Introduction to the Course
- Lectures 2 and 3: Smoothing, Trends, Detrending
- Lecture 4: Principal Components Analysis I
- Lecture 5: Principal Components Analysis II
- Lecture 6: Optimal Linear Prediction
- Lecture 7: Linear Prediction for Time Series
- Lecture 8: Linear Prediction for Spatial and Spatio-Temporal Random Fields
- Lectures 9--13: Filtering, Fourier Analysis, African Population and Slavery, Linear Generative Models
- Lectures 14 and 15: Inference for Dependent Data
- Lecture 17: Simulation
- Lectures 18 and 19: Simulation for Inference
- Lecture 20: Markov Chains
- Lectures 21--24: Compartment Models, Optimal Prediction, Inference for Markov Models, Markov Random Fields, Hidden Markov Models
- Self-Evaluation and Lessons Learned
- Books to Read While the Algae Grow in Your Fur, December 2018
- Books to Read While the Algae Grow in Your Fur, January 2019
Posted at February 03, 2019 14:15 | permanent link
January 31, 2019
Books to Read While the Algae Grow in Your Fur, January 2019
Attention
conservation notice: I have no taste. I also have no qualifications
to discuss the history of millenarianism, or really even statistical graphics.
- Bärbel Finkenstädt, Leonhard Held and Valerie Isham (eds.), Statistical Methods for Spatio-Temporal Systems
- This is an edited volume arising from a conference, with all the virtues
and vices that implies. (Several chapters have references to
the papers which first published the work expounded in other
chapters.) I will, accordingly, review the chapters in order.
- Chapter 1: "Spatio-Temporal Point Processes: Methods and Applications"
(Diggle). Mostly a precis of case studies from Diggle's (deservedly standard)
books on the subject, which I will get around to finishing one of these years.
- Chapter 2: "Spatio-Temporal Modelling --- with a View to Biological Growth"
(Vedel Jensen, Jónsdóttir, Schmiegel, and Barndorff-Nielsen).
This chapter divides into two parts. One is about "ambit stochastics". In a
random field $Z(s,t)$, the "ambit" of the space-time point-instant $(s,t)$ is
the set of point-instants $(q,u)$, $u < t$, where $Z(q,u)$ is (causally)
relevant to $Z(r,t)$. (This is what, in my own work, I've called
the "past cone" of $(s,t)$.)
Having a regular geometry for the ambit imposes some tractable restrictions on
random fields, which are explored here for models of growth-without-decay. The
second part of this chapter will only make sense to hardened habituees of Levy
processes, and perhaps not even to all of them.
- Chapter 3: "Using Transforms to Analyze Space-Time Processes" (Fuentes,
Guttorp, and Sampson): A very nice survey of Fourier transform, wavelet
transform, and PCA approaches to decomposing spatio-temporal data. There's a
good account of some tests for non-stationarity, based on the idea that
(essentially) we should get the nearly same transforms for different parts of
the data if things really are stationary. (I should think carefully about the
assumptions and the implied asymptotic regime here, since the argument makes
sense, but it also makes sense that sufficiently slow mean-reversion
is indistinguishable from non-stationarity.)
- Chapter 4: "Geostatistical Space-Time Models, Stationarity, Seperability,
and Full Symmetry" (Gneiting, Genton, and Guttorp): "Geostatistics" here refers to
"kriging", or using linear prediction on correlated data. As
every schoolchild knows,
this boils down to finding the covariance function,
$\mathrm{Cov}[Z(s_1, t_1), Z(s_2, t_2)]$. This chapter considers three kinds
of symmetry restrictions on the covariance functions: "separability", where
$\mathrm{Cov}[Z(s_1, t_1), Z(s_2, t_2)] = C_S(s_1, s_2) C_T(t_1, t_2)$; the
weaker notion of "full symmetry", where $\mathrm{Cov}[Z(s_1, t_1), Z(s_2, t_2)]
= $\mathrm{Cov}[Z(s_1, t_2), Z(s_2, t_1)]$; and "stationarity", where
$\mathrm{Cov}[Z(s_1, t_1), Z(s_2, t_2)] =
$\mathrm{Cov}[Z(s_1+q, t_1+h), Z(s_2+q, t_2+h)]$. As the authors explain,
while separable covariance functions are often used because of their
mathematical tractability, they look really weird; "full symmetry" can
do a lot of the same work, at less cost in implausibility.
- Chapter 5: "Space-Time Modelling of Rainfall for Continuous Simulations"
(Chandler, Isham, Belline, Yang and Northrop): A detailed exposition of two
models for rainfall, at different spatio-temporal scales, and how they are both
motivated by and connected to data. I appreciate their frankness about things
that didn't work, and the difficulties of connecting the different models.
- Chapter 6, "A Primer on Space-Time Modeling from a Bayesian Perspective"
(Higdon): Here "space-time modeling" means "Gaussian Markov random fields".
Does what it says on the label.
- All the chapters combine theory with examples --- chapter 2 is perhaps the
most mathematically sophisticated one, and also the one where the examples do
the least work. The most useful, from my point of view, were Chapters 3 and 4,
but that's because I was teaching a class where I did a lot of kriging ad PCA,
and (with some regret) no point processes. If you have a professional interest
in spatio-temporal statistics, and a fair degree of prior
acquaintance, I can recommend this as a useful collection of examples, case
studies, and expositions of some detailed topics.
- Errata, of a sort: There are supposed to be color plates
between pages 142 and 143. Unfortunately, in my copy these are printed in
grey, not in color.
- Disclaimer: The publisher sent me a copy of this book, but that
was part of my fee for reviewing a (different) book proposal for them.
- Kieran Healy, Data Visualization: A Practical Introduction
- Anyone who has looked at my professional writings will have noticed that my
data visualizations are neither fancy nor even attractive, and they never go
beyond basic R graphics. This is because I have never learned any other system
for statistical visualization. And I've not done that because I'm
lazy, and have little visual sense anyway. This book is the best guide I've
seen to (1) learning the widely-used, and generally handsome, ggplot library in
R, (2) learning the "grammar of graphics" principles on which it is based, and
(3) learning the underlying psychological principles which make some graphics
better or worse visualizations than others. (This is not to be confused with
learning the maxims or even the tacit taste of a particular designer, even one
of genius.) The writing is great, the examples are interesting, well-chosen
and complete, and the presumptions about how much R, or statistics, you know
coming in are minimal. I wish something like this had existed long ago, and
I'm tempted, after reading it, to totally re-do the figures in
my book. (Aside to
my editor: I am not going to totally re-do the figures in my book.) I strongly
recommend it, and will be urging it on my graduate students for the foreseeable
future.
- ObLinkage: The book is online, pretty much.
- ObDisclaimer: Kieran and I have been saying good things about each other's
blogs since the High Bronze Age of the Internet. But I paid good cash money for my copy, and have no stake in the success of this book.
- Anna Lee Huber, Mortal Arts
- More historical-mystery mind candy,
this time flavored by the (dismal) history of early 19th century psychiatry.
(Huber is pretty good, though not perfect, at avoiding anachronistic
language, so nobody says "psychiatry" in the novel.)
- Norman Cohn, The Pursuit of the Millennium: Revolutionary Millenarians and Mystical Anarchists of the Middle Ages
- I vividly remember finding a used copy of this in the UW-Madison student
bookstore when I began graduate school, in the fall of 1993, and having my mind
blown by reading it that fall*. Coming back to it now, I find it still
fascinating and convincing, and does an excellent job of tracing millenarian
movements among the poor in Latinate Europe from the fall of Rome through the
Reformation. (There are a few bits where he gets a bit psychoanalytic, but the
first edition was published in 1957.) If I no longer find it
mind-blowing, that's in large part because reading it sparked an enduring
interest in
millenarianism,
and so I've long since absorbed what then (you should forgive the expression)
came as a revelation.
- The most controversial part of the book, I think, is the conclusion, where
Cohn makes it very clear that he thinks there is a great deal of similarity, if
not actual continuity, between his "revolutionary millenarians and mystical
anarchists" and 20th century political extremism, both of the Fascist and the
Communist variety. He hesitates --- wisely, I think --- over whether this is
just a similarity, or there is an actual thread of historical continuity; but I
think his case for the similarity is sound.
- *: I was supposed to be having my mind
blown
by Sakurai.
In retrospect, this incident sums up both why I was not a very good graduate
student, and why I will never be a great scientist.
Books to Read While the Algae Grow in Your Fur;
Enigmas of Chance;
Data over Space and Time;
Pleasures of Detection, Portraits of Crime;
Tales of Our Ancestors;
Psychoceramica;
Writing for Antiquity
Commit a Social Science
Posted at January 31, 2019 23:59 | permanent link
December 31, 2018
Books to Read While the Algae Grow in Your Fur, December 2018
Attention
conservation notice: I have no taste. I also have no qualifications
to discuss poetry or leftist political theory. I do know something about
spatiotemporal data analysis, but you don't care about that.
- Gidon Eshel, Spatiotemporal Data Analysis
- I assigned this as a textbook in my fall class
on data over space and time,
because I need something which covered spatiotemporal data analysis, especially
principal components analysis, for students who could be taking linear
regression at the same time, and was cheap. This met all my requirements.
- The book is divided into two parts. Part I is a review or crash course in
linear algebra, building up to decomposing square matrices in terms of their
eigenvalues and eigenvectors, and then the singular value decomposition of
arbitrary matrices. (Some prior acquaintance with linear algebra will help,
but not very much is needed.) Part II is about data analysis, covering some
basic notions of time series and autocorrelation, linear regression models
estimated by least squares, and "empirical orthogonal functions", i.e.,
principal components analysis, i.e., eigendecomposition of covariance or
correlation matrices. As for "cheap", while the list price is (currently) an
outrageous \$105, it's on
JSTOR, so The Kids had free access to the PDF through the
university library.
- In retrospect, there were strengths to the book, and some serious
weaknesses --- some absolute, some just for my needs.
- The most important strength is that Eshel writes like a
human being, and not a bloodless textbook. His authorial persona is not
(thankfully) much like mine, but it's a likeable and enthusiastic one. This is
related to his trying really, really hard to explain everything as simply as
possible, and with multitudes of very detailed worked examples. I will
probably be assigning Part I of the book, on linear algebra, as refresher
material to my undergrads for years.
- He is also very good at constantly returning to physical insight to
motivate data-analytic procedures. (The highlight of this, for me, was section
9.7 [pp. 185ff] on when and why an autonomous, linear, discrete-time AR(1) or
VAR(1) model will arise from a forced, nonlinear, continuous-time dynamical
system.) If this had existed when I was a physics undergrad, or starting grad
school, I'd have loved it.
- Turning to the weaknesses, some of them are, as I said, merely ways in
which he didn't write the book to meet my needs. His implied reader
is very familiar with physics, and not just the formal, mathematical parts but
also the culture (e.g., the delight in complicated compound units of
measurement, saying "ensemble" when other disciplines say "distribution" or
"population"). In fact, the implied reader is familiar with, or at least
learning, climatology. But that reader has basically no experience with
statistics, and only a little probability (so that, e.g., they're not familiar
with rules for algebra with expectations and covariances*). Since my audience was undergraduate and masters-level
statistics students, most of whom had only the haziest memories of high school
physics, this was a mis-match.
- Others weaknesses are, to my mind, a bit more serious, because they
reflect more on the intrinsic content.
- A trivial but real one: the book is printed in black and white, but many
figures are (judging by the text) intended to be in color, and are scarcely
comprehensible without it. (The first place this really struck me was p. 141
and Figure 9.4, but there were lots of others.) The electronic version is no
better.
- The climax of the book (chapter 11) is principal components analysis.
This
is really, truly
important, so it deserves a lot of treatment. But it's not a very satisfying
stopping place: what do you do with the principal components once you have
them? What about the difference between principal components / empirical
orthogonal functions
and factor models?
(In the book's terms, the former does a low-rank approximation to the sample
covariance matrix $\mathbf{v} \approx \mathbf{w}^T \mathbf{w}$, while the
latter treats it as low-rank-plus-diagonal-noise $\mathbf{v} \approx
\mathbf{w}^T\mathbf{w} + \mathbf{d}$, an importantly different thing.) What
about nonlinear methods of dimensionality reduction? My issue isn't so much
that the book didn't do everything, as that it didn't give readers even hints
of where to look.
- There are places where the book's exposition is not very internally
coherent. Chapter 8, on autocorrelation, introduces the topic with an example
where $x(t) = s(t) + \epsilon(t)$, for a deterministic signal function
$s(t)$ and white noise $\epsilon(t)$. Fair enough; this is a trend-plus-noise
representation. But it then switches to modeling the autocorrelations as
arising from processes where $x(t) = \int_{-\infty}^{t}{w(u) x(u) du} +
\xi(t)$, where again $\xi(t)$ is white noise. (Linear autoregressions are the
discrete-time analogs.)
These are distinct classes of processes. (Readers will find it
character-building to try to craft a memory kernel $w(u)$ which matches the
book's running signal-plus-noise example, where $s(t) =
e^{-t/120}\cos{\frac{2\pi t}{49}}$.)
- I am all in favor of physicists' heuristic mathematical sloppiness,
especially in introductory works, but there are times when it turns into mere
confusion. The book persistently conflates time or sample averages with
expectation values. The latter are ensemble-level quantities, deterministic
functionals of the probability distribution. The former are random variables.
Under various laws of large numbers or ergodic theorems, the
former converge on the latter, but they are not the same.
Eshel knows they are not the same, and sometimes talks about
how they are not the same, but the book's notation persistently writes
them both as $\langle x \rangle$, and the text sometimes flat-out identifies
them. (For one especially painful example among many, p. 185.) Relatedly, the
book conflates parameters (again, ensemble-level quantities, functions of the
data-generating process) and estimators of those parameters (random
variables)
- The treatment of multiple regression is unfortunate.
$R^2$ does
not measure goodness of fit. (It's not even a measure of how well the
regression predicts or
explains.) At some level, Eshel knows this, since his recommendation for
how to pick regressors is not "maximize $R^2$". On the other hand, his
prescription for picking regressors (sec. 9.6.4, pp.180ff) is rather painful to
read, and completely at odds with his stated rationale of using regression
coefficients to compare alternative explanations (itself a bad, though common,
idea). Very strikingly, the terms "cross-validation" and "bootstrap" do not
appear in his index**. Now, to be clear,
Eshel isn't worse in his treatment of regression that most
non-statisticians, and he certainly understands the algebra backwards
and forwards. But his advice on the craft of regression is, to be
polite, weak and old-fashioned.
- Summing up, the linear-algebra refresher/crash-course of Part I is great,
and I even like the principal components chapters in Part II, as far as they
go. But it's not ideal for my needs, and there are a bunch of ways I think it
could be improved for anyone's needs. What to assign instead,
I have no idea.
- *: This is, I think, why he
doesn't explain the calculation of the correlation time and effective
sample size in sec. 8.2 (pp. 123--124), just giving a flat statement of the
result, though it's really easy to prove with those
tools. I do appreciate finally learning the origin of this beautiful and
practical result --- G. I. Taylor, "Diffusion by Continuous
Movements", Proceedings
of the London Mathematical Society, series 2, volume 20 (1922),
pp. 196--212 (though the book's citing it with the wrong year, confusing series number with an issue number, and no page numbers was annoying).
^
- **: The absence of "ridge
regression" and "Tikhonov regularization" from the index is all the more
striking because they appear in section 9.3.3 as "a more general, weighted,
dual minimization formalism", which, compared to ordinary least squares, is
described as "sprinkling added power ... on the diagonal of an otherwise
singular problem". This is, of course, a place where it would be really
helpful to have a notion of cross-validation, to decide how much to
sprinkle.^
- Nick Srnicek and Alex Williams, Inventing the Future: Postcapitalism and a World Without Work
- It's --- OK, I guess? They have some good points against what they call
"folk politics", namely, that it has conspicuously failed to accomplish
anything, so doubling down on more of it seems like a bad way to change the
world. And they really want to change the world: the old
twin goals of increasing human power over the world, and eliminating human
power of other humans, are very much still there, though they might not quite
adopt that formula. To get there, their basic idea is to push for a "post-work
world", one where people don't have to work to survive, because
they're entitled to a more-than-subsistence basic income as a matter of right.
They realize that making that work will require lots of
politics and pushes for certain kinds of technological progress rather
than others. This is the future they want --- to finally enter (in
Marx's words) "the kingdom of freedom", where we will be able to get on with
all the other problems, and possibilities, confronting us.
- As for getting there: like a long, long line of leftist intellectuals from
the 1960s onwards, Srnicek and Williams are very taken with the idea, going
back to Gramsci, that the key to achieving socialism is to first achieve
ideological "hegemony". To put it crudely, this means trying to make your idea
such broadly-diffused, widely-accepted, scarcely-noticed common notions that
when madmen in authority channel voices from the air, they
channel you. (In passing: Occupy may have done nothing to reduce
economic inequality, but Gramsci's success as a strategist may be measured by
the fact that he wrote
in a Fascist prison.) Part of this drive for hegemony is pushing for
new ideas in economics --- desirable in itself, but they are sure in advance of
what inquiry should find *. Beyond this, and
saying that many tactics will need to be tried out by a whole "ecology" of
organizations and groups, they're pretty vague. There's some wisdom here ---
who could propound a detailed plan to get to post-work
post-capitalism? --- but also more ambiguity than they acknowledge. Even if a
drive for a generous basic income (and all that would go with it) succeeds, the
end result might not be anything like the sort of post-capitalism Srniceck and
Williams envisage, if only because what we learn and experience along the way
might change what seems feasible and desirable. (This is a Popperian point
against Utopian plans, but it can be put in other language quite
easily**.) I think Srnicek and
Williams might be OK with the idea that their desired future won't be
realized, so long as some better future is, and that the important
point is to get people on the left not to prefigure better worlds in occasional
carnivals of defiance, but to try to make them happen. Saying that doing this
will require organization, concrete demands, and leadership is pretty sensible,
though they do disclaim trying to revive the idea of a vanguard party.
- Large portions of the book are, unfortunately, given over to insinuating,
without ever quite saying, that post-work is not just desirable and possible,
but a historical necessity to which we are impelled by
the inexorable
development of capitalism, as foreseen by the Prophet. (They also
talk about how Marx's actual scenario for how capitalism would develop, and
end, not only has not come to pass yet, but is pretty much certain to never
come to pass.) Large portions of the book are given over to wide-ranging
discussions of lots of important issues, all of which, apparently, they grasp
through the medium of books and articles published by small, left-wing presses
strongly influenced by post-structuralism --- as it were, the world viewed
through the Verso Books catalog. (Perry Anderson had the important advantage,
as a writer and thinker, of being formed outside the rather hermetic
subculture/genre he helped create; these two are not so lucky.) Now, I
recognize that good
ideas usually
emerge within a community that articulates its own distinctive
tradition, so some insularity can be all to the good. In this case, I am not
all that far from the authors' tradition, and sympathetic to it. But still,
the effect of these two (overlapping) writerly defects is that once the book
announced a topic, I often felt I could have written the subsequent passage
myself; I was never surprised by what they had to say. Finishing this was a
slog.
- I came into the book a mere Left Popperian and market socialist, friendly
to the idea of a basic income, and came out the same way. My mind was not
blown, or even really changed, about anything. But it might encourage
some leftist intellectuals to think constructively about the future,
which would be good.
- Shorter: Read Peter
Frase's Four
Futures instead.
- *: They are quite confident
that modern computing lets us have an efficient planned economy, a conclusion
they support not be any technical knowledge of the issue but by citations to
essays in literary magazines and collections of humanistic scholarship. As I
have said before, I wish that were the case, if only because it would be
insanely helpful for my own work,
but I think that's just wrong.
In any case, this is an important point for socialists, since it's
very consequential for the kind of socialism we should pursue. It
should be treated much more seriously, i.e., rigorously and knowledgeable, than
they do. Fortunately, a basic income is entirely compatible
with market socialism, as are other measures to ensure that people
don't have to sell their labor power in order to live.
- **: My own two-minute stab at making chapter 9
of The Open Society and Its Enemies sound suitable for New
Left Review: "The aims of the progressive forces, always multifarious,
develop dialectically in the course of the struggle to attain them. Those aims
can never be limited by the horizon of any abstract,
pre-conceived telos, even one designated 'socialism', but will always
change and grow through praxis." (I admit "praxis" may be a bit
dated.) ^
- A. E. Stallings, Like: Poems
- Beautiful stuff from one of my favorite contemporary poets. "Swallows" and
"Epic Simile" give a fair impression of what you'll find. This also
includes a lot of the poems discussed in Cynthia Haven's "Crossing Borders" essay.
Books to Read While the Algae Grow in Your Fur;
Enigmas of Chance;
Data over Space and Time;
The Progressive Forces;
The Commonwealth of Letters
Posted at December 31, 2018 23:59 | permanent link
December 28, 2018
Data over Space and Time: Self-Evaluation and Lessons Learned
Attention
conservation notice: Academic navel-gazing, about a class you didn't
take, in a subject you don't care about, at a university you don't
attend.
Well, that went better than it could have,
especially since it was the first time I've taught a new undergraduate course
since 2011.
Some things that worked well:
- The over-all choice of methods topics --- combining
descriptive/exploratory techniques and generative models and their inference.
Avoiding the ARIMA alphabet soup as much as possible both played to my
prejudices and avoided interference with a spring course.
- The over-all kind and range of examples (mostly environmental and
social-historical) and the avoidance of finance. I could have done some more
economics, and some more neuroscience.
- The recurrence of linear algebra and eigen-analysis (in smoothing,
principal components, linear dynamics, and Markov processes) seems to have
helped some students, and at least not hurt the others.
- The in-class exercises did wonders for attendance. Whether doing the
exercises, or that attendance, improved learning is hard to say. Some students
specifically praised them in their anonymous feedback, and nobody complained.
Some things did not work so well:
- I was too often late in posting assignments, and too many of them had
typos when first posted. (This was a real issue with the final. To
any of the students reading this: my apologies once again.) I also had a lot
of trouble calibrating how hard the assignments would be, so the opening
problem sets were a lot more work than the later ones.
(In my partial defense about late assignments, there were multiple problem
sets which I never posted, after putting a lot of time into them, because my
initial idea either proved much too complicated for this course when fully
executed, or because I was, despite much effort, simply unable to reproduce
published papers*. Maybe next time, if
there is a next time, these efforts can see the light of day.)
- I let the grading get really, really behind the assignments. (Again, my
apologies.)
- I gave less emphasis to spatial and spatio-temporal models in the second,
generative half of the course than they really deserve. E.g., Markov random
fields and cellular automata (and kin) probably deserve at least a
lecture each, perhaps more.
- I didn't build in enough time for review in my initial schedule, so I
ended up making some painful cuts. (In particular, nonlinear autoregressive
models.)
- My attempt to teach Fourier analysis was a disaster. It needs much more
time and preparation than I gave it.
- We didn't get very much at all into how to think your way through building
a new model, as opposed to estimating, simulating, predicting, checking, etc.,
a given model.
- I have yet to figure out how to get the students to do the
readings before class.
If I got to teach this again, I'd keep the same over-all structure, but
re-work all the assignments, and re-think, very carefully, how much time I
spent on which topics. Some of these issues would of course go away if there
were a second semester to the course, but that's not going to happen.
*: I now somewhat suspect that one of the papers I tried
to base an assignment on is just wrong, or at least could not have done the
analysis the way it say it did. This is not the first time I've encountered
something like this through teaching... ^
Data over Space and Time
Posted at December 28, 2018 11:22 | permanent link
Posted at December 28, 2018 10:25 | permanent link
November 30, 2018
Books to Read While the Algae Grow in Your Fur, November 2018
Attention
conservation notice: I have no taste. I also have no qualifications
to discuss the history of photography, or of black Pittsburgh.
- Cheryl Finley, Laurence Glasco and Joe W. Trotter, with an introduction by Deborah Willis, Teenie Harris, Photographer: Image, Memory, History
- A terrific collection of Harris's photos of (primarily) Pittsburgh's
black community from the 1930s to the 1970s, with good biographical and
historical-contextual essays.
- Disclaimer: Prof. Trotter is also on the faculty at CMU, but I
don't believe we've ever actually met.
- Ben Aaronovitch, Lies Sleeping
- Mind candy: the latest installment in the long-running
supernatural-procedural mystery series, where the Folly gets tangled up with
the Matter of Britain.
- Charles Stross, The Labyrinth Index
- Mind candy; Latest installment in Stross's long-running Lovecraftian
spy-fiction series. I imagine a novel about the US Presidency being taken over
by a malevolent occult force seemed a lot more amusing before 2016, when this
must have been mostly written. It's a good installment, but only suitable for
those already immersed in the story.
- Anna Lee Huber, The Anatomist's Wife and A Brush with Shadows
- Mind-candy, historical mystery flavor. These are the first and sixth
books in the series, because I couldn't lay hands on 2--5, but I will.
(Update: More.)
Books to Read While the Algae Grow in Your Fur;
Scientifiction and Fantastica;
Pleasures of Detection, Portraits of Crime;
Tales of Our Ancestors;
Cthulhiana;
Heard About Pittsburgh, PA
Posted at November 30, 2018 23:59 | permanent link
November 13, 2018
Data over Space and Time, Lecture 20: Markov Chains
(.Rmd)
Data over Space and Time
Posted at November 13, 2018 16:50 | permanent link
November 12, 2018
Course Announcement: Advanced Data Analysis (36-402/36-608), Spring 2019
Attention
conservation notice: Announcement of an advanced undergraduate course
at a school you don't attend in a subject you don't care about.
I will be
teaching 36-402/36-608,
Advanced Data Analysis, in the spring.
This will be the seventh time I'll have taught it, since I took it over and
re-vamped it in 2011. The biggest change from previous iterations will be in
how I'll be handling class-room time, by introducing in-class small-group
exercises. I've been doing this in this semester's class, and it seems to at
least not be hurting their understanding, so we'll see how well it scales to
a class with four or five times as many students.
(The other change is that by the time the class begins in January,
the textbook will,
inshallah, be in the hands of the publisher. I've finished adding everything
I'm going to add, and now it's a matter of cutting stuff, and fixing
mistakes.)
Advanced Data Analysis from an Elementary Point of View
Posted at November 12, 2018 14:51 | permanent link
Posted at November 12, 2018 13:55 | permanent link
November 03, 2018
In Memoriam Joyce Fienberg
I met Joyce through her late husband Stephen, my admired
and much-missed colleague. I won't pretend that she
was a close friend, but she was a friend, and you could hardly hope to meet a
kinder or more decent person. A massacre by a deluded bigot would be awful
enough even if his victims had been prickly and unpleasant individuals. But
that he murdered someone like Joyce --- five blocks from where I live
--- makes it especially hard to take. I am too sad to have anything
constructive to say, and too angry at living
in a running morbid joke to remember her the way she deserves.
Posted at November 03, 2018 14:25 | permanent link
November 01, 2018
Data over Space and Time, Lecture 17: Simulation
Lecture 16 was canceled.
(.Rmd)
Data over Space and Time
Posted at November 01, 2018 13:00 | permanent link
|