2014

In which we practice working with data frames, grapple with some of the subtleties of R's system of data types, and think about how to make sequences.

(Hidden agendas: data cleaning; practice using R Markdown; practice reading R help files)

Assignment, due at 11:59 pm on Thursday, 4 September 2014

Introduction to Statistical Computing

Posted by crshalizi at August 29, 2014 11:30 | permanent link

Lab: Exponentially More Fun (Introduction to Statistical Computing)

In which we play around with basic data structures and convince ourself that the laws of probability are, in fact, right. (Or perhaps that R's random number generator is pretty good.) Also, we learn to use R Markdown.

— Getting everyone randomly matched for pair programming with a deck of cards worked pretty well. It would have worked better if the university's IT office hadn't broken R on the lab computers.

Lab (and its R Markdown source)

Introduction to Statistical Computing

Posted by crshalizi at August 29, 2014 10:30 | permanent link

August 27, 2014

Bigger Data Structures (Introduction to Statistical Computing)

Matrices as a special type of array; functions for matrix arithmetic and algebra: multiplication, transpose, determinant, inversion, solving linear systems. Using names to make calculations clearer and safer: resource-allocation mini-example. Lists for combining multiple types of values; access sub-lists, individual elements; ways of adding and removing parts of lists. Lists as key-value pairs. Data frames: the data structure for classic tabular data, one column per variable, one row per unit; data frames as hybrids of matrices and lists. Structures of structures: using lists recursively to creating complicated objects; example with eigen.

Slides

Introduction to Statistical Computing

Posted by crshalizi at August 27, 2014 10:30 | permanent link

August 25, 2014

Introduction to the Course; Basic Data Types (Introduction to Statistical Computing)

Introduction to the course: statistical programming for autonomy, honesty, and clarity of thought. The functional programming idea: write code by building functions to transform input data into desired outputs. Basic data types: Booleans, integers, characters, floating-point numbers. Operators as basic functions. Variables and names. Related pieces of data are bundled into larger objects called data structures. Most basic data structures: vectors. Some vector manipulations. Functions of vectors. Naming of vectors. Our first regression. Subtleties of floating point numbers and of integers.

Slides

Introduction to Statistical Computing

Posted by crshalizi at August 25, 2014 11:30 | permanent link

Class Announcement: 36-350, Statistical Computing, Fall 2014

Fourth time is charm:

36-350, Statistical Computing: Instructors: Yours truly and Andrew Thomas; Description: Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to programming, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs.; Students will learn the core of ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will in particular learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations, how to parallelize data analyses, how to employ numerical optimization algorithms and diagnose their limitations, and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code.; The class will be taught in the R language, use RStudio for labs, and R Markdown for assignments.; Pre-requisites: This is an introduction to programming for statistics students. Prior exposure to statistical thinking, to data analysis, and to basic probability concepts is essential, as is some prior acquaintance with statistical software. Previous programming experience is not assumed, but familiarity with the computing system is. Formally, the pre-requisites are "Computing at Carnegie Mellon" (or consent of instructor), plus one of either 36-202 or 36-208, with 36-225 as either a pre-requisite (preferable) or co-requisite (if need be).; The class may be unbearably redundant for those who already know a lot about programming. The class will be utterly incomprehensible for those who do not know statistics and probability.

Further details can be found at the class website. Teaching materials (lecture slides, homeworks, labs, etc.), will appear both there and here.

— The class is much bigger than in any previous year --- we currently have 50 students enrolled in two back-to-back lecture sections, and another twenty-odd on the waiting list, pending more space for labs. Most of the ideas tossed out in my last self-evaluation are going to be at least tried; I'm particularly excited about pair programming for the labs. Also, I at least am enjoying re-writing the lectures in R Markdown's presentation mode.

Manual trackback: Equitablog

Corrupting the Young; Enigmas of Chance; Introduction to Statistical Computing

Posted by crshalizi at August 25, 2014 10:30 | permanent link

July 31, 2014

Books to Read While the Algae Grow in Your Fur, July 2014

Attention conservation notice: I have no taste.

Stephen King, Eyes of the Dragon

Mind candy. I really liked it when I was a boy, and on re-reading it's not been visited by the Suck Fairy, but I did come away with two thoughts. (1) I'd have been very interested to see what a writer with drier view of political power would have done with the story elements (the two princes, the evil magician, the exiled nobles) — Cherryh, say, or Elizabeth Bear. (2) Speaking of which, it's striking how strongly King's fantasy books (this one, The Dark Tower) buy into the idea of rightfully inherited authority, when his horror stories are often full of healthy distrust of government officials ("the Dallas police"). I don't think he'd say that being electorally accountable, rather than chosen by accident of birth, makes those in power less trustworthy...

Charles Tilly, Why?

Tilly's brief attempt to look at reason-giving as a social act, shaped by relations between the giver and receiver of reasons, and often part of establishing, maintaining, or repairing that relationship. He distinguished between reasons why involved cause-and-effect and those which use a logic of "appropriateness" instead, and those which require specialized knowledge and those which don't. "Conventions" are common-knowledge reasons which are invoke appropriateness, not causal accounts. (Think "Sorry I'm late, traffic was murder".) "Stories" give causal explanations which only invoke common knowledge. Tilly is (explicitly) pretty Aristotlean about stories: they involve the deeds of a small number of conscious agents, with unity of time, place, and action. Codes are about matching circumstances to the right specialized formulas and formalities --- are your papers in order? is the evidence admissible? Technical accounts, finally, purport to be full cause-effect explanations drawing on specialized knowledge.

The scheme has some plausibility, and Tilly has lots of interesting examples. But of course he has no argument that these two dimensions (generalist vs. specialist, causation vs. appropriateness) are the only two big ones, that everything (e.g.) the "codes" box really does act the same way, etc. So I'd say it's worth reading to chew over, rather than being deeply illuminating.

Elliott Kay, Rich Man's War

Sequel to Poor Man's Fight, continuing the same high standard of quality mind-candy. (No Powell's link because currently only available on Amazon.)

Alexis de Tocqueville, Democracy in America

Yet another deserved classic read only belatedly. Volume I is actually about de Tocqueville's observations on, and ideas about, democracy in America. This is interesting, mostly empirical, and full of intriguing accounts of social mechanisms. (I see why Jon Elster is so into him.) Volume II consists of his dictates about what democracy and social equality will do to customs and character in every society. This is speculative and often the only reference to America comes in the chapter titles. (I see why this would also appeal to Elster.)

I would dearly like to find a good "de Tocqueville in retrospect" volume. Some of his repeated themes are the weakness of the Federal government, the smallness of our military, the absence of serious wars, the relative equality of economic condition of the (white) population, the lack of big cities among us. So how have we managed to preserve as much democracy as we have? For that matter, how does the civil war and its outcomes even begin to make sense from his perspective?

&madash; Rhetorical observation: de Tocqueville was very fond of contrasts where democracy leads to less dispersion among people than does aristocracy, but around a higher average level. He either didn't have the vocabulary to say this concisely, or regarded using statistical terms as bad style. (I suspect the former, due to the time period.) He was also very fond of paradoxes, where he either inverted directions of causal arrows, or flipped their signs.

Maria Semple, Where'd You Go, Bernadette?

Literary fiction about Seattle, motherhood, marital collapse, aggressively eccentric architects, and Antarctica. Very funny and more than a bit touching.

Thomas Piketty, Capital in the Twenty-First Century [Online technical appendix, including extra notes, figures, and spreadsheets]

Yes, it's as good and important as everyone says. If by some chance you haven't read about this yet, I recommend Robert Solow, Branko Milanovic and Kathleen Geier for overviews; Suresh Naidu's take is the best I've seen on the strengths and weaknesses of the book, but doesn't summarize so much.

Some minor and scattered notes; I might write a proper review later. (Why not? Everybody else has.)

Perhaps it's the translation, but Piketty seems wordy and a bit repetitive; I think the same things could have been said more briskly. Perhaps relatedly, I got a little tired of the invocations of Austen, Balzac, and American television.
The book has given rise to the most perfect "I happen to have Marshall McLuhan right here" moment I ever hope to see.
Attempts to undermine his data have, unsurprisingly, blown up in his attackers' faces. Similarly, claims that Piketty ignores historical contigency, political factors and institutions are just bizarre.
Granting that nobody has better point estimates, I wish he'd give margins of error as well. (A counter-argument: maybe he could calculate purely-statistical standard errors, but a lot of the time they could be swamped by nearly-impossible-to-estimate systematic errors, due to, e.g., tax evasion.)
His two "laws" of capitalism are an accounting identity (the share of capital in national income is the rate of return on capital times the ratio of capital to income, $ \alpha = r \beta $ ), and a long-run equilibrium condition (the steady-state capital/income ratio is the savings rate divided by the economy-wide growth rate, $ \beta = s/g $ ), the latter presuming that two quite variable quantities ($ s $ and $ g $) stay fixed forever. So the first can't help but be true, and the second is of limited relevance. (Why should people keep saving the same fraction of national income as their wealth and income change?) But I don't think this matters very much, except for the style. (However, Milanovic has an interesting defense of Piketty on this point.)
He gets the Cambridge Capital Controversy wrong, and while that matters for our understanding of capital as a factor of production, it's irrelevant for capital as a store of value, which is what Piketty is all about. Similarly, Piketty doesn't need to worry about declining marginal returns to capital in the economy's aggregate production function, which is good, because aggregate production functions make no sense even within orthodox neo-classical economics. (The fact that orthodox neo-classical economists continue to use them is a bit of an intellectual embarrassment; they should have more self-respect.)
The distinction between "income from labor" and "income from capital" is part of our legal system, and Piketty rests a lot of his work on it. But It seems to me an analytical mistake to describe the high compensation of a "super-manager" as income from labor. While it isn't coming from owning their corporation, it is coming from (partially) controlling it. In some ways, it's more like the income of an ancien regime tax farmer, or an Ottoman timariot, than the income of a roofer, nurse's aid, computer programmer, or even an architect. (Actually, the analogy with the timariot grows on me the more I think about it. The timariot didn't own his timar, he couldn't sell it or bequeath it, any more than a super-manager owns his company. Officially, income in both cases is compensation for services rendered to the actual owner, whether sultan or stock-holder.) It would be very nice to see someone try to separate income from labor and income from control, but I have no clue how to do it, statistically. (Though I do have a modest proposal for how to reduce the control income of super-managers.)
p. 654, n. 56: For "Claude Debreu", read "Gerard Debreu". (Speaking of economists' "passion for mathematics and for purely theoretical ... speculation"!)

ETA: Let me emphasize the point about production functions, marginal returns on capital, etc. It cannot be emphasized enough that capital, for Piketty, is the same as wealth, assets, stores of value which can be traded in the market. He does not mean non-human factors of production, "capital goods". (Cf.) Capital goods can work fine as assets, but a much more typical asset is a claim on part of the product achieved through putting capital goods and labor to use. Because he is looking at wealth rather than capital goods, the appropriate unit of measurement, and the one he uses, is monetary rather than physical. One consequence is that Piketty can legitimately add up monetary amounts to get the total wealth of a person, a class, or a country. (Whereas adding up capital goods is deeply problematic at best; I don't think even the dullest Gosplan functionary would've tried to get the total capital of the USSR by adding up the weight or volume of its capital goods.)

This also has implications for the "marginal product of capital" question. If a capital good is measured in physical units, it's not crazy to imagine diminishing marginal returns. If some line of work needs tools, equipment, a proper space, etc., to be carried out, then the first crude tools and the shack which allow it to get started increase output immensely, then having a bit more equipment and a decent space helps, and after a certain point one extra spanner or crucible, with no extra worker, does very little. (Not crazy, but also not obviously true: see the work of Richard A. Miller [i, ii], which I learned of from Seth Ackerman's piece on Piketty.) Some critics of Piketty's forecasts point to this, to argue that his vision of widening inequality will fail on these grounds. They equate the rate of rate on capital, Piketty's $ r $, with the marginal product of capital, and, believing the latter must decline, think $ r $ must shrink as well. We thus have the curious spectacle of apostles of capitalism claiming it will be saved by a falling rate of profit. (I believe Uncle Karl would have savored the irony.) This intuition, however, is based on physical units of capital --- spanners, crucibles, servers, square meters of buildings. What about in monetary units?

Well, what price would you, as a sensible capitalist, pay for a marginal increase in your supply of some capital good? Its value to you is the present value of the increased future production that makes possible. (One buys a stock of capital and receives a flow of product.) A $1 marginal increase in the capital stock has to produce at least $1 PV in extra production. If it augmented the NPV of production by more than $1, you'd be happy to buy it, but the price of that same physical capital good would then presumably be bid up by others. (Or, alternately, if not bid up, you would then buy another $1 worth of capital, until the diminishing returns of physical capital set in.) At equilibrium, a marginal extra dollar of capital should always, for every enterprise, increase the PV of production by $1. Under the simplest assumption that the extra product is constant over time, this means a marginal $1 of capital should increase production in each time period by $ $ \delta $, the discount rate. (Again, we're using monetary and not physical units for output. Also, I neglect small complications from depreciation and the like.) In symbols, $ r = \partial Y/\partial K = \delta $. ($ K $ has units of money, and $ Y $ of money per unit time, so the partial derivative has units of inverse time, as $ \delta $ should.) It is surely not obvious that the discount rate should fall as capital accumulates.

Expressed in other terms, the elasticity of substitution between capital and labor thus ends up being the elasticity of the marginal product of labor ($ \partial Y/\partial L $) with respect to the ratio of capital to labor ($ K/L $). Again, this may or may not fall as $ K $ increases, but I don't see how diminishing returns to physical capital guarantees this.

However, the fact that the measured real rate of return on capital (which Piketty puts at roughly 5% over all periods and countries) is so much higher than any plausible discount rate suggests that the whole enterprise of trying to relate returns on capital to marginal products is ill-conceived. Indeed, Piketty rightly says as much, and his claim that $ r > g $ is just an empirical regularity, true for most but not all of his data. So it's clearly not immutable, and indeed his policy proposal of a progressive tax on capital is designed to change it!

Charles Stross, The Rhesus Chart

Mind candy. Of course this is what would happen if some City quants happened to find themselves turning into vampires...

Susan A. Ambrose, Michael W. Bridges, Michele DiPietro, Marsha C. Lovett and Marie K. Norman, How Learning Works: Seven Research-Based Principles for Smart Teaching

An excellent guide to what psychological research has to say about making college-level teaching more effective --- that is, helping our students understand what we want them to learn, retain it, and use it and make it their own. I'd already been following some of the recommendations, but I am going to consciously try to do more, especially when it comes to scaffolding and giving rapid, targeted feedback. Following through on everything here would be a pretty daunting amount of work...

Disclaimer: Four of the authors worked at CMU when the book was published, and one is the spouse of a friend.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Commit a Social Science; Minds, Brains, and Neurons; The Beloved Republic; The Dismal Science; Corrupting the Young; The Commonwealth of Letters

Posted by crshalizi at July 31, 2014 23:59 | permanent link

July 11, 2014

A Statement from the Editorial Board of the Journal of Evidence-Based Haruspicy

Attention conservation notice: Leaden academic sarcasm about methodology.

The following statement was adopted unanimously by the editorial board of the journal, and reproduced here in full:

We wish to endorse, in its entirety and without reservation, the recent essay "On the Emptiness of Failed Replications" by Jason Mitchell. In Prof. Mitchell's field, scientists attempt to detect subtle patterns of association between faint environmental cues and measured behaviors, or to relate remote proxies for neural activity to differences in stimuli or psychological constructs. We are entirely persuaded by his arguments that the experimental procedures needed in these fields are so delicate and so tacit that failures to replicate published findings must indicate incompetence on the part of the replicators, rather than the original report being due to improper experimental technique or statistical fluctuations. While the specific obstacles to transmitting experimental procedures for social priming or functional magnetic resonance imaging are not the same as those for reading the future from the conformation and coloration of the liver of a sacrificed sheep, goat, or other bovid, we see no reason why Prof. Mitchell's arguments are not at least as applicable to the latter as to the former. Instructions to referees for JEBH will accordingly be modified to enjoin them to treat reports of failures to replicate published findings as "without scientific value", starting immediately. We hope by these means to ensure that the field of haruspicy, and perhaps even all of the mantic sciences, is spared the painful and unprofitable controversies over replication which have so distracted our colleagues in psychology.

Questions about this policy should be directed to the editors; I'm just the messenger here.

Manual trackback: Equitablog; Pete Warden

Learned Folly; Modest Proposals

Posted by crshalizi at July 11, 2014 15:50 | permanent link

July 06, 2014

Accumulated Bookchat

Attention conservation notice: I have no taste, and I am about to recommend a lot of books.

Somehow, I've not posted anything about what I've been reading since September. So: have October, November, December, January, February, March, April, May, and June.

Posted by crshalizi at July 06, 2014 17:26 | permanent link

June 30, 2014

Books to Read While the Algae Grow in Your Fur, June 2014

Attention conservation notice: I have no taste.

Plato, The Republic: I had a teacher in junior high who had the good idea, when I was bored, of making me read philosophers and political writers he thought I'd violently disagree with, and forcing me to explain why I thought they were wrong. The ones which stuck with me were Ayn Rand and Plato. I did indeed disagree furiously with both of them (I'd already imprinted on orcs), but they became part of the, as it were, invisible jury in my head I run things by.; Reading Drury on Strauss (below) drove me back to the Republic. (You couldn't pay me enough to revisit Rand.) As a grown-up, I find it such a deeply strange book as to sympathize with Strauss's position that it couldn't possibly be taken at face value.; For instance: the idea that justice is doing good to friends but bad to enemies is proposed in I 332d, and then rejected with downright sophistry. But it's then revived as a desideratum for the guardians (II 375), and argued to be psychologically realizable because pure-bred dogs show "love of learning and love of wisdom" (II 376).; Or again: the whole point of the book is supposedly to figure out what justice is. The ideal city was spun out because it's supposed to be easier to figure out what makes a just city than a just person. (No reason is given for why the justice of the just city has to resemble the justice of the just person any more than the beauty of a beautiful sunrise has to resemble the beauty of a beautiful poem.) Plato's answer is that the justice of the ideal city consists of the members of each class sticking to their duties and not getting above their station (IV 433). Socrates supposedly reaches this by a process of elimination, all the other features of city having been identified with other virtues (IV 428--432). I won't say that this is the worst train of reasoning ever (I've graded undergraduates), but how did it ever persuade anyone?; The whole thing is like that: a tissue of weak analogies, arbitrary assertions, eugenic numerology, and outright myths. Whatever you think about Plato's conclusions, there's hardly any rational argument for those conclusions to engage with. And yet this is the foundation-work of the western (as in, west-of-China) intellectual tradition which prizes itself on, precisely, devotion to reason!; Given how much better Plato could argue in works like Euthyphro and Meno, how moving the Apology is, how other dialogues show actual dialogue, etc., I am led to wonder whether our civilization has not managed to canonize one of the oldest surviving attacks of the brain eater.; ObLinkage: Jo Walton reviewing it as though it were SF.; Update: John Emerson on Plato.
Christopher Moore and Ian Corson with Jennyson Rosero, The Griff
Ted Naifeh, Courtney Crumrin and the Night Things
Nick Spencer and Joe Eisma, Morning Glories: For a Better Future
Brian K. Vaughan et al. Runaways, 2: Teenage Wasteland and 3: The Good Die Young: Comic book mind candy, assorted.
Shamini Flint, A Bali Conspiracy Most Foul: Mind candy. The intersection of dissipated ex-pat life with terrorism. (Previously.)
John Layman and Rob Guillory, Chew (3, 4, 5, 6, 7, 8): Comic-book mind candy (forgive the pun). I'm not sure what further food-related weirdness there is for them to pull, but I look forward to finding out. (Previously: 1, 2.)
Shadia B. Drury, The Political Ideas of Leo Strauss: Convincing portrait of Strauss as someone who was basically Nietzschean, and who projected his own views back on to admired figures from the past by the device of claiming they engaged in "esoteric writing". The esoteric doctrine is that the definition of justice given and then (to exoteric eyes) rejected at the beginning of The Republic, namely helping one's friends and hurting one's enemies, is in fact right, because there is really no basis for justice or morality beyond force and fraud. When Plato's Socrates seems to say that even bandits must be just to each other in order to prey effectively on others, what Plato really means is that this is all justice is. (In other words, Thrasymachus is right.) Hedonism is also true, and the only real good is pleasure in this world. Despite this, there are higher and lower types of humanity; the highest types are the philosophers, the tiny elite able to take pleasure from contemplating the Cosmic Null and/or fabricating new values. Political society exists for their sake. If most people realized the truth, political society would fall apart, so they need to be thoroughly soaked in the illusions of morality, virtue, afterlives, personal divinities, etc. Philosophers must on no account teach the truth in such a way that the masses can pick up on it. For these purposes, "the masses" including most rulers, who should be just as much ideological dupes as any servant. Basically every philosopher in the Greek tradition and its descendants, from the British Isles to Khurasan, had this same esoteric teaching, whatever the differences in their exoteric teachings. The rot set in when people like Machiavelli and Hobbes began to give the game away, and look where we are now.; Drury makes no attempt to evaluate Strauss as a historian of philosophy (but cf.). She confines criticism of his ideas to her last chapter, where she suggests that people who believe this sort of thing are not going to be fun to live around, or have in your government. Strauss's on modes of interpretation (heavy on numerology and inversions of meaning) are left undeployed. Mostly, it's just an attempt to say plainly, based on Strauss's actual texts, what he says obscurely and circuitously. At that point, criticism becomes almost superfluous.; Side-notes and speculations:; 1. Drury presumes that Strauss gave his story of the Platonic tradition of political philosophy, and its degeneration via Machiavelli and Hobbes into mere modernity, as sincere (if between-the-lines) account of what happened. This would make it a remarkably influential piece of psychoceramica, and Strauss a sort of superior (because genuinely erudite) Mencius Moldbug. After reading her, however, I wonder if it wasn't a deliberate myth, told in indifference to the facts but with an eye on its effects on his students, or perhaps their students.; 2. It's interesting to imagine what Strauss or Straussians would've made of evolutionary game theory. On the one hand, being so explicit that the "pro-social behavior" means cooperating to prey on others might count as decadent modernity. On the other hand, math is arguably even better than esoteric writing for keeping the doctrine from the multitude, so it might be acceptable as "political philosophy".; 3. It is true that there's a puzzle in interpreting The Republic: the arguments against Thrasymachus are horribly bad. After Thrasymachus is given a chance to state his views, Socrates tries to refute them with a series of incredibly weak analogies, and shouldn't have convinced anyone. (The counter-analogy of the shepherd is much stronger than any of Socrates's.) Then Thrasymachus shuts up in a huff, and Glaucon re-phrases a very similar position in more social-contract or tit-for-tat terms (recently illustrated by John Holbo). Socrates's response is to change the subject to the ideal city. Since Plato could certainly argue much more logically, why didn't he? (ETA: See above.)
Europa Report: I appreciate the effort at making a hard-SF movie. But: how would a private company make money sending an expedition to Europa? More importantly (ROT-13'd for spoilers), ubj bsgra qbrf fbzrguvat ynaq ba Rhebcn, gb cebivqr na rpbybtvpny avpur sbe gur perngher jr frr?
Tim Seeley and Mike Norton, Revival: 1, You're Among Friends; 2, Live Like You Mean It; 3, A Faraway Place: Comic book mind candy. It's just a little resurrection of the dead, barely worth bothering over...

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Philosophy; The Running-Dogs of Reaction; Writing for Antiquity; Pleasures of Detection, Portraits of Crime; Learned Folly

Posted by crshalizi at June 30, 2014 23:59 | permanent link

June 22, 2014

Notes on "Collective Stability in Structured Prediction: Generalization from One Example" (or: Small Pieces, Loosely Joined)

\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\zprime}{z^{\prime}} \newcommand{\Zprime}{Z^{\prime}} \newcommand{\Eta}{H} \newcommand{\equdist}{\stackrel{d}{=}} \newcommand{\indep}{\mathrel{\perp\llap{\perp}}} \]

Attention conservation notice: 2700+ words, expounding a mathematical paper on statistical learning theory. Largely written months ago, posted now in default of actual content.

For the CMU statistical learning theory reading group, I decided to present this:

Ben London and Bert Huang and Benjamin Taskar and Lise Getoor, "Collective Stability in Structured Prediction: Generalization from One Example", in Sanjoy Dasgupta and David McAllester (eds.), Proceedings of the 30th International Conference on Machine Learning [ICML-13] (2013): 828--836: Abstract: Structured predictors enable joint inference over multiple interdependent output variables. These models are often trained on a small number of examples with large internal structure. Existing distribution-free generalization bounds do not guarantee generalization in this setting, though this contradicts a large body of empirical evidence from computer vision, natural language processing, social networks and other fields. In this paper, we identify a set of natural conditions — weak dependence, hypothesis complexity and a new measure, collective stability — that are sufficient for generalization from even a single example, without imposing an explicit generative model of the data. We then demonstrate that the complexity and stability conditions are satisfied by a broad class of models, including marginal inference in templated graphical models. We thus obtain uniform convergence rates that can decrease significantly faster than previous bounds, particularly when each structured example is sufficiently large and the number of training examples is constant, even one.

The question being grappled with here is how we can learn from one example, really from one realization of a stochastic process. Our usual approach in statistics and machine learning is to assume we have many, independent examples from the same source. It seems very odd to say that if we see a single big, internally-dependent example, we're as much in the dark about the data source and its patterns as if we'd observed a single one-dimensional measurement, but that's really all a lot of our theory can do for us. Since we know that animals and machines often can successfully learn generalizable patterns from single realizations, there needs to be some explanation of how the trick is turned...

This paper is thus relevant to my interests in dependent learning, time series and spatio-temporal data, and networks. I read it when it first came out, but I wasn't at all convinced that I'd really understood it, which was why I volunteered to present this. Given this, I skipped sections 6 and 7, which specialize from pretty general learning theory to certain kinds of graphical models. It's valuable to show that the assumptions of the general theory can be realized, and by a non-trivial class of models at that, but they're not really my bag.

At a very high level, the strategy used to prove a generalization-error bound here is fairly familiar in learning theory. Start by establishing a deviation inequality for a single well-behaved function. Then prove that the functions are "stable", in the sense that small changes to their inputs can't alter their outputs too much. The combination of point-wise deviation bounds and stability then yields concentration bounds which hold uniformly over all functions. The innovations are in how this is all made to work when we see one realization of a dependent process.

Weak Dependence and a Point-wise Deviation Bound

The data here is an $n$-dimensional vector of random variables, $Z = (Z_1, Z_2, \ldots Z_n)$ is an $n$-dimensional object. N.B., $n$ here is NOT number of samples, but the dimensionality of our one example. (I might have preferred something like $p$ here personally.) We do not assume that the $Z_i$ are independent, Markov, exchangeable, stationary, etc., just that $Z$ obeys some stochastic process or other.

We are interested in functions of the whole of $Z$, $g(Z)$. We're going to assume that they have a "bounded difference" property: that if $z$ and $\zprime$ are two realizations of $Z$, which differ in only a single coordinate, then $|g(z) - g(\zprime)| \leq c/n$ for some $c$ which doesn't care about which constant we perturb.

With this assumption, if the $Z_i$ were IID, the ordinary (McDiarmid) bounded differences inequality would say \[ \Prob{g(Z) - \Expect{g(Z)} \geq \epsilon} \leq \exp{\left\{ -\frac{2n\epsilon^2}{c^2} \right\} } \]

This sort of deviation inequality is the bread-and-butter of IID learning theory, but now we need to make it work under dependence. This needs a probabilistic assumption: changing one coordinate alone can't change the function $f$ too much, but it mustn't also imply changes to many other coordinates.

The way London et al. quantify this is to use the $\eta$-dependence coefficients introduced by Aryeh "Absolutely Regular" Kontorovich. Specifically, pick some ordering of the $Z_i$ variables. Then the $\eta$-dependence between positions $i$ and $j$ is \[ \eta_{ij} = \sup_{z_{1:i-1}, z_i, \zprime_i}{{\left\|P\left(Z_{j:n}\middle| Z_{1:i-1}= z_{1:i-1}, Z_i = z_i\right) - P\left(Z_{j:n}\middle| Z_{1:i-1}= z_{1:i-1}, Z_i = \zprime_i\right) \right\|}_{TV}} \] I imagine that if you are Aryeh, this is transparent, but the rest of us need to take it apart to see how it works...

Fix $z_{1:i-1}$ for the moment. Then the expression above would say how much can changing $Z_i$ matter for what happens from $j$ onwards; we might call it how much influence $Z_i$ has, in the context $z_{1:i-1}$. Taking the supremum over $z_{1:i-1}$ shows how much influence $Z_i$ could have, if we set things up just right.

Now, for book-keeping, set $\theta_{ij} = \eta_{ij}$ if $i < j$, $=1$ if $i=j$, and $0$ if $i > j$. This lets us say that $\sum_{j=1}^{n}{\theta_{ij}}$ is (roughly) how much influence $Z_i$ could exert over the whole future.

Since we have no reason to pick out a particular $Z_i$, we ask how influential the most influential $Z_i$ could get: \[ \|\Theta_n\|_{\infty} = \max_{i\in 1:n}{\sum_{j=1}^{n}{\theta_{ij}}} \] Because this quantity is important and keeps coming up, while the matrix of $\theta$'s doesn't, I will depart from the paper's notation and give it an abbreviated name, $\Eta_n$.

Now we have the tools to assert Theorem 1 of London et al., which is (as they say) essentially Theorem 1.1 of Kontorovich and Ramanan:

Theorem 1: Suppose that $g$ is a real-valued function which has the bounded-differences property with constant $c/n$. Then \[ \Prob{g(Z) - \Expect{g(Z)} \geq \epsilon} \leq \exp{\left\{ -\frac{2n\epsilon^2}{c^2 \Eta_n^2} \right\} } \]

That is, the effective sample size is $n/\Eta_n^2$, rather than $n$, because of the dependence between observations. (We have seen similar deflations of the number of effective observations before, when we looked at mixing, and even in the world's simplest ergodic theorem.) I emphasize that we are not assuming any Markov property/conditional independence for the observations, still less that $Z$ breaks up into independent chucks (as in an $m$-dependent sequence). We aren't even assuming a bound or a growth rate for $\Eta_n$. If $\Eta_n = O(1)$, then for each $i$, $\eta_{ij} \rightarrow 0$ as $j \rightarrow \infty$, and we have what Kontorovich and Ramanan call an $\eta$-mixing process. It is not clear whether this is stronger than, say, $\beta$-mixing. (Two nice questions, though tangential here, are whether $\beta$ mixing would be enough, and, if not whether our estimator of $\beta$-mixing be adapted to get $\eta_{ij}$ coefficients?)

To sum up, if we have just one function $f$ with the bounded-difference property, then we have a deviation inequality: we can bound how far below its mean it should be. Ultimately the functions we're going to be concerned with are the combinations of models with a loss function, so we want to control deviations for not just one model but for a whole model class...

Vectorized Functions and Collective Stability

In a lot of contexts with structured data, we might want to make a prediction (assign a label, take an action) for each component of $Z$. If $Z$ is an image, for instance, and we're doing image segmentation, we might want to say which segment each pixel is in. If $Z$ is text, we might want to assign each word to a part of speech. If $Z$ is a social network, we might want to categorize each node (or edge) in some way. We might also want to output probability distributions over categories, rather than making a hard choice of category. So we will now consider functions $f$ which map $Z$ to $\mathcal{Y}^n$, where $\mathcal{Y}$ is some suitable space of predictions or actions. In other words, our functions output vectors.

(In fact, at some points in the paper London et al. distinguish between the dimension of the data ($n$) and the dimension of the output vector ($N$). Their core theorems presume $n=N$, but I think one could maintain the distinction, just at some cost in notational complexity.)

Ordinarily, when people make stability arguments in learning theory, they have the stability of algorithms in mind: perturbing (or omitting) one data point should lead to only a small change in the algorithm's output. London et al., in contrast, are interested in the stability of hypotheses: small tweaks to $z$ should lead to only small changes in the vector $f(z)$.

Definition. A vector-valued function $f$ is collectively $\beta$-stable iff, when $z$ and $\zprime$ are off-by-one, then $\| f(z) - f(\zprime) \|_1 \leq \beta$. The function class $\mathcal{F}$ is uniformly collectively $\beta$-stable iff every $f \in \mathcal{F}$ is $\beta$-stable.

Now we need to de-vectorize our functions. (Remember, ultimately we're interested in the loss of models, so it would make sense to average their losses over all the dimensions over which we're making predictions.) For any $f$, set \[ \overline{f}(z) \equiv \frac{1}{n}\sum_{i=1}^{n}{f_i(z)} \]

(In what seems to me a truly unfortunate notational choice, London et al. wrote what I'm calling $\overline{f}(z)$ as $F(z)$, and wrote $\Expect{\overline{f}(Z)}$ as $\overline{F}$. I, and much of the reading-group audience, found this confusing, so I'm trying to streamline.)

Now notice that if $\mathcal{F}$ is uniformly $\beta$-stable, if we pick any $f$ in $\mathcal{F}$, its sample average $\overline{f}$ must obey the bounded difference property with constant $\beta/n$. So sample averages of collectively stable functions will obey the deviation bound in Theorem 1.

Stability of the Worst-Case Deviation

Can we extend this somehow into a concentration inequality, a deviation bound that holds uniformly over $\mathcal{F}$?

Let's look at the worst case deviation: \[ \Phi(z) = \sup_{f \in \mathcal{F}}{\Expect{\overline{f}(Z)} - \overline{f}(z)} \] (Note: Strictly speaking, $\Phi$ is also a function of $\mathcal{F}$ and $n$, but I am suppressing that in the notation. [The authors included the dependence on $\mathcal{F}$.])

To see why controlling $\Phi$ gives us concentration, start with the fact that, by the definition of $\Phi$, \[ \Expect{\overline{f}(Z)} - \overline{f}(Z) \leq + \Phi(Z) \] so \[ \Expect{\overline{f}(Z)} \leq \overline{f}(Z) + \Phi(Z) \] not just with almost-surely but always. If in turn $\Phi(Z) \leq \Expect{\Phi(Z)} + \epsilon$, at least with high probability, then we've got \[ \Expect{\overline{f}(Z)} \leq \overline{f}(Z) + \Expect{\Phi(Z)} + \epsilon \] with the same probability.

There are many ways one could try to show that $\Phi$ obeys a deviation inequality, but the one which suggests itself in this context is that of showing $\Phi$ has bounded differences. Pick any $z, \zprime$ which differ in just one coordinate. Then \begin{eqnarray*} \left|\Phi(z) - \Phi(\zprime)\right| & = & \left| \sup_{f\in\mathcal{F}}{\left\{ \Expect{\overline{f}(Z)} - \overline{f}(z)\right\}} - \sup_{f\in\mathcal{F}}{\left\{ \Expect{\overline{f}(Z)} - \overline{f}(\zprime)\right\}} \right|\\ & \leq & \left| \sup_{f \in \mathcal{F}}{ \Expect{\overline{f}(Z)} - \overline{f}(z) - \Expect{\overline{f}(Z)} + \overline{f}(\zprime)}\right| ~ \text{(supremum over differences is at least difference in suprema)}\\ & = & \left|\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{f_i(\zprime) - f_i(z)}}\right| \\ &\leq& \sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{|f_i(\zprime) - f_i(z)|}} ~ \text{(Jensen's inequality)}\\ & = & \frac{1}{n}\sup_{f\in \mathcal{F}}{\|f(\zprime) - f(z)\|_1} ~ \text{(definition of} \ \| \|_1) \\ & \leq & \frac{\beta}{n} ~ \text{(uniform collective stability)} \end{eqnarray*} Thus Theorem 1 applies to $\Phi$: \[ \Prob{\Expect{\Phi(Z)} - \Phi(Z) \geq \epsilon} \leq \exp{\left\{ -\frac{2n\epsilon^2}{\beta^2 \Eta_n^2} \right\}} \] Set the right-hand side to $\delta$ and solve for $\epsilon$: \[ \epsilon = \beta \Eta_n \sqrt{\frac{\log{1/\delta}}{2n}} \] Then we have, with probability at least $1-\delta$, \[ \Phi(Z) \leq \Expect{\Phi(Z)} + \beta \Eta_n \sqrt{\frac{\log{1/\delta}}{2n}} \] Hence, with the same probability, uniformly over $f \in \mathcal{F}$, \[ \Expect{\overline{f}(Z)} \leq \overline{f}(Z) + \Expect{\Phi(Z)} + \beta \Eta_n \sqrt{\frac{\log{1/\delta}}{2n}} \]

Rademacher Complexity

Our next step is to replace the expected supremum of the empirical process, $\Expect{\Phi(Z)}$, with something more tractable and familiar-looking. Really any bound on this could be used, but the authors provide a particularly nice one, in terms of the Rademacher complexity.

Recall how the Rademacher complexity works when we have a class $\mathcal{G}$ of scalar-valued function $g$ of an IID sequence $X_1, \ldots X_n$: it's \[ \mathcal{R}_n(\mathcal{G}) \equiv \Expect{\sup_{g\in\mathcal{G}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i g(X_i)}}} \] where we introduce the Rademacher random variables $\sigma_i$, which are $\pm 1$ with equal probability, independent of each other and of the $X_i$. Since the Rademacher variables are the binary equivalent of white noise, this measures how well our functions can seem to correlate with noise, and so how well they can seem to match any damn thing.

What the authors do in Definition 2 is adapt the definition of Rademacher complexity to their setting in the simplest possible way: \[ \mathcal{R}_n(\mathcal{F}) \equiv \Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i f_i(Z)}}} \] In the IID version of Rademacher complexity, each summand involves applying the same function ($g$) to a different random variable ($X_i$). Here, in contrast, each summand applies a different function ($f_i$) to the same random vector ($Z$). This second form can of course include the first as a special case.

Now we would like to relate the Rademacher complexity somehow to the expectation of $\Phi$. Let's take a closer look at the definition there: \[ \Expect{\Phi(Z)} = \Expect{\sup_{f\in\mathcal{F}}{\Expect{\overline{f}(Z)} - \overline{f}(Z)}} \] Let's introduce an independent copy of $Z$, say $\Zprime$, i.e., $Z \equdist \Zprime$, $Z\indep \Zprime$. (These are sometimes called "ghost samples".) Then of course $\Expect{\overline{f}(Z)} = \Expect{\overline{f}(\Zprime)}$, so \begin{eqnarray} \nonumber \Expect{\Phi(Z)} & = & \Expect{\sup_{f\in\mathcal{F}}{\Expect{\overline{f}(\Zprime)} - \overline{f}(Z)}} \\ \nonumber & \leq & \Expect{\sup_{f\in\mathcal{F}}{\overline{f}(\Zprime) - \overline{f}(Z)}} ~ \text{(Jensen's inequality again)}\\ \nonumber & = & \Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{f_i(\Zprime) - f_i(Z)}}}\\ & = & \Expect{\Expect{ \sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{f_i(\Zprime) - f_i(Z)}} \middle| \sigma}} ~ \text{(law of total expectation)} \label{eqn:phi-after-symmetrizing} \end{eqnarray} Look at the summands. No matter what $f_i$ might be, $f_i(\Zprime) - f_i(Z) \equdist f_i(Z) - f_i(\Zprime)$, because $Z$ and $\Zprime$ have the same distribution but are independent. Since multiplying something by $\sigma_i$ randomly flips its sign, this suggests we should be able to introduce $\sigma_i$ terms without changing anything. This is true, but it needs a bit of trickery, because of the (possible) dependence between the different summands. Following the authors, but simplifying the notation a bit, define \[ T_i = \left\{ \begin{array}{cc} Z & \sigma_i = +1\\ \Zprime & \sigma_i = -1 \end{array} \right. ~ , ~ T^{\prime}_i = \left\{ \begin{array}{cc} \Zprime & \sigma_i = +1 \\ Z & \sigma_i = -1 \end{array}\right. \] Now notice that if $\sigma_i = +1$, then \[ f_i(\Zprime) - f_i(Z) = \sigma_i(f_i(\Zprime) - f_i(Z)) = \sigma_i(f_i(T^{\prime}_i) - f_i(T_i)) \] On the other hand, if $\sigma_i = -1$, then \[ f_i(\Zprime) - f_i(Z) = \sigma_i(f_i(Z) - f_i(\Zprime)) = \sigma_i(f_i(T^{\prime}_i) - f_i(T_i)) \] Since $\sigma_i$ is either $+1$ or $-1$, we have \begin{equation} f_i(\Zprime) - f_i(Z) = \sigma_i(f_i(T^{\prime}_i) - f_i(T_i)) \label{eqn:symmetric-difference-in-terms-of-rad-vars} \end{equation} Substituting \eqref{eqn:symmetric-difference-in-terms-of-rad-vars} into \eqref{eqn:phi-after-symmetrizing} \begin{eqnarray*} \Expect{\Phi(Z)} & \leq & \Expect{\Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{f_i(\Zprime) - f_i(Z)}} \middle| \sigma}} \\ & = & \Expect{\Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i (f_i(T^{\prime}_i) - f_i(T_i))}} \middle | \sigma}} \\ & = & \Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i(f_i(\Zprime) - f_i(Z))}}}\\ & \leq & \Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i f_i(\Zprime)}} + \sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i f_i(Z)}}}\\ & = & 2\Expect{\sup_{f\in\mathcal{F}}{\frac{1}{n}\sum_{i=1}^{n}{\sigma_i f_i(Z)}}}\\ & = & 2\mathcal{R}_n(\mathcal{F}) \end{eqnarray*}

This is, I think, a very nice way to show that Rademacher complexity still controls over-fitting with dependent data. (This result in fact subsumes our result in arxiv:1106.0730, and London et al. have, I think, a more elegant proof.)

Collective Stability and Generalizing from One Big Example

Now we put everything together.

Suppose that $\mathcal{F}$ is uniformly collectively $\beta$-stable. Then with probability at least $1-\delta$, uniformly over $f \in \mathcal{F}$, \[ \Expect{\overline{f}(Z)} \leq \overline{f}(Z) + 2\mathcal{R}_n(\mathcal{F}) + \beta \Eta_n \sqrt{\frac{\log{1/\delta}}{2n}} \]

This is not quite the Theorem 2 of London et al., because they go through some additional steps to relate the collective stability of predictions to the collective stability of loss functions, but at this point I think the message is clear.

That message, as promised in the abstract, has three parts. The three conditions which are jointly sufficient to allow generalization from a single big, inter-dependent instance are:

An isolated change to one part of the instance doesn't change the predictions very much (collective stability, $\beta$ exists and is small);
Very distant parts of the instance are nearly independent ($\eta$-mixing, $\Eta_n = O(1)$); and
Our hypothesis class isn't so flexible it could seem to fit any damn thing (shrinking Rademacher complexity, $\mathcal{R}_n \rightarrow 0$).

I suspect this trio of conditions is not jointly necessary as well, but that's very much a topic for the future. I also have some thoughts about whether, with dependent data, we really want to control $\Expect{\overline{f}(Z)}$, or rather whether the goal shouldn't be something else, but that'll take another post.

Enigmas of Chance

Posted by crshalizi at June 22, 2014 10:54 | permanent link

May 31, 2014

Books to Read While the Algae Grow in Your Fur, May 2014

Attention conservation notice: I have no taste.

Robert Hughes, Rome: A Cultural, Visual, and Personal History: As the subtitle suggests, a bit of a grab-bag of Hughes talking about Rome, or Rome-related, subjects, seemingly as they caught his attention. Thus the chapter on ancient Rome contains a mix of recitals of the legends, archaeological findings, the military history of the Punic Wars (including a description of the corvus filled with what I recognize as school-boy enthusiasm), the rise of the Caesars — and then he gets to the art, especially the architecture, of the Augustan age, and takes off, before wandering back into political history (Diocletian--Constantine--Julian). The reader should, in other words, be prepared for a ramble.; Hughes is, unsurprisingly, at his best when talking about art. There is he knowledgeable, clear, sympathetic to a wide range of art but definitely unafraid of rendering judgment. If he doesn't always persuade (I remain completely immune to the charms of Baroque painting and sculpture), he definitely does his best to catalyze an appreciative reaction to the art in his reader, and one can hardly ask more of a critic; He's at his second best in the "personal" parts, conveying his impressions of Rome as he first found it in the early 1960s, and as he left it in the 2000s, to the detriment of the latter. (He's self-aware enough to reflect that some of that is the difference between being a young and an old man.) His ventures into the political and religious history of Rome are not as good — he has nothing new to say — but not bad.; Overall: no masterpiece, but always at least pleasant, and often informative and energizing.
R. A. Scotti, Basilica: The Splendor and the Scandal: Building St. Peter's: Mind candy: engaging-enough popular history, by a very-obviously Catholic writer. (My own first reaction to St. Peter's, on seeing it again for the first time as a grown-up, was that Luther had a point; my second and more charitable reaction was that there was an awesome space beneath the idolatrous and servile rubbish.)
Pacific Rim: Mind candy. While I like giant robots battling giant monsters, and I appreciate playing with elements of the epic (the warrior sulking in his tent; the catalog of ships), I'd have liked it better if the plot made more sense.
Sara Poole, Poisoner and The Borgia Betrayal: Mind candy: decent historical thrillers, though implausibly proto-feminist, philo-Semitic and proto-Enlightenment for the period.
Patrizia Castiglione, Massimo Falcioni, Annick Lesne and Angelo Vulpiani, Chaos and Coarse Graining in Statistical Mechanics: A good modern tour of key issues in what might be called the "Boltzmannian" tradition of work on the foundations of statistical mechanics, emphasizing the importance of understanding what happens in single, very large mechanical assemblages. Both "single" and "very large" here are important, and important by way of contrasts.; The emphasis on the dynamics of single assemblages contrasts with approaches (descending from Gibbs and from Jaynes) emphasizing "ensembles", or probability distributions over assemblages. (Ensembles are still used here, but as ways of simplifying calculations, not fundamental objects.) The entropy one wants to show (usually) grows over time is the Boltzmann entropy of the macrostate, not the Gibbs or Shannon entropy of the ensemble. (Thus studies of the dynamics of ensembles are, pace, e.g., Mackey, irrelevant to this question, whatever their other merits.) One wants to know that a typical microscopic trajectory will (usually) move the assemblage from a low-entropy (low-volume) macrostate to a high-volume macrostate, and moreover that once in the latter region, most trajectories that originated from the low-entropy macrostate will act like ones that began in the equilibrium macrostate. One wants, though I don't recall that they put it this way, a Markov property at the macroscopic level.; Randomizing behavior for macroscopic variables seems to require some amount of instability at the microscopic level, but not necessarily the very strict form of instability, of sensitive dependence on initial conditions, which we've come to call "chaos". Castiglione et al. present a very nice review of the definitions of chaos, the usual measures of chaos (Lyapunov exponents and Kolmogorov-Sinai entropy rate), and "finite-size" or non-asymptotic analogs, in the course of arguing that microscopic chaos is neither necessary nor sufficient for the applicability of statistical mechanics.; The single-assemblage viewpoint on statistical mechanics has often emphasized ergodicity, but Castiglione et al. down-play it. The ergodic property, as that has come to be understood in dynamical systems theory, is both too weak and too strong to really be useful. It's too weak because it doesn't say anything about how quickly time averages converge on expectations. (If it's too slow, it's irrelevant to short-lived creatures like us, but if it's too fast, we should never be able to observe non-equilibrium behavior!) It's too strong in that it applies to all integrable functions of the microscopic state, not just physically relevant ones.; The focus on large assemblages contrasts with low-dimensional dynamical systems. Here the authors closely follow the pioneering work of Khinchin, pointing out that if one has a non-interacting assemblage of particles and considers macroscopic observables which add up over molecules (e.g., total energy), one can prove that they are very close to their expectation values with very high probability. (This is a concentration of measure result, though the authors do not draw connections to that literature.) This still holds even when one relaxes non-interaction to weak, short-range interaction, and from strictly additive observables to ones where each microscopic degree of freedom has only a small influence on the total. (Again, familiar ground for concentration of measure.) This is a distinctly high-dimensional phenomenon, not found in low-dimensional systems even if very chaotic.; Putting these two ingredients — some form of randomizing local instability and high-dimensional concentration of measure — it becomes reasonable to think that something like statistical mechanics, in the microcanonical ensemble, will work. Moreover, for special systems one can actually rigorously prove results like Boltzmann's H-theorem in suitable asymptotic limits. An interesting large-deviations argument by De Roeck et al. suggests that there will generally be an H-theorem when (i) the macroscopic variables evolve autonomously in the large-assemblage limit, and (ii) microscopic phase-space volume is conserved. This conclusion is very congenial to the perspective of this book, but unfortunately the work of De Roeck et al. is not discussed.; One feature which pushes this book beyond being just a careful and judicious defense of the Boltzmannian viewpoint on statistical mechanics is its treatment of how, generally, one might obtain macroscopic dynamics from microscopic physics. This begins with an interesting discussion of multi-scale methods for differential equations, as an alternative to the usual series expansions of perturbation theory. This is then used to give a new-to-me perspective on renormalization, and why differences in microscopic dynamics wash out when it comes to aggregated, macroscopic variables. I found this material intriguing, but not as fully persuasive as the earlier parts.
Conor Fitzgerald, The Dogs of Rome: Mind candy. Police procedural with local color for Rome. (Having an American protagonist seems like a cheat.)
Colin Crouch, Making Capitalism Fit for Society: A plea for an "assertive" rather than a "defensive" social democracy, on the grounds that social democracy has nothing to be defensive about, and that the taming of capitalism is urgently necessary. That neo-liberalism has proved to be a combination of a sham and a disaster, I agree; that lots of his policy proposals, about combining a stronger safety net with more microeconomic flexibility, are both desirable and possible, I agree. But what he seems to skip over, completely, is where the power for this assertion will come from.; Disclaimer: I've never met Prof. Crouch, but he's a friend of a friend.
Elliott Kay, Poor Man's Fight: Mind candy science fiction. Does a good job of presenting the villains sympathetically, from inside their own minds. Also, props for having the hero's step-mother make a prediction of how joining the navy will distort the hero's character, for it coming true, and for the hero realizing it, to his great discomfort.; Sequel.

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; Writing for Antiquity; Tales of Our Ancestors; Physics; The Progressive Forces

Posted by crshalizi at May 31, 2014 23:59 | permanent link

April 30, 2014

Books to Read While the Algae Grow in Your Fur, April 2014

Attention conservation notice: I have no taste.

Chris Willrich, Scroll of Years: Mind candy fantasy. The blurbs are over the top, but it is fun and decidedly better-written than average. Mildly orientalist, though in a respectful mode.
Matthew Bogart, The Chairs' Hiatus
Kelly Sue DeConnick and Emma Rios, Pretty Deadly
Joe Harris, Great Pacific: 1, Trashed; 2, Nation Building
Brian K. Vaughan and Fiona Staples, Saga, vols. 2 and 3
L. Frank Weber, Bikini Cowboy
Terry LaBan, Muktuk Wolfsbreath, Hard Boiled Shaman: The Spirit of Boo: Comic book mind candy. Muktuk Wolfsbreath is especially notable for ethnographic accuracy, Pretty Deadly for the gorgeous art and genuinely-mythic weirdness, and Saga for general awesomeness. (Previously for Saga.)
Jeff VanderMeer, Annihilation: Mind candy, but incredible mind candy. The basic story is a familiar one for SF: an expedition into an unknown and hostile environment quickly goes spectacularly awry, as the explorers don't appreciate just how strange that environment really is. But from there it builds to a gripping story that combines science fiction about bizarre biology with genuinely creepy horror. It's Lovecraftian in the best sense, not because it uses the props of Cthulhiana, but because it gives the feeling of having encountered something truly, frighteningly alien. (In contrast.); There are two sequels coming out later this year; I've ordered both.
Adam Christopher, The Burning Dark: Mind candy: a haunted house story, with a space-opera setting. (Self-presentation.)
S. Frederick Starr, Lost Enlightenment: Central Asia's Golden Age from the Arab Conquest to Tamerlane: Starr has been a historian of Central Asia throughout his long professional career, and like many such, he feels that the region doesn't get enough respect in world history. This is very much an effort in rectifying that, along the way depicting medieval Central Asia as a center of the Hellenistic rationalism which he sees as being the seed of modern science and enlightenment. (It's pretty unashamedly whiggish history.); Starr's Central Asia is urban and mercantile. It should be understood as the historic network of towns in, very roughly, the basins of the Amu Darya and Syr Darya rivers, or Transoxiana plus Khorasan and Khwarezm. Starr argues that this region formed a fairly coherent cultural area from a very early period, characterized by intensive irrigation, the cultural and political dominance of urban elites, the importance of long-distance over-land trade (famously but not exclusively, the Silk Road), and so cross-exposure to ideas and religions developed in the better-known civilizations of the Mediterranean, the Fertile Crescent, Iran, India and China. One consequence of this, he suggests, was an interest in systematizing these traditions, e.g., compiling versions of the Buddhist canon.; With the coming of Islam, which he depicts as a very drawn-out process, some of these same traditions led to directions like compiling hadith. Beyond this, the coming of Islam exposed local intellectuals both to Mulsim religious concepts, to the works of Greek science and philosophy, and to Indian mathematics and science. (He gives a lot more emphasis to the Arab and Greek contributions than the Indian.) In his telling, it was the tension between these which led to the great contributions of the great figures of medieval Islamic intellectual history. Starr is at pains to claim as many of these figures for Central Asia as possible, whether by where they lived and worked, where their families were from, where they trained, or sometimes where their teachers were from. [0] He even, with some justice, depicts the rise of the 'Abbasid dynasty as a conquest of Islam by Khurasan.; Much of the book is accordingly devoted to the history of mathematics, natural science, philosophy, theology, and belles lettres in Central Asia, with glances at the fine arts (especially painting and architecture) and politics (especially royal patronage). This largely takes the form of capsule biographies of the most important scholars, and sketches of the cities in which they worked. These seem generally reliable, though there are some grounds for worry. One is that I can't tell whether Starr is just awkward at explaining what mathematicians did, or whether he doesn't understand it and is garbling his sources. The other is that there are places where he definitely over-reaches in claiming influence [1]. Even deducting for these exaggerations and defects, Starr makes a sound case that there was a long period of time --- as he says, from the Arab conquests to the coming of the Timurids --- when Central Asia was the home to much of the best intellectual activity of the old world. That this amounted to an "age of Enlightenment" comparable to 17th and 18th century Europe seems another over-fond exaggeration.; What Starr would have liked to produce is something as definitive, and as revelatory, as Joseph Needham's Science and Civilisation in China. (He's pretty up front about this.) He knows that he hasn't gotten there. He can't be blamed for this: even for so extraordinary a figure as Needham, it was the work of a lifetime, backed by a massive team. Still, one can hope that his book will help make such an effort more likely. In the meanwhile, it's a decently-written and mostly-accurate popular history about a time and place which were once quite important, and have since faded into undeserved obscurity.; What the book is doing with blurbs from various reactionary foreign-affairs pundits, up to and including Henry Kissinger, I couldn't say, though I have suspicions.; 0: He also feels it necessary to make the elementary point that writing in Arabic didn't make these men "Arabs", any more than writing in Latin made contemporary European scholars "Romans". I will trust his judgment that there are still people who need to hear this.; 1: E.g., on p. 421, it's baldly asserted that Hume found Ghazali's arguments against causality "congenial". Now, the similarity between the two men's arguments have often been pointed out, and the relevant book of Ghazali's, The Incoherence of the Philosophers, was known to the Scholastics in Latin translation. It's conceivable that Hume encountered a copy he could have read. Nonetheless, Ghazali's name does not appear, in any romanization, in Hume's Treatise of Hume Nature, Enquiry Concerning Human Understanding, Enquiry Concerning the Principles of Morals, Dialogues Concerning Natural Religion, or Essays, Moral, Political, and Literary. (I have not searched Hume's complete works.) No other writer on either philosopher, that I am aware of, suggests either a direct influence or even the transmission of a tradition, as opposed to a re-invention, and Starr provides no supporting citation or original evidence.
Arkady and Boris Strugatsky (trans. Olena Bormashenko), Roadside Picnic: Mind candy, at the edge of being something greater. Humanity is visited by ridiculously advanced aliens, who leave behind artifacts which we understand no more than ants could comprehend the relics of the titular picnic. Owing to human greed, stupidity, and (it must be said) capitalism, this goes even worse for us than it would for the ants.
M. John Harrison, Nova Swing: Mind candy: noir science fiction, owing a massive debt to Roadside Picnic.
Elizabeth Bear, Steles of the Sky: Conclusion to the trilogy begun in Range of Ghosts and Shattered Pillars. It is, to my mind, magnificent; all the promise of the earlier books is fulfilled.; ObLinkage: Astute comments by Henry Farrell.
Felix Gilman, The Revolutions: Historical fantasy set in Edwardian London, and the outer spheres of the solar system, featuring under-employed young people with literary ambitions, dueling occult societies, interplanetary romances, and distributed Chinese rooms.
Gene Wolfe, The Claw of the Conciliator: My comments on The Shadow of the Torturer apply with even greater force.
Darwyn Cooke, Parker (1, 2, 3, 4): Mind candy: comic book versions of the classic crime novels by Richard Stark. The pictures are a nice complement to the high-energy stories about characters with no morally redeeming qualities whatsoever.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Afghanistan and Central Asia; Islam; Philosophy; Writing for Antiquity

Posted by crshalizi at April 30, 2014 23:59 | permanent link

April 01, 2014

"Proper scoring rules and linear estimating equations in exponential families" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) care about estimating complicated statistical models, and (2) will be in Pittsburgh on Monday.

Steffen Lauritzen, "Proper scoring rules and linear estimating equations in exponential families": Abstract: In models of high complexity, the computational burden involved in calculating the maximum likelihood estimator can be forbidding. Proper scoring rules (Brier 1950, Good 1952, Bregman 1967, de Finetti 1975) such as the logarithmic score, the Brier score, and others, induce natural unbiased estimating equations that generally lead to consistent estimation of unknown parameters. The logarithmic score corresponds to maximum likelihood estimation whereas a score function introduced by Hyvärinen (2005) leads to linear estimation equations for exponential families.; We shall briefly review the facts about proper scoring rules and their associated divergences, entropy measures, and estimating equations. We show how Hyvärinen's rule leads to particularly simple estimating equations for Gaussian graphical models, including Gaussian graphical models with symmetry.; The lecture is based on joint work with Philip Dawid, Matthew Parry, and Peter Forbes. For a recent reference see: P. G. M. Forbes and S. Lauritzen (2013), "Linear Estimating Equations for Exponential Families with Application to Gaussian Linear Concentration Models", arXiv:1311.0662.; Time and place: 4--5 pm on Monday, 7 April 2014, in 125 Scaife Hall.

Much of what I know about graphical models I learned from Prof. Lauritzen's book. His work on sufficienct statistics and extremal models, and their connections to symmetry and prediction, has shaped how I think about big chunks of statistics, including stochastic processes and networks. I am really looking forward to this.

(To add some commentary purely of my own: I sometimes encounter the idea that frequentist statistics is somehow completely committed to maximum likelihood, and has nothing to offer when that fails, as it sometimes does [1]. While I can't of course speak for every frequentist statistician, this seems silly. Frequentism is a family of ideas about when probability makes sense, and it leads to some ideas about how to evaluate statistical models and methods, namely, by their error properties. What justifies maximum likelihood estimation, from this perspective, is not the intrinsic inalienable rightness of taking that function and making it big. Rather, it's that in many situations maximum likelihood converges to the right answer (consistency), and in a somewhat narrower range will converge as fast as anything else (efficiency). When those fail, so much the worse for maximum likelihood; use something else that is consistent. In situations where maximizing the likelihood has nice mathematical properties but is computationally intractable, so much the worse for maximum likelihood; use something else that's consistent and tractable. Estimation by minimizing a well-behaved objective function has many nice features, so when we give up on likelihood it's reasonable to try minimizing some other proper scoring function, but again, there's nothing which says we must.)

[1]: It's not worth my time today to link to particular examples; I'll just say that from my own reading and conversation, this opinion is not totally confined to the kind of website which proves that rule 34 applies even to Bayes's theorem. ^

Posted by crshalizi at April 01, 2014 10:45 | permanent link

March 31, 2014

Books to Read While the Algae Grow in Your Fur, March 2014

Attention conservation notice: I have no taste.

Brian Michael Bendis, Michael Avon Oeming, et al. Powers (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14): Comic book mind candy, at the super-hero/police-procedural interface.
Martha Wells, The Fall of Ile-Rien (The Wizard Hunters, The Ships of Air, The Gate of Gods): Mind candy. I read these when they were first published, a decade ago, and re-visited them when they came out as audiobooks. They remain unusually smart fantasy with refreshingly human protagonists. (Though I think the narrator for the recordings paused too often, in odd places, to be really effective.)
Graydon Saunders, The March North: Mind candy. Readers know I am a huge fan of science fiction and fantasy (look around you), but there's no denying both genres have a big soft spot for authoritarianism and feudalism. I picked this up because it promised to be an egalitarian fantasy novel, and because (back in the Late Bronze Age of Internet time) I used to like Saunders's posts on rec.arts.sf.written . I am glad I did: it's not the best-written novel ever, but it's more than competent, it scratches the genre itch, and Saunders has thought through how people-power could work in a world where part of the normal life cycle of a wizard would ordinarily be ruling as a Dark Lord for centuries. (I suspect his solution owes something to the pike square.) The set-up calls out for sequels, which I would eat up with a spoon.
Marie Brennan, The Tropic of Serpents: A Memoir by Lady Trent: Mind candy: further adventures in the natural history of dragons.
Franklin M. Fisher, Disequilibrium Foundations of Equilibrium Economics: This is a detailed, clear and innovative treatment of what was known in 1983 about the stability of economic equilibrium. (It begins with an argument, which I find entirely convincing, that this is an important question, especially for economists who only want to reason about equilibria.) The last half of the book is a very detailed treatment of a dis-equilibrium model of interacting rational agents, and the conditions under which it will approach a Walrasian (price-supported) equilibrium. (These conditions involve non-zero transaction costs, each agent setting its own prices, and something Fisher calls "No Favorable Surprise", the idea that unexpected changes never make things better.) Remarkably, Fisher's model recovers such obvious features of the real world as (i) money existing and being useful, and (ii) agents continue to trade for as long as they live, rather than going through a spurt of exchange at the dawn of time and then never trading again. It's a tour de force, especially because of the clarity of the writing. I wish I'd read it long ago.; Fisher has a 2010 paper, reflecting on the state of the art a few years ago: "The Stability of General Equilibrium --- What Do We Know and Why Is It Important?".; — One disappointment with this approach: Fisher doesn't consider the possibility that aggregated variables might be in equilibrium, even though individuals are not in "personal equilibrium". E.g., prevailing prices and quantities are stable around some equilibrium values (up to small fluctuations), even though each individual is constantly perpetually alternating between seizing on arbitrage opportunities and being frustrated in their plans. This is more or less the gap that is being filled by Herb Gintis and collaborators in their recent work (1, 2, 3, 4). Gintis et al. also emphasize the importance of agents setting their own prices, rather than having a centralized auctioneer decree a single binding price vector.
The Borgias (1, 2, 3): Mind candy. Fun in its own way, but I'm disappointed that they made the politics less treacherous and back-stabbing than it actually was. (Also, there's a chunk of anachronistic yearning for equal rights and repulsion over corruption.); ObLinkage: Comparative appreciation by a historian of the Renaissance. (I haven't seen the rival series.)
House of Cards: Mind candy. Rather to my surprise, I enjoyed this at least as much as the original. (I will not comment on the economics of the second season.)
Homeland (1, 2, 3): Mind candy. Well-acted, but I find the politics very dubious. In particular, "let the CIA do whatever it wants in Iran" is a big part of how we got into this mess.
Dexter (5, 6, 7 and 8): Mind candy. Not as good as the first four seasons, but then, very little is. I refuse to believe the ending. Previously: 2, 3, 4.
Allie Brosh, Hyperbole and a Half: Unfortunate Situations, Flawed Coping Mechanisms, Mayhem, and Other Things That Happened: As a sentient being with a working Internet connection, you are aware that Brosh is one of this age's greatest writers on moral psychology (and dogs). This is a collection of some of her best pieces.
John Dollard, Caste and Class in a Southern Town: The easy way to read this ethnography of "Southerntown" (= Indianola, Miss.) in 1936 would be as a horror story, with a complacent "thankfully we're not like that" air. (At least, it would be easy for me to read it so.) Better, I think, to realize both how horrifying this was, and to reflect on what injustices are similarly woven into my own way of life...
Seanan McGuire, Half-Off Ragnarok: Mind candy: the travails of a mild-manner Ohio cryptozoologist. (I think it could be enjoyed without the earlier books in the series.) Previously: 1, 2.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; The Beloved Republic; Commit a Social Science; The Dismal Science; Linkage

Posted by crshalizi at March 31, 2014 23:59 | permanent link

March 15, 2014

"The Neglect of Fluctuations in the Thermodynamics of Computation" (Next Week at the Energy and Information Seminar)

Lo these many years ago, I blogged about how a paper of John Norton's had led me to have doubts about Landauer's Principle. Prof. Norton has continued to work on this topic, and I am very happy to share the news about his upcoming talk at CMU's "Energy and Information" seminar:

John D. Norton, "The Neglect of Fluctuations in the Thermodynamics of Computation": Abstract: The thermodynamics of computation assumes that thermodynamically reversible processes can be realized arbitrarily closely at molecular scales. They cannot. Overcoming fluctuations so that a molecular scale process can be completed creates more thermodynamic entropy than the small quantities tracked by Landauer's Principle. This no go result is the latest instance of a rich history of problems posed by fluctuations for thermodynamics.; Time and place: Noon--1 pm on Wednesday, 19 March 2014, in room D-210, Hamerschlag Hall.; Related papers: "All Shook Up: Fluctuations, Maxwell's Demon and the Thermodynamics of Computation", Entropy 15 (2013): 4432--4483; "The End of the Thermodynamics of Computation: A No-Go Result", Philosophy of Science 80 (2013): 1182--1192

(For the record, I remain of at least two minds about Landauer's principle. The positive arguments for it seem either special cases or circular, but the conclusion makes so much sense...)

Manual trackback / update, 1 April 2014: Eric Drexler's Metamodern, who objects that "the stages of computation themselves need not be in equilibrium with one another, and hence subject to back-and-forth fluctuations" (his italics). In particular, Drexler suggests introducing an external time-varying potential that "can carry a system deterministically through a series of stages while the system remains at nearly perfect thermodynamic equilibrium at each stage". But I think this means that the whole set-up is not in equilibrium, and in fact this proposal seems quite compatible with sec. 2.2 of Norton's "No-Go" paper. Norton's agrees that "there is no obstacle to introducing a slight disequilibrium in a macroscopic system in order to nudge a thermodynamically reversible process to completion"; his claim is that the magnitude of the required disequilibria, measured in terms of free energy, are large compared to Landauer's bound. The point is not that it's impossible to build molecular-scale computers (which would be absurd), but that they will have to dissipate much more heat than Landauer suggests. I won't pretend this settles the matter, but I do have a lecture to prepare...

Physics

Posted by crshalizi at March 15, 2014 11:25 | permanent link

March 04, 2014

"Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

Attention conservation notice: Late notice of an academic talk in Pittsburgh. Only of interest if you care about the places where the kind of statistical theory that leans on concepts like "the graphical Markov property" merges with the kind of analytical metaphysics which tries to count the number of possibly fat men not currently standing in my doorway.

A great division in the field of causal inference in statistics is between those who like to think of everything in terms of "potential outcomes", and those who like to think of everything in terms graphical models. More exactly, while partisans of potential outcomes tend to denigrate graphical models (*), those of us who like the latter tend to presume that potential outcomes can be read off from graphs, and hope someone will get around to showing some sort of formal equivalence.

That somebody appears to have arrived.

Thomas S. Richardson, "Unifying the Counterfactual and Graphical Approaches to Causality via Single World Intervention Graphs (SWIGs)": Abstract: Models based on potential outcomes, also known as counterfactuals, were introduced by Neyman (1923) and later popularized by Rubin (1974). Such models are used extensively within biostatistics, statistics, political science, economics, and epidemiology for reasoning about causation. Directed acyclic graphs (DAGs), introduced by Wright (1921) are another formalism used to represent causal systems. Graphs are also extensively used in computer science, bioinformatics, sociology and epidemiology.; In this talk, I will present a simple approach to unifying these two approaches via a new graph, termed the Single-World Intervention Graph (SWIG). The SWIG encodes the counterfactual independences associated with a specific hypothetical intervention on a set of treatment variables. The nodes on the SWIG are the corresponding counterfactual random variables. The SWIG is derived from a causal DAG via a simple node splitting transformation. I will illustrate the theory with a number of examples. Finally I show that SWIGs avoid a number of pitfalls that are present in an alternative approach to unification, based on "twin networks" that has been advocated by Pearl (2000).; Joint work with James Robins; paper, and shorter summary paper from the Causal Structure Learning Workshop at UAI 2013; Time and place: 4:30--5:30 pm on Wednesday, 5 March 2014, in Scaife Hall 125

As always, the talk is free and open to the public, whether the public follows their arrows or not.

*: I myself have heard Donald Rubin assert that graphical models cannot handle counterfactuals, or non-additive interactions between variables (particularly that they cannot handle non-additive treatments), and that their study leads to neglecting analysis-of-design questions. (This was during his talk at the CMU workshop "Statistical and Machine Learning Approaches to Network Experimention", 22 April 2013.) This does not diminish Rubin's massive contributions to statistics in general, and to causal inference in particular, but does not exactly indicate a thorough knowledge of a literature which goes rather beyond "playing with arrows".

~~Constant Conjunction~~ Necessary Connexion

Posted by crshalizi at March 04, 2014 17:50 | permanent link

February 28, 2014

Books to Read While the Algae Grow in Your Fur, February 2014

Attention conservation notice: I have no taste. To exemplify this, the theme for the month was finally getting a tablet, and so indulging in a taste for not very sophisticated comic books.

John Rogers et al., Dungeons and Dragons: Shadowplague, First Encounters, Down: Comic book mind candy. It is with praise and affection that I say it made me want to play D&D again.
Nunzio DeFilippis, Christina Weir, Christopher Mitten, and Bill Crabtree, Bad Medicine
Lora Innes, The Dreamer, 1: The Consequences of Nathan Hale
Matt Fraction and David Aja, Hawkeye, 2: Little Hits: Comic book mind candy, assorted flavors.
Matt Fraction and Chip Zdarsky, Sex Criminals, 1: One Weird Trick: Comic book mind candy, but so wonderfully weird as to deserve special mention. If your orgasms stop time for everyone else, wouldn't you (ROT-13'd for spoilers) ratntr va n frevbhf bs eboorevrf gb fnir ybpny yvoenevrf , obviously?
Alan Moore and Jacen Burrows, Neonomicon: Comic book mind candy. One the one hand, I am glad I am not alone in seeing the (ROT-13'd for spoilers) Wrfhf Puevfg/Terng Pguhyuh cnenyyryf . On the other hand, this is so pervy, in such an ugly way, that it made me feel soiled and somehow complicit in a way very few books do. (Also, I suspect Moore doesn't altogether regard it as mind candy.)
Gail Simone and Walter Geovani, Red Sonja, 1: Queen of Plagues: Comic-book mind-candy, but enjoyable as such without apology. (And yes, that Red Sonja.)
Carrie Vaughn, Dreams of the Golden Age: Mind candy; sequel to After the Golden Age, half a generation later. Good, but not as good.
Franco Moretti, Distant Reading: A collection of essays, in which Moretti works his way from his early, more conventional ideas about literary history to the Graphs, Maps, Trees position. Despite the promise of the publisher's blurb, the recent work, some of which actually does feature principal components, is not included. I enjoyed reading or re-reading these essays, but I'm very much on his side, and it doesn't make a good introduction to the debate.
Konstantin Kakaes, The Pioneer Detectives: It takes a special kind of person to find a gripping read in a decade-spanning saga of high-precision measurement, celestial mechanics, academic dispute, and detailed numerical modeling of thermal stresses. If you read this blog, you are very likely to be that kind of person: go be gripped. (I forget where I saw this recommended.)

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Cthulhiana; Writing for Antiquity; The Commonwealth of Letters; Physics; The Eternal Silence of These Infinite Spaces

Posted by crshalizi at February 28, 2014 23:59 | permanent link

On Reaching the Clearing at the End of the Tenure Track

Attention conservation notice: Navel-gazing by a middle-aged academic.

I got tenure a few weeks ago. (Technically it takes effect in July.) The feedback from the department and university which accompanied the decision was gratifyingly positive, and I'm pleased that blogging didn't hurt me at all, and perhaps even helped. I got here with the help of a lot of mentors, colleagues, and friends (not a few of whom I met through this blog), and I feel some vindication on their behalf. For myself, I feel — relieved, even pleased.

Relieved and pleased, but not triumphant. I benefited from a huge number of lucky breaks. I know too many people who would be at least as good in this sort of job as I am, and would like such a job, but instead have ones which are far less good for them. If a few job applications, grant decisions, or choices about what to work on when had turned out a bit different, I could have been just as knowledgeable, had ideas just as good, worked on them as obsessively, etc., and still been in their positions, at best. Since my tenure decision came through, I've had two papers and a grant proposal rejected, and another paper idea I've been working on for more than a year scooped. A month like that at the wrong point earlier on might well have sunk my academic career. You don't get tenure at a major university without being a productive scholar (not for the most part anyway), but you also don't get it without being crazily lucky, and I can't feel triumphant about luck.

It's also hard for me to feel triumph because, by the time I get tenure, I will have been at CMU for nine years and change. Doing anything for that long marks you, or at least it marks me, and I'm not sure I like the marks. The point of tenure is security, and I hope to broaden my work, to follow some interests which are more speculative and risky and seem like they will take longer to pay off, if they ever do. But I have acquired habits and made commitments which will be very hard to shift. One of those habits is to think of my future in terms of what sort of scholarly work I'm going to be doing, and presuming that I will be working all the time, with only weak separation between work and the rest of life. I even have some fear that this has deformed my character, making some ordinary kinds of happiness insanely difficult. But maybe "deformed" is the wrong word; maybe I stuck with this job because I was already that kind of person. I can't bring myself to wish I wasn't so academic in my interests, or that I hadn't pursued the career I have, or that I had been less lucky in it. But I worry about what I have given up for it, and how those choices will look in another nine years, or twenty-nine.

Sometime in the future, I may write about what I think about tenure as an institution. But today is a beautiful winter's day here in Pittsburgh, cold but clear, the sky is a brilliant pale blue right now. It's my 40th birthday. I'm going outside to take a walk — and then probably going back to work.

Self-Centered

Posted by crshalizi at February 28, 2014 15:52 | permanent link

January 31, 2014

Books to Read While the Algae Grow in Your Fur, January 2014

Attention conservation notice: I have no taste.

William M. Arkin, American Coup: Arkin has something to say about the self-perpetuating and dubiously-constitutional national security bureaucracy its dim views of actual democracy, and its apparent day-dreams about martial law. Unfortunately, after reading this I'm hard-pressed to tell you exactly what he wants to say. Definitely inferior to Top-Secret America.
Gene Wolfe, The Shadow of the Torturer: This is the first volume of The Book of the New Sun. Some genius decided that this was appropriate material for my middle school's library, so I read it when I was ten or eleven, and discovering science fiction. I found them fascinating and bewildering and I wasn't sure if I liked them but I had to keep going. (I am not sure if I mean "genius" sarcastically or not.) Re-reading after a lapse of thirty years, I find it fascinating and bewildering, and I have no idea how much of it I actually understood as a boy or how much of it I understand now. I am not sure if I like it or not, though that hardly seems relevant; I have to keep going.
Elizabeth L. Eisenstein, The Printing Press as an Agent of Change: This is and deserves to be a modern classic, and I wish I'd read it long ago. Eisenstein's fundamental point, which I think is entirely sound, is that you cannot understand anything about the transition from ancient or medieval intellectual life to the modern life of the mind without grasping the importance of being able to quickly, cheaply make large numbers of very accurate copies of a text, and distribute them widely.; To give just one point: one reason ancient and medieval scientists thought that older books were ipso facto better books, was that it was literally true. Scribes are awful at reproducing technical material. This meant that either scientists had to waste immense efforts, e.g., correcting astronomical tables or re-doing mathematical calculations, or cluster around rare centers where rulers were willing to put vast efforts into maintaining highly accurate manuscript collections, or live with knowledge that degenerated from copy to copy and generation to generation. Printing made it possible to change all that. Printing made it possible to reproduce, e.g., a table of sines or logarithms or latitudes which could be trusted. Printing made it possible for scholars across the European subcontinent, and eventually beyond, to access the same texts. Printing made possible the second edition --- the expanded and corrected second edition.; Again: with manuscripts, copying texts is extremely expensive. This tends to make scholarly attention to one field of inquiry come at the expense of others --- more scribes copying ancient philosophy or histories means fewer copying mathematics or books of mechanical devices. The constraint of sheer book-reproduction is vastly weakened by introducing printing: a society which is interested in Cicero and Archimedes and Augustine and political pr0n can have them all [1]. Not having to make such choices was itself transformative. If the arrival of printing coincides with a revival of interest in classical literature, it can propel the latter from a passing incident (as had happened several times before) into a permanent intellectual and spiritual revolution. It can make possible the scientific and industrial revolutions, and all that have followed from them.; Eisenstein also has very interesting things to say about the religious impact of printing, including how the multi-lingual origins of the Christian Bible first stimulated scholarship and then led to skepticism, and the feedback loop between Protestant Bibliolatry, wide-spread literacy, and the viability of printing as a trade and a profit-making business. Again, there is fascinating material in here on the impact of printing on the rise and then decline of magic and occultism. And on and on.; There are drawbacks to the book. Eisenstein's implied reader isn't just familiar with at least the outlines of the political and intellectual history of western Europe from say 1400 to 1700, they know a lot about it, including the origins and spread of movable-type printing. The reader doesn't just know who Gutenberg was, or Erasmus, or Francis Bacon, but Ramus and Scaliger and Plantin and Bruno. (I admit I had to look up a lot of the printers and not a few of the scholars.); Even with this knowledge, it seems to me to be insufficiently comparative. There are gestures at why Christendom was more apt to be transformed by printing than the Islamic lands, but some of these seem to rest on mis-understandings [2] or even mere failure of imagination [3]. More importantly, Eisenstein pays almost no attention paid to East (and Central) Asia. This is where printing began, and became ubiquitous, centuries before Gutenberg. So why did it not have the effects there that Eisenstein claims for it in western Europe? Something clearly led to very different effects, but it's hard, on her account, to see why [4]. Of course, this is asking for a great deal of a single historian!; The anthropologist Dan Sperber once wrote that "culture is the precipitate of cognition and communication in a human population". Even with my quibbles, this is the best examination I've ever seen about how changing the means of communication changed the culture of a human population. That transformation is still reshaping the world, and I think this book is essential reading for coming to grips with it.; 1: I am indoctrinated enough by economics that I'd phrase this argument somewhat differently than Eisenstein does: she writes as though the supply of scribal labor was fixed and inelastic, and the supply of scholarly labor was not also a constraint. These are details.; 2: E.g., on calenderics. Islam has nothing like the computational mess of Easter, but in lots of Muslim countries solar calendars did and still do co-exist with the lunar-based religious calendar.; 3: Why shouldn't the desire to fix accurately the text of the Qur'an, and the content of the hadith, have led to textual and historical scholarship comparable to that called for by the Bible? (Cf.) Vernacular versions of the Qur'an would have been a massive religious innovation, but so were vernacular Bibles: why not have them? Even if that was too big a step, early-modern Sunni orthodoxy regarded (and for the most part still regards) the Qur'an as the uncreated word of God: why not print it in Arabic, so as to avoid scribal corruptions?; 4: If a multi-lingual scriptural tradition was somehow vital, well, East Asia had that, with Sanskrit and Pali — and Chinese, in Korea, Vietnam and Japan — instead of Greek and Hebrew. If an alphabet and movable type were needed, Korea had those.
Jackie Kessler and Caitlin Kittredge, Black and White and Shades of Gray: Mind candy: prose super-hero stories. (Fittingly, given the genre, the second one involves a lot of retconning.)
Joe Haldeman, The Forever War: Somehow, I had missed reading this until now. It goes well beyond mind-candy to being both a great work of hard science fiction and an excellent novel in its own right. The sexual and gender politics are dated, and the technology is now anachronistically short of computation, but that's not really the point; as Haldeman says in his foreword to the electronic edition, it's "mainly about war, about soldiers, and about the reasons we think we need them". Relativistic time dilation works here as a powerful metaphor for the estrangement of the soldier returning to the civilian world, but not just as a metaphor. (Cf.); Further commentary is outsourced to Jo Walton.; (Thanks to US Airways and LAX airport for cooperating to provide the time and environment where there was nothing to distract my mind from this novel.)
Andrea Camilleri, The Age of Doubt, The Dance of the Seagull, and Treasure Hunt: Mind candy. As with any candy, I keep expecting to feel over-full at some point, but so far, I keep scarfing them down as fast as I can find them. (Previously.)
Seanan McGuire, Indexing: Mind candy: in which fairy-tales are constantly attempting to escape from the dungeon dimensions, and our heroes battle them back. (There is an essay to be written about contemporary ambivalence about "narratives".)

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Writing for Antiquity; The Great Transformation; The Collective Use and Evolution of Concepts; The Continuing Crises; The Beloved Republic

Posted by crshalizi at January 31, 2014 23:59 | permanent link

January 02, 2014

36-350, Fall 2013: Self-Evaluation and Lessons Learned (Introduction to Statistical Computing)

This was not one of my better performances as a teacher.

I felt disorganized and unmotivated, which is a bit perverse, since it's the third time I've taught the class, and I know the material very well by now. The labs were too long, and my attempts to shove the excess parts of the labs into revised homework assignments did not go over very well. The final projects were decent, but on average not as good as the previous two years.

I have two ideas about what went wrong. One is of course about kids these days (i.e., blaming the victims), and the other is about my own defects of character.

First, in retrospect, previous iterations of the course benefited from the fact that there hadn't been an undergraduate course here in statistical computing. This meant there was a large pool of advanced statistics majors who wanted to take it, but already knew a lot of the background materials and skills; the modal student was also more academically mature generally. That supply of over-trained students is now exhausted, and it's not coming back either — the class is going to become a requirement for the statistics major. (As it should.) So I need to adjust my expectations of what they know and can do on their own downward in a major way. More exactly, if I want them to know how to do something, I have to make sure I teach it to them, and cut other things from the curriculum to make room. This, I signally failed to do.

Second, I think the fact that this was the third time I have taught basically the same content was in fact part of the problem. It made me feel too familiar with everything, and gave me an excuse for not working on devising new material up until the last moment, which meant I didn't have everything at my finger's ends, and frankly I wasn't as excited about it either.

Putting these together suggests that a better idea for next time would be something like the following.

They need to be drilled in interacting-with-the-machine skills, like how to use text editors (not word processors) and command lines. Therefore: require the use of RStudio, and that all assignments be turned in either in RMarkdown or Sweave. (This would be a bit hypocritical, since I normally don't use any of those myself, but I am not going to teach them my own Emacs-centric work habits.)
Programming style (meaningful names, commenting, writing effect tests which are clearly separate from the main code, etc.) needs to be religiously enforced. Therefore: enforce it religiously, and make it an explicit part of the grading scheme. Grading rubrics need to be transparent about this.
They need to learn to think through debugging, design, and re-design. Therefore: Devote some lectures to live examples of all these, with non-trivial code, and pair them with assignments.
They need more practice with collaboration. (If nothing else this will help them see the importance of writing code others can read.) Therefore: Institute pair programming in labs, and/or require paired work on homework; rotate partners. (I'm not so sure about this requirement.)
They need to learn about version control and collaboration tools. Therefore: at least one lecture on version control with git, and require its use on projects. (The start-up costs for learning git may be too harsh, but it's what I use and I really won't go through teaching myself something else.)
They need me to walk them through linear regression (or at least lm) and the logic of maximum likelihood. Therefore: Add a lecture (and assignment) with a systematic demo of lm, formula, residuals, predict. Include at least one non-linear regression method as a contrast case.
The in-class midterm exam consistently fails to give useful additional information about what they know and can do. Therefore: scrap it in favor of a mini-project.
The final project needs a lot more scaffolding, or at least feedback. Therefore: have them start projects earlier, and require more interim reports from them.
They need more concrete data-analytic examples earlier, they need to see the data-manipulation material earlier. Therefore: re-organize the lectures, putting off the classic computational-statistics topics of simulation and optimization towards the end. Chop off material as needed to fit the basics.
The labs are too long and too poorly integrated with homework. (Also, they are sequenced to the current lectures, and there are too many copies of old solution sets circulating.) Therefore, throw out all the old assignments and write new ones. One student suggested making the labs not the beginning of a take-home assignment, but a chance to get feedback/corrections at the end; this seems promising. Provide very explicit rubrics.

All of this will be a lot of work for me, but that's part of the point. Hopefully, I will make the time to do this, and it will help.

Introduction to Statistical Computing

Posted by crshalizi at January 02, 2014 18:01 | permanent link

January 01, 2014

End of Year Inventory, 2013

Attention conservation notice: Navel-gazing.

Paper manuscripts completed: 4
Papers accepted: 3
Papers rejected: 4 (fools! we'll show you all!)
Papers in revise-and-resubmit purgatory: 2
Papers in refereeing limbo: 1
Papers with co-authors waiting for me to revise: 7
Other papers in progress: I won't look in that directory and you can't make me

Grant proposals submitted: 5
Grant proposals funded: 1
Grant proposals rejected: 3 (fools! we'll show you all!)
Grant proposals in refereeing limbo: 2
Grant proposals in progress for next year: 1
Grant proposals refereed: 2

Talk given and conferences attended: 17, in 10 cities

Classes taught: 2 [i, ii]
New classes taught: 0
Summer school classes taught: 1
New summer school classes taught: 0
Pages of new course material written: not that much

Manuscripts refereed: 21
Number of times I was asked to referee my own manuscript: 0
Manuscripts waiting for me to referee: 5
Manuscripts for which I was the responsible associate editor at Annals of Applied Statistics: 4
Book proposals reviewed: 1
Book proposals submitted: 0
Book outlines made and torn up: 3
Book manuscripts completed: 0
Book manuscripts due soon: 1

Students who completed their dissertations: 0
Students who completed their dissertation proposals: 0
Students preparing to propose in the coming year: 4
Letters of recommendation sent: 60+
Dissertations at other universities for which I was an external examiner: 2 (i, ii)

Promotions received: 0
Tenure packets submitted: 1
Days until final decision on tenure: < 30

Book reviews published on dead trees: 0

Weblog posts: 93
Substantive posts: 17, counting algal growths
Incomplete posts in the drafts folder: 39
Incomplete posts transferred to the papers-in-progress folder: 1

Books acquired: 260
Books begun: 104
Books finished: 76
Books given up: 3
Books sold: 28
Books donated: 0

Major life transitions: 1

Self-Centered

Posted by crshalizi at January 01, 2014 00:01 | permanent link

Three-Toed Sloth

September 15, 2014

Introduction to Statistical Computing

Fall 2014

Fall 2013

Fall 2012

Fall 2011

August 29, 2014

Rainfall, Data Structures, Sequences (Introduction to Statistical Computing)

Lab: Exponentially More Fun (Introduction to Statistical Computing)

August 27, 2014

Bigger Data Structures (Introduction to Statistical Computing)

August 25, 2014

Introduction to the Course; Basic Data Types (Introduction to Statistical Computing)

Class Announcement: 36-350, Statistical Computing, Fall 2014

July 31, 2014

Books to Read While the Algae Grow in Your Fur, July 2014

July 11, 2014

A Statement from the Editorial Board of the Journal of Evidence-Based Haruspicy

July 06, 2014

Accumulated Bookchat

June 30, 2014

Books to Read While the Algae Grow in Your Fur, June 2014

June 22, 2014

Notes on "Collective Stability in Structured Prediction: Generalization from One Example" (or: Small Pieces, Loosely Joined)

Weak Dependence and a Point-wise Deviation Bound

Vectorized Functions and Collective Stability

Stability of the Worst-Case Deviation

Rademacher Complexity

Collective Stability and Generalizing from One Big Example

May 31, 2014

Books to Read While the Algae Grow in Your Fur, May 2014

April 30, 2014

Books to Read While the Algae Grow in Your Fur, April 2014

April 01, 2014

"Proper scoring rules and linear estimating equations in exponential families" (Next Week at the Statistics Seminar)

March 31, 2014

Books to Read While the Algae Grow in Your Fur, March 2014

March 15, 2014

"The Neglect of Fluctuations in the Thermodynamics of Computation" (Next Week at the Energy and Information Seminar)

March 04, 2014

"Unifying the Counterfactual and Graphical Approaches to Causality" (Tomorrow at the Statistics Seminar)

February 28, 2014

Books to Read While the Algae Grow in Your Fur, February 2014

On Reaching the Clearing at the End of the Tenure Track

January 31, 2014

Books to Read While the Algae Grow in Your Fur, January 2014

January 02, 2014

36-350, Fall 2013: Self-Evaluation and Lessons Learned (Introduction to Statistical Computing)

January 01, 2014

End of Year Inventory, 2013