December 31, 2010

A Defense of Lotteries

Jordan Ellenberg at Quomodocumque links to an old article he wrote about the expected value of lottery tickets. Despite the fact that the article is in Slate, it is free of knee-jerk contrarianism, and this so disturbs the fundamental order of the universe that I feel like I have to supply some of my own. I claim, therefore, that playing the lottery can be quite rational in cost-benefit terms, even if the expected monetary value of the ticket is negative, and one is risk averse. (And what God died and left expected utility in charge?)

The benefit to playing the lottery comes entirely between buying the ticket, and when the winner is revealed. During this interval, someone who has bought the ticket can entertain the idea that they might win, and pleasantly imagine how much better their life could be with the money, what they would do with it, etc. It's true that in some sense you always could thinking about "what if I had $280 million?", but many people find it very hard to get our imaginations going on sheer will-power. A plausible and concrete path to the riches, no matter how low the probability, serves as a hook on which to suspend disbelief. In this regard, indeed, lottery tickets are arguably quite cost-effective. If a $1 lottery ticket licenses even one hour of imagining a different life, I don't see how people who spend $12 for two or three hours of such imagining at a movie theater, or $25 for ten hours at a bookstore, are in any position to talk.

Despite having held this idea for years, I have never played the lottery, because I couldn't begin to make myself believe.

Modest Proposals

Posted by crshalizi at December 31, 2010 19:43 | permanent link

December 24, 2010

Hogswatchnight Gifts

As a reward for the fortitude everyone showed during the Dark Gods' simultaneous assault on the Sun and the Moon, I bring the traditional Hogswatchnight sausage products book reviews: of Karl Sigmund's The Calculus of Selfishness; of Kurt Jacobs's Stochastic Processes for Physicists: Understanding Noisy Systems; and of Josiah Ober's Democracy and Knowledge: Innovation and Learning in Classical Athens. Now what did you the Hogfather get me this year?

The Collective Use and Evolution of Concepts; Biology; Mathematics; Complexity; Physics; The Dismal Science; Commit a Social Science; Writing for Antiquity; Enigmas of Chance

Posted by crshalizi at December 24, 2010 01:08 | permanent link

December 18, 2010

The 2011 SFI Complex Systems Summer School (Dept. of Signal Amplification)

A public service announcement: the Santa Fe Institute's annual complex systems summer school is taking applications online, through January 7th. The summer schools are one of the best things SFI does, a very intense intellectual experience in a beautiful setting with a lot of scary-bright peers and outstanding instructors (though occasionally they let me rant too*). If you are a graduate student or post-doc and read this weblog, the odds are very good that you would be interested in the summer school; so apply, why don't you?

*: Speaking of which, I don't think I ever linked to last year's lecture slides [1, 2, 3].

Complexity; Incestuous Amplification

Posted by crshalizi at December 18, 2010 21:10 | permanent link

December 09, 2010

"These seeming anchorages are themselves but floating seaweed"

Attention conservation notice: Yet more ramblings about social thought and science fiction, including a long quotation from a work of philosophy written half a century ago, and some notions I already aired years ago, when writing about Lem and Sterling.

People on the Web are interested in the Singularity: who knew? Further to the question, and especially to Henry Farrell's post (on Patrick Nielsen Hayden's post on my post...), a long passage from Ernest Gellner's Thought and Change (University of Chicago Press, 1965). This comes as Gellner — a committed liberal — is discussing the "undemonstrability of our liberal values".

The ethical models which do happen to be relevant in our time and are applicable, partly at least, it is argued, include the Rail and the "Hypothetical Imperative" (target) type: the view that certain things are good because inevitable, and some others are such because they are means to things whose desirability cannot in practice be doubted (i.e. wealth, health and longevity rather than their opposites).

Their application gives us as corollaries the two crucial and central values of our time — the attainment of affluence and the satisfaction of nationalism. "Affluence" means, in effect, a kind of consummation of industrial production and application of science to life, the adequate and general provision of the means of a life free from poverty and disease; "nationalism" requires, in effect, the attainment of a degree of cultural homogeneity within each political unit, sufficient to give the members of the unit a sense of participation. [CRS: I omit here a footnote in which Gellner refers to a later chapter on his theory of nationalism.]

The two logical paths along which one can reach these two conclusions, which in any case are hardly of staggering originality, are not independent of each other: nor are these two modes of reason very distinct from the general schema of the argument of this book — the attempt to see what conclusions are loosely implicit — nothing, alas, is implicit rigorously — in a lucid estimate of our situation.

These two data or limitations on further argument do not by any means uniquely determine either the final form of industrial society, or the path of its attainment. This is perhaps fortunate: it is gratifying to feel that doors remain open. The difficulties stem perhaps from the fact that they are too open, philosophically. If we look back at the logical devices men have employed to provide anchorages for their values, we see how very unsatisfactory they are: these seeming anchorages are themselves but floating seaweed. The notion of arguing from given desires and satisfactions (as in Utilitarianism), from the true nature of man (as in the diverse variants of the Hidden Prince doctrine, such as Platonism), from a global entelechy, or the notion of harmony, etc., are all pretty useless, generally because they assume something to be fixed which is in fact manipulable. But our needs are not fixed (except for certain minimal ones which are becoming so easy to satisfy as to pose no long-term problem). We have no fixed essence. The supposed anchorage turns out to be as free-floating as the ship itself, and usually attached to it. They provide no fixed point. That which is itself a variable cannot help to fix the value of the others.

This point has been obscured in the past by the fact that, though it was true in theory, it was not so in practice: human needs, the image of man, etc., were not changing very rapidly, still less were they under human control.

This openness, combined with a striking shortage of given premisses for fixing its specific content, gives some of us at any rate a sense of vertigo when contemplating the future of mankind. This vertigo seems to me somehow both justified and salutary. Once it was much in evidence: in a thinker such as H. G. Wells, futuristic fantasies were at the same time exercises in political theory. Today, science fiction is a routinised genre, and political theory is carefully complacent and dull — and certainly not, whatever else might be claimed for it, vertiginous. But whilst it is as well to be free of messianic hopes, it is good to retain some sense of vertigo.

This openness itself provides the clue to one additional, over-riding value — liberty, the preservation of open possibilities, both for individuals and collectively.

There is a lot that could be said about this — the difference between this sort of anti-foundationalism and, say, Rorty's; Gellner's confidence that satisfying minimal human needs is, or soon will be, a solvable problem is something I actually agree with, but oh does it ever sound complacent today; and so for that matter does his first person plural — but I want to focus on the next to last bit, about science fiction. It was pretty plain by, oh, 1848 at the latest that the kind of scientific knowledge we have now, and the technological power that goes with it, radically alters, and even more radically expands, the kind of societies are possible, lets us live our lives in ways profoundly different from our ancestors. (For instance, we can have affluence and liberty.) How then should we live? becomes a question of real concern, because we have, in fact, the power to change ourselves, and are steadily accruing more of it.

This, I think, is the question at the heart of science fiction at its best. (This meshes with Jo Walton's apt observation that one of the key aesthetic experiences of reading SF is having a new world unfold in one's mind.) Now it is clear that the vast majority of it is rehashing familiar themes and properties, and transparently projecting the social situation of its authors. I like reading that anyway, even when I can see how it would be generated algorithmically (perhaps just by a finite-state machine). Admittedly, I have no taste, but I actually think there is a lot to be said for this sort of entertainment, which has in any case been going on for quite a while now]. (As I may have said before, TV Tropes is the Morphology of the Folk-Tale of our time.) But sometimes, SF can break beyond that, to approach the question What should we make of our ourselves? with the imagination, and vertigo, it deserves.

Update, later that afternoon: Coincidentally, Paul McAuley.

Manual trackback: Crooked Timber (making me wonder if I shouldn't also file this under "incestuous amplification"!)

The Great Transformation; Scientifiction and Fantastica

Posted by crshalizi at December 09, 2010 15:34 | permanent link

December 07, 2010

This Week Today! at the Complex Systems Colloquium

Attention conservation notice: Of interest only if you (1) happen to be in Ann Arbor today, and (2) care about causal inference in social networks. Also, this post was meant to go live over the weekend, but I messed up the timing, so it's probably too late for you to make plans.

I'll be speaking today at the Center for the Study of Complex Systems at the University of Michigan:

"Homophily, Contagion, Confounding: Pick Any Three"
Abstract: One person's behavior can often be predicted from that of their neighbors in a social network. This is sometimes explained by homophily, the tendency to form social ties with others because we resemble them. It is also sometimes explained by social contagion or social influence, the tendency to act like someone because they are our neighbor. We show that, generically, these two mechanisms are confounded with each other, and with the causal effect of an individual's attributes on their behavior. Distinguishing them requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular, simple examples show that asymmetries in regression coefficients cannot identify causal effects, and that imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these non-identifiability results. (Joint work with Andrew Thomas)
Place and Time: 340 West Hall, noon to 1:30 pm. (I believe we will be starting on "Michigan time".)
I was a post-doc at CSCS when I started this weblog, and they still graciously continue to host it. I prefer to think of coming back to give a talk as more "local boy makes good", and less "not getting anywhere after all these years".

Manual trackback of a sort: AnnArbor.com

Self-Centered; Networks; Enigmas of Chance; Complexity

Posted by crshalizi at December 07, 2010 09:30 | permanent link

December 02, 2010

Model Complexity and Prediction Error in Macroeconomic Forecasting (or, Statistical Learning Theory to the Rescue!)

Attention conservation notice: 5000+ words, and many equations, about a proposal to improve, not macroeconomic models, but how such models are tested against data. Given the actual level of debate about macroeconomic policy, isn't it Utopian to worry about whether the most advanced models are not being checked against data in the best conceivable way? What follows is at once self-promotional, technical, and meta-methodological; what would be lost if you checked back in a few years, to see if it has borne any fruit?

Some months ago, we — Daniel McDonald, Mark Schervish, and I — applied for one of the initial grants from the Institute for New Economic Thinking, and, to our pleasant surprise, actually got it. INET has now put out a press release about this (and the other awards too, of course), and I've actually started to get some questions about it in e-mail; I am a bit sad that none of these berate me for becoming a tentacle of the Soros conspiracy.

To reinforce this signal, and on the general principle that there's no publicity like self-publicity, I thought I'd post something about the grant. In fact what follows is a lightly-edited version of our initial, stage I proposal, which was intended for maximal comprehensibility, plus some more detailed bits from the stage II document. (Those of you who sense a certain relationship between the grant and Daniel's thesis proposal are, of course, entirely correct.) I am omitting the more technical parts about our actual plans and work in progress, because (i) you don't care; (ii) some of it is actually under review already, at venues which insist on double-blinding; and (iii) I'll post about them when the papers come out. In the meanwhile, please feel free to write with suggestions, comments or questions.

Update, next day: And already I see that I need to be clearer. We are not trying to come up with a new macro forecasting model. We are trying to put the evaluation of macro models on the same rational basis as the evaluation of models for movie recommendations, hand-writing recognition, and search engines.

Proposal: Model Complexity and Prediction Error in Macroeconomic Forecasting

Cosma Shalizi (principal investigator), Mark Schervish (co-PI), Daniel McDonald (junior investigator)

Macroeconomic forecasting is, or ought to be, in a state of confusion. The dominant modeling traditions among academic economists, namely dynamic stochastic general equilibrium (DSGE) and vector autoregression (VAR) models, both spectacularly failed to forecast the financial collapse and recession which began in 2007, or even to make sense of its course after the fact. Economists like Narayana Kocherlakota, James Morley, and Brad DeLong have written about what this failure means for the state of macroeconomic research, and Congress has held hearings in an attempt to reveal the perpetrators. (See especially the testimony by Robert Solow.) Whether existing approaches can be rectified, or whether basically new sorts of models are needed, is a very important question for macroeconomics, and, because of the privileged role of economists in policy making, for the public at large.

Largely unnoticed by economists, over the last three decades statisticians and computer scientists have developed sophisticated methods of model selection and forecast evaluation, under the rubric of statistical learning theory. These methods have revolutionized pattern recognition and artificial intelligence, and the modern industry of data mining would not exist without it. Economists' neglect of this theory is especially unfortunate, since it could be of great help in resolving macroeconomic disputes, and determining the reliability of whatever models emerge for macroeconomic time series. In particular, these methods guarantee with high probability that the forecasts produced by models estimated with finite amounts of data will be accurate. This allows for immediate model comparisons without appealing to asymptotic results or making strong assumptions about the data generating process, in stark contrast to AIC and similar model selection criteria. These results are also provably reliable unlike the pseudo-cross validation approach often used in economic forecasting whereby the model is fit using the initial portion of a data set and evaluated on the remainder. (For illustrations of the last, see, e.g., Athanasopoulos and Vahid, 2008; Faust and Wright, 2009; Christoffel, Coenen, and Warne, 2008; Del Negro, Schorfheide, Smets, and Wouters, 2004; and Smets and Wouters, 2007. This procedure can be heavily biased: the held out data is used to choose the model class under consideration, the distributions of the test set and the training set may be different, and large deviations from the normal course of events [e.g., the recessions in 1980--82] may be ignored.)

In addition to their utility for model selection, these methods give immediate upper bounds for the worst case prediction error. The results are easy to understand and can be reported to policy makers interested in the quality of the forecasts. We propose to extend proven techniques in statistical learning theory so that they cover the kind of models and data of most interest to macroeconomic forecasting, in particular exploiting the fact that major alternatives can all be put in the form of state-space models.

To properly frame our proposal, we review first the recent history and practice of macroeconomic forecasting, followed by the essentials of statistical learning theory (in more detail, because we believe it will be less familiar). We then describe the proposed work and its macroeconomic applications.

Economic Forecasting

Through the 1970s, macroeconomic forecasting tended to rely on "reduced-form" models, predicting the future of aggregated variables based on their observed statistical relationships with other aggregated variables, perhaps with some lags, and with the enforcement of suitable accounting identities. Versions of these models are still in use today, and have only grown more elaborate with the passage of time; those used by the Federal Reserve Board of Governors contain over 300 equations. Contemporary vector autoregression models (VARs) are in much the same spirit.

The practice of academic macroeconomists, however, switched very rapidly in the late 1970s and early 1980s, in large part driven by the famous "critique" of such models by Lucas (published in 1976). He argued that even if these models managed to get the observable associations right, those associations were the aggregated consequences of individual decision making, which reflected, among other things, expectations about variables policy-makers would change in response to conditions. This, Lucas said, precluded using such models to predict what would happen under different policies.

Kydland and Prescott (1982) began the use of dynamic stochastic general equilibrium (DSGE) models to evade this critique. The new aim was to model the macroeconomy as the outcome of individuals making forward-looking decisions based on their preferences, their available technology, and their expectations about the future. Consumers and producers make decisions based on "deep" behavioral parameters like risk tolerance, the labor-leisure trade-off, and the depreciation rate which are supposedly insensitive to things like government spending or monetary policy. The result is a class of models for macroeconomic time series that relies heavily on theories about supposedly invariant behavioral and institutional mechanisms, rather than observed statistical associations.

DSGE models have themselves been heavily critiqued in the literature for ignoring many fundamental economic and social phenomena --- we find the objections to the representative agent assumption particularly compelling --- but we want to focus our efforts on a more fundamental aspect of these arguments. The original DSGE model of Kydland and Prescott had a highly stylized economy in which the only source of uncertainty was the behavior of productivity or technology, whose log followed an AR(1) process with known-to-the-agent coefficients. Much of the subsequent work in the DSGE tradition has been about expanding these models to include more sources of uncertainty and more plausible behavioral and economic mechanisms. In other words, economists have tried to improve their models by making them more complex.

Remarkably, there is little evidence that the increasing complexity of these models actually improves their ability to predict the economy. (Their performance over the last few years would seem to argue to the contrary.) For that matter, much the same sort of questions arise about VAR models, the leading alternatives to DSGEs. Despite the elaborate back-story about optimization, the form in which a DSGE is confronted with the data is a "state-space model," in which a latent (multivariate) Markov process evolves homogeneously in time, and observations are noisy functions of the state variables. VARs also have this form, as do dynamic factor models, and all the other leading macroeconomic time series models we know of. In every case, the response to perceived inadequacies of the models is to make them more complex.

The cases for and against different macroeconomic forecasting models are partly about economic theory, but also involve their ability to fit the data. Abstractly, these arguments have the form "It would be very unlikely that my model could fit the data well if it got the structure of the economy wrong; but my model does fit well; therefore I have good evidence that it is pretty much right." Assessing such arguments depends crucially on knowing how well bad models can fit limited amounts of data, which is where we feel we can make a contribution to this research.

Statistical Learning Theory

Statistical learning theory grows out of advances in non-parametric statistical estimation and in machine learning. Its goal is to control the risk or generalization error of predictive models, i.e., their expected inaccuracy on new data from the same source as that used to fit the model. That is, if the model f predicts outcomes Y from inputs X and the loss function is $ \ell $ (e.g., mean-squared error or negative log-likelihood), the risk of the model is

\[ 
R(f) = \mathbb{E}[\ell(Y,f(X))] ~ . 
 \]
The in-sample or training error, $ \widehat{R}(f) $ , is the average loss over the actual training points. Because the true risk is an expectation value, we can say that
\[ 
\widehat{R}(f) = R(f) + \gamma_n(f) ~ , 
 \]
where $ \gamma_n(f) $ is a mean-zero noise variable that reflects how far the training sample departs from being perfectly representative of the data-generating distribution. By the laws of large numbers, for each fixed f, $ \gamma_n(f) \rightarrow 0 $ as $ n\rightarrow\infty $ , so, with enough data, we have a good idea of how well any given model will generalize to new data.

However, economists, like other scientists, never have a single model with no adjustable parameters fixed for them in advance by theory. (Not even the most enthusiastic calibrators claim as much.) Rather, there is a class of plausible models $ \mathcal{F} $ , one of which in particular is picked out by minimizing the in-sample loss --- by least squares, or maximum likelihood, or maximum a posteriori probability, etc. This means

\[ 
\widehat{f} = \argmin_{f \in \mathcal{F}}{\widehat{R}(f)} = \argmin_{f 
  \in \mathcal{F}}{\left(R(f) + \gamma_n(f)\right)} ~ . 
 \]
Tuning model parameters so they fit the training data well thus conflates fitting future data well (low R(f), the true risk) with exploiting the accidents and noise of the training data (large negative $ \gamma_n(f) $ , finite-sample noise). The true risk of $ \widehat{f} $ will generally be bigger than its in-sample risk, $ \widehat{R}(\widehat{f}) $ , precisely because we picked it to match the data well. In doing so, $ \widehat{f} $ ends up reproducing some of the noise in the data and therefore will not generalize well. The difference between the true and apparent risk depends on the magnitude of the sampling fluctuations:
\[ 
R(\widehat{f}) - \widehat{R}(\widehat{f}) \leq 
\max_{f\in\mathcal{F}}|\gamma_n(f)| = \Gamma_n(\mathcal{F}) ~ . 
 \]
The main goal of statistical learning theory is to mathematically control $ \Gamma_n(\mathcal{F}) $ by finding tight bounds on it while making minimal assumptions about the unknown data-generating process; to provide bounds on over-fitting.

Using more flexible models (allowing more general functional forms or distributions, adding parameters, etc.) has two contrasting effects. On the one hand, it improves the best possible accuracy, lowering the minimum of the true risk R(f). On the other hand, it also increases the ability to, as it were, memorize noise, raising $ \Gamma_n(\mathcal{F}) $ for any fixed sample size n. This qualitative observation --- a generalization of the bias-variance trade-off from basic estimation theory --- can be made usefully precise by quantifying the complexity of model classes. A typical result is a confidence bound on $ \Gamma_n $ (and hence on the over-fitting), say that with probability at least $ 1-\eta $ ,

\[ 
\Gamma_n(\mathcal{F}) \leq \delta(C(\mathcal{F}), n, \eta) ~ , 
 \]
where the function C() measures how complex the model class $ \mathcal{F} $ is.

Several inter-related model complexity measures are now available. The oldest, called "Vapnik-Chervonenkis dimension," effectively counts how many different data sets $ \mathcal{F} $ can fit well by tuning the parameters in the model. Another, "Rademacher complexity," directly measures the ability of $ \mathcal{F} $ to correlate with finite amounts of white noise (Bartlett and Mendelson, 2002; Mohri and Rostamizadeh, 2009). This leads to particularly nice bounds of the form

\[ 
\Gamma_n \leq C_n(\mathcal{F}) + \sqrt{k_1\frac{\log{k_2/\eta}}{n}} ~ , 
 \]
where $ k_1 $ and $ k_2 $ are calculable constants. Yet another measure of model complexity is the stability of parameter estimates with respect to perturbations of the data, i.e., how much $ \widehat{f} $ changes when small changes are made to the training data (Bousquet and Elisseeff, 2002; Mohri and Rostamizdaeh, 2010). (Stable parameter estimates do not require models which are themselves dynamically stable, and the idea could be used on systems which have sensitive dependence on initial conditions.) The different notions of complexity lead to bounds of different forms, and lend themselves more or less easily to calculation for different sorts of models; VC dimension tends to be the most generally applicable, but also the hardest to calculate, and to give the most conservative bounds. Importantly, model complexity, in this sense, is not just the number of adjustable parameters; there are models with a small number of parameters which are basically inestimable because they are so unstable, and conversely, one of the great success stories of statistical learning theory has been devising models ("support vector machines") with huge numbers of parameters but low and known capacity to over-fit.

However we measure model complexity, once we have done so and have established risk bounds, we can use those bounds for two purposes. One is to give a sound assessment of how well our model will work in the future; this has clear importance if the model's forecasts will be used to guide individual actions or public policy. The other aim, perhaps even more important here, is to select among competing models in a provably reliable way. Comparing in-sample performance tends to pick complex models which over-fit. Adding heuristic penalties based on the number of parameters, like the Akaike information criterion (AIC), also does not solve the problem, basically because AIC corrects for the average size of over-fitting but ignores the variance (and higher moments). But if we could instead use $ \Gamma_n(\mathcal{F}) $ as our penalty, we would select the model which actually will generalize better. If we only have a confidence limit on $ \Gamma $ and use that as our penalty, we select the better model with high confidence and can in many cases calculate the extra risk that comes from model selection (Massart, 2007).

Our Proposal

Statistical learning theory has proven itself in many practical applications, but most of its techniques have been developed in ways which keep us from applying it immediately to macroeconomic forecasting; we propose to rectify this deficiency. We anticipate that each of the three stages will require approximately a year. (More technical details follow below.)

First, we need to know the complexity of the model classes to which we wish to apply the theory. We have already obtained complexity bounds for AR(p) models, and are working to extend these results to VAR(p) models. Beyond this, we need to be able to calculate the complexity of general state-space models, where we plan to use the fact that distinct histories of the time series lead to different predictions only to the extent that they lead to different values of the latent state. We will then refine those results to find the complexity of various common DSGE specifications.

Second, most results in statistical learning theory presume that successive data points are independent of one another. This is mathematically convenient, but clearly unsuitable for time series. Recent work has adapted key results to situations where widely-separated data points are asymptotically independent ("weakly dependent" or "mixing" time series) [Meir, 2000; Mohri and Rostamizadeh, 2009, 2010; Dedecker et al., 2007]. Basically, knowing the rate at which dependence decays lets one calculate how many effectively-independent observations the time series has and apply bounds with this reduced, effective sample size. We aim to devise model-free estimates of these mixing rates, using ideas from copulas and from information theory. Combining these mixing-rate estimates with our complexity calculations will immediately give risk bounds for DSGEs, but not just for them.

Third, a conceptually simple and computationally attractive alternative to using learning theory to bound over-fitting is to use an appropriate bootstrap for dependent data to estimate generalization error. However, this technique currently has no theoretical basis, merely intuitive plausibility. We will investigate the conditions under which bootstrapping can yield non-asymptotic guarantees about generalization error.

Taken together, these results can provide probabilistic guarantees on a proposed forecasting model's performance. Such guarantees can give policy makers reliable empirical measures which intuitively explain the accuracy of a forecast. They can also be used to pick among competing forecasting methods.

Why This Grant?

As we said, there has been very little use of modern learning theory in economics (Al-Najjar, 2009 is an interesting, but entirely theoretical, exception), and none that we can find in macroeconomic forecasting. This is an undertaking which requires both knowledge of economics and of economic data, and skill in learning theory, stochastic processes, and prediction theory for state-space models. We aim to produce results of practical relevance to forecasting, and present them in such a way that econometricians, at least, can grasp their relevance.

If all we wanted to do was produce yet another DSGE, or even to improve the approximation methods used in DSGE estimation, there would be plenty of funding sources we could turn to, rather than INET. We are not interested in making those sorts of incremental advances (if indeed proposing a new DSGE is an "advance"). We are not even particularly interested in DSGEs. Rather, we want to re-orient how economic forecasters think about basic issues like evaluating their accuracy and comparing their models --- topics which should be central to empirical macroeconomics, even if DSGEs vanished entirely tomorrow. Thus INET seems like a much more natural sponsor than institutions with more of a commitment to existing practices and attitudes in economics.

[Detailed justification of our draft budget omitted]

Detailed exposition

In what follows, we provide a more detailed exposition of the technical content of our proposed work, including preliminary results. This is, unavoidably, rather more mathematical than our description above.

The initial work described here builds mainly on the work of Mohri and Rostamizadeh, 2009, which offers a handy blueprint for establishing data-dependent risk bounds which will be useful for macroeconomic forecasters. (Whether this bound is really optimal is another question we are investigating.) The bound that they propose has the general form

\[ 
R(\widehat{f}) - \widehat{R}(\widehat{f}) \leq C_n(\mathcal{F}) + 
\sqrt{k_1\frac{\log k_2(\eta)}{m(n)}} ~ . 
 \]
The two terms on the right hand side are a model complexity term, $ C_n(\mathcal{F}) $ , in this case what is called the "Rademacher complexity", and a term which depends on the desired confidence level (through $ \eta $ ), and the amount of data n used to choose $ \widehat{f} $ . For many problems, the calculation of the complexity term is well known in the literature. However, this is not the case for state-space models. The final term depends on the mixing behavior of the time series (assumed to be known). In the next sections we highlight some progress we have made toward calculating the model complexity for state-space models and estimating the mixing rate from data. We then apply the bound to a simple example attempting to predict interest rate movements using an AR model. Finally, we discuss the logic behind using the bootstrap to estimate a bound for the risk. All of these results are new and require further research to make them truly useful to economic forecasters.

Model complexity

As mentioned earlier, statistical learning theory provides several ways of measuring the complexity of a class of predictive models. The results we are using here rely on what is known as the Rademacher complexity, which can be thought of as measuring how well the model can (seem to) fit white noise. More specifically, when we have a class $ \mathcal{F} $ of prediction functions f, the Rademacher complexity of the class is

\[ 
C_n(\mathcal{F}) \equiv 2 \mathbf{E}_{X}\left[\mathbf{E}_{Z}\left[ \sup_{f\in 
      \mathcal{F}}{\left|\frac{1}{n} \sum_{i=1}^{n}{Z_i f(X_i) } \right|} 
  \right]\right] ~ , 
 \]
where X is the actual data, and $ Z_i $ are a sequence of random variables, independent of each other and everything else, and equal to +1 or -1 with equal probability. (These are known in the field as "Rademacher random variables". Very similar results exist for other sorts of noise, e.g., Gaussian white noise.) The term inside the supremum, $ \left|\frac{1}{n} \sum_{i=1}^{n}{Z_i f(X_i) } \right| $ , is the sample covariance between the noise Z and the predictions of a particular model f. The Rademacher complexity takes the largest value of this sample correlation over all models in the class, then averages over realizations of the noise. Omitting the final average over the data X gives the "empirical Rademacher complexity", which can be shown to converge very quickly to its expected value as n grows. The final factor of 2 is conventional, to simplify some formulas we will not repeat here.

The idea, stripped of the technicalities required for actual implementation, is to see how well our models could seem to fit outcomes which were actually just noise. This provides a kind of baseline against which to assess the risk of over-fitting, or failing to generalize. As the sample size n grows, the sample correlation coefficients $ \left|\frac{1}{n} \sum_{i=1}^{n}{Z_i f(X_i) } \right| $ will approach 0 for each particular f, by the law of large numbers; the over-all Rademacher complexity should also shrink, though more slowly, unless the model class is so flexible that it can fit absolutely anything, in which case one can conclude nothing about how well it will predict in the future from the fact that it performed well in the past.

One of our goals is to calculate the Rademacher complexity of stationary state-space models. [Details omitted.]

Mixing rates

Because time-series data are not independent, the number of data points n in a sample S is no longer a good characterization of the amount of information available in that sample. Knowing the past allows forecasters to predict future data points to some degree, so actually observing those future data points gives less information about the underlying data generating process than in the case of iid data. For this reason, the sample size term must be adjusted by the amount of dependence in the data to determine the effective sample size $ m(n) $ which can be much less than the true sample size n. These sorts of arguments can be used to show that a typical data series used for macroeconomic forecasting, detrended growth rates of US GDP from 1947 until 2010, has around n=252 actual data points, but an effective sample size of $ m(n)\approx 12 $ . To determine the effective sample size to use, we must be able to estimate the dependence of a given time series. The necessary notion of dependence is called the mixing rate.

Estimating the mixing rates of time-series data is a problem that has not been well studied in the literature. According to Ron Meir, "as far as we are aware, there is no efficient practical approach known at this stage for estimation of mixing parameters". In this case, we need to be able to estimate a quantity known as the $ \beta $ -mixing rate.

Definition. Let $ X_t $ be a stationary sequence of random variables or stochastic process with joint probability law $ \mathbb{P} $ . For $ -\infty\leq J \leq L\leq\infty $ , let $ \mathcal{F}_J^L =\sigma(X_t,\ J\leq k \leq L) $ , the $ \sigma $ -field generated by the observations between those times. Let $ \mathbb{P}_t $ be the restriction of $ \mathbb{P} $ to $ \mathcal{F}_{-\infty}^t $ with density $ f_t $ , $ \mathbb{P}_{t+m} $ be the restriction of $ \mathbb{P} $ to $ \mathcal{F}_{t+m}^{\infty} $ with density $ f_{t+m} $ , and $ \mathbb{P}_{t \otimes t+m} $ the restriction of $ \mathbb{P} $ to $ \sigma(X_{-\infty}^t,X_{t+m}^\infty) $ with density $ f_{t\otimes t+m} $ . Then the $ \beta $ -mixing coefficient at lag $ m $ is
\[  \beta(m) \equiv {\left\|\mathbb{P}_t \otimes \mathbb{P}_{t+m} - 
\mathbb{P}_{t \otimes t+m}\right\|}_{TV} = \frac{1}{2}\int{\left|f_t f_{t+m} - 
f_{t\otimes t+m}\right|} 
 \]
(Here $ \|\cdot\|_{TV} $ is the total variation distance, i.e., the largest difference between the probabilities that $ \mathbb{P}_{t \otimes 
t+m} $ and $ \mathbb{P}_t \otimes \mathbb{P}_{t+m} $ assign to a single event. Also, to simplify notation, we stated the definition assuming stationarity, but this is not strictly necessary.)

The stochastic process X is called " $ \beta $ -mixing" if $ \beta(m) \rightarrow 0 $ as $ m \rightarrow \infty $ , meaning that the joint probability of events which are widely separated in time increasingly approaches the product of the individual probabilities --- that X is asymptotically independent.

The form of the definition of the $ \beta $ -mixing coefficient suggests a straightforward though perhaps naive procedure: use nonparametric density estimation for the two marginal distributions as well as the joint distribution, and then calculate the total variation distance by numerical integration. This would be simple in principle, and could give good results; however, one would need to show not just that the procedure was consistent, but also learn enough about it that the generalization error bound could be properly adjusted to account for the additional uncertainty introduced by using an estimate rather than the true quantity. Initial numerical experiments on the naive are not promising, but we are pursuing a number of more refined ideas.

Bootstrap

An alternative to calculating bounds on forecasting error in the style of statistical learning theory is to use a carefully constructed bootstrap to learn about the generalization error. A fully nonparametric bootstrap for time series data uses the circular bootstrap reviewed in Lahiri, 2003. The idea is to wrap the data of length n around a circle and randomly sample blocks of length q. There are n possible blocks, each starting with one of the data points 1 to n. Politis and White (2004) give a method for choosing q. The following algorithm proposes a bootstrap for bounding the generalization error of a forecasting method.
  1. Take the time series, call it X. Fit a model $ \widehat{g}(X) $ , and calculate the in-sample risk, $ \widehat{R}(\widehat{g}(X)) $ .
  2. Repeat for B times:
    • Bootstrap a new series Y from X, which is several times longer than X. Call the initial segment, which is as long as X, $ Y_1 $ .
    • Fit a model to this, $ \widehat{g}_b(Y_1) $ , and calculate its in-sample risk, $ \widehat{R}(\widehat{g}_b(Y_1)) $ .
    • Calculate the risk of $ \widehat{g}_b(Y_1) $ on the rest of Y. Because the process is stationary and Y is much longer than X, this should be a reasonable estimate of the generalization error of $ \widehat{g}_b(Y_1) $ .
    • Store the difference between the in-sample and generalization risks.
  3. Find the $ 1-\eta $ percentile of the distribution of over-fits. Add this to $ \widehat{R}(\widehat{g}(X)) $ .

While intuitively plausible, there is no theory, yet, which says that the results of this bootstrap will actually control the generalization error. Deriving theoretical results for this type of bootstrap is the third component of our grant application.


Manual trackback: Economics Job Market Rumors [!]

Self-Centered; Enigmas of Chance; The Dismal Science

Posted by crshalizi at December 02, 2010 12:55 | permanent link

November 30, 2010

Books to Read While the Algae Grow in Your Fur, November 2010

Lois McMaster Bujold, Cryoburn
If you need someone to explain why the appearance of a new book in this series is a "Run, go read" event, see Jo Walton.
Sherlock
Easily the best Sherlock Holmes adaptation I've run across in a long time; they do a great job of maintaining the characters and atmosphere, while updating the setting, and telling good stories. I also like that they make Holmes rather a jerk, which no doubt he would be. (Conjecture: this last was inspired by House, but then Dr. House is what you get if you hybridize Holmes and Dr. Watson, and update the IV cocaine habit. [Spoiler for a joke in the show: Fureybpx vf nqqvpgrq gb avpbgvar cngpurf.]) The cliff-hanger at the end of the season is monstrous — and come to think of it, doesn't "cliff-hanger" come from the original episode at the Reichenbach Falls?
Streaming for free from PBS through December 7.
Paul McAuley, The Quiet War
Space opera, confined to the solar system two centuries or so from now: terrestrial dynasties, with Green ideologies, against the democratic, decentralized, genetically-engineered inhabitants of the outer solar system. McAuley maneuvers a fairly large but also well-realized cast of characters (all sympathetically portrayed, even when not very nice) through a complicated and plausible world, or rather, worlds. One of the most striking parts of the novel is how he conveys a vivid sense of weird, stark beauty for the landscapes of the outer solar system. (I can't, obviously, say that he gets them right, though he's clearly tried, but he makes them feel right.) All of this is embedded in the matrix of a complex but fast-moving and engrossing plot, which ends at a natural point, though one open for a sequel (currently on its way to me). It is not so much mind candy as mind confectionery.
Aside, for those into scientifictional intertextuality: The Quiet War is obviously heavily indebted to Bruce Sterling's Shaper/Mechanist stories (now collected in Schismatrix Plus), as well as directly to Sterling's source material in Freeman Dyson. (In somewhat the same way, McAuley's superb Confluence trilogy [1, 2, 3] channels Gene Wolfe's Book of the New Sun, Jack Vance's Dying Earth, and a number of Big Artificial World stories.) But while the setting and themes owes much to the previous work, it's a solid and independently valuable re-fashioning of the material. I find the Shaper/Mechanist stories very compelling but viscerally unpleasant, and sometimes wonder if Sterling set out to illustrate Haldane's principle that "every biological innovation is a perversion". McAuley has more sympathy for his characters, and for his reader's sensibilities, and I can't recall anything in Sterling like McAuley's eerie moonscapes. On the other hand, The Quiet War lacks the sheer blow-your-head-off power and scope of Schismatrix.
Tim Shallice, From Neuropsychology to Mental Structure
A detailed consideration of how much can be learned about the organization of normal human minds from studying the deficits and pathologies of cognition produced by damage to the brain, i.e., from neuropsychology. Chapter 1 is historical; chapter 2 is an initial look at the issue of modularity, of associations between symptoms, and (more importantly) dissociations between them. To keep this from being an entirely abstract affair, chapters 3--8 cover specific syndromes in great detail: forms of short-term memory loss, [acquired] dyslexia, agraphia, other language disorders, and visual processing. The findings here are strange and fascinating, though, when I stop to think about what they meant for the people living through them, very sad.
Chapters 9 through 11 resume the methodological discussion. The central place here is taken by "double dissociations": if patient Alice can read words normally but cannot do arithmetic, and patient Bob, with a different lesion, can't read but can calculate, it is very natural to conclude that there must be different, functionally distinct, neural assemblages required for reading and for calculating --- some amount of anatomically-localized modularity. (If we only had a single dissociation, say Alice's pattern, then perhaps arithmetic is just harder than reading, in that it demands more of the same resources, which are impaired in her case by her lesion.) Shallice carefully considers non-modular architectures and what patterns of deficits they can produce, and (rightly, I think) finds it hard, though not quite impossible, to come up with ones that can produce a classical double dissociation. Shallice is also is very scrupulous about noting the assumptions which go in to drawing inferences about normal behavior: for example, Alice cannot, post-lesion, have learned a way of reading to a normal level of performance, but using parts of her brain that wouldn't ordinarily be involved in the action.
The remaining chapters turn to applying neuropsychological methods to "higher" or "more central" processes, such as visual attention, disorders of "central processing" (like acalculia), intentional movement and planning, memory, and conscious awareness. (They kick off with a very restrained rebuke to some astonishingly ignorant and fatuous remarks by the philosopher Jerry Fodor.) A vast amount has been done on all of these topics since 1988, making me wish there was an updated edition.
I find Shallice's methodological arguments convincing, though they leave me wanting more formalism and abstraction. (It feels like there should be some way of expressing the double-dissocation argument in more statistical terms, but a little toying around doesn't reveal it to me.) While the empirical findings are no doubt somewhat dated now (and I'd be very curious to learn his take on fMRI), I have not found a better exposition and defense of the methods of neuropsychology, or a better explanation of what it can offer cognitive science.
Update, next day: Fred Mailhot, in e-mail, points out some interest-looking papers [1, 2] on getting double dissociations without modularity, which I've yet to read.
Kurt Jacobs, Stochastic Processes for Physicists: Understanding Noisy Systems
I have mixed feelings; which will be elaborated upon in a review in Physics Today, so I'll hold off until that's out.
Update: And here it is.
Steven Berlin Johnson, Where Good Ideas Come From: The Natural History of Innovation
My remarks grew into a full review: Go to the Reef, Thou Dullard, and Consider Its Ways.
Simon Blackburn, Plato's Republic: A Biography
Not really a biography of the book, but rather an exposition by a good modern analytical philosopher. What Blackburn ends up saying about Plato is actually very close to what Popper wrote in The Open Society and Its Enemies, only without Popper's hostile tone. So I pretty much agree with Blackburn, but wish he'd displayed more hostility.
ObLinkage: Jo Walton on The Republic.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Minds, Brains, and Neurons; The Collective Use and Evolution of Concepts; Enigmas of Chance; Physics; Philosophy

Posted by crshalizi at November 30, 2010 23:59 | permanent link

November 28, 2010

The Singularity in Our Past Light-Cone

Attention conservation notice: Yet another semi-crank pet notion, nursed quietly for many years, now posted in the absence of new thoughts because reading The Half-Made World brought it back to mind.

The Singularity has happened; we call it "the industrial revolution" or "the long nineteenth century". It was over by the close of 1918.

Exponential yet basically unpredictable growth of technology, rendering long-term extrapolation impossible (even when attempted by geniuses)? Check.

Massive, profoundly dis-orienting transformation in the life of humanity, extending to our ecology, mentality and social organization? Check.

Annihilation of the age-old constraints of space and time? Check.

Embrace of the fusion of humanity and machines? Check.

Creation of vast, inhuman distributed systems of information-processing, communication and control, "the coldest of all cold monsters"? Check; we call them "the self-regulating market system" and "modern bureaucracies" (public or private), and they treat men and women, even those whose minds and bodies instantiate them, like straw dogs.

An implacable drive on the part of those networks to expand, to entrain more and more of the world within their own sphere? Check. ("Drive" is the best I can do; words like "agenda" or "purpose" are too anthropomorphic, and fail to acknowledge the radical novely and strangeness of these assemblages, which are not even intelligent, as we experience intelligence, yet ceaselessly calculating.)

Why, then, since the Singularity is so plainly, even intrusively, visible in our past, does science fiction persist in placing a pale mirage of it in our future? Perhaps: the owl of Minerva flies at dusk; and we are in the late afternoon, fitfully dreaming of the half-glimpsed events of the day, waiting for the stars to come out.

Manual trackback: Gearfuse; Random Walks; Text Patterns; The Daily Dish; The Slack Wire; Making Light (I am not worthy! Also, the Nietzsche quote is perfect); J. S. Bangs; Daily Grail; Crooked Timber; Peter Frase; Blogging the Hugo Winners; The Essence of Mathematics Is Its Freedom; der Augenblick; Monday Evening; The Duck of Minerva (appropriately enough)

The Great Transformation; Scientifiction and Fantastica

Posted by crshalizi at November 28, 2010 11:00 | permanent link

November 16, 2010

The Neutral Model of Inquiry (or, What Is the Scientific Literature, Chopped Liver?)

Attention conservation notice: 900 words of wondering what the scientific literature would look like if it were entirely a product of publication bias. Veils the hard-won discoveries of actual empirical scientists in vague, abstract, hyper-theoretical doubts, without alleging any concrete errors. A pile of skeptical nihilism, best refuted by going back to the lab.

I have been musing about the following scenario for several years now, without ever getting around to doing anything with it. Since it came up in conversation last month between talks in New York, now seems like as good a time as any to get it out of my system.

Imagine an epistemic community that seeks to discover which of a large set of postulated phenomena actually happen. (The example I originally had in mind was specific foods causing or preventing specific diseases, but it really has nothing to do with causality, or observational versus experimental studies.) Let's build a stochastic model of this. At each time step, an investigator will draw a random candidate phenomenon from the pool, and conduct an appropriately-designed study. The investigator will test the hypothesis that the phenomenon exists, and calculate a p-value. Let's suppose that this is all done properly (no dead fish here), so that the p-value is uniformly distributed between 0 and 1 when the hypothesis is false and the phenomenon does not exist. The investigator writes up the report and submits it for publication.

What happens next depends on whether the phenomenon has entered the published literature already or not. If it has, the new p-value is allowed to be published. If it has not, the report is published if, and only if, the p-value is < 0.05. This is the "file-drawer problem": finding a lack of evidence for a phenomenon is publication-worthy only if people thought it existed.

The community combines the published p-values in some fashion — reasonably exact solutions to this problem were devised by R. A. Fisher and Karl Pearson in the 1930s, leading to Neyman's smooth test of goodness of fit, but I have been told by a psychologist that "of course" one should just use the median of the published p-values. Different rules of combination will lead to slightly different forms of this model.

The last assumption of the model is that, sadly, none of the phenomena the community is interested in exist. All of their null hypotheses are, strictly speaking, true. Just as neutral models of evolution are ones which have all sorts of evolutionary mechanisms except selection, this is a model of the scientific process without discovery. Since, by assumption, everyone does their calculations correctly and honestly, if we could look at all the published and unpublished p-values they'd be uniformly distributed between 0 and 1. But the first published p-value for any phenomenon is uniformly distributed between 0 and 0.05. A full 2% of initial announcements will have an impressive-seeming (nominal) significance level of 10-3.

Of course, when people try to replicate those initial findings, their p-values will be distributed between 0 and 1. The joint distribution of p-values from the initial study and m attempts at replication will be a product of independent uniforms, one on [0, 0.05] and m of them on [0,1]. What follows from this will depend on the exact rule used to aggregate individual studies, and on doing some calculations I have never pushed through, so I will structure it as a series of "exercises for the reader".

  1. Pick your favorite meta-analytic rule for aggregating p-values. (If you do not have a favorite rule, one will be issued to you.) What is the distribution of the aggregate p-value after m replications?
  2. Say that a phenomenon is dropped from the literature when its aggregate p-value climbs above 0.05. Find the probability of being dropped as a function of m.
  3. Say that the lifespan of a phenomenon is the number of replications it receives before being dropped from the literature. (Under any sensible aggregation rule, the probability of being dropped will tend towards 1 as m increases, so lifespans will be finite.) Find the distribution of lifespans.
  4. Let us take any field of inquiry; say, to be diplomatic, haruspicy. Surveying all the published claims of phenomena in its literature, how many replications have they survived? Does this look at all different from the distribution of lifespans under the neutral model? How much nudging of marginal results below the 5% threshold would be needed to account for the discrepancy? (After all, "the difference between 'significant' and 'not significant' is not itself statistically significant".) Does the literature, in other words, provide any evidence that the discipline knows anything at all?
p < 1e-3 (two-lobed test)

Let me draw the moral. Even if the community of inquiry is both too clueless to make any contact with reality and too honest to nudge borderline findings into significance, so long as they can keep coming up with new phenomena to look for, the mechanism of the file-drawer problem alone will guarantee a steady stream of new results. There is, so far as I know, no Journal of Evidence-Based Haruspicy filled, issue after issue, with methodologically-faultless papers reporting the ability of sheeps' livers to predict the winners of sumo championships, the outcome of speed dates, or real estate trends in selected suburbs of Chicago. But the difficulty can only be that the evidence-based haruspices aren't trying hard enough, and some friendly rivalry with the plastromancers is called for. It's true that none of these findings will last forever, but this constant overturning of old ideas by new discoveries is just part of what makes this such a dynamic time in the field of haruspicy. Many scholars will even tell you that their favorite part of being a haruspex is the frequency with which a new sacrifice over-turns everything they thought they knew about reading the future from a sheep's liver! We are very excited about the renewed interest on the part of policy-makers in the recommendations of the mantic arts...

Update, later that same day: I meant to mention this classic paper on the file-drawer problem, but forgot because I was writing at one in the morning.

Update, yet later: sense-negating typo fixed, thanks to Gustavo Lacerda.

Manual trackback: Wolfgang Beirl; Matt McIrvin's Steam-Operated World of Yesteryear; Idiolect

Modest Proposals; Learned Folly; The Collective Use and Evolution of Concepts; Enigmas of Chance

Posted by crshalizi at November 16, 2010 01:30 | permanent link

November 10, 2010

"Statistics for the Past Millennium" (Tomorrow at the Statistics Seminar)

This should be interesting:

Julien Emile-Geay, "Statistics for the Past Millennium"
Abstract: In 1998, a seminal study by Mann, Bradley, and Hughes took advantage of climate signals embedded in an array of high-resolution paleoclimate proxy data to conclude that "Northern Hemisphere mean annual temperatures for three of the past eight years are warmer than any other year since (at least) AD 1400." The so-called "hockey stick" reconstruction showed relatively stable temperatures for most of the millennium, until the start of the Industrial Revolution, when reconstructed temperatures began a rise to a level not seen in the last millennium.
Since 2001, when the third assessment report by the IPCC featured the "hockey stick" prominently, this graph has become the emblem of the debate on anthropogenic global warming. No other picture conveys how anomalous recent climate change is in the context of natural variations in temperature over the past millennium. Defended as definitive proof of global warming by many climate scientists and sympathetic members of the public, hailed as a "misguided and illegitimate investigation" by some politicians, it remains one of the most hotly debated climate studies ever published. After a congressional inquiry was conducted under the aegis of the respectable Dr. Wegman, most statisticians are now convinced that the "hockey stick" is a fluke due to the overfitting of noisy data.
Have paleoclimatologists been wasting their time all along? In this talk, I will describe the most recent statistical methods used by climate scientists to reconstruct past climates; explain how their performance can be assessed in a realistic geophysical context; show that some climate scientists are, in fact, working hand-in-hand with professional statisticians, with some promising results.
Time and place: 4:30--5:30 pm on Thursday, 11 November 2010, in the Adamson Wing of Baker Hall (entry through 136)

As always, the seminar is free and open to the public, but I should probably add, considering the topic, that if you come and talk like a crazy person you will be ignored and/or mocked and rebuked.

Enigmas of Chance

Posted by crshalizi at November 10, 2010 11:00 | permanent link

November 09, 2010

Equilibrium in Bargaining Games, Idle Query Division

Speaking of listening to my inner economist: does Gary Becker charge graduate students a premium for supervising their dissertations? If not, shouldn't he, especially given the unusually high "degree of elite solidarity and hierarchical control over the placement of ... graduate students" in economics?

(And while I'm thinking about this, why hasn't anyone built PhDMeatMarket.com yet?)

The Dismal Science; Modest Proposals

Posted by crshalizi at November 09, 2010 10:10 | permanent link

November 08, 2010

36-402, Advanced Data Analysis, Spring 2011 (Course Announcement)

This is the undergraduate "advanced data analysis", not to be confused with the graduate projects course I'm teaching right now. Actually, they used to be much more similar, but due to the uncanny growth of the undergraduate major, I will have seventy or so students in 402, and all of them doing projects is more than we can cope with. (My inner economist says that the statistics department should leave the curriculum alone and just keep raising the threshold for passing our classes until the demand for being a statistics major balances the supply of faculty energy, as per Parkinson's "The Short List, or Principles of Selection", but fortunately no one listens to my inner economist.) So about a dozen will do projects in 36-490, as last year, and everyone will learn about methods.

36-402, Advanced Data Analysis, Spring 2011
Description: This course concentrates on methods for the analysis of data, building on the theory and application of the linear model from 36-401. Real-world examples will be drawn from a variety of fields.
Prerequisites: 36-401 (modern regression), or an equivalent class, with my permission.
Topics Tentative, and grouped by theme; presentation order will vary
Model evaluation: statistical inference, prediction, and scientific inference; in-sample and out-of-sample errors, generalization and over-fitting, cross-validation; evaluating by simulating; bootstrap; information criteria and their limits; mis-specification checks
Yet More Regression: regression = estimating the conditional expectation function; lightning review of ordinary least linear regression and what it is really doing; analysis of variance; limits of linear OLS; extensions: weighted least squares, basis functions; ridge regression and lasso.
Smoothing: kernel smoothing, including local polynomial regression; splines; additive models; classification and regression trees; kernel density estimation
GAMs: linear classifiers; logistic regression; generalized linear models; generalized additive models.
Latent variables and structured data: principal components; factor analysis and latent variables; graphical models in general; latent cluster/mixture models; random effects; hierarchical models
Causality: graphical causal models; estimating causal effects; discovering causal structure
Time and place: 10:30--11:50 Tuesdays and Thursdays in Porter Hall 100
Textbook: Julian Faraway, Extending the Linear Model with R (Chapman Hall/CRC Press, 2006, ISBN 978-1-58488-424-8) will be required. (Faraway's page on the book, with help and errata.) There may be other optional books.
Mechanics: nearly-weekly problem sets (mostly analyzing data sets, a little programming) will be due on Tuesdays; mid-term exam; final exam.
Computing: You will be expected, and in some assignments required, to use the R programming language. All assignments will need a computer. Let me know at once if this will be a problem.
Office hours: Monday 2--4 pm in Baker Hall 229C, or by appointment.

Update, 15 November: The class webpage will be here. Also: this is the same class as 36-608; graduate students should register under the latter number.

Corrupting the Young; Enigmas of Chance

Posted by crshalizi at November 08, 2010 16:20 | permanent link

October 31, 2010

Books to Read While the Algae Grow in Your Fur, October 2010 (Supp'd full with horrors edition)

Algernon Blackwood, In the Realm of Terror: Eight Haunting Tales
At his best when suggesting ways our world is about to open out on to some vast and mysterious and far from benign realm; I see why Lovecraft admired him, and his sentences are better than HPL's, less purple and bloated. But there's something very stuffy and theosophical about Blackwood which I find off-putting.
(And can anyone, in this day and age, take seriously the premise of "A Psychical Invasion", which is that eating a hash brownie sets you up for demonic possession? In fairness, this story does have some fine observation of feline personality.)
Felix Gilman, The Half-Made World
A splendidly-written high-fantasy western. (It is by no stretch of the imagination "steampunk".) Gilman takes great themes of what one might call the Matter of America — the encroachment of regimented industrial civilization, the hard-eye anarchic men (and women) of violence, the dream of not just starting the world afresh but of offering the last best hope of earth — and transforms the first two into warring rival pantheons of demons, the third into a noble lost cause. (I think Gilman knows exactly how explosive the last theme is, which is why he manages to handle it without setting it off.) Beneath and behind it all lies the continuing presence of the dispossessed original inhabitants of the continent. A story of great excitement and moment unfolds in this very convincing world, tying together an appealing, if believably flawed, heroine and two finely-rendered anti-heroes, told in prose that is vivid and hypnotic by turns. The story is complete in itself, but leaves open a return to the world, which I really hope will happen soon. The most natural point of comparison is Stephen King's The Dark Tower, especially The Gunslinger, which I love; this is more ambitious in its themes, sounder in its construction, and more satisfying in its execution.
The Half-Made World is the finest rendition I've ever seen of one of our core national myths; go read it.
Manual trackback: Felix Gilman; Crooked Timber
H. P. Lovecraft, Tales (ed. Peter Straub)
Lovecraft — Lovecraft! — gets a Library of America edition: a solid (850 pp.) and handsome volume, with a good selection (including At the Mountains of Madness), and decent end-notes. The stars must indeed be right...
Cherie Priest, Four and Twenty Blackbirds
Ghosts, twisted families, maniacal cultists, and a nice atmosphere of Southern twistedness. A little slow in places but it was a first novel, and I will definitely look for others.
Lauren Willig, Betrayal of the Blood Lily
Mind-candy — this time, jalebi.
George G. Roussas, Contiguity of Probability Measures: Some Applications in Statistics
Two sequences of probability measures, say Pn and Qn are "contiguous" when, any sequence of measurable sets An, the probabilities Pn(An) -> 0 if and only if Qn(An) -> 0. This is just slightly weaker than asymptotic mutual absolute continuity --- more exactly, the original sequences of measures can be approximated to arbitrary precision by a new pair of sequences which are mutually absolutely continuous (Theorem 1.5.1). Mutual absolute continuity lets us define Radon-Nikodym derivatives, which is to say, likelihood ratios, and these, as it turns out, can always be put into an exponential form. Assuming that the measures are parametric families of Markov processes, and that contiguity holds, Roussas constructs a clever theory of local approximation by exponential families (whether the original model has that form or not). From these approximation results he then derives locally optimal asymptotic procedures for estimating parameters, testing hypotheses about them, and constructing confidence intervals. Using these methods, results for inference on Markov processes are no harder than the special case of IID data.
A sure grasp of measure-theoretic probability, and of the development of ordinary statistical theory on that basis, are essential pre-requisites. (Someone who could read Schervish, or van der Vaart, or even Cramér, should be fine.) No particular knowledge of Markov processes is required.
Dexter, Season 4
Great as always, and illustrating this point. ROT-13'd spoiler for the very last few minutes: ABBBBBBBBBBB!
Rondo Keele, Ockham Explained
Biographical introduction to the thought of William of Ockham (or Occam, if you prefer the Latin rendition). Does a good job on explaining the background (Aristotle and medieval Catholicism), and clarifying what sort of parsimony principle Ockham advocated. (Cf.) While Keele is not always the sharpest tool in the philosophical shed*, it's readable and short, which is a major accomplishment in itself when dealing with the Scholastics.
*: For instance, he offers an extended critique (pp. 122--128) of Ockham's account of motion, which Keele glosses as follows (p. 118): "Ockham analyzes a sentence like 'X is in motion' as 'X is in a place at time t, and at t* != t, X is (continuously, without rest) in another place". (He never quotes a definition of motion from Ockham, that I can see. As stated, this rules out periodic motion, but let that pass.) On this basis, Ockham applies the razor and argues that, while things move, there is no need to posit that they do so by acquiring the "accidental form" of motion. Keele objects that, per Ockham, we cannot truly say that any body is moving at any instant of time, but it's impossible to occupy multiple places at multiple times in a single instant, let alone do so continuously. (Keele even says that Ockham's account conflicts with the differential calculus, and the possibility of calculating an instantaneous velocity by taking the time-derivative of position!) On this basis, Keele suggests that we go back to thinking of motion as an accident inhering in bodies.
The reader doubtless sees the fallacies already, but I will belabor the obvious, because I'm like that. There are many statements which simply do not refer to the configuration of matter at a single time, but are nonetheless true (or false) at particular times; especially statements about change, or for that matter about stasis. For instance, "William of Ockham had better eyesight in 1306" was presumably true in 1340, but it is not true just in virtue of the condition of his eyes in 1340. Or again, "William of Ockham was ordained 34 years ago" was also a true statement in 1340. Would Keele advocate for had-better-eyesight-in-1306 and ordained-34-years-ago as inhering accidental forms? I hope not. Statements about motion are statements about change, and so involve at least two times; statements about instantaneous derivatives are actually statements about positions at infinitely many times.
To conclude, Ockham's account of motion may have problems, but this is not one of them. Here ends the pedantic excursion.
Shirley Jackson, The Sundial
I loved this in college, but hadn't read in 17 years or more. I remembered the basic story: eccentric people awaiting the end of the world in the isolated great house with its elaborate gardens. I'd forgotten, if it ever registered, just how sad so much of it was; how matter-of-fact the uncanny events were; how shabbily the people treated each other; how disabused the auctorial voice was. And I seem to have made up an ending, though mostly from elements present in the story. Memory, in short, has played its usual tricks, and while this is a good book, in many ways it's not the good book I recalled.
Sadly, this seems to be out of print.
Fun fact: The Red Tree was a finalist for the Shirley Jackson award last year. (Did you know there was such a prize? I didn't, until Bookslut mentioned it somewhere.)
Caitlín R. Kiernan, The Red Tree
Rather classic New England horror, full of menacing history, ominous landscapes, and ambiguous events. With an added layer of sordid personal baggage on the part of the narrator. Lovecraftian themes are present, but more subordinate here than in other books I've read by Kiernan. ROT-13'd spoilers: Gur boivbhf zlgubf vagrecergngvba vf gung gur ubhfr vf ohvyg ngbc n arfg bs tubhyf, nf va "Cvpxzna'f Zbqry", be Xvreana'f bja Qnhtugre bs Ubhaqf. V nz abg fher gung guvf vf pbeerpg, be rira gung gurer vf n pbeerpg vagrecergngvba gb or bssrerq.

Books to Read While the Algae Grow in Your Fur; Scientifiction Fiction and Fantastica; Cthulhiana; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Philosophy

Posted by crshalizi at October 31, 2010 23:59 | permanent link

Halloween Message Spelled Out by Etheric Forces

Regular service will resume shortly after the AIStats deadline, but I wanted to mention one of the highlights of the trip, because it's only too apt for what I'll be doing in the meanwhile. When Danny Yee kindly showed me around Oxford, we not only got to see a fragment of the Difference Engine, but I also encountered the only technology for reliably obtaining correctly specified models. This will henceforth replace references to "the Oracle" or "the Regression Model Fairy" in my lectures; "for angels are very bright mirrors".

Now back to wrestling with the MSS.

Postcards

Posted by crshalizi at October 31, 2010 11:41 | permanent link

October 14, 2010

"Personalized Content Recommendation on Yahoo!" (Next Week at the Statistics Seminar)

Attention conservation notice: Of limited interest if you (1) will not be in Pittsburgh on Monday, or (2) do not use the Web.

One of the first things I have the students in data mining read is "Amazon.com Recommendations Understand Area Woman Better Than Husband", from America's finest news source. The topic for next week's seminar is how to harness the power of statistical modeling to make recommendation engines even more thoughtful, attentive, delightful and broad-minded (all qualities for which statisticians are, of course, especially noted in our personal lives).

Deepak K. Agarwal, "Personalized Content Recommendation on Yahoo!"
Abstract: We consider the problem of recommending content to users visiting a portal like Yahoo!. Content for each user visit is selected from an inventory that changes over time; our goal is to display content for billions of visits to Yahoo! to maximize overall user engagement measured through metrics like click rates, time spent, and so on. This is a bandit problem since there is positive utility associated with displaying content that currently have high variance. Each user can be interpreted as a separate bandit but they all share a common set of arms given by the content inventory.
Classical bandit methods are ineffective due to curse of dimensionality (millions of users, thousands of content items to choose from). We take a model based approach to the problem and reduce dimension by sharing parameters across bandits and arms. In this talk, we describe latent factor models that capture interactions between users and content through multiplicative random effects model. We describe scalable methods to fit such hierarchical models through a Monte Carlo EM approach. Approximate model fitting in a Map-Reduce framework for massive datasets (that cannot fit in memory) is also described.
Time and place: 4--5 pm on Monday, 18 October 2010, in Doherty Hall A310

As always, the seminar is free and open to the public.

Enigmas of Chance

Posted by crshalizi at October 14, 2010 19:57 | permanent link

October 09, 2010

Across the Icy Waters of the North Atlantic

I'll be traveling for much of the rest of the month to give talks.

18 October
"Homophily, Contagion, Confounding: Pick Any Three", Yahoo Labs New York.
"So, You Think You Have a Power Law Do You? Well Isn't That Special?", New York Machine Learning Meetup. [Slides (3 MB, PDF)]
(I am perfectly happy to give two talks in a day, if you give me a steady drip feed of caffeine.)
19 October
"The Computational Structure of Neuronal Spike Trains", Applied Mathematics Colloquium, Columbia University.
20 October
"When Bayesians Can't Handle the Truth", Statistics Seminar, Columbia University, 1 pm in Schermerhorn 963. Based overwhelmingly on this, but with a bit of that.
22 October
"Markovian, predictive, and conceivably causal representations of stochastic processes", at "Complexity and Statistics: Tipping Points and Crashes", Royal Statistical Society, London. (No paper, yet, but it grows out of ones like these.)
25 October
"Homophily, Contagion, Confounding: Pick Any Three", CabDyn Complexity Seminar, Oxford.
26 October
"The Computational Structure of Neuronal Spike Trains", Bristol Centre for Complexity Sciences.
27--28 October
Lecturing to the Centre's degree-program students on optimal prediction, self-organization and coherent structures.
29 October
"When Bayesians Can't Handle the Truth", Statistics Seminar, University of Bristol.

Between traveling, and needing to revise (or, in two cases, write) my talks, I am going to be an even worse correspondent than usual.

Self-Centered

Posted by crshalizi at October 09, 2010 09:50 | permanent link

October 07, 2010

London Lodgings?

Wise and well-traveled readers! Can anyone recommend a hotel in London which is (in decreasing priority) not too expensive, walking distance from 12 Errol Street, and convenient to public transport?

Update, 9 October: Thanks again to everyone who wrote with suggestions, and helped me find my hotel.

Self-centered

Posted by crshalizi at October 07, 2010 19:29 | permanent link

October 05, 2010

"From Statistical Learning to Game-Theoretic Learning" (Next Week at the Statistics Seminar)

Our usual assumption in statistics is that the world is capricious and haphazard, but is not trying fooling us. When we are fooled, and fall into error, it is due to fluctuations and our own intemperance, not to malice. We carry this attitude over to machine learning; when our models over-fit, we think it an excess of optimism (essentially, the winner's curse). Theologically, we think of evil as an absence (the lack of infinite data), rather than an independent and active force. Theoretical computer scientists, however, are traditionally rather more Manichean. They have developed a school of machine learning which tries to devise algorithms which are guaranteed to do well no matter what data the Adversary throws at them, somewhat misleadingly known as "online learning". (A fine introduction to this is Prediction, Learning, and Games.) There turn out to be important connections between online and statistical learning, and one of the leading explorers of those connections is

Alexander Rakhlin, "From Statistical Learning to Game-Theoretic Learning" (arxiv:1006.1138)
Abstract: Statistical Learning Theory studies the problem of estimating (learning) an unknown function given a class of hypotheses and an i.i.d. sample of data. Classical results show that combinatorial parameters (such as Vapnik-Chervonenkis and scale-sensitive dimensions) and complexity measures (such as covering numbers, Rademacher averages) govern learnability and rates of convergence. Further, it is known that learnability is closely related to the uniform Law of Large Numbers for function classes.
In contrast to the i.i.d. case, in the online learning framework the learner is faced with a sequence of data appearing at discrete time intervals, where the data is chosen by the adversary. Unlike statistical learning, where the focus has been on complexity measures, the online learning research has been predominantly algorithm-based. That is, an algorithm with a non-trivial guarantee provides a certificate of learnability.
We develop tools for analyzing learnability in the game-theoretic setting of online learning without necessarily providing a computationally feasible algorithm. We define complexity measures which capture the difficulty of learning in a sequential manner. Among these measures are analogues of Rademacher complexity, covering numbers and fat shattering dimension from statistical learning theory. These can be seen as temporal generalizations of classical results. The complexities we define also ensure uniform convergence for non-i.i.d. data, extending the Glivenko-Cantelli type results. A further generalization beyond external regret covers a vast array of known frameworks, such as internal and Phi-regret, Blackwell's Approachability, calibration of forecasters, global non-additive notions of cumulative loss, and more.
This is joint work with Karthik Sridharan and Ambuj Tewari.
Time and place: 4--5 pm on Monday, 11 October 2010, in Doherty Hall A310
As always, the seminar is free and open to the public.

(I know I got the Augustinian vs. Manichean learning bit from Norbert Wiener, but I cannot now find the passage.)

Enigmas of Chance

Posted by crshalizi at October 05, 2010 17:35 | permanent link

October 04, 2010

Could It Really Be a Coincidence, Comrades...

... that I get my eagerly-awaited copy of Red Plenty on the anniversary of the launch of Sputnik? Well yes actually of course it could be a coincidence. (Thanks to Henry Farrell for kindly procuring the book for me.)

Posted by crshalizi at October 04, 2010 13:55 | permanent link

September 30, 2010

Books to Read While the Algae Grow in Your Fur, September 2010

Fred Vargas, The Chalk Circle Man and This Night's Foul Work
Vastly entertaining, though the level of coincidences required for the mysteries to be solved — for the plot to go at all — is well above the regulation dose of a single million-to-one chance per month, and indeed only just below Adamsberg finding a confession in a bottle in a shark he just happens to catch and cut open.
Bérénice Geoffroy-Schneiter, Gandhara: The Memory of Afghanistan
Pretty photos of Gandharan art, with an introductory essay that emphasizes the French archaeological mission in Afghanistan. Dates from the end of 2000, with a last-minute addition about the destruction of the giant Buddhas. I doubt the essay would make much sense to anyone who didn't already know about the subject.
Arthur M. Hind, A History of Engraving and Etching: From the 15th Century to the Year 1914
Much old-fashioned art-historical information. Needs more pictures. (Did not answer my questions, but I hardly expected it to.)
John Layman and Rob Guillory, Chew: International Flavor
Mind-candy.
Lauren Willig, The Deception of the Emerald Ring, The Seduction of the Crimson Rose and The Temptation of the Night Jasmine
Karin Slaughter, Broken
Steffen L. Lauritzen, Extremal Families and Systems of Sufficient Statistics
A fascinating look at what can be done by starting with postulating certain sufficient statistics, and distributions of observables conditional on them, and then building the model class from these. In particular, a very special role is played by "extremal" distributions, which can be interpreted as ones which cannot be obtained as convex mixtures of other distributions in the same family, or, what turns out to be equivalent, the models where all the parameters are identified in the limit. Particularly nice results hold for models where the sufficient statistics take values in a semi-group, including powerful extension of the usual results about exponential families. All in an all, it's an excellent book with some rather profound statistical theory, but it's horrible to read math written on a typewriter. Someone really needs to re-set it in LaTeX and maybe put it on the arxiv.
Lauritzen's "Extreme Point Models in Statistics", Scandinavian Journal of Statistics 11 (1984): 65--91 (with discussions and reply) is selected highlights of the book, without proofs, details and extensions, but with decent typography. It's available via JSTOR.

Books to Read While the Algae Grow in Your Fur; The Pleasures of Detection; Writing for Antiquity; Enigmas of Chance; Afghanistan and Central Asia; Scientifiction and Fantastica

Posted by crshalizi at September 30, 2010 23:59 | permanent link

September 27, 2010

"Uniform Approximation of VC Classes" (This Week at the Statistics Seminar)

Something I have been meaning to post about is a series of papers by Terry Adams and Andrew Nobel, on the intersection of machine learning theory (in the form of Vapnik-Chervonenkis dimension and the Glivenko-Cantelli property) with stochastic processes, specifically ergodic theory. (arxiv:1010.3162; arxiv:1007.2964; arxiv:1007.4037) I am very excited by this work, which I think is extremely important for understanding learning from dependent data, and so very pleased to report that, this week, our seminar speaker is —

Andrew Nobel, "Uniform Approximation of VC Classes"
Abstract: The Vapnik-Chervonenkis (VC) dimension measures the ability of a family of sets to separate finite collections of points. The VC dimension plays a foundational role in the theory of machine learning and empirical processes. In this talk we describe new research concerning the structure of families with finite VC dimension. Our principal result states the following: for any family of sets in a probability space, either (i) the family has infinite VC dimenion or (ii) every set in the family can be approximated to within a given error by a fixed, finite partition. Immediate corollaries include the fact that families with finite VC dimenion have finite bracketing numbers, and satisfy uniform laws of large numbers for ergodic processes. We will briefly discuss analogous results for VC major and VC graph families of functions.
The talk will include definitions and examples of VC dimension and related quantities, and should be accessible to anyone familiar with theoretical machine learning.
This is joint work with Terry M. Adams.
Time and Place: 4:30--5:30 pm on Thursday, 30 September 2010, in the Adamson Wing, Baker Hall

Through poor planning on my part, I have a prior and conflicting engagement, though a very worthwhile one.

Enigmas of Chance

Posted by crshalizi at September 27, 2010 15:30 | permanent link

September 24, 2010

Limit Orders and Indirect Inference

My first Ph.D. student, Linqiao Zhao, jointly supervised with Mark Schervish, has just successfully defended her dissertation:

"A Model of Limit-Order Book Dynamics and a Consistent Estimation Procedure" [PDF, 2.8 M]
Abstract: A limit order is an order to buy or sell a certain number of shares of a financial instrument at a specified price or better. A market's limit order book collects all its outstanding limit orders, and changes through the arrival of orders, including matching new orders against old ones. Despite being extensively used in contemporary exchanges, the dynamics of limit-order books are still not understood well. In this thesis, we propose a minimal model for the dynamics of whole limit order books, based on a self-exciting stochastic process of order flows. However, the data available are not time series of entire books, but rather of small parts of the book, so that almost all of the data is missing, and very much not "missing at random". To fit our model to the actual data with its complicated, history-dependent censoring, we use a relatively new technique for simulation-based estimation, indirect inference. We extend this methodology, proving new theorems on the consistency and asymptotic normality of indirect inference under weaker conditions than those previously established.
The fitted model captures important features of observable data from limit-order books, and exhibits important advantages over existing benchmark models. We point out some of the remaining discrepancies between our model and the data, and discuss how the model could be modified to accommodate them.

This is the culmination of years of hard work and determination on Linqiao's part. I'm very proud to have helped. Congratulations, Dr. Zhao!

Enigmas of Chance; The Dismal Science; Kith and Kin

Posted by crshalizi at September 24, 2010 11:30 | permanent link

September 15, 2010

"Extracting Communities from Networks" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you are (1) in Pittsburgh on Monday and (2) care about the community discovery problem for networks, or general methods of statistical clustering.
Ji Zhu, "Extracting Communities from Networks" (arxiv:1005.3265)
Abstract: Analysis of networks and, in particular, discovering communities within networks has been a focus of recent work in several fields, with applications ranging from citation and friendship networks to food webs and gene regulatory networks. Most of the existing community detection methods focus on partitioning the network into cohesive communities, with the expectation of many links between the members of the same community and few links between different communities. However, many real-world networks contain, in addition to communities, a number of sparsely connected nodes that are best classified as "background". To address this problem, we propose a new criterion for community extraction, which aims to separate tightly linked communities from a sparsely connected background, extracting one community at a time. The new criterion is shown to perform well in simulation studies and on several real networks. We also establish asymptotic consistency of the proposed method under the block model assumption.
Joint work with Yunpeng Zhao and Liza Levina.
Time and place: 4--5 pm on Monday, 20 September 2010, in Doherty Hall A310.

As always, the talk is free and open to the public.

Networks; Enigmas of Chance

Posted by crshalizi at September 15, 2010 20:15 | permanent link

Consilience of Inductions

Shalizi and Tozier, 1999; Munroe, 2010.

Self-centered; Physics; Learned Folly

Posted by crshalizi at September 15, 2010 10:50 | permanent link

September 13, 2010

Brad DeLong Makes a Wishful Mistake

Attention conservation notice: Academics squabbling about abstruse points in social theory.

Chris Bertram, back from a conference where he heard Michael Tomasello talk about his interesting experiments on (in Bertram's words) "young children and other primates [supporting the view] that humans are hard-wired with certain pro-social dispositions to inform, help, share etc and to engage in norm-guided behaviour of various kinds", wonders about the implications of the fact that "work in empirical psychology and evolutionary anthropolgy (and related fields) doesn't — quelle surprise! — support anything like the Hobbesian picture of human nature that lurks at the foundations of microeconomics, rational choice theory and, indeed, in much contemporary and historical political philosophy."

Brad DeLong asserts that the microfoundations of economics point not to a Hobbesian vision of the war of all against all, but rather to Adam Smith's propensities for peaceful cooperation, especially through exchange. "The foundation of microeconomics is not the Hobbesian 'this is good for me' but rather the Smithian 'this trade is good for us,' and on the uses and abuses of markets built on top of the 'this trade is good for us' principle." Bertram objects that this isn't true, and others in DeLong's comments section further object that modern economics simply does not rest on this Smithian vision. DeLong replies: "Seems to me the normal education of an economist includes an awful lot about ultimatum games and rule of law these days..."

I have to call this one against DeLong — rather to my surprise, since I usually get more out of his writing than Bertram's. The fact is that the foundations of standard microeconomic models envisage people as hedonistic sociopaths [ETA: see below], and theorists prevent mayhem from breaking out in their models by the simple expedient of ignoring the possibility.

If you open up any good book on welfare economics or general equilibrium which has appeared since Debreu's Theory of Value (or indeed before), you will see a clear specification of what the economic agents care about: this is entirely a function of their own consumption of goods and services. Does any agent in any such model care at all about what any other agent gets to consume? No; it is a matter of purest indifference to them whether their fellows experience feast or famine; even whether they live or die. If one such agent has an unsatiated demand for potato chips, and the cost of one more chip will be to devastate innumerable millions, they simply are not equipped to care. (And the principle of Pareto optimality shrugs, saying "who are we to judge?") Arrow, Debreu and co. rule out by hypothesis any interaction between agents other than impersonal market exchange [ETA: or more exactly, their model does so], but the specification of the agents shows that they'd have no objection to pillage, or any preference for obtaining their consumption basket by peaceful truck, barter and commerce rather than fire, sword and fear.

Well, you might say, welfare economics and general equilibrium concern themselves with what happens once peaceful market systems have been established. Of course they don't need to put a "pillaging, not really my thing" term in the utility functions, since it would never come up. Surely things are better in game theory, which has long been seen to be the real microfoundations for economics?

In a word, no. If you ask why a von Neumann-Morgenstern agent refrains from pillaging, you get the answers that (1) the game is postulated not to have pillaging as an option, or (2) he is restrained by fear of some power stronger than himself, whether that power be an individual or an assembly. (Thus von Neumann: "It is just as foolish to complain that people are selfish and treacherous as it is to complain that the magnetic field does not increase unless the electric field has a curl.") Option (1) being obviously irrelevant to explaining why people obey the law, etc., we are left with option (2), which is the essence of all the leading attempts, within economics, to give microfoundations to such phenomena. This is very much in line with the thought of an eminent British moral philosopher — one can read the Folk Theorem as saying that Leviathan could be a distributed system — but that philosopher is not Dr. Smith.

One can defend the utility of the Hobbesian, game-theoretic vision, and though in my humble (and long-standing) opinion the empirical results on things like the ultimatum game mean that it can be no more than an approximation useful in certain circumstances, and ideas like those of Tomasello (and Smith) need to be taken very seriously. But of course those ideas are not part of the generally-accepted microfoundations of economics. This is why every graduate student in economics reads (something equivalent to) Varian's Microeconomic Analysis, but not Bowles's Microeconomics: Behavior, Institutions, and Evolution; would that they did. If you read Bowles, you will in fact learn a great deal about the ultimatum game, the rule of law, and so forth; in a standard microeconomics text you will not. I think the Hobbesian vision is wrong, but anyone who thinks that modern economics's micro-foundations aren't thoroughly Hobbesian is engaged in wishful thinking.

Update, 15 September: A reader observes, correctly, that actual sociopaths show much more other-regarding preferences than does Homo economicus (typically, forms of cruelty). I could quibble and gesture to dissocial personality disorder, but point taken.

Update, 24 December: In the comments at DeLong's, Robert Waldmann rightly chides me for conflating the actual social views of Arrow and Debreu with what they put into their model of general equilibrium. I have updated the text accordingly.

Manual trackback: Stephen Kinsella; Marginal Utility; Marc Kaufmann; Brad DeLong; Contingencies

*: Varian wrote a book, with Carl Shapiro, giving advice to businesses in industries with imperfect competition. The advice is to (1) extract as much as possible from the customer, to the point where they just barely prefer doing business with you to switching to a competitor or taking their marbles and going home, (2) disguise how much you will extract from your customers as much as possible, (3) participate in standards-setting and public policy formation, so as to ensure that the standards and policies will be to your commercial advantage as much as possible, and (4) generally engage in as much anti-competitive behavior as possible without risk of legal consequences. All this may in fact be sound advice for increasing the (more or less short-run) profits of such firms, but the premises are purely Hobbesian. Were there no risk of legal consequences, their arguments would extend straightforwardly to pillaging. The only reason Shapiro and Varian would counsel Apple against, say, running a phishing scam on everyone who bought a Macintosh would be that it was very likely they'd be caught, with adverse consequences; obviously if Apple made enough money from such a scam, Shaprio and Varian's arguments would say not only "go phish", but "lobby to make such phishing legal" (perhaps under the principle of caveat emptor).

The Dismal Science; Philosophy; The Natural Science of the Human Species

Posted by crshalizi at September 13, 2010 13:00 | permanent link

September 08, 2010

"Machine Learning for Computational Social Science" (This Week at the Machine Learning Seminar)

David Jensen, "Machine Learning for Computational Social Science"
Abstract: Research and applications in machine learning and knowledge discovery increasingly address some of the most fundamental questions of social science: What determines the structure and behavior of social networks? What influences consumer and voter preferences? How does participation in social systems affect behaviors such as fraud, technology adoption, or resource allocation? Often for the first time, these questions are being examined by analyzing massive data sets that record the behavior and interactions of individuals in physical and virtual worlds.
A new kind of scientific endeavor --- computational social science --- is emerging at the intersection of social science and computer science. The field draws from a rich base of existing theory from psychology, sociology, economics, and other social sciences, as well as from the formal languages and algorithms of computer science. The result is an unprecedented opportunity to revolutionize the social sciences, expand the reach and impact of computer science, and enable decision-makers to understand the complex systems and social interactions that we must manage in order to address fundamental challenges of economic welfare, energy production, sustainability, health care, education, and crime.
Computational social science suggests an impressive array of new tasks and technical challenges to researchers in machine learning and knowledge discovery. These include modeling complex systems with temporal, spatial, and relational dependence; identifying cause and effect rather than mere association; modeling systems with feedback; and conducting analyses in ways that protect the privacy of individuals. Many of these challenges interact in fundamental ways that are both surprising and encouraging. Together, they point to an exciting new future for machine learning and knowledge discovery.
Place and time: Gates-Hillman 6115, 1 pm on Thursday, 9 September 2010

Enigmas of Chance; Commit a Social Science

Posted by crshalizi at September 08, 2010 13:20 | permanent link

September 07, 2010

36-835, Paper of the Week

As threatened, I'll post links to the paper being discussed each week in the statistical modeling seminar. This will happen after the discussion, and my own brief comments here will also not be shared with the students beforehand. This should be an RSS feed for this page.

  1. Leo Breiman, "Statistical Modeling: The Two Cultures", Statistical Science 16 (2001): 199--231
    Comment: I remember being very excited by this paper when it came out. The students were less taken with it — "Of course you use cross-validation to check predictive performance, why does he feel like he has to say that?" In retrospect, I would say that what Breiman calls "data models" are very rarely serious scientific models of the data-generating mechanism, but more "algorithmic models" of a pre-computer age...
  2. Sarat C. Dass and Mingfei Li, "Hierarchical mixture models for assessing fingerprint individuality", Annals of Applied Statistics 3 (2009): 1448--1466
    Comment: This is interesting, but the big problem is that they did absolutely nothing to convince me that their model works. (Cf.) Consequently, why should I think that their estimates of false-identification probabilities are even roughly right? (Also, why not model a spatial point process as a spatial point process?)

Corrupting the Young; Enigmas of Chance

Posted by crshalizi at September 07, 2010 16:49 | permanent link

September 06, 2010

Putting Down That Which They Call Up (Labor Day, 2010)

[Capitalism] has created more massive and more colossal productive forces than have all preceding generations together. Subjection of Nature's forces to man, machinery, application of chemistry to industry and agriculture, steam-navigation, railways, electric telegraphs, clearing of whole continents for cultivation, canalisation of rivers, whole populations conjured out of the ground — what earlier century had even a presentiment that such productive forces slumbered in the lap of social labour?

Ghosts in the Hollow from Jim Lo Scalzo on Vimeo.

Posted by crshalizi at September 06, 2010 19:30 | permanent link

September 04, 2010

Links, Pleading to be Dumped

Attention conservation notice: Yet more cleaning out of to-be-blogged bookmarks, with links of a more technical nature than last time. Contains log-rolling promotion of work by friends, acquaintances, and senior colleagues.

Pleas for Attention

Wolfgang Beirl raises an interesting question in statistical mechanics: what is " the current state-of-the-art if one needs to distinguish a weak 1st order phase transition from a 2nd order transition with lattice simulations?" (This is presumably unrelated to Wolfgang's diabolical puzzle-picture.)

Maxim Raginsky's new blog, The Information Structuralist. Jon Wilkin's new blog, Lost in Transcription. Jennifer Jacquet's long-running blog, Guilty Planet.

Larry Wasserman has started a new wiki for inequalities in statistics and machine learning; I contributed an entry on Markov's inequality. Relatedly: Larry's lecture notes for intermediate statistics, starting with Vapnik-Chervonenkis theory. (It really does make more sense that way.)

Pleas for Connection

Sharad Goel on birds of a feather shopping together, on the basis of data set that sounds really quite incredible. "It's perhaps tempting to conclude from these results that shopping is contagious .... Though there is probably some truth to that claim, establishing such is neither our objective nor justified from our analysis." (Thank you!)

Mark Liberman on the Wason selection test. There is I feel something quite deep here for ideas that connect the meaning of words to their use, or, more operationally, test whether someone understands a concept by their ability to use it; but I'm not feeling equal to articulating this.

What it's like being a bipolar writer. What it's like being a schizophrenic neuroscientist (the latter via Mind Hacks).

Pleas for Correction

The Phantom of Heilbronn, in which the combined police forces of Europe spend years chasing a female serial killer, known solely from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier. Draw your own morals for data mining and the national surveillance state. (Via arsyed on delicious.)

Herbert Simon and Paul Samuelson take turns, back in 1962 beating up on Milton Friedman's "Methodology of Positive Economics", an essay whose exquisite awfulness is matched only by its malign influence. (This is a very large scan of a xerox copy, from the CMU library's online collection of Simon's personal files.) Back in July, Robert Solow testified before Congress on "Building a Science of Economics for the Real World" (via Daniel McDonald). To put it in "shorter Solow" form: I helped invent macroeconomics, and let me assure you that this was not what we had in mind. Related, James Morley on DSGEs (via Brad DeLong).

Pleas for Scholarly Attention

This brings us to the paper-link-dump portion of the program.

James K. Galbraith, Olivier Giovanni and Ann J. Russo, "The Fed's Real Reaction Function: Monetary Policy, Inflation, Unemployment, Inequality — and Presidential Politics", University of Texas Inequality Project working paper 42, 2007
A crucial posit of the kind of models Solow and Morley complain about above is that the central bank acts as a benevolent (and far-sighted) central planner. Concretely, they generally assume that the central bank follows some version of the Taylor Rule, which basically says "keep both the rate of inflation and the rate of real economic growth steady". What Galbraith et al. do is look at what actually predicts the Fed's actions. The Taylor Rule works much less well, it turns out, than the assumption that Fed policy is a tool of class and partisan struggle. It would amuse me greatly to see what happens in something like the Kydland-Prescott model with this reaction function.
Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer, "Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity", Journal of Machine Learning Research 11 (2010): 1709--1731
The Galbraith et al. paper, like a great deal of modern macroeconometrics, uses a structural vector autoregression. The usual ways of estimating such models have a number of drawbacks — oh, I'll just turn it over to the abstract. "Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR's. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data." (Disclaimer: Patrik is an acquaintance.)
Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, Gert R.G. Lanckriet, "Hilbert Space Embeddings and Metrics on Probability Measures", Journal of Machine Learning Research 11 (2010): 1517--1561
There's been a lot of work recently on representing probability distributions by representing them as points in Hilbert spaces, because really, who doesn't love a Hilbert space? (One can see this as both the long-run recognition that Wahba was on to something profound when she realized that splines became much more comprehensible in reproducing-kernel Hilbert spaces, and the influence of the kernel trick itself.) But there are multiple ways to do this, and it would be nicest if we could chose a representation which has useful probabilistic properties --- distance in the Hilbert space should be zero only when the distributions are the same, and for many purposes it would be even better if the distance in the Hilbert space "metrized" weak convergence, a.k.a. convergence in distribution. This paper gives comprehensible criteria for these properties to hold in a lot of important domains.
Robert Haslinger, Gordon Pipa and Emery Brown, "Discrete Time Rescaling Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of Neural Spiking", Neural Computation 22 (2010): 2477--2506
A broad principle in statistics is that if you have found the right model, whatever the model can't account for should look e completely structureless. One expression of this is the bit of folklore in information theory that an optimally compressed signal is indistinguishable from pure noise (i.e., a Bernoulli process with p=0.5). Another manifestation is residual checking in regression models: to the extent there are patterns in your residuals, you are missing systematic effects. One can make out a good case that this is a better way of comparing models than just asking which has smaller residuals. For example, Aris Spanos argues (Philosophy of Science 74 (2007): 1046--1066; PDF preprint) that looking for small residuals might well lead one to prefer a Ptolemaic model for the motion of Mars to that of Kepler, but the Ptolemaic residuals are highly systematic, while Kepler's are not.
Getting this idea into a usable form for a particular kind of data requires knowing what "structureless noise" means in that context. For point processes, "structureless noise" is a homogeneous Poisson process, where events occur at a constant rate per unit time, and nothing ever alters the rate. If you have another sort of point process, and you know the intensity function, you can use that to transform the original point process into something that looks just like a homogeneous Poisson process, by "time-rescaling" --- you stretch out the distance between points when the intensity is high, and squeeze them together where the intensity is low, to achieve a constant density of points. (Details.) This forms the basis for a very cute goodness-of-fit test for point processes, but only in continuous time. As you may have noticed, actual continuous-time observations are rather scarce; we almost always have data with a finite time resolution. The usual tactic has been to hope that the time bins are small enough that we can pretend our observations are in continuous time, i.e., to ignore the issue. This paper shows how to make the same trick work in discrete time, with really minimal modifications. (Disclaimer: Rob is an old friend and frequent collaborator, and two of the co-authors on the original time-rescaling paper are senior faculty in my department.)

And now, back to work.

Manual trackback: Beyond Microfoundations

Linkage; Enigmas of Chance; The Dismal Science; Minds, Brains, and Neurons; Physics; Networks; Commit a Social Science; Incestuous Amplification

Posted by crshalizi at September 04, 2010 11:05 | permanent link

September 01, 2010

On an Example of Vienneau's

Attention conservation notice: 1600+ dry, pedantic words and multiple equations on how some heterodox economists mis-understand ergodic theory.

Robert Vienneau, at Thoughts on Economics, has posted an example of a stationary but non-ergodic stochastic process. This serves as a reasonable prompt to follow up on my comment, a propos of Yves Smith's book, that the post-Keynesian school of economists seems to be laboring under a number of confusions about "ergodicity".

I hasten to add that there is nothing wrong with Vienneau's example: it is indeed a stationary but non-ergodic process. (In what follows, I have lightly tweaked his notation to suit my own tastes.) Time is indexed in discrete steps, and Xt = YZt, where Z is a sequence of independent, mean-zero, variance 1 Gaussian random variables (i.e., standard discrete-time white noise), and Y is a chi-distributed random variable (i.e., the square root of something which has a chi-squared distribution). Z is transparently a stationary process, and Y is constant over time, so X must also be a stationary process. However, by simulation Vienneau shows that the empirical cumulative distribution functions from different realizations of the process do not converge on a common limit.

In fact, the result can be strengthened considerably. Given Y = y, X is just Gaussian white noise with standard deviation y, so by the Glivenko-Cantelli theorem, the empirical CDF of X converges almost surely on the CDF of that Gaussian. The marginal distribution of Xt for each t is however a mixture of Gaussians of different standard deviations, and not a Gaussian. Conditionally on Y, therefore, the empirical CDF converges to the marginal distribution of the stationary process with probability 0. Since this convergence has conditional probability zero for every value of y, it has probability zero unconditionally as well. So Vienneau's process very definitely fails to be ergodic.

(Proof of the unconditionality claim: Let C be the indicator variable for the empirical CDF converging to the marginal distribution.
\[ 
\mathbf{E}\left[C|Y=y\right] = 0 
 \]
for all y, but
\[ 
\mathbf{E}\left[C\right] = \mathbf{E}\left[\mathbf{E}\left[C|Y=y\right]\right] 
 \]
by the law of total expectation.)

Two things, however, are worth noticing. First, Vienneau's X process is a mixture of ergodic processes; second, which mixture component is sampled from is set once, at the beginning, and thereafter each sample path looks like a perfectly well-behaved realization of an ergodic process. These observations generalize. The ergodic decomposition theorem (versions of which go back as far as von Neumann's original work on ergodic theory) states that every stationary process is a mixture of processes which are both stationary and ergodic. Moreover, which ergodic component a sample path is in is an invariant of the motion — there is no mixing of ergodic processes within a realization. It's worth taking a moment, perhaps, to hand-wave about this.

Start with the actual definition of ergodic processes. Ergodicity is a property of the probability distribution for whole infinite sequences X = (X1, X2, ... Xt, ... ). As time advances, the dynamics chop off the initial parts of this sequence of random variables. Some sets of sequences are invariant under such "shifts" — constant sequences, for instance, but also many other more complicated sets. A stochastic process is ergodic when all invariant sets either have probability zero or probability one. What this means is that (almost) all trajectories generated by an ergodic process belong to a single invariant set, and they all wander from every part of that set to every other part — they are "metrically transitive". (Because: no smaller set with any probability is invariant.) From this follows Birkhoff's individual ergodic theorem, which is the basic strong law of large numbers for dependent data. If X is an ergodic process, then for any (integrable) function f, the average of f(Xt) along a sample path, the "time average" of f, converges to a unique value almost surely. So with probability 1, time averages converge to values characteristic of the ergodic process.

Now go beyond a single ergodic probability distribution. Two distributions are called "mutually singular" if one of them gives probability 1 to an event which has probability zero according to the other, and vice versa. Any two ergodic processes are either identical or mutually singular. To see this, realize that two distributions must give different expectation values to at least one function; otherwise they're the same distribution. Pick such a distinguishing function and call it f, with expectation values f1 and f2 under the two distributions. Well, the set of sample paths where
\[ 
\frac{1}{n}\sum_{t=1}^{n}{f(X_t)} \rightarrow f_1 
 \]
has probability 1 under the first measure, and probability 0 under the second. Likewise, under the second measure the time average is almost certain to converge on f2, which almost never happens under the first measure. So any two ergodic measures are mutually singular.

This means that a mixture of two (or more) ergodic processes cannot, itself, be ergodic. But a mixture of stationary processes is stationary. So the stationary ergodic processes are "extremal points" in the set of all stationary processes. The convex hull of these extremal points are the set of stationary but non-ergodic processes which can be obtained by mixing stationary and ergodic processes. It is less trivial to show that every stationary process belongs to this family, that it is a mixture of stationary and ergodic processes, but this can indeed be done. (See, for instance, this beautiful paper by Dynkin.) Part of the proof shows that which ergodic component a stationary process's sample path is in does not change over time — ergodic components are themselves invariant sets of trajectories. The general form of Birkhoff's theorem thus has time averages converging to a random limit, which depends on the ergodic component the process started in. This can be shown even at the advanced undergraduate level, as in Grimmett and Stirzaker.

At this point, three notes seem in order.

  1. Many statisticians will be more familiar with a special case of the ergodic decomposition, which is de Finetti's result about how infinite exchangeable random sequences are mixtures of independent and identically-distributed random sequences. The ergodic decomposition is like that, only much cooler, and not tainted by the name of a Fascist. (That said, de Finetti's theorem actually covers Vienneau's example.)
  2. Following tradition, I have stated the ergodic decomposition above for stationary processes. However, it is very important that this limitation is not essential. The broadest class of processes I know of for which an ergodic decomposition holds are the "asymptotically mean-stationary processes". The defining property of such processes is that their probability laws converge in Cesaro mean. In symbols, and writing Pt for the law of the process from t onwards, we must have
    \[ 
\lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=1}^{n}{P_t(A)}} = P(A) 
 \]
    for some limiting law P. (I learned to appreciate the importance of AMS processes from Robert Gray's Probability, Random Processes and Ergodic Properties, and stole those ideas shamelessly for Almost None.) This allows for cyclic variation in the process, for asymptotic approach to a stationary distribution, for asymptotic approach to a cyclically varying process, etc. Every AMS process is a mixture of ergodic AMS processes, in exactly the way that every stationary process is a mixture of ergodic stationary processes.

    I actually don't know whether the ergodic decomposition can extend beyond this, but I suspect not, since the defining condition for AMS is very close to a Cesaro-mean decay-of-dependence property which turns out to be equivalent to ergodicity, namely that, for any two sets A and B
    \[ 
\lim_{n\rightarrow\infty}{\frac{1}{n}\sum_{t=0}^{n-1}{P_1(A \cap T^{-t} B)}} = P_1(A) P(B) 
 \]
    where T-t are the powers of the back-shift operator (what time series econometricians usually write L), so that T-tB are all the trajectories which will be in the set B in t time-steps. (See Lemma 6.7.4 in the first, online, edition, of Gray, p. 148). This means that, on average, the far future becomes unpredictable from the present.

  3. In light of the previous note, if dynamical systems people want to read "basin of attraction" for "ergodic component", and "natural invariant measure on the attractor" for "limit measure of an AMS ergodic process", they will not go far wrong.

As the last remark suggests, it is entirely possible for a process to be stationary and ergodic but to have sensitive dependence on initial conditions; this is generally the case for chaotic processes, which is why there are classic articles with titles like "The Ergodic Theory of Chaos and Strange Attractors". Chaotic systems rapidly amplify small perturbations, at least along certain directions, so they are subject to positive destabilizing feedbacks, but they have stable long-run statistical properties.

Going further, consider the sort of self-reinforcing urn processes which Brian Arthur and collaborators made famous as models of lock-in and path dependence. (Actually, in the classification of my old boss Scott Page, these models are merely state-dependent, and do not rise to the level of path dependence, or even of phat dependence, but that's another story.) These are non-stationary, but it is easily checked that, so long as the asymptotic response function has only a finite number of stable fixed points, they satisfy the definition of asymptotic mean stationarity given above. (I leave it as an exercise whether this remains true in a case like the original Polya urn model.) Hence they are mixtures of ergodic processes. Moreover, if we have only a single realization — a unique historical trajectory — then we have something which looks just like a sample path of an ergodic process, because it is one. ("[L]imiting sample averages will behave as if they were in fact produced by a stationary and ergodic system" — Gray, p. 235 of 2nd edition.) That this was just one component of a larger, non-ergodic model limits our ability to extrapolate to other components, unless we make strong modeling assumptions about how the components relate to each other, but so what?

I make a fuss about this because the post-Keynesians seem to have fallen into a number of definite errors here. (One may see these errors in e.g., Crotty's "Are Keynesian Uncertainty and Macrotheory Compatible?" [PDF], which however also has insightful things to say about conventions and institutions as devices for managing uncertainty.) It is not true that non-stationarity is a sufficient condition for non-ergodicity; nor is it a necessary one. It is not true that "positive destabilizing feedback" implies non-ergodicity. It is not true that ergodicity is incompatible with sensitive dependence on initial conditions. It is not true that ergodicity rules out path-dependence, at least not the canonical form of it exhibited by Arthur's models.

Update, 12 September: Fixed the embarrassing mis-spelling of Robert's family name in my title.

Manual trackback: Robert Vienneau; Beyond Microfoundations

Enigmas of Chance; The Dismal Science

Posted by crshalizi at September 01, 2010 11:50 | permanent link

Power Law Swag

The admirable Mason Porter, responding to a universal and critical demand, has started the Power Law Shop, celebrating my very favorite class of probability distributions in all the world. This is certainly the funniest thing to come out of the SAMSI complex networks workshop.

Manual trackback: The Monkey Cage; Structure and Strangeness; Quantum Chaotic Thoughts; Science after Sunclipse

Power Laws; Learned Folly

Posted by crshalizi at September 01, 2010 10:10 | permanent link

War Against the Bookmarks

Attention conservation notice: Clearing out my to-blog folder, limiting myself to stuff which isn't too technical and/or depressing.

The late Charles Tilly was, it appears, working on a world history of cities, states and trust networks when he died. The first chapter is online (open access), and makes me really regret that we'll never see the rest. It includes a truly marvelous depiction of the rise of the Mongol Empire, from Marco Polo:

Some time after the migration of the Tartars to [Karakorum], and about the year of our lord 1162, they proceeded to elect for their king a man who was named Chingis-khan, one of approved integrity, great wisdom, commanding eloquence, and eminent for his valour. He began his reign with so much justice and moderation, that he was beloved and revered as their deity rather than their sovereign; and the fame of his great and good qualities spreading over that part of the world, all the Tartars, however dispersed, placed themselves under his command. Finding himself thus at the head of so many brave men, he became ambitious of emerging from the deserts and wildernesses by which he was surrounded, and gave them orders to equip themselves with bows, and such other weapons as they were expert at using, from the habits of their pastoral life. He then proceeded to render himself master of cities and provinces; and such was the effect produced by his character for justice and other virtues, that wherever he went, he found the people disposed to submit to him, and to esteem themselves happy when admitted to his protection and favour.

John Emerson has a slightly different explanation: the culmination of a thousand years of increasingly sophisticated military rivalry in central Eurasia.

My hypothesis is that, for the last several decades during the twelfth century, northern China, Karakitai, the Silk Road between them, and the Mongolian and Manchurian hinterlands served as a pressure cooker or laboratory where strategy, tactics, and military organization were perfected during a period of constant warfare. The Jin Chinese fought against the Song Chinese and sometimes the Xixia or the Mongols, the Xixia fought against the Jin and the Mongols, the Mongols fought with the other two and with each other, and because they were busy with one another they put little pressure on the Karakitai farther west, who were able to concentrate on maintaining their hegemony in Central Asia.

The states in this zone (and the non-state Mongols) hardened up and improved their discipline, organization and skills during decades of practice wars, so that when Genghis Khan finally united the steppe, subjugated the Xixia, and neutralized the Jin (in part because Jin forces had been deserting to the Mongols), he had essentially won the military championship of the toughest league in the world, so that every army he met from then until the Mamluks in Egypt would be far inferior to his. When Genghis Khan gained control of this military high pressure zone, there was no one who could stop him. Furthermore, once Genghis Khan controlled a plurality of the steppe, there was a snowball effect when most of the remaining steppe peoples not allied to his enemies joined him (semi-voluntarily — the alternative was destruction).

Also from Emerson, a selection of Byzantine anecdotes. They really don't make political slanders like they used to, despite some people's best efforts.

Rajiv Sethi ponders The Astonishing Voice of Albert Hirschman; Steve Laniel reviews Exit, Voice, and Loyalty. As an application, consider the plight of would-be refugees from Facebook.

John Dewey writing on economics, economic policy and the financial collapse in 1932, under the rubric of "The Collapse of a Romance" (cached copy). Here Dewey sounds almost Austrian on the connection between uncertainty and the capitalist process — and accordingly condemns the latter as sheer gambling. (Cf.) This line was particularly nice: "Human imagination had never before conceived anything so fantastic as the idea that every individual is actuated in all his desires by an insight into just what is good for him, and that he is equipped with the sure foresight which will enable him to calculate ahead and get just what he is after."

Relatedly, my friend Chris Wiggins observed struggling to save at-risk youth.

Ken MacLeod on Apophatic atheology.

Fifteenth Century Peasant Romance Comics. (Hark, a Vagrant is generally a treasure.)

Ta-Nehisi Coates schools the Freakonomics crowd in the concept of "sample selection bias".

Kalashnikov wanted to be a poet; but war was interested in him.

"Genji, you skank!"

A visual history of lolcats since the 1800s.

Jordan Ellenberg on math in the age of Romanticism.

Becoming death, destroyer of mosquito worlds. How termites evolved from cockroach-like insects (not to be read while eating).

"This is why I'll never be an adult" is scarily perceptive --- "Internet FOREVER!", indeed (via unfogged). While on the subject of moral psychology, how to keep someone with you forever (via Edge of the American West).

Cool data-mining tricks for academic libraries. Via Magistra et Mater, seen elsewhere connecting Carolingian texts and social media.

Canadian engineers are much stranger than you'd think.

Oleg Grabar on the history of images of Muhammad in Islamicate culture (via Laila Lalami).

Akhond of Swat on "Ideas of India" and The Reading Life of Gandhi, Ambedkar and Nehru.

Southern literature, objectively defined and measured by Jerry Leath Mills:

My survey of around thirty prominent twentieth-century southern authors has led me to conclude, without fear of refutation, that there is indeed a single, simple, litmus-like test for the quality of southernness in literature, one easily formulated into a question to be asked of any literary text and whose answer may be taken as definitive, delimiting, and final. The test is: Is there a dead mule in it? As we shall see, the presence of one or more specimens of Equus caballus x asinus (defunctus) constitutes the truly catalytic element, the straw that stirs the strong and heady julep of literary tradition in the American South.

Jessa Crispin on the pleasures of reading about polar travel, while nowhere near the poles.

"Having a world unfold in one's head is the fundamental SF experience." (Pretty much everything Jo Walton writes is worth reading.)

Bruce Sterling on zombie romance: "Paranormal Romance is a tremendous, bosom-heaving, Harry-Potter-sized, Twilight-shaped commercial success. It sorta says everything about modern gender relations that the men have to be supernatural. It also says everything about humanity that we're so methodically training ourselves to be intimate partners of entities that aren't human."

The Demon-haunted world, or, the past and future of practical city magic.

Manual trackback: The Monkey Cage

Update, 4 September: fixed typos and accidentally-omitted link.

Linkage; Writing for Antiquity; The Commonwealth of Letters; Afghanistan and Central Asia; Scientifiction and Fantastica

Posted by crshalizi at September 01, 2010 09:50 | permanent link

August 31, 2010

Books to Read While the Algae Grow in Your Fur, August 2010

John A. Hall, Ernest Gellner: An Intellectual Biography
I've said my piece about Gellner himself elsewhere, and I'd just be repeating myself here if I went into that. Hall's book, appropriately for the intellectual biography of a major thinker, mixes relating the story of Gellner's life with an exposition and (fair) criticism of his ideas, more or less in chronological order. The tone is serious, the research into the historical and academic backgrounds of various phases of Gellner's life and thought obviously thorough, but the prose is quite readable --- though Hall wisely doesn't even try to match Gellner's style.
(The biggest surprises for me were learning about Gellner's osteoporosis, and the photos of how handsome he was as a young man, but then I have been a Gellnerian since a chance encounter with The Psychoanalytic Movement led me to spend the summer of '97 reading my way through all his books.)
See also Scott McLemee; thanks to Henry Farrell for letting me know about this book.
Julia Spencer-Fleming, I Shall Not Want
Laurence Gough, Killers
Partha Dasgupta, Economics: A Very Short Introduction
I wanted to like this a lot, and I can see that it does have some very nice features. It emphasizes that economics is about actual processes of production, distribution and exchange, not about abstract optimization theory, and that the goal is to improve the human condition, especially that of the most destitute. It makes clear that markets are one of many different economic institutions, which have important virtues in many circumstances, but aren't the end-all and be-all of the subject. It puts a lot of justified emphasis on environmental issues (Dasgupta's professional specialty), and is appropriately skeptical of fellow economists pushing "efficiency" as an end in itself. It is clear and (mostly) correct. Someone who doesn't know any economics would in fact learn a lot from it, and be better prepared both to understand economists and to learn more.
I didn't even dislike reading it. I just didn't like doing so at all, and I'm not sure why --- some incompatiblity of style, or over-familiarity with the subject on my part, maybe. It's probably worth your checking out, if a brief primer on modern economics sounds interesting.
(The one error I noticed: pace what Dasgupta says on p. 78, Oskar Lange was not a "market socialist" because he argued that an ideal central planner, with perfect information, could be as efficient as an idealized market. [If anything, that is an obvious truth about the neo-classical set-up.] Rather, as is easily checked from Lange's papers [I, II], he wanted the socialist economy to actually use markets (and plus a procedure which is basically a sped-up simulation of a Walrasian market), precisely in order to overcome critics like Hayek and von Mises. ["A statue of Professor Mises ought to occupy an honourable place in the great hall of the Ministry of Socialisation or of the Central Planning Board of the socialist state."] In fact, I think it is fair to say that Hayek's two papers on economics and knowledge, while permanent contributions to social science, do not adequately deal with the market socialist idea. But I have written too much about this elsewhere, including the real issues with Lange's proposal, and this is a mere page in Dasgupta's book.)
Shamini Flint, Inspector Singh Investigates: A Most Peculiar Malaysian Murder
Mind-candy. A well-constructed mystery, nice writing, and an adorable detective. Apparently there are at least two other books in the series, published abroad.
Pixu
Collaborative graphic novel about a haunted apartment house. Goes beyond "creepy" into "disturbing".
Don Marquis, Archy and Mehitabel
"There's life in the old dame yet."
Karl Sigmund, The Calculus of Selfishness
I'm reviewing this for American Scientist, so I won't say much right now. To preview: really good, though I am a little dubious about the specific model for public goods games. (It looks like a lot turns on the private utility of the public good going down proportional to the population size, which is not true for many public goods, such as light-houses, sanitation, policing, etc. Also, it's assumed that abstaining from participation is free, but providing, say, a private substitute for the police would be very, very expensive.)
— And the review is now out: Honor Among Thieves.
Lauren Willig, The Secret History of the Pink Carnation and The Masque of the Black Tulip
Mind-candy. Fun if you are willing to accept them on their own terms: bucklers are there to be swashed, bodices are there to be ripped, dungeons are there to be escaped from, and graduate students are there to uncover, well, secret histories. (One of these, admittedly, is not like the others.) Query presupposing a mild spoiler for Pink Carnation: Fvapr gur Checyr Tragvna'f vqragvgl orpnzr choyvp, naq Rybvfr jnf fghqlvat uvz vagrafryl, fubhyqa'g fur unir vzzrqvngryl erpbtavmrq gur znvqra anzr bs uvf jvsr, naq xarj jurer gung fgbel jnf urnqrq?
Josh Bazell, Beat the Reaper
Mind-candy; hilarious and griping crime/medical novel; I read it in one sitting. The climax was a truly remarkable instance of the gun casually set on the mantlepiece in Act I going off at the end.
Hope Larson, Gray Horses
Charming little fable about adventures in the dreamlands, coming of age, and Chicago Onion City.

Books to Read While the Algae Grow in Your Fur; The Pleasures of Detection; Commit a Social Science; The Dismal Science; Biology; Mathematics; Scientifiction and Fantastica

Posted by crshalizi at August 31, 2010 23:59 | permanent link

Annual Call to the Adobe Tower

Once again, the Santa Fe Institute is hiring post-docs. Once again, for sheer concentrated intellectual stimulation — not to mention views like this from your office window — there is no better position for an independent-minded young scientist with interdisciplinary interests. The official announcement follows:

The Omidyar Postdoctoral Fellowship at the Santa Fe Institute offers you:
  • unparalleled intellectual freedom
  • transdisciplinary collaboration with leading researchers worldwide
  • up to three years in residence in Santa Fe, NM
  • discretionary research and collaboration funds
  • individualized mentorship and preparation for your next leadership role
  • an intimate, creative work environment with an expansive sky
The Omidyar Fellowship at the Santa Fe Institute is unique among postdoctoral appointments. The Institute has no formal programs or departments. Research is collaborative and spans the physical, natural, and social sciences. Most research is theoretical and/or computational in nature, although it may include an empirical component. SFI typically has 15 Omidyar Fellows and postdoctoral researchers, 15 resident faculty, 95 external faculty, and 250 visitors per year. Descriptions of the research themes and interests of the faculty and current Fellows can be found at http://www.santafe.edu/research. Requirements:
  • a Ph.D. in any discipline (or expect to receive one by September 2011)
  • an exemplary academic record
  • a proven ability to work independently and collaboratively
  • a demonstrated interest in multidisciplinary research
  • evidence of the ability to think outside traditional paradigms
Applications are welcome from:
  • candidates from any country
  • candidates from any discipline
  • women and minorities, as they are especially encouraged to apply.
The Santa Fe Institute is an Equal Opportunity Employer.

Deadline: 1 November 2010
To apply: www.santafe.edu We accept online applications ONLY.
Inquiries: email to ofellowshipinfo at santafe dot edu

The Santa Fe Institute is a private, independent, multidisciplinary research and education center founded in 1984. Since its founding, SFI has devoted itself to creating a new kind of scientific research community, pursuing emerging synthesis in science. Operating as a visiting institution, SFI seeks to catalyze new collaborative, multidisciplinary research; to break down the barriers between the traditional disciplines; to spread its ideas and methodologies to other institutions; and to encourage the practical application of its results.

The Omidyar Fellowship at the Santa Fe Institute is made possible by a generous gift from Pam and Pierre Omidyar.

Complexity

Posted by crshalizi at August 31, 2010 20:00 | permanent link

August 26, 2010

36-757, Advanced Data Analysis: Teaching Handouts (Fall 2010)

The students are just starting on their projects, so, rather than say anything of substance, I try to extract the rational kernel from the traditional shell of the practices by which our cultural formation strives to reproduce itself. (Background.)

Syllabus and Orientation to the Course
Or, what the statistics department hopes to achieve by making you spend the academic year analyzing real data.
Some Advice on Process
Or, riding the big hairy research project.

Corrupting the Young; Enigmas of Chance

Posted by crshalizi at August 26, 2010 12:08 | permanent link

August 19, 2010

Overcoming the Binary (Next Week at the Statistics Seminar)

Every human relationship is a unique and precious snowflake, but do we treat them that way when we model them mathematically? No. No we do not. Join us next week to hear not just why this is wrong, but what to do instead. As always, the seminar is free and open to the public.

Joe Blitzstein, "Strengths of Ties in Network Modeling and Network Sampling"
Abstract: Measuring and modeling the strengths of ties in a social network has a long history, and an even longer history of being ignored. How much does it matter for inference if the strengths are discarded? Dichotomizing a network may seem to be an appealing simplification, but we show that it comes at a heavy cost, through quantifying the information loss. Closely related issues arise in respondent-driven sampling, a popular method for surveying ``hidden'' populations. We suggest ways to incorporate strength of tie information in this setting, comparing design-based and model-based estimation approaches in the context of an AIDS study.
Based on joint works with Sergiy Nesterko and Andrew Thomas.
Time and place: Monday, Aug. 23, 2010, 4:00--5:00 PM in Doherty Hall A310

Let me add that Prof. Blitzstein will be visiting us from the Bible college of a prophecy-obsessed, theocratic Puritan cult clinging to the rudiments of civilization in a plague-blasted post-apocalyptic wasteland*, so I expect a good turn out to show him how we do these things around here.

*: No, really.

Manual trackback (!): The Inverse Square

Enigmas of Chance; Networks

Posted by crshalizi at August 19, 2010 16:30 | permanent link

August 17, 2010

Fall 2010 Classes: 36-757 and 36-835

I will not be teaching data mining this fall; 36-350 is being taken over this year by my friend and mentor Chris Genovese. Instead, I will be teaching 36-757 (if you'd be interested, you're already in it*), and co-teaching 36-835 with Rob Kass. Here's the announcement for the latter:

36-835 Seminar on Statistical Modeling
First meeting: Tuesday, 24 August, 1:30 pm in Porter Hall A20A (organizational)
This course will be a weekly journal club on the principles and practice of statistical modeling, organized through the careful reading and group discussion of important recent papers. Readings will be selected by the class from sources such as JASA or Annals of Applied Statistics. Discussion will emphasize the relationship between scientific questions and statistical methods. Each week students will be required to post, in an online discussion group, one cogent question or comment about the reading, and will be required to participate in the discussion. Each student will also be responsible for leading at least one class discussion. The course is intended for graduate students in Statistics or Machine Learning. Others are welcome.

If there's interest, I'll post the reading list. Our first paper will definitely be Breiman's "Statistical Modeling: The Two Cultures" (Statistical Science 16 (2001): 199--231).

Update, 26 August: handouts for 757, which may be of broader interest.

Update, 7 September: There was interest in the 835 reading list.

*: This is the first half of "advanced data analysis", a year-long project our doctoral students do on analyzing data provided by an outside investigator, under the supervision of a faculty member. ADA culminates in the student presenting their findings in written and oral form, which serves as one of their three qualifying exams. The goal is to solve genuine scientific questions, not (or not just) to use the most shiny methodological toys. If you have some real-world data which need to be analyzed, and which seem like they might benefit from the attention of a very smart statistics graduate student, please get in touch. (I promise nothing.)

Corrupting the Young; Enigmas of Chance; Self-centered

Posted by crshalizi at August 17, 2010 14:57 | permanent link

July 31, 2010

Books to Read While the Algae Grow in Your Fur, July 2010

Stephen L. Morgan and Christopher Winship, Counterfactuals and Causal Inference: Methods and Principles for Social Research
The first reasonable introductory textbook on modern approaches to causal inference I have seen. (Books like Causation, Prediction and Search, or Pearl's Causality, are not suitable as textbooks.) It alternates between talking about counterfactual random variables and using graphical models (being clear that the latter have at least as much expressive power as the former). After the introduction, which gives a very nice tour of the rest of the book, the first few chapters cover a simple example of the kind of effect estimation we want to do; how to use conditioning and Pearl's back-door condition to control for other variables; matching methods and propensity scores; and regression and why it is problematic. They then turn to methods which might be applicable when adequate conditioning is not, like instrumental variables (about which, soundly, they are very dubious), Pearl's front-door criterion, and longitudinal and regression-discontinuity designs. (Their discussion of the front-door criterion draws very interesting links to the literature on explanation-by-mechanisms, as in Elster, Tilly, or indeed DeLanda, which I need to think about more.) Manski's partial identification approach also gets looked at. The last chapter is a sort of victory lap.
The implied reader of this book is a social scientist who likes quantitative data but is not very interested, and perhaps not very comfortable with, mathematical data; everything has been brought to the level where it can be followed, with a little work, by someone who remembers how to do ordinary least squares regression, but is fuzzy about why XTX controls the standard errors of the coefficient estimates*. Readers who know more statistical theory but not causal inference can, I think, just skip the worked numerical examples, and generally go faster through the book, but will still learn a lot, and not have to unlearn any of it later. (At no point did I notice any lies-told-to-children.) Non-social-scientists interested in what can be said about causal relationships from observational, non-experimental data will also find it useful.
Disclaimer: Winship is an editor at a journal where I have a paper under review.
*: Because it's the generalization of the sum-of-squares for the independent variable in univariate regression; and the more points you have from a line, and the more widely spaced they are, the better you know the slope of the line.
W. W. Tarn, The Greeks in Bactria and India
Teeth-grindingly Eurocentric, and erects massive conjectures on what seem to me to be the most flimsy evidential foundations (e.g., those Seleucid princesses!), but did a monumental job in surveying the evidence from Greek literature about the Hellenistic presence in what is now Central Asia, Afghanistan, Pakistan and India; and also the coins, as they were known in the 1930s. (He tries to bring in Indian and Chinese literature as well, but doesn't know the languages and is self-conscious about relying on translations.) Someone should really try integrating this with what we now know from archaeology; maybe they have.
Laura E. Reeve, Pathfinder
Mind-candy. Previously: 1, 2.
Sarah Vowell, The Wordy Shipmates
Otto J. Maenchen-Helfen, The World of the Huns: Studies in Their History and Culture
How on Earth do we know that any of these archaeological finds belong to Huns?
Laurence Gough, The Goldfish Bowl
First in the series. As hard-boiled as possible, under the circumstances.
Sauna
Creepy and moody historical horror movie. Not sure if some parts of it would be less weird if I were Finnish.

Books to Read While the Algae Grow in Your Fur; Writing for Antiquity; Afghanistan and Central Asia; Scientifiction and Fantastica; The Pleasures of Detection; The Beloved Republic; Enigmas of Chance

Posted by crshalizi at July 31, 2010 23:59 | permanent link

July 26, 2010

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting"

Attention conservation notice: 500 words on a student's thesis proposal, combining all the thrills of macroeconomic forecasting with the stylish vivacity of statistical learning theory. Even if you care, why not check back in a few years when the work is further along?

Daniel McDonald is writing his thesis, under the joint supervision of Mark Schervish and myself. I can use the present participle, because on Thursday he successfully defended his proposal:

"Generalization Error Bounds for State Space Models: With an Application to Economic Forecasting" [PDF]
Abstract: In this thesis, I propose to derive entirely data dependent generalization error bounds for state space models. These results can characterize the out-of-sample accuracy of many types of forecasting methods. The bounds currently available for time series data rely both on a quantity describing the dependence properties of the data generating process known as the mixing rate and on a quantification of the complexity of the model space. I will derive methods for estimating the mixing behavior from data and characterize the complexity of state space models. The resulting risk bounds will be useful for empirical researchers at the forefront of economic forecasting as well as for economic policy makers. The bounds can also be applied in other situations where state space models are employed.

Some of you may prefer the slides (note that Daniel is using DeLong's reduction of DSGEs to D2 normal form), or an even more compact visual summary:

Most macroeconomic forecasting models are, or can be turned into, "state-space models". There's an underlying state variable or variables, which evolves according to a nice Markov process, and then what we actually measure is a noisy function of the state; given the current state, future states and current observations are independent. (Some people like to draw a distinction between "state-space models" and "hidden Markov models", but I've never seen why.) The calculations can be hairy, especially once you allow for nonlinearities, but one can show that, asymptotically, maximum likelihood estimation, as well as various regularizations, have all the nice asymptotic properties one could want.

Asymptotic statistical theory is, of course, useless for macroeconomics. Or rather: if our methods weren't consistent even with infinite data, we'd know we should just give up. But if the methods only begin to give usably precise answers when the number of data points gets over 1024, we should give up too. Knowing that things could work with infinite data doesn't help when we really have 252 data points, and serial dependence shrinks the effective sample size to about 12 or 15. The wonderful thing about modern statistical learning theory is that it gives non-asymptotic results, especially risk bounds that hold at finite sample sizes. This is, of course, the reason why ergodic theorems, and the correlation time of US GDP growth rates, have been on my mind recently. In particular, this is why we are thinking about ergodic theorems which give not just finite-sample bounds (like the toy theorem I posted about), but can be made to do so uniformly over whole classes of functions, e.g., the loss functions of different macro forecasting models and their parameterizations.

Anyone wanting to know how to deal with non-stationarity is reminded that Daniel is proposing a dissertation in statistics, and not a solution to the problem of induction.

Enigmas of Chance; The Dismal Science; Incestuous Amplification

Posted by crshalizi at July 26, 2010 15:30 | permanent link

Social Carbon Banditry (Dept. of Modest Proposals for Keeping Civilization from Suffocating In Its Own Waste)

Attention conservation notice: A consideration of social banditry as a tool of climate-change policy. Sadly, this mockery apparently has about as much chance of actually helping as does action by the world's leading democracy.

Only on Unfogged would the comments on a post about visual penis jokes turn to a discussion of what, if anything, civil disobedience could do about climate change; but they did.

One of the goals of classic civil disobedience is to make maintaining an unjust institution costly, though I'm not sure how often it is put in these terms. Ordinarily, those who are disadvantaged or subordinated by a prevailing institution go along with it, they follow its norms and conventions without having to be forced. — whether because they accept those norms, or because they reasonably fear the retaliation that would come if they flouted them makes little difference. This makes maintaining the injustice a good deal for the oppressors: not only do they get the immediate benefits of the institution, they don't have to expend a lot of effort maintaining it. Mass civil disobedience disrupts this state of affairs. Even if the oppressors can live with the evidence of seeing that they are, in fact, the kind of people who will engage in brutality to retain their privileges, the time policemen spend working over Sunday-school teachers, etc., is time they do not spend patrolling the streets, catching burglars, etc. Mass civil disobedience, especially if prolonged, raises the cost of perpetuating injustice. The implicit challenge to Pharaoh is: "Are you really willing to pay what it takes to keep us in bondage?"

What does this suggest when it comes to climate change? Burning fossil fuels is not an act with any intrinsic moral significance. The trouble with it is that my burning those fuels inflicts costs on everyone else, and there is no mechanism, yet, for bringing those costs home to me, the burner. The issue is not one of unjust institutions, but of an unpriced externality. The corresponding direct action, therefore, is not making oppressors actually enforce their institutions, but internalizing the externality. I envisage people descending on oil refineries, coal mines, etc., and forcing the operators to hand over sums proportional to the greenhouse-gas contribution of their sales. What happened to the money afterwards would be a secondary consideration at best (though I wouldn't recommend setting it on fire). The situation calls not for civil disobedience but for social carbon banditry.

Of course, to really be effective, the banditry would need to be persistent, universal, and uniform. Which is to say, the banditry has to become a form of government again, if not necessarily a part of the state.

Modest Proposals; The Dismal Science; The Continuing Crises

Posted by crshalizi at July 26, 2010 14:30 | permanent link

July 09, 2010

"Inferring Hierarchical Structure in Networks and Predicting Missing Links" (Next Week at the [Special Summer Bonus] Statistics Seminar)

Attention conservation notice: Only of interest if you are (1) in Pittsburgh next Tuesday, and (2) care about statistical network modeling and community discovery. Also, the guest is a friend, collaborator and mentor; but, despite his undiscriminating taste in acquaintances, an excellent speaker and scientist.

Usually, during the summer the CMU statistics seminar finds a shaded corner and drowses through the heat, with no more activity than an occasional twitch of its tail. Next week, however, it rouses itself for an exceptional visitor:

Cristopher Moore, "Inferring Hierarchical Structure in Networks and Predicting Missing Links"
Abstract: Given the large amounts of data that are now becoming available on social and biological networks, we need automated tools to extract important structural features from this data. Moreoever, for many networks, observing their links is a costly and imperfect process — food webs require field work, protein networks require combining pairs of proteins in the laboratory, and so on. Based on the part of the network we have seen so far, we would like to make good guesses about what pairs of vertices are likely to be connected, so we can focus limited resources on those pairs.
I will present a Bayesian approach to this problem, where we try to infer the hierarchical structure of the network, with communities and subcommunities at multiple levels of organization. We start with a rich model of random networks of this type, and then use a Monte Carlo Markov Chain to explore the space of these models. This approach performs quite well on real networks, often outperforming simple heuristics such as assuming that two vertices with neighbors in common are likely to be connected. In particular, it can handle both "assortative" behavior like that seen in many social networks, and "disassortative" behavior as in food webs.
Joint work with Aaron Clauset and Mark Newman.
Place and time: Tuesday, July 13, 2010, 4:00--5:00 p.m. in Porter Hall 125B

As usual, the seminar is free and open to the public.

Networks; Enigmas of Chance; Incestuous Amplification

Posted by crshalizi at July 09, 2010 14:33 | permanent link

July 03, 2010

Variations on a Patriotic Theme

"They'd ask me, 'Raf, what abut this Revolution of yours? What kind of world are you really trying to give us?' I've had a long time to consider that question."

"And?"

"Did you ever hear the Jimi Hendrix Rendition of 'The Star-Spangled Banner'?"

Starlitz blinked. "Are you kidding? That cut still moves major product off the back catalog."

"Next time, really listen to that piece of music. Try to imagine a country where that music truly was the national anthem. Not weird, not far-out, not hip, not a parody, not a protest against some war, not for young Yankees stoned on some stupid farm in New York. Where music like that was social reality. That is how I want people to live...."

[Bruce Sterling, A Good Old-Fashioned Future, pp. 104--105]

"I wasn't born in America. In point of fact, I wasn't even born. But I work for our government because I believe in America. I happen to believe that this is a unique society. We have a unique role in the world."

Oscar whacked the lab table with an open hand. "We invented the future! We built it! And if they could design or market it a little better than we could, then we just invented something else more amazing yet. If it took imagination, we always had that. If it took enterprise, we always had it. If it took daring and even ruthlessness, we had it — we not only built the atomic bomb, we used it! We're not some crowd of pious, sniveling, red-green Europeans trying to make the world safe for boutiques! We're not some swarm of Confucian social engineers who would love to watch the masses chop cotton for the next two millennia! We are a nation of hands-on cosmic mechanics!"

"And yet we're broke," Greta said.

[Bruce Sterling, Distraction, p. 90]

The Beloved Republic

Posted by crshalizi at July 03, 2010 22:30 | permanent link

July 02, 2010

The World's Simplest Ergodic Theorem

Attention conservation notice: Equation-filled attempt at a teaching note on some theorems in mathematical probability and their statistical application. (Plus an oblique swipe at macroeconomists.)

The "law of large numbers" says that averages of measurements calculated over increasingly large random samples converge on the averages calculated over the whole probability distribution; since that's a vague statement, there are actually several laws of large numbers, from the various ways of making this precise. As traditionally stated, they assume that the measurements are all independent of each other. Successive observations from a dynamical system or stochastic process are generally dependent on each other, so the laws of large numbers don't, strictly, apply, but they have analogs, called "ergodic theorems". (Blame Boltzmann.) Laws of large numbers and ergodic theorems are the foundations of statistics; they say that sufficiently large samples are representative of the underlying process, and so let us generalize from training data to future or currently-unobserved occurrences.

Here is the simplest route I know to such a theorem; I can't remember if I learned it from Prof. A. V. Chubukov's statistical mechanics class, or from Uriel Frisch's marvellous Turbulence. Start with a sequence of random variables X1, X2, ... Xn. Assume that they all have the same (finite) mean m and the same (finite) variance v; also assume that the covariance, E[XtXt+h] - E[Xt] E[Xt+h], depends only on the difference in times h and not on the starting time t. (These assumptions together comprise "second-order" or "weak" or "wide-sense" stationarity. Stationarity is not actually needed for ergodic theorems, one can get away with what's called "asymptotic mean stationarity", but stationarity simplifies the presentation here.) Call this covariance ch. We contemplate the arithmetic mean of the first n values in X, called the "time average":

\[ 
A_n = \frac{1}{n}\sum_{t=1}^{n}{X_t} 
 \]

What is the expectation value of the time average? Taking expectations is a linear operator, so

\[ 
\mathbf{E}[A_n] = \frac{1}{n}\sum_{t=1}^{n}{\mathbf{E}[X_t]} = \frac{n}{n}m = m 
 \]
which is re-assuring: the expectation of the time average is the common expectation. What we need for an ergodic theorem is to show that as n grows, An tends, in some sense, to get closer and closer to its expectation value.

The most obvious sense we could try is for the variance of An to shrink as n grows. Let's work out that variance, remembering that for any random variable Y, Var[Y] = E[Y2] - (E[Y])2.


\begin{eqnarray*} 
\mathrm{Var}[A_n] & = & \mathbf{E}[A_n^2] - m^2\\ 
& = & \frac{1}{n^2}\mathbf{E}\left[{\left(\sum_{t=1}^{n}{X_t}\right)}^2\right] - m^2\\ 
& = & \frac{1}{n^2}\mathbf{E}\left[\sum_{t=1}^{n}{\sum_{s=1}^{n}{X_t X_s}}\right] - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{\mathbf{E}\left[X_t X_s\right]}} - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{ c_{s-t} + m^2}} - m^2\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{s=1}^{n}{ c_{s-t}}}\\ 
& = & \frac{1}{n^2}\sum_{t=1}^{n}{\sum_{h=1-t}^{n-t}{ c_h}} 
\end{eqnarray*}

This used the linearity of expectations, and the definition of the covariances ch. Imagine that we write out all the covariances in an n*n matrix, and average them together; that's the variance of An. The entries on the diagonal of the matrix are all c0 = v, and the off-diagonal entries are symmetric, because (check this!) c-h = ch. So the sum over the whole matrix is the sum on the diagonal, plus twice the sum of what's above the diagonal.

\[ 
\mathrm{Var}[A_n] = \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}} 
 \]

If the Xt were uncorrelated, we'd have ch = 0 for all h > 0, so the variance of the time average would be O(n-1). Since independent random variables are necessarily uncorrelated (but not vice versa), we have just recovered a form of the law of large numbers for independent data. How can we make the remaining part, the sum over the upper triangle of the covariance matrix, go to zero as well?

We need to recognize that it won't automatically do so. The assumptions we've made so far are compatible with a process where X1 is chosen randomly, and then all subsequent observations are copies of it, so that then the variance of the time average is v, no matter how long the time series; this is the famous problem of checking a newspaper story by reading another copy of the same paper. (More formally, in this situation ch = v for all h, and you can check that plugging this in to the equations above gives v for variance of An for all n.) So if we want an ergodic theorem, we will have to impose some assumption on the covariances, one weaker than "they are all zero" but strong enough to exclude the sequence of identical copies.

Using two inequalities to put upper bounds on the variance of the time average suggests a natural and useful assumption which will give us our ergodic theorem.


\begin{eqnarray*} 
\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}} & \leq & \sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{|c_h|}}\\ 
& \leq & \sum_{t=1}^{n-1}{\sum_{h=1}^{\infty}{|c_h|}} 
\end{eqnarray*}
Covariances can be negative, so we upper-bound the sum of the actual covariances by the sum of their magnitudes. (There is no approximation here if all covariances are positive.) Then we extend the inner sum so it covers all lags. This might of course be infinite, and would be for the sequence-of-identical-copies. Our assumption then is
\[ 
\sum_{h=1}^{\infty}{|c_h|} < \infty 
 \]
This is a sum of covariances over time, so let's write it in a way which reflects those units: $ \sum_{h=1}^{\infty}{|c_h|} = v T $ , where T is called the "(auto)covariance time", "integrated (auto)covariance time" or "(auto)correlation time". We are assuming a finite correlation time. (Exercise: Suppose that $ c_h = v e^{-h \tau} $ , as would be the case for a first-order linear autoregressive model, and find T. This confirms, by the way, that the assumption of finite correlation time can be satisfied by processes with non-zero correlations.)

Returning to the variance of the time average,


\begin{eqnarray*} 
\mathrm{Var}[A_n] & = & \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{\sum_{h=1}^{n-t}{c_{h}}}\\ 
& \leq & \frac{v}{n} + \frac{2}{n^2}\sum_{t=1}^{n-1}{v T}\\ 
& = & \frac{v}{n} + \frac{2(n-1) vT}{n^2}\\ 
& \leq & \frac{v}{n} + \frac{2 vT}{n}\\ 
& = & \frac{v}{n}(1+ 2T) 
\end{eqnarray*}
So, if we can assume the correlation time is finite, the variance of the time averages goes is O(n-1), just as if the data were independent. However, the convergence is slower than for independent data by an over-all factor which depends only on T. As T shrinks to zero, we recover the result for uncorrelated data, an indication that our approximations were not too crude.

From knowing the variance, we can get rather tight bounds on the probability of An's deviations from m if we assume that the fluctuations are Gaussian. Unfortunately, none of our assumptions so far entitle us to assume that. For independent data, we get Gaussian fluctuations of averages via the central limit theorem, and these results, too, can be extended to dependent data. But the assumptions needed for dependent central limit theorems are much stronger than merely a finite correlation time. What needs to happen, roughly speaking, is that if I take (nearly) arbitrary functions f and g, the correlation between f(Xt) and g(Xt+h) must go to zero as h grows. (This idea is quantified as "mixing" or "weak dependence".)

However, even without the Gaussian assumption, we can put some bounds on deviation probabilities by bounding the variance (as we have) and using Chebyshev's inequality:

\[ 
\mathrm{Pr}\left(|A_n - m| > \epsilon\right) \leq \frac{\mathrm{Var}[A_n]}{\epsilon^2} \leq \frac{v}{\epsilon^2} \frac{2T+1}{n} 
 \]
which goes to zero as n grows. So we have just proved convergence "in mean square" and "in probability" of time averages on their stationary expectation values, i.e., the mean square and weak ergodic theorems, under the assumptions that the data are weakly stationary and the correlation time is finite. There were a couple of steps in our argument where we used not very tight inequalities, and it turns out we can weaken the assumption of finite correlation time. The necessary and sufficient condition for the mean-square ergodic theorem turns out to be that, as one might hope,
\[ 
\lim_{n\rightarrow 0}{\frac{1}{n}\sum_{h=1}^{n}{c_h}} = 0 
 \]
though I don't know of any way of proving it rigorously without using Fourier analysis (which is linked to the autocovariance via the Wiener-Khinchin theorem; see chapters 19 and 21 of Almost None of the Theory of Stochastic Processes).

Reverting to the case of finite correlation time T, observe that we have the same variance from n dependent samples as we would from n/(1+2T) independent ones. One way to think of this is that the dependence shrinks the effective sample size by a factor of 2T+1. Another, which is related to the name "correlation time", is to imagine dividing the time series up into blocks of that length, i.e., a central point and its T neighbors in either direction, and use only the central points in our averages. Those are, in a sense, effectively uncorrelated. Non-trivial correlations extend about T time-steps in either direction. Knowing T can be very important in figuring out how much actual information is contained in your data set.

To give an illustration not entirely at random, quantitative macroeconomic modeling is usually based on official statistics, like GDP, which come out quarterly. For the US, which is the main but not exclusive focus of these efforts, the data effectively start in 1947, as what national income accounts exist before then are generally thought too noisy to use. Taking the GDP growth rate series from 1947 to the beginning of 2010, 252 quarters in all, de-trending, I calculate a correlation time of just over ten quarters. (This granting the economists their usual, but absurd, assumption that economic fluctuations are stationary.) So macroeconomic modelers effectively have 11 or 12 independent data points to argue over.

Constructively, this idea leads to the mathematical trick of "blocking". To extend a result about independent random sequences to dependent ones, divide the dependent sequence up into contiguous blocks, but with gaps between them, long enough that the blocks are nearly independent of each other. One then has the IID result for the blocks, plus a correction which depends on how much residual dependence remains despite the filler. Picking an appropriate combination of block length and spacing between blocks keeps the correction small, or at least controllable. This idea is used extensively in ergodic theory (including the simplest possible proof of the strong ergodic theorem) and information theory (see Almost None again), in proving convergence results for weakly dependent processes, in bootstrapping time series, and in statistical learning theory under dependence.

Manual trackback: An Ergodic Walk (fittingly enough); Thoughts on Economics

Update, 7 August: Fixed typos in equations.

Enigmas of Chance

Posted by crshalizi at July 02, 2010 13:40 | permanent link

July 01, 2010

Posed While the Algae Grow in Their Fur

The last post was really negative; to cleanse the palate, look at the Sloth Sanctuary of Costa Rica, dedicated to rescuing orphaned and imperiled sloths.

(Via Environmental Grafitti, via Matthew Berryman, and with thanks to John Emerson)

Linkage

Posted by crshalizi at July 01, 2010 14:40 | permanent link

June 30, 2010

Books to Read While the Algae Grow in Your Fur, June 2010

Yves Smith, Econned: How Unenlightened Self Interest Undermined Democracy and Corrupted Capitalism
I found this a bit of a frustrating read, actually, but I still recommend it overall. When it comes to the details how financial markets work, and for whom, and how that has changed over the years, it's very good. The criticisms of the economic profession are a mixed bag. On the moral point, that the economists have managed to secure a uniquely influential and privileged position among the social sciences (arguably among all the sciences), and have not risen to this by uniquely valuable and correct advice, or even by taking seriously and learning from their failures, she's correct. On the sheer insanity of a lot of neo-classical economics and its pretensions, especially as applied to finance, she is correct. Her most technical attacks fail, but I think those are not really needed for the arguments she wants to make. (More below.) When she discusses policy and the Obama Administration, there is something about her tone which I do not care for, though I think most of her actual positive suggestions are pretty good ideas. I suspect I would have liked this book more had I read less in the area beforehand.
(Smith complains about the neo-classicists reliance on assumptions of "ergodicity". But when she uses the term, she runs together (i) actual ergodicity, (ii) stationarity, (iii) homogeneity [as of a Markov process], (iv) [lack of] sensitive dependence on initial conditions, (v) the existence of a unique and rapidly attracting static equilibrium, (vi) [lack of] path dependence, (vii) [lack of] state dependence, (viii) [lack of] positive feedbacks, (ix) mixing or decay of correlations and (x) the existence of generating probability distribution, of which the actual historical trajectory of the economy is a realization. Those of us who work in the area have separated these concepts because they are in fact distinct, with complicated inter-relations, and if I take what she says about these matters literally it is a tissue of fallacies and equivocations. But Smith is merely being misled by her authorities, the so-called post-Keynesian political economists, who seem to have originated these errors. To repeat, I think these parts of the book could have been cut without any loss to the important messages.)
Amitav Ghosh, The Hungry Tide
About the tide country of Bengal; being an American innocent abroad; being an ineffectual left-wing Bengali intellectual; being a self-centered member of the modern Indian upper-middle class; being at the mercy of the elements. Also a very well-turned work of scientist-fiction. (I listened to the audio book while exercising; it was read well.)
Shirley Jackson, Novels and Stories
Shirley Jackson now has a Library of America edition, and I am well-pleased. Contents: The Lottery and Other Stories; The Haunting of Hill House; We Have Always Lived in the Castle; and some miscellaneous short stories. Those are almost certainly her two best novels — The Haunting of Hill House is flat-out one of the greatest pieces of fantastic literature ever — but, since I am greedy, I am a bit disappointed that they didn't fit in more of her novels (The Bird's Nest, say, or especially The Sundial). Still: Shirley Jackson now has a Library of America edition, and I am well-pleased.
Colin Martindale, The Clockwork Muse: The Predictability of Artistic Change
No purchase link because this is an anti-recommendation: life is short, ignore this. It's got some of the worst data analysis I have ever seen, and the argument rests entirely on those analyses. And yet people who know even less evidently take it seriously, perhaps because Martindale didn't realize he had no idea what he was doing and so presents his howlers as obviously correct. This book alone seems (if I can trust Google Scholar) to have over 200 citations. Why oh why can't we have a better republic of scholars?
(Thanks, of a sort, to Carlos Yu for finally getting me to read this.)
An elaboration of this snarl of contempt.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Minds, Brains, and Neurons; The Continuing Crises Writing for Antiquity; The Dismal Science; The Commonwealth of Letters; Learned Folly

Posted by crshalizi at June 30, 2010 23:59 | permanent link

In which Dunning-Krueger meets Slutsky-Yule, and they make music together

Attention conservation notice: Over 2500 words on how a psychologist who claimed to revolutionize aesthetics and art history would have failed undergrad statistics. With graphs, equations, heavy sarcasm, and long quotations from works of intellectual history. Are there no poems you could be reading, no music you could be listening to?

I feel I should elaborate my dismissal of Martindale's The Clockwork Muse beyond a mere contemptuous snarl.

The core of Martindale's theory is this. Artists, and still more consumers of art, demand novelty; they don't just want the same old thing. (They have the same old thing.) Yet there is also a demand, or a requirement, to stay within the bounds of a style. Combining this with a notion that coming up with novel ideas and images requires "regressing" to "primordial" modes of thought, he concludes

Each artist or poet must regress further in search of usable combinations of ideas or images not already used by his or her predecessors. We should expect the increasing remoteness or strangeness of similes, metaphors, images, and so on to be accompanied by content reflecting the increasingly deeper regression toward primordial cognition required to produce them. Across the time a given style is in effect, we should expect works of art to have content that becomes increasingly more and more dreamlike, unrealistic, and bizarre.

Eventually, a turning point to this movement toward primordial thought during inspiration will be reached. At that time, increases in novelty would be more profitably attained by decreasing elaboration — by loosening the stylistic rules that govern the production of art works — than by attempts at deeper regression. This turning point corresponds to a major stylistic change. ... Thus, amount of primordial content should decline when stylistic change occurs. [pp. 61--64, his emphasis; the big gap corresponds to some pages of illustrations, and not me leaving out a lot of qualifying text]

Reference to actual work in cognitive science on creativity, both theoretical and experimental (see, e.g., Boden's review contemporary with Martindale's work), is conspicuously absent. But who knows, maybe his uncritical acceptance of these sub-Freudian notions has lead in some productive direction; let us judge them by their fruits.

Here is Martindale's Figure 9.1 (p. 288), supposedly showing the amount of "primordial content" in Beethoven's musical compositions from 1795 through 1826, or rather a two-year moving average of this.

Let us leave to one side the very difficult questions of how to measure "primordial content"; Martindale, like too many psychologists, is slave to quite confused ideas about "construct validity". The dots are the moving averages, the solid black line is a guide to the eye, and the dashed line is a parabola fit to the moving averages. In the main text, Martindale combines the parabolic trend with a second order autoregression, getting the fitted model (p. 289)
PCt = -1.59 + 0.23t - 0.01 t2 + 0.58 PCt-1 - 0.55 PCt-2
which, he says, has an R2 of 50%. Primordial content is supposed to go up as an artist (or artistic community) "works out the possibilities of a style", but go down with a switch to a new, fresh style. Martindale tries (p. 289) to match up his peaks and troughs with what the critics say about the development of Beethoven's style, and succeeds to his own satisfaction, at least "in broad outline".

Now, here is the figure which was, so help me, the second run of some R code I wrote.

Here, however, instead of having people try to figure out how much primordial content there was in Beethoven's music, I simply took Gaussian white noise, with mean zero and variance 1, with one random number per year, and treated that exactly the same way that Martindale did: two-year moving averages, a quadratic fit over time (displayed), and a quadratic-plus-AR(2) over-all model, which kept 45% of the variance. My final fitted model was
PCt = -0.61 + 0.15t - 0.004 t2 + 0.63 PCt-1 - 0.51 PCt-2
Was this a fluke? No. When I repeat this 1000 times, the median R2 is 43%, and 28% of the runs have an R2 greater than what Martindale got. His fit is no better than one would expect if his measurements are pure noise.

What is going on here? All of the apparent structure revealed in Martindale's analysis is actually coming from his having smoothed his data, from having taken the two-year moving average. Remarkably enough, he realized that this could lead to artifacts, but brushed the concern aside:

One has to be careful in dealing with smoothed data. The smoothing by its very nature introduces some autocorrelation because the score for one year is in part composed of the score for the prior year. However, autocorrelations introduced by smoothing are positive and decline regularly with increase lags. That is not at all what we find in the case of Beethoven — or in other cases where I have used smoothed data. The smoothing is not creating correlations where non existed; it is magnifying patterns already in the data. [p. 289]

What this passage reveals is that Martindale did not understand the difference between the autocorrelation function of a time series, and the coefficients of an autoregressive model fit to that time series. (Indeed I suspect he did not understand the difference between correlation and regression coefficients in general.) The autoregressive coefficients correspond, much more nearly, to the partial autocorrelation function, and the partial autocorrelations which result from applying a moving average to white noise have alternating signs — just like Martindale's do. In fact, the coefficients he got are entirely typical of what happens when his procedure is applied to white noise:


Small dots: Autoregressive coefficients from 1000 runs of Martindale's analysis applied to white noise. Large X: his estimated coefficients for Beethoven.

I could go on about what has gone wrong in just the four pages Martindale devotes to Beethoven's style, but I hope my point is made. I won't say that he makes every conceivable mistake in his analysis, because my experience as a teacher of statistics is that there are always more possible errors than you would ever have suspected. But I will say that the errors he's making — creating correlations by averaging, confusing regression and correlation coefficients, etc. — are the sort of things which get covered in the first few lessons of a good course on time series. The fact that averaging white noise produces serial correlations, and a particular pattern of autoregressive coefficients, is in particular famous as the Yule-Slutsky effect, after its two early-20th-century discoverers. (Slutsky, interestingly, appears to have thought of this as an actual explanation for many apparent cycles, particularly of macroeconomic fluctuations under capitalism, though how he proposed to reconcile this with Marx I don't know.) I am not exaggerating for polemical effect when I say that I would fail Martindale from any class I taught on data analysis; or that every single one of the undergraduate students who took 490 this spring has demonstrated more skill at applied statistics than he does in this book.

Martindale's book has about 200 citations in Google Scholar. (I haven't tried to sort out duplicates, citation variants, and self-citations.) Most of these do not appear to be "please don't confuse us with that rubbish" citations. Some of them are from intelligent scholars, like Bill Benzon, who, through no fault of their own, are unable to evaluate Martindale's statistics, and so take his competence on trust. (Similarly with Dutton, who I would not describe as an "intelligent scholar".) This trust has probably been amplified by Martindale's rhetorical projection of confidence in his statistical prowess. (Look at that quote above.) — Oh, let's not mince words here: Martindale fashions himself as someone bringing the gospel of quantitative science to the innumerate heathen of the humanities, complete with the expectation that they'll be too stupid to appreciate the gift. For many readers, those who project such intellectual arrogance are not just more intimidating but also more credible, though rationally, of course, they shouldn't be. (If you want to suggest that I exploit this myself, well, you'd have a point.)

Could there be something to the idea of an intrinsic style cycle, of the sort Martindale (like many others) advocates? I actually wouldn't be surprised if there were situations when some such mechanism (shorn of the unbearably silly psychoanalytic bits) applies. In fact, the idea of this mechanism is much older than Martindale. For example, here is a passage from Marshall G. S. Hodgson's The Venture of Islam, which I happen to have been re-reading recently:

After the death of [the critic] Ibn-Qutaybah [in 889], however, a certain systematizing of critical standards set in, especially among his disciples, the "school of Baghdad". ... Finally the doctrine of the pre-eminence of the older classics prevailed. So far as concerned poetry in the standard Mudâi Arabic, which was after all, not spoken, puristic literary standards were perhaps inevitable: an artificial medium called for artificial norms. That critics should impose some limits was necessary, given the definition of shi`r poetry in terms of imposed limitations. With the divorce between the spoken language of passion and the formal language of composition, they had a good opportunity to exalt a congenially narrow interpretation of those limits. Among adîbs who so often put poetry to purposes of decoration or even display, the critics' word was law. Generations of poets afterwards strove to reproduce the desert qasîdah ode in their more serious work so as to win the critics' acclaim.

Some poets were able to respond with considerable skill to the critics' demands. Abû-Tammâm (d. c. 845) both collected and edited the older poetry and also produced imitations himself of great merit. But work such as his, however admirable, could not be duplicated indefinitely. In any case, it could appear insipid. A living tradition could not simply mark time; it had to explore whatever openings there might be for working through all possible variations on its themes, even the grotesque. Hence in the course of subsequent generations, taste came to favor an ever more elaborate style both in verse and in prose. Within the forms which had been accepted, the only recourse for novelty (which was always demanded) was in the direction of more far-fetched similes, more obscure references to educated erudition, more subtle connections of fancy.

The peak of such a tendency was reached in the proud poet al-Mutanabbi', "the would-be prophet" (915--965 — nicknamed so for a youthful episode of religious propagandizing, in which his enemies said he claimed to be a prophet among the Bedouin), who travelled whenever he did not meet, where he was, with sufficient honor for his taste. He himself consciously exemplified, it is said, something of the independent spirit of the ancient poets. Though he lived by writing panegyrics, he long preferred, to Baghdad, the semi-Bedouin court of the Hamdânid Sayf-al-dawlah at Aleppo; and on his travels he died rather than belie his valiant verses, when Bedouin attacked the caravan and he defended himself rather than escape. His verse has been ranked as the best in Arabic on the ground that his play of words showed the widest range of ingenuity, his images held the tension between fantasy and actuality at the tautest possible without falling into absurdity.

After him, indeed, his heirs, bound to push yet further on the path, were often trapped in artificial straining for effect; and sometimes they appear simply absurd. In any case, poetry in literary Arabic after the High Caliphal Period soon became undistinguished. Poets strove to meet the critics' norms, but one of the critics' demands was naturally for novelty within the proper forms. But such novelty could be had only on the basis of over-elaboration. This the critics, disciplined by the high, simple standards of the old poetry, properly rejected too. Within the received style of shi`r, good further work was almost ruled out by the effectively high standards of the `Abbâsî critics. [volume I, pp. 463--464, omitting some diacritical marks which I don't know how to make in HTML]

Now, it does not matter here what the formal requirements of such poetry were, still less those of the qasidah; nor is it relevant whether Hodgson's aesthetic judgments were correct. I quote this because he points to the very same mechanism — demand for novelty plus restrictions of a style leading to certain kinds of elaboration and content — decades before Martindale (Hodgson died, with this part of his book complete, in 1968), and with no pretense that he was making an original argument, as opposed to rehearsing a familiar one.

But there are obvious problems with turning this mechanism into the Universal Scientific Law of Artistic Change, as Martindale wants to do. Or rather problems which should be obvious, many of which were well put by Joseph (Abu Thomas) Levenson in Confucian China and Its Modern Fate:

Historians of the arts have sometimes led their subjects out of the world of men into a world of their own, where the principles of change seem interior to the art rather than governed by decisions of the artist. Thus, we have been assured that seventeenth-century Dutch landscape bears no resemblance to Breughel because by the seventeenth century Breughel's tradition of mannerist landscape had been exhausted. Or we are treated to tautologies, according to wich art is "doomed to become moribund" when it "reaches the limit of its idiom", and in "yielding its final flowers" shows that "nothing more can be done with it" — hece the passing of the grand manner of the eighteenth entury in Europe and the romantic movement of the nineteenth.

How do aesthetic valuies really come to be superseded? This sort of thing, purporting to be a revelation of cause, an answer to a question, leaves the question still to be asked. For Chinese painting, well before the middle of the Ch'ing period, with its enshrinement of eclectic virtuosi and connoisseurs, had, by any "internal" criteria, reached the limit of its idiom and yielded its final flowers. And yet the values of the past persisted for generations, and the fear of imitation, the feeling that creativity demanded freshness in the artist's purposes, remained unfamiliar to Chinese minds. Wang Hui was happy to write on a landscape he painted in 1692 that it was a copy of a copy of a Sung original; while his colleague, Yün Shou-p'ing, the flower-painter, was described approvingly by a Chi'ing compiler as having gone back to the "boneless" painting of Hsü Ch'ung-ssu, of the eleventh century, and made his work one with it. (Yün had often, in fact, inscribed "Hsü Ch'ung-ssu boneless flower picture" on his own productions.) And Tsou I-kuei, another flower-painter, committed to finding a traditional sanction for his art, began a treatise with the following apologia:

When the ancients discussed painting they treated landscape in detail but slighted flowering plants. This does not imply a comparison of their merits. Flower painting flourished in the northern Sung, but Hsü [Hsi] and Huang [Ch'üan] could not express themselves theoretically, and therefore their methods were not transmitted.

The lesson taught by this Chinese experience is that an art-form is "exhausted"when its practitioners think it is. And a circular explanation will not hold — they think so not when some hypothetically objective exhaustion occurs in the art itself, but when outer circumstances, beyond the realm of purely aesthetic content, has changed their subjective criteria; otherwise, how account for the varying lengths of time it takes for different publics to leave behind their worked-out forms? [pp. 40–41]

Martindale seems to be completely innocent of such considerations. What he brings to this long-running discussion is, supposedly, quantitative evidence, and skill in its analysis. But this is precisely what he lacks. I have only gone over one of his analyses here, but I claim that the level of incompetence displayed here is actually entirely typical of the rest of the book.

Manual trackback: Evolving Thoughts; bottlerocketscience

Minds, Brains, and Neurons; Writing for Antiquity; The Commonwealth of Letters; Learned Folly; Enigmas of Chance

Posted by crshalizi at June 30, 2010 15:00 | permanent link

June 28, 2010

Reminder: The Link Distribution of Weblogs Is Not a Power Law

For some reason, Clay Shirky's 2003 essay "Power Laws, Weblogs, and Inequality" seems to be making the rounds again. Allow me to remind the world that, at least as of 2004, the distribution of links to weblogs was definitely not a power law. Whether this matters to Shirky's broader arguments about the development of new media is a different question; perhaps all that's needed is for the distribution to be right skewed and heavy tailed. But the actual essay stresses the power law business, which is wrong.

If you have more recent data and would like an updated analysis, you can use our tools and do it yourself.

Enigmas of Chance; Networks; Power Laws

Posted by crshalizi at June 28, 2010 09:31 | permanent link

June 26, 2010

Praxis and Ideology in Bayesian Data Analysis

Attention conservation notice: 750+ self-promoting words about a new preprint on Bayesian statistics and the philosophy of science. Even if you like watching me ride those hobby-horses, why not check back in a few months and see if peer review has exposed it as a mass of trivialities, errors, and trivial errors?

I seem to have a new pre-print:

Andrew Gelman and CRS, "Philosophy and the Practice of Bayesian Statistics", arxiv:1006.3868
Abstract: A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypothetico-deductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science.
Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.

As the two or three people who still read this blog may recall, I have long had a Thing about Bayesianism, or more exactly the presentation of Bayesianism as the sum total of rationality, and the key to all methodologies. (Cf.) In particular, the pretense that all a scientist really wants, or should want, is to know the posterior probability of their theories — the pretense that Bayesianism is a solution to the problem of induction — bugs me intensely. This is the more or less explicit ideology of a lot of presentations of Bayesian statistics (especially among philosophers, economists* and machine-learners). Not only is this crazy as methodology — not only does it lead to the astoundingly bass-ackwards mistake of thinking that using a prior is a way of "overcoming bias", and to myths about Bayesian super-intelligences — but it doesn't even agree with what good Bayesian data analysts actually do.

If you take a good Bayesian practitioner and ask them "why are you using a hierarchical linear model with Gaussian noise and conjugate priors?", or even "why are you using that Gaussian process as your prior distribution over regression curves?", if they have any honesty and self-awareness they will never reply "After offering myself a detailed series of hypothetical bets, the stakes carefully gauged to assure risk-neutrality, I elicited it as my prior, and got the same results regardless of how I framed the bets" — which is the official story about operationalizing prior knowledge and degrees of belief. (And looking for "objective" priors is hopeless.) Rather, data analysts will point to some mixture of tradition, mathematical convenience, computational tractability, and qualitative scientific knowledge and/or guesswork. Our actual degree of belief in our models is zero, or nearly so. Our hope is that they are good enough approximations for the inferences we need to make. For such a purpose, Bayesian smoothing may well be harmless. But you need to test the adequacy of your model, including the prior.

Admittedly, checking your model involves going outside the formalism of Bayesian updating, but so what? Asking a Bayesian data analyst not just whether but how their model is mis-specified is not, pace Brad DeLong, tantamount to violating the Geneva Convention. Instead, it is recognizing them as a fellow member of the community of rational inquirers, rather than a dumb numerical integration subroutine. In practice, good Bayesian data analysts do this anyway. The ideology serves only to give them a guilty conscience about doing good statistics, or to waste time in apologetics and sophistry. Our modest hope is to help bring an end to these ideological mystifications.

The division of labor on this paper was very simple: Andy supplied all the worthwhile parts, and I supplied everything mistaken and/or offensive. (Also, Andy did not approve this post.)

*: Interestingly, even when economists insist that rationality is co-extensive with being a Bayesian agent, none of them actually treat their data that way. Even when they do Bayesian econometrics, they are willing to consider that the truth might be outside the support of the prior, which to a Real Bayesian is just crazy talk. (Real Bayesians enlarge their priors until they embrace everything which might be true.) Edward Prescott forms a noteworthy exception: under the rubric of "calibration", he has elevated his conviction that his prior guesses are never wrong into a new principle of statistical estimation.

Manual trackback: Andrew Gelman; Build on the Void; The Statistical Mechanic; A Fine Theorem; Evolving Thoughts; Making Sense with Facilitated Systems; Vukutu; EconTech; Gravity's Rainbow; Nuit Blanche; Smooth; Andrew Gelman again (incorporating interesting comments from Richard Berk); J.J. Hayes's Amazing Antifolk Explicator and Philosophic Analyzer; Manuel "Moe" G.

Bayes, anti-Bayes; Enigmas of Chance; Philosophy; Self-Centered

Posted by crshalizi at June 26, 2010 15:58 | permanent link

June 24, 2010

The Old Country, Back in the Day

In the late 1950s, my grandfather, Abdussattar Shalizi, was the president of the planning office in Afghanistan's ministry of planning; back then Afghanistan had a planning office and a ministry of planning which were not just jokes. During that time he wrote a book called Afghanistan: Ancient Land with Modern Ways, mostly consisting of his photographs of the signs of the country's progress. This was, as you might guess, a propaganda piece, but I can testify that it was an utterly sincere propaganda piece. So far as I know my grandfather did not erect any Potemkin factories, schools, houses, irrigation works, record stores, Girl Scout troops, or secure roads for his photographs. Re-reading the book now fills me with pity and, to be honest, anger.

But it is important to remember, when people ignorantly mutter about a country stuck in the 12th century, not just that the 12th century meant something very different there than it did in Scotland, but that 1960 in Afghanistan actually happened. So I am very pleased to see, via my brother, a photo essay in Foreign Policy, by Mohammad Qayoumi, consisting of scanned photos from my grandfather's book, with his original captions and Qayoumi's commentary. Go look.

(My plan to post something positive at least once a week was a total failure. I am contemplating requiring every merely-critical post to be paired with a positive one.)

Manual trackback: Gaddeswarup

Afghanistan and Central Asia

Posted by crshalizi at June 24, 2010 21:30 | permanent link

Confounded Divorce ("Why Oh Why Can't We Have a Better Press Corps?" Dept.)

Attention conservation notice: 1000+ words about how I am irritated by journalists being foolish, and about attempts at causal inference on social networks. As novel as a cat meowing or a car salesman scamming.

I have long thought that most opinion writers could be replaced, to the advantage of all concerned, by stochastic context-free grammars. Their readers would be no less well-informed about how the world is and what should be done about it, would receive no less surprise and delight at the play of words and ideas, and the erstwhile writers would be free to pursue some other trade, which did not so corrode their souls. One reason I feel this way is that these writers habitually make stuff up because it sounds good to them, even when actual knowledge is attainable. They have, as a rule, no intellectual conscience. Yesterday, therefore, if you had told me that one of their number actually sought out some social science research, I would have applauded this as a modest step towards a better press corps.

Today, alas, I am reminded that looking at research is not helpful, unless you have the skills and skepticism to evaluate the research. Exhibit A is Ross "Chunky Reese Witherspoon Lookalike" Douthat, who stumbled upon this paper from McDermott, Christakis, and Fowler, documenting an association between people getting divorced and those close to them in the social network also getting divorced. Douthat spun this into the claim that "If your friends or neighbors or relatives get divorced, you're more likely to get divorced --- even if it's only on the margins --- no matter what kind of shape your marriage is in." It should come as no surprise that McDermott et al. did not, in any way whatsoever, try to measure what shape peoples' marriages were in.

Ezra Klein, responding to Douthat, suggests that the causal channel isn't making people who are happy in their marriages divorce, but leading people to re-evaluate whether they are really happily married, by making it clear that there is an alternative to staying married. "The prevalence of divorce doesn't change the shape your marriage is in. It changes your willingness to face up to the shape your marriage is in." (In other words, Klein is suggesting that many people call their marriages "happy" only through the mechanism of adaptive preferences, a.k.a. sour grapes.) Klein has, deservedly, a reputation for being more clueful than his peers, and his response shows a modicum of critical thought, but he is still relying on Ross Douthat to do causal inference, which is a sobering thought.

Both of these gentlemen are assuming that this association between network neighbors' divorces must be due to some kind of contagion — Douthat is going for some sort of imitation of divorce as such, Klein is looking to more of a social learning process about alternatives and their costs. Both of them ignore the possibility that there is no contagion here at all. Remember homophily: People tend to be friends with those who are like them. I can predict your divorce from your friends' divorces, because seeing them divorce tells me what kind of people they are, which tells me about what kind of person you are. From the sort of observational data used in this study, it is definitely impossible to say how much of the association is due to homophily and how much to contagion. (The edge-reversal test they employ does not work.) It seems to be impossible to even say whether there is any contagion at all.*

To be clear, I am not castigating columnists for not reading my pre-prints; on balance I'm probably happier that they don't. But the logical issue of running together influence from friends and inference from the kind of friends you have is clear and well known. (Our contribution was to show that you can't escape the logic through technical trickery.) One would hope it would have occurred to people to ponder it before calling for over-turning family law, or saying, in effect, "You should stay together, for the sake of your neighbors' kids". I also have no problem with McDermott et al. investigating this. It's a shame that their data is unable to answer the causal questions, but without their hard work in analyzing that data we wouldn't know there was a phenomenon to be explained.

I hope it's obvious that I don't object to people pontificating about whatever they like; certainly I do enough of it. If people can get paying jobs doing it, more power to them. I can even make out a case why ideologically committed opinionators have a role to play in the social life of the mind, like so. It's a big complicated world full of lots of things which might, conceivably, matter, and it's hard to keep track of them all, and figure out how one's principles apply** — it takes time and effort, and those are always in short supply. Communicating ideas takes more time and effort and skill. People who can supply the time, effort and skill to the rest of us, starting from more or less similar principles, thereby do us a service. But only if they are actually trustworthy — actually reasoning and writing in good faith — and know what they are talking about.

(Thanks, of a kind, to Steve Laniel for bringing this to my attention.)

*: Arbitrarily strong predictive associations of the kind reported here can be produced by either mechanism alone, in the absence of the other. We are still working on whether there are any patterns of associations which could not be produced by homophily alone, or contagion alone. So far the answer seems to be "no", which is disappointing.

**: And sometimes you reach conclusions so strange or even repugnant that the principles they followed from come into doubt themselves. And sometimes what had seemed to be a principle proves, on reflection, to be more like a general rule, adapted to particular circumstances. And sometimes one can't articulate principles at all. All of this, too, could and should be part of our public conversation; but let me speak briefly in the main text.

(Typos corrected, 26 June)

Manual trackback: The Monkey Cage.

Networks; The Running Dogs of Reaction

Posted by crshalizi at June 24, 2010 20:45 | permanent link

June 01, 2010

Brush Your Teeth!

Attention conservation notice: Combines quibbles about what's in an academic paper on tooth-brushing with more quibbles about the right way to do causal inference.

Chris Blattman finds a new paper which claims not brushing your teeth is associated with higher risk of heart disease, and is unimpressed:

Toothbrushing is associated with cardiovascular disease, even after adjustment for age, sex, socioeconomic group, smoking, visits to dentist, BMI, family history of cardiovascular disease, hypertension, and diagnosis of diabetes.

...participants who brushed their teeth less often had a 70% increased risk of a cardiovascular disease event in fully adjusted models.

The idea is that inflamed gums lead to certain chemicals or clot risks.

In the past five days I've seen this study reported in five newspapers, half a dozen radio news shows, and several blogs. These researchers know how to use a PR firm.

Sounds convincing. What could be wrong there?

OH WAIT. MAYBE PEOPLE WHO BRUSH THEIR TEETH TWICE A DAY GENERALLY TAKE BETTER CARE OF THEMSELVES AND WATCH WHAT THEY EAT.

I'm consistently blown away by what passes for causal analysis in medical journals.

Now, I am generally of one mind with Blattman about the awfulness of causal inference in medicine — I must write up the "neutral model of epidemiology" sometime soon — but here, I think, he's being a bit unfair. (I have not read or listened to any of the press coverage, but I presume it's awful, because it always is.) If you read the actual paper, which seems to be open access, one of the covariates is actually a fairly fine-grained set of measures of physical activity, albeit self-reported. (I'm not sure why the didn't list it in the abstract.) It would be nice to have information about diet, and of course self-reports are always extra dubious for moralized behaviors like exercise. Still, it's not right to say, IN ALL CAPS, that the authors of the paper did nothing about this.

In fact, the real weakness of the paper is that they have a reasonably clear mechanism in mind, and enough information to test it, but didn't do so. As Blattman says, the idea is that not brushing your teeth causes tooth and gum disease, tooth and gum disease cause inflammatory responses, and inflammation causes heart disease. Because of this, the authors measured the levels of two chemical markers of inflammation, and found that they were positively predicted by not brushing, even adjusting for their other variables (including physical activity). So far so good. Following the logic of Pearl's front-door criterion, what they should have done next, but did not, was see whether conditioning on the levels of these chemical markers substantially reduced the dependence of heart disease on tooth brushing. (The dependence should be eliminated by conditioning on the complete set of chemicals mediating the inflammatory response.) This is what one would expect if that mechanism I mentioned actually works, but not if the association comes down to not brushing being a sign that one's an unhealthy slob.

The moral is: brush your teeth, for pity's sake, unless you want to end up like this poor soul.

Enigmas of Chance; The Natural Science of the Human Species

Posted by crshalizi at June 01, 2010 13:25 | permanent link

May 31, 2010

Books to Read While the Algae Grow in Your Fur, May 2010

Bruce Sterling, designed by Lorraine Wild, Shaping Things
I'll let the introduction speak for itself. — The graphic design is actually really very nice, though it doesn't call attention to itself.
John Stuart Mill, On Bentham and Coleridge
Finally read the old paperback volume of these two essays which I appear to have bought at a friends-of-the-library sale in 1997. (That had an introduction by F. R. Leavis, which I ignored.) Enjoyable, but it really did not succeed in convincing me to take Coleridge seriously. And of course when he wrote these essays Mill made his living as a functionary of the British Empire (in the form of the East India Company), and so had a personal stake in anti-democratic arguments — you can see him straining to find some way to evade the force of the idea that those who hold power ought to be accountable to those over whom it is wielded...
I realize that I shouldn't have laughed out loud reading Mill's contrast between the national characters of the Germans and the Italians, but come on.
S. N. Lahiri, Resampling Methods for Dependent Data
This is a thorough discussion of variants of the bootstrap for time series (chapters 2--11) and spatial random fields (chapter 12). Lahiri does not presume any previous knowledge of the bootstrap, though that would help; familiarity with theoretical statistics at the level of, say, Lehmann or Schervish is essential. A few proofs are referred to Lahiri's papers, otherwise this is self-contained, with an appendix reminding readers of the most essential results on stochastic processes.
Chapter 1 is an introduction to the idea of bootstrapping and a tour of the book. Chapter 2 discusses the forms of bootstrapping for time series, which are all based on the idea that rather than resampling individual observations, one needs to resampling whole blocks of consecutive data points. This preserves the dependence structure within each block, but messes it up at the transition between blocks; one therefore wants to let the length of the blocks grow as one gets more data. There are various ways of resampling blocks, with mostly but not entirely identical properties. Chapter 3 in particular discusses the estimation of sample means using these various block boostraps, assuming various sorts of strong mixing (most such results would carry over to cases with mere weak dependence). Chapter 4 shows how to reduce other problems to ones of sample means for transformations of the data. (I hadn't realized that a lot of the techniques for smooth functionals in statistics go back to von Mises.) Chapter 5 compares the first-order properties of the different bootstraps. Chapter 6 uses Edgeworth expansions to get at second-order properties; personally I found this chapter (unlike the others) nearly unreadable. (Since Edgeworth expansions are about series expansions for generating functions obeying certain combinatorial rules, it feels like it should be possible somehow to express them as Feynman diagrams, which would be a lot easier to grasp. If someone has done this, though, I can't find it.) Chapter 7 is about estimating how big the blocks should be, by more resampling. Chapters 8 and 9 are about alternatives to block bootstraps: either bootstrapping from parametric models (just linear ARMA models, chapter 8), or from the Fourier transform (chapter 9).
The previous theoretical results all rely on comparatively rapid decay of correlations and on the control of higher moments. Chapter 10 gives results on how to bootstrap when there is long-range dependence, and chapter 11 considers modifications for heavy-tailed data. (Also for maxima and minima, since the extreme values drawn from even light-tailed distributions tend to look heavy-tailed.) Interestingly, for both long-range dependence and heavy tails, one really needs the surrogate data to be, in an appropriate sense, smaller than the original --- trying to produce something just as large as the original time series turns out to lead to inconsistent estimates.
Finally, chapter 12 gives block bootstraps for spatial data, either on regular grids (in the limit of growing sampling regions of fixed shape) or irregularly spaced. The former is pretty straightforward, aside from annoyances about how to cover the edges of the sampling region. The latter is considerably more involved, but can handle both growing regions and increasingly dense sampling from a fixed region.
I am glad I read this, but I recommend it only for those with a serious interest in the theory of the bootstrap. (Even for them, chapter 6, oy.)
Laurence Gough, Silent Knives = Death on a No. 8 Hook and Hot Shots
Mind candy. Procedural series mystery set in Vancouver. I'd picked up a much later book in the series years ago (Heartbreaker) and loved it, but hadn't been able to lay hands on this more until I ran across this one, which I devoured instantly.
Quatermass and the Pit
Mind candy. I saw the 1950s BBC TV serial, rather than the 1967 movie remake the latter link indicates, this is very Lovecraftian science fiction: ancient alien strangeness implicated, in the most horrible way, in the oldest history of humanity. (Think "The Rats in the Walls" and At the Mountains of Madness) There is also I think some influence from Stapledon's great The Last and the First Men (ROT-13'd: fcrpvsvpnyyl, gur jnl uvf yngre fcrpvrf bs uhzna npuvrir gryrcnguvp cbjref guebhtu uloevqvmvat jvgu Znegvnaf).
The Objective and Red Sands
Mind candy. These are the only American movies (except arguably Iron Man) I've run across about the American experience in Afghanistan since 2001; both are horror movies, which probably means something. Some people would deduce from these that, by venturing into Afghanistan, we fear we are entangling ourselves with something very old and dangerous and better left alone; but I think that kind of criticism is B.S., and anyway that ship sailed long ago. (Pointers to other movies, especially ones not in this genre, would be appreciated.) ROT-13'd spoilers: Gur Bowrpgvir fhssref sebz gur snpg gung vs V gnxr frevbhfyl gur yrnq npgbe'f nccrnenapr va aneengvir-2002, ur jbhyq unir whfg orra fgnegvat gb funir jura gur PVN jnf jbexvat jvgu gur zhwnuvqrra. Zber shaqnzragnyyl, cybg fhssref sebz vgf vafvfgrapr ba fraqvat n zna gb qb n HNI'f wbo, gb fnl abguvat bs gur fho-iba-Qnavxra qrhf rk gevnathyb ng gur raq. Erq Fnaqf jnf na nygbtrgure fhcrevbe ubeebe zbivr, ohg znqr gur zvfgnxr bs fubjvat gur zbafgre ng gur raq; cergraq gung qvqa'g unccra.
Ta-Nehisi Coates, The Beautiful Struggle: A Father, Two Sons, and an Unlikely Road to Manhood
Coates grew up a year younger than me, and forty-odd miles north; this is a wonderfully-written memoir of what it was like to grow up a dreamy, head-in-the-clouds boy into fantasy stories and role-playing games in Maryland in the '80s, and it makes me remember that life very well. (Even some of what he says about his father struck a chord.) But in so, so many ways he and I might as well have grown up in different worlds, and that makes me angry on his behalf.
It's a beautiful book; read it.
Charles Tilly, Explaining Social Processes
A collection of Tilly's papers on the methods and objects of the social sciences and social history. I find them agreeable and sensible: Tilly stresses the importance of causal explanation of social phenomena by concatenating robust mechanisms; of path dependence; of networks, relations, and the aggregation of recurrent interactions into durable social structures; and the uselessness of thinking about "societies", or some kind of invariant pattern of "social change". His substantive research — on migration chains, "contentious" politics, and the development of the modern state — appears repeatedly, but appropriately, as examples. The essays are unedited and so overlap with each other (and even cite each other in the journal versions), but not so badly that I felt put upon.
That said, I was a bit disappointed in this collection, because, I guess, I was expecting a more systematic statement of Tilly's views on how social phenomena are put together and how they should be investigated. (He rightly insists that the two questions are linked.) In particular, methodological individualism seems to me to have a lot more going for it than Tilly allowed; even explanation by invariants is in better shape. As every school-child knows, when you put agents with invariant decision-making mechanisms in an environment which largely consists of other agents and let them go, their macroscopic behavior is generally path-dependent, and many models of small-scale behavior will only work as local approximations. (See, for instance.) Would Tilly say that this was nonetheless missing something? That assuming an individualistic and invariant infrastructure to explain relational and transient phenomena violates Occam's Razor? Distinguish this from what he meant by "methodological individualism" and "invariance"? I wish I knew; sadly, there will be no chance to find out.
— Many, but not all, of the essays reprinted in this book are available online.
Jack Vance, The Dying Earth
Re-read for the first time in more than a decade, it's still just as good as it always was. (Actually, I got this on CD to listen to while exercising, which worked surprisingly well, and then re-read it immediately after finishing the disc.)
Thomas Barfield, Afghanistan: A Cultural and Political History
The best synoptic history I know of (at least in English), running from the 17th century to the present. Barfield is an anthropologist who did ethnographic fieldwork in Afghanistan in the 1970s, so "culture" here really means social organization and widely-shared ideas about political legitimacy, rather than literature, music, etc.; pleasingly, one of his touchstones is ibn Khaldun. A full review will appear Any Day Now. In the meanwhile: strongly, strongly recommended if you have any reason to care about Afghanistan.
Kathleen George, Afterimage
Mind candy. Police procedural with really good characters, plus local color for Pittsburgh. (I think I can identify every restaurant, except I can't think of anywhere in Shadyside where someone could be watched from the second floor and get a gourmet pizza.) Third (?) book in a series; I'll be tracking down the others.
Jack Campbell, Victorious
Mind candy. A fitting and triumphal conclusion to the long anabasis.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Writing for Antiquity; Progressive Forces; Enigmas of Chance; Afghanistan and Central Asia; Commit a Social Science; The Continuing Crises; Philosophy; Cthulhiana

Posted by crshalizi at May 31, 2010 23:59 | permanent link

May 15, 2010

"I Don't Feel Like I Gotta Do Nothing"

And why not listen to Eilen Jewell while doing your nothing? (Also: allow me to recommend the Thunderbird Cafe for all your smoky blues-bar needs in Pittsburgh.)

Postcards

Posted by crshalizi at May 15, 2010 23:10 | permanent link

May 14, 2010

Special Demi-Issues on Network Data Analysis in Annals of Applied Statistics

The Annals of Applied Statistics is running a special issue on "modeling and analysis of network data", or rather is spreading it over the current issue and the next. Go look, starting with Steve Fienberg's introduction. You need to subscribe, but then you or your institution should subscribe to AoAS. (Alternately, you could wait about six months for them to show up on arxiv.org.)

Disclaimer: I am an associate editor of AoAS, and helped handle many of the papers for this section.

Networks; Enigmas of Chance; Incestuous Amplification

Posted by crshalizi at May 14, 2010 13:00 | permanent link

May 09, 2010

The Atlantic's Observance of Confederate History Month

Continuing, or in some cases reviving, long-standing but utterly unwelcome customs, several southern states declared April "Confederate History Month". The occasion redeemed itself by provoking a long series of posts from Ta-Nehisi Coates at The Atlantic, each of which "observ[s]e some aspect of the Confederacy—but through a lens darkly". These begin with one whose peroration is worthy of Mencken,

This is who they are—the proud and ignorant. If you believe that if we still had segregation we wouldn't "have had all these problems," this is the movement for you. If you believe that your president is a Muslim sleeper agent, this is the movement for you. If you honor a flag raised explicitly to destroy this country then this is the movement for you. If you flirt with secession, even now, then this movement is for you. If you are a "Real American" with no demonstrable interest in "Real America" then, by God, this movement of alchemists and creationists, of anti-science and hair tonic, is for you.
The whole of it is a moving, empathic, and thereby all the more devastating meditation on memory, pride, shame, racism, heroism, moral courage, myths, the great personalities of the Civil War, and the enduring legacy of one of America's two great founding sins; on just how it is that we can be a country where a month set aside to remember a heritage of treason in defense of slavery is intended as a time of celebration and not of soul-searching.

(Owing to the folly of that venerable magazine's web design, there doesn't seem to be a single page collecting them, but I think this is the entire sequence: 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21.)

(Incidentally, last week Coates asked his readers to explain financial derivatives to him, and this week he's move on to nuclear weapons. I speculate that if enough people buy his book, he is certain to not try out the business plan "1. Take a big position in the end-of-the-world trade; 2. Enrich uranium; 3. Profit!")

The Beloved Republic; Writing for Antiquity

Posted by crshalizi at May 09, 2010 09:00 | permanent link

April 30, 2010

Books to Read While the Algae Grow in Your Fur, April 2010

Jean-Guy Prévost, A Total Science: Statistics in Liberal and Fascist Italy
A history of the Italian statistical community (or, as he prefers, "field") from around 1900 through the fall of Fascism, with a brief glance at the immediate post-war era. This is not about the history of statistical technique, but about the development of statistics as an autonomous academic discipline, with pretensions of in fact being the key discipline for all empirical investigation, especially into social and biological matters. So we get a lot about university positions, internal disputes (as Prévost says, one mark of a field is precisely that there are recurring internal arguments, with well-worn positions), how methodology came to be seen as more important than either mathematical theory or applications, the conflict with political economy, etc. Naturally, this extends to looking at how the statistical establishment eagerly sought to serve the Fascist state, proposals for "corporatist" and "totalitarian" statistics, and the elaboration of Fascist ideology by leading statisticians, relying on their self-presentation as polymaths. (Tukey's line about how "The best thing about being a statistician is that you get to play in everyone else's backyard" assumes a new significance when you imagine it being uttered by a blackshirt.) In all this, including the last, Gini is the central figure; quite honestly he should have been purged after the war, but somehow escaped justice.
Some familiarity with the history of both Fascism and of the intellectual content is presupposed. If you have that, and are willing to tolerate a minimal (almost homeopathic) dose of Bourdieu, this provides a lot of interesting, if unhappy, food for thought.
(Prévost makes it clear how Gini's work on measuring inequality was in the tradition launched by Pareto's laws of income and wealth distribution. Cantelli was a friend and collaborator of Gini's, and the Glivenko-Cantelli theorem is the kind of result which would guarantee non-parametric consistency of estimated Gini coefficients from sample data. Was this what motivated Cantelli?)
Robert B. Reich, Supercapitalism: The Transformation of Business, Democracy, and Everyday Life
This is aiming to be something like The Affluent Society or The New Industrial State for modern times; it does a pretty good job. Basically, his argument is that Galbraith was more or less right about how the economy worked during the post-WWII golden age of capitalism: large, autonomous, oligopolistic firms more interested in continued steady growth, exploiting economies of scale, than anything else. JKG's mistake was in thinking this regime would continue. Reich sees Galbraithian capitalism as being upset not so much through deliberate political action in the 1980s as through new technologies in the 1970s, especially improvements in logistics, communications and information technology, which made it possible and efficient to replace the vertically-integrated firm with global supply networks, and to replace investment financed out of retained earnings with global financial markets. (As Reich points out in some detail, all of the key technologies, from container shipping through microelectronics and the Internet, were devised by the military-industrial-university complex to fight the Cold War; sowing the dragon's teeth, as it were.) Deregulation, to Reich's way of thinking, was more a consequence than a cause — the legal superstructure accommodating changes in the forces of production, though he doesn't use such language. The result, he says, is a system more responsive to consumer demand and to investors, but where most of the population sees no gains from economic growth, inequality soars, countervailing power evaporates, security is steadily eroded, and the primary check on the political influence of corporations is the opposing commercial interests of other corporations. (He also has an ingenious argument as to why decreasing regulation led to increasing lobbying.) This he calls "supercapitalism"; I dislike the term and will avoid it.
The way the system is set up, he says, the people running corporations simply have no choice but to do whatever they can to maximize profit in the short term; if they won't, they will shortly be replace by those who will. Calls for corporate social responsibility, still less trying to shame or pressure individual corporations, therefore misses the point. The goal, rather, has to be to change the laws under which all corporations must act, ultimately, to neuter corporations politically, and creating a non-corporate social safety net. (The idea that health insurance, for instance, should be provided by one's employer is just nuts.) Something he does not adequately address, though, is that laws and regulations must be enforced, which is hard to do when one of the two parties regards them as necessarily illegitimate...
So, criticisms: (1) As I hinted, I think Reich underplays the role of ideology and political action, in favor of technological developments and market forces. It would be interesting to try to synthesize this with Krugman's take in Conscience of a Liberal. (2) There are some bits where the economics is a bit odd. For instance, economies of scale are certainly important in information production, just as in making steel. (Cf.) Arguably though the sheer magnitude of the fixed costs, and the time-scale, has shrunk, and that would be enough for Reich's argument. Also, profits decline as industries become more competitive, falling to the cost of capital plus the cost of the entrepreneur's time.
(Picked up after someone, I forget who, pointed me at Lessig's review.)
Nunzio DeFilippis, Christina Weir and Christopher J. Mitten, Past Lies
Brandon Graham, King City
Michael Alan Nelson, Emma Rios and Cris Peter, Hexed
Mark Waid and Minck Oosterveer, The Unknown
Brian Michael Bendis and Michael Avon Oeming, Powers: 1 (Who Killed Retro Girl?), 2 (Roleplay)
Robert M. Solow, Monopolistic Competition and Macroeconomic Theory
"Monopolistic competition" is the slightly oxymoronic name for the situation where there are a number of goods which are all more or less close substitutes for each other, but each good has a monopoly producer. It can arise in a number of ways, from legal restrictions (e.g., copyright on particular pieces of software) or from increasing returns to scale. (Successful branding convinces consumers that basically identical commodities are really different, and so creates monopolistic competition.) In monopolistic competition, firms have some control over their prices, but to maximize profits they need to forecast quantitative demand. The theory is quite well-established microeconomics, having begun its real development in the 1930s with Chamberlin and Robinson, and is a standard part of industrial organization (Cabral's textbook has an especially nice treatment).
This extremely short (88 pages including the index) book consists of Solow pointing out that once you admit monopolistic competition is not just possible but actually common, a lot of the conclusions of macroeconomic models which rest on the idea of perfect competition in all markets evaporate, and one is led to Keynesian conclusions, even if one assumes that everyone in the economy is a perfectly foresightful utility maximizer. In particular, the way is opened for the existence of multiple equilibria: low-output equilibria in which everyone correctly forecasts that there will not be a lot of demand, so they produce little, pay little, and buy little, and high-output equilibria in which everyone correctly forecasts high demand, produces a lot, pays a lot and buys a lot. Everyone prefers the high-level equilibrium to the low, but that doesn't mean they'll manage to coordinate on it. Solow takes this insight, and related ones, and does what he does best, namely build and solve elegant little models of the resulting macroeconomy. He is quite open about these being toy models, and that in some places he has to stipulate some macro-level relations which he doesn't directly derive from the micro assumptions. (But, though he doesn't mention this, the same is true of the usual representative-agent macro models which purport to be aggregations of perfect competition.) The results are not strictly in line with every detail of the General Theory, but are clearly closely related, and make a lot of sense.
This book is very enjoyable, if you have any taste for elegant economic modeling, though alas the price of the actual physical artifact (twenty six cents per page in paperback) is insane. But I've ranted about this before.
Richard A. Berk, Statistical Learning from a Regression Perspective
A gentle introduction to modern nonparametric regression and classification, for people who are comfortable with running linear and logistic regressions, and curious about data mining and/or machine learning. After a brief review of regression (following the lines laid down in his earlier book), Berk covers smoothing (especially with splines), additive models, classification and regression trees, bagging, random forests, boosting, and support vector machines. There are many real-data examples and exercises, all done in R, and all of them I think from the social sciences, with a certain emphasis on his own field of criminology.
Berk relies very heavily on The Elements of Statistical Learning as an authority, and one might think of this as a simplified presentation of the key parts of that book, for social scientists, or advanced undergraduates in statistics — I used it as a supplementary text in my data mining class last fall, and would happily do so again.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; The Dismal Science; Enigmas of Chance; Writing for Antiquity; The Continuing Crises; The Running-Dogs of Reaction

Posted by crshalizi at April 30, 2010 23:59 | permanent link

April 29, 2010

The Republic Hath Need of Thee

Carlos Yu on Facebook yesterday: "What this country really needs is William Tecumseh Sherman." He went on:

... leaving a ten-mile wide trail of burned-out mobile homes and meth labs behind him, Sherman paused in his March to the Tea to regroup his forces. Water was always an issue for Sherman's armies, campaigning as they did in the dusty steppes surrounding Bakersfield, in the deserts of Arizona, and throughout the drought-stricken former Confederacy. Nowhere was their lack of water worse than among the abandoned exurban developments of central Florida, where the water table had been permanently damaged...
This, I think, sums up everything admirably.

The Beloved Republic; The Continuing Crises; Modest Proposals

Posted by crshalizi at April 29, 2010 14:57 | permanent link

April 28, 2010

Return of "Homophily, Contagion, Confounding: Pick Any Three", or, The Adventures of Irene and Joey Along the Back-Door Paths

Attention conservation notice: 2700 words on a new paper on causal inference in social networks, and why it is hard. Instills an attitude of nihilistic skepticism and despair over a technical enterprise you never knew existed, much less cared about, which a few feeble attempts at jokes and a half-hearted constructive suggestion at the end fail to relieve. If any of this matters to you, you can always check back later and see if it survived peer review.

Well, we decided for a more sedate title for the actual paper, as opposed to the talk:

CRS and Andrew C. Thomas, "Homophily and Contagion Are Generically Confounded in Observational Social Network Studies", arxiv:1004.4704, submitted to Sociological Methods and Research
Abstract: We consider processes on social networks that can potentially involve three phenomena: homophily, or the formation of social ties due to matching individual traits; social contagion, also known as social influence; and the causal effect of an individual's covariates on their behavior or other measurable responses. We show that, generically, all of these are confounded with each other. Distinguishing them from one another requires strong assumptions on the parametrization of the social process or on the adequacy of the covariates used (or both). In particular we demonstrate, with simple examples, that asymmetries in regression coefficients cannot identify causal effects, and that very simple models of imitation (a form of social contagion) can produce substantial correlations between an individual's enduring traits and their choices, even when there is no intrinsic affinity between them. We also suggest some possible constructive responses to these results.
R code for our simulations

The basic problem here is as follows. (I am afraid this will spoil some of the jokes in the paper.) Consider the venerable parental question: "If your friend Joey jumped off a bridge, would you jump too?" The fact of the matter is that the answer is "yes"; but why does Joey's jumping off a bridge mean that Joey's friend Irene is more likely to jump off one too?

  1. Influence or social contagion: Because they are friends, Joey's example inspires Irene to jump. Or, more subtly: seeing Joey jump re-calibrate's Irene's tolerance for risky behavior, which makes jumping seem like a better idea.
  2. Biological contagion: Joey is infected with a parasite which suppresses the fear of heights and/or falling, and, because they are friends, Joey passes it on to Irene.
  3. Manifest homophily: Joey and Irene are friends because they both like to jump off bridges (hopefully with bungee cords attached).
  4. Latent homophily: Joey and Irene are friends because they are both hopeless adrenaline junkies, and met through a roller-coaster club; their common addiction leads both of them to take up bridge-jumping.
  5. External causation: Sometimes, jumping off a bridge is the only sane thing to do:

For Irene's parents, there is a big difference between (1) and (2) and the other explanations. The former suggest that it would be a good idea to keep Irene away from Joey, or at least to keep Joey from jumping off the bridge; with the others, however, that's irrelevant. In the case of (3) and (4), in fact, knowing that Irene is friends with Joey is just a clue as to what Irene is really like; the damage was already done, and they can hang out together as much as they want. The difference between these accounts is one of causal mechanisms. (Of course there can be mixed cases.)

What the statistician or social scientist sees is that bridge-jumping is correlated across the social network. In this it resembles many, many, many behaviors and conditions, such as prescribing new antibiotics (one of the classic examples), adopting other new products, adopting political ideologies, attaching tags to pictures on flickr, attaching mis-spelled jokes to pictures of cats, smoking, drinking, using other drugs, suicide, literary tastes, coming down with infectious diseases, becoming obese, and having bad acne or being tall for your age. For almost all of these conditions or behaviors, our data is purely observational, meaning we cannot, for one reason or another, just push Joey off the bridge and see how Irene reacts. Can we nonetheless tell whether bridge-jumping spreads by (some form) of contagion, or rather is due to homophily, or, if it is both, say how much each mechanism contributes?

A lot of people have thought so, and have tried to come at it in the usual way, by doing regression. Most readers can probably guess what I think about that, so I will just say: don't you wish. More sophisticated ideas, like propensity score matching, have also been tried, but people have pretty much assumed that it was possible to do this sort of decomposition. What Andrew and I showed is that in fact it isn't, unless you are willing to make very strong, and generally untestable, assumptions.

This becomes clear as soon as you draw the relevant graphical model, which goes like so:

Here i stands for Irene and j for Joey. Y(i,t) is 1 if Irene jumps off the bridge on day t and 0 otherwise; likewise Y(j,t-1) is whether Joey jumped off the bridge yesterday. We want to know whether the latter variable influences the former. A(i,j) is how we represent the social network --- it's 1 if Irene regards Joey as a friend, 0 otherwise. Lurking in the background are the various traits which might affect whether or not Irene and Joey are friends, and whether or not they like to jump off bridges, collectively X. Suppose that, all else equal, being more similar makes it more likely that people become friends.

Now it's easy to see where the trouble lies. If we learn that Joey jumped off a bridge yesterday, that tells us something about what kind of person Joey is, X(j). If Joey and Irene are friends, that tells us something about what kind of person Irene is, X(i), and so about whether Irene will jump off a bridge today. And this is so whether or not there is any direct influence of Joey's behavior on Irene's, whether or not there is contagion. The chain of inferences — from Joey's behavior to Joey's latent traits, and then over the social link to Irene's traits and thus to Irene's behavior — constitutes what Judea Pearl strikingly called a "back-door path" connecting the variables at either end. When such paths exist, as here, Y(i,t) will be at least somewhat predictable from Y(j,t-1), and sufficiently clever regressions will detect this, but they cannot distinguish how much of the predictability is due to the back door path and how much to direct influence. If this sounds hand-wavy to you, and you suspect that with some fancy adjustments you can duck and weave through it, read the paper.

To switch examples to something a little more serious than jumping off bridges, let's take it as a given that (as Christakis and Fowler famously reported), if Joey became obese last year, the odds of Irene becoming obese this year go up substantially. They interpreted this as a form of social contagion, and one can imagine various influences through which it might work (changing Irene's perception of what normal weight is, changing Irene's perception of what normal food consumption is, changes in happiness leading to changes in comfort food and/or comfort alcohol consumption, etc.). Now suppose that there is some factor X which affects both whether Joey and Irene become friends, and whether and when they become obese. For example:

  • becoming friends because they both do extreme sports (like jumping off bridges...) vs. becoming friends because they both really like watching the game on weekends and going through a few six-packs vs. becoming friends because they are both confrontational-spoken-word performance artists;
  • friendships tend to be within ethnic groups, which differ in their culturally-transmitted foodways, attitudes towards voluntary exercise, and occupational opportunities;
  • (for those more fond of genetic explanations than I am): friendships tend to be within ethnic groups, so friends tend to be more genetically similar than random pairs of individuals, and genetic variants that predispose to obesity (in the environment of Framingham, Mass.) are more common in some groups than in others.
So long as we cannot measure X, the back-door path linking Joey and Irene remains open, and our inferences about contagion are confounded. It would be enough to measure the aspect of X which influences link formation, or the aspect which influences obesity; but without that, there will always be many ways of combining homophily and contagion to produce any given pattern of association between Joey's obesity status last year and Irene's this year. And it's not matter of not being able to decide among some causal alternatives due to limited data; the different causal alternatives all produce the same observable outcomes. (More on this notion of "identification".)

Christakis and Fowler made an interesting suggestion in their obesity paper, however, which was actually one of the most challenging things for us to deal with. They noticed that friendships are sometimes not reciprocated, that Irene thinks of Joey as a friend, but Joey doesn't think of Irene that way — or, more cautiously, Irene reports Joey as a friend, but Joey doesn't name Irene. For these asymmetric pairs in their data, Christakis and Fowler note, it's easier to predict the person who named a friend from the behavior of the nominee than vice versa. This is certainly compatible with contagion, in the form of being influenced by those you regard as your friends, but is there any other way to explain it?

As it happens, yes. One need only suppose that being a certain kind of person — having certain values of the latent trait X — make you more likely to be (or be named as) a friend. Suppose that there is just a one-dimensional trait, like your location on the left-right political axis, or perhaps some scale of tastes. (Perhaps Irene and Joey are neo-conservative intellectuals, and the trait in question is just how violent they like their Norwegian black metal music.) Having similar values of the trait makes you more likely to be friends (that's homophily), but there is always an extra tendency to be friends with those who are closer to the median of the distribution, or at least to say those are who your friends are. (Wherever neo-conservatives really are on the black metal spectrum, they tend to say, on Straussian grounds, that their friends are those who prefer only the median amount of church-burning with their music.) If Irene thinks of Joey as a friend, but Joey does not, this is a sign that Irene has a more extreme value of the trait than Joey does, which changes how much their behavior predicts each other. Putting together a very basic model of this sort shows that it robustly generates the kind of asymmetry Christakis and Fowler found, even when there is really no contagion.

To be short about it, unless you actually know, and appropriately control for, the things which really lead people to form connections, you really have no way of distinguishing between contagion and homophily.

All of this can be turned around, however. Suppose that you want to know whether, or how strongly, some trait of people influences their choices. Following a long tradition with many illustrious exponents, for instance, people are very convinced that social class influences political choices, and there is indeed a predictive relationship here, though many people are totally wrong about what that relationship is. The natural supposition is that this predictive relationship reflects causation. But suppose that there is contagion, that you can catch ideology or even just choices from your friends. Social class is definitely a homophilous trait; this means that an opinion or attitude or choice can become entrenched among one social class, and not another, simply through diffusion, even if there is no intrinsic connection between them. And there's nothing special about class here; it could be any trait or combination of traits which leads to homophily.

Here, for example, is a simple simulation done using Andrew's ElectroGraph package.

To explain: Each individual has a social type or trait, which takes one of two values and stays fixed — think of this as social class, if you like. People are more likely to form links with those of the same type, so when we plot the graph in a way which brings linked nodes closer to each other, we get a nice separation into two sub-communities, with all the upper-class individuals in the one on top and all the lower-class individuals in the one below. Also, each individual makes a "choice" which can change over time, which again is binary, here "red" or "blue". Initially, choices are completely independent of traits, so there's just as much red among the high-class individuals as among the low.

Now let the choices evolve according to the simplest possible rule: at each point in time, a random individual picks one of their neighbors, again at random, and copies their opinion. After a few hundred such updates, the lower class has turned red, and the upper class has turned blue:

And this isn't just a fluke; the pattern of color separation repeats quite reliably, though which color goes with which class is random. If you wanted to be more quantitative about it, you could, say, run a logistic regression, and discovery that in the homophilous network, statistically-significant prediction of choice from trait is possible, but not in an otherwise-matched network without homophily; you can see those results in the paper. A bit more abstractly, when I learned cellular automata from David Griffeath, one of the topics was something called the "voter model", which is just the rule I gave above for copying choices. On a regular two-dimensional grid, the voter model self-organizes from random noise into blobs of homogeneous color with smooth boundaries; this is just the corresponding behavior on a graph. As I have said several times before, I think this phenomenon — correlating traits and choices by homophily plus contagion — seriously complicates a lot of what people want to do in the social sciences and even the humanities, but since I have gone on about that already, I won't re-rant today.

In their own way, each of the two models in our paper is sheer elegance in its simplicity, and I have been known to question the relevance of such models for actual social science. I don't think I'm guilty of violating my own strictures, however, because I'm not saying that the processes of, say, spreading political opinions really follows a voter model. (The reality is much more complicated.) The models make vivid what was already proved, and show that the conditions needed to produce the phenomena are not actually very extreme.

My motto as a writer might as well be "the urge to destroy is also a creative urge", but in this paper we do hold out some hope, which is that even if the causal effects of contagion and/or homophily cannot be identified, they might be bounded, following the approach pioneered by Manski for other unidentifiable quantities. Even if observable associations would never let us say exactly how strong contagion is, for instance, they might let us say that it has to lie inside some range, and if that range excludes zero, we know that contagion must be at work. (Or, if the association is stronger than contagion can produce, something else must be at work.) I suspect (with no proof) that one way to get useful bounds would be to use the pattern of ties in the network to divide it into sub-networks or, as we say in the trade, communities, and use the estimated communities as proxies for the homophilous trait. That is, if people tend to become friends because they are similar to each other, then the social network will tend to become a set of clumps of similar people, as in the figures above. So rather than just looking at the tie between Joey and Irene, we look at who else they are friends with, and who their friends are friends with, and so on, until we figure out how the network is divided into communities and that (say) Irene and Joey are in the same community, and therefore likely have the similar values of X, whatever it is. Adjusting for community might then approach actually adjusting for X, though it couldn't be quite the same. Right now, though, this idea is just a conjecture we're pursuing.

Manual trackback: The Monkey Cage; Citation Needed; Healthy Algorithms; Siris; Gravity's Rainbow; Orgtheory; PeteSearch

Networks; Enigmas of Chance; Complexity; Commit a Social Science; Self-Centered

Posted by crshalizi at April 28, 2010 18:00 | permanent link

April 21, 2010

Outsourced Heavy Flagella Blogging

I was going to blog about this paper

Adrián López García de Lomana, Qasim K. Beg, G. de Fabritiis and Jordi Villà-Freixa, "Statistical Analysis of Global Connectivity and Activity Distributions in Cellular Networks", Journal of Computational Biology forthcoming (2010), arxiv:1004.3138
Abstract: Various molecular interaction networks have been claimed to follow power-law decay for their global connectivity distribution. It has been proposed that there may be underlying generative models that explain this heavy-tailed behavior by self-reinforcement processes such as classical or hierarchical scale-free network models. Here we analyze a comprehensive data set of protein-protein and transcriptional regulatory interaction networks in yeast, an E. coli metabolic network, and gene activity profiles for different metabolic states in both organisms. We show that in all cases the networks have a heavy-tailed distribution, but most of them present significant differences from a power-law model according to a stringent statistical test. Those few data sets that have a statistically significant fit with a power-law model follow other distributions equally well. Thus, while our analysis supports that both global connectivity interaction networks and activity distributions are heavy-tailed, they are not generally described by any specific distribution model, leaving space for further inferences on generative models.
since they are very definitely not making the baby Gauss cry, but Aaron beat me to it, so you should just go read him.

(Study of the scholarly misconstruction of reality suggests that this will lead to at most a marginal reduction in the number of claims that biochemical networks follow power laws.)

Power Laws; Biology; Networks

Posted by crshalizi at April 21, 2010 18:30 | permanent link

On Eyjafjallajökull

It evidently takes a week to find a priest and a nubile virgin in Europe.

Update: "On the other hand", as J.B. told me as soon as I posted this, "find one and you're not far from the other."

Posted by crshalizi at April 21, 2010 08:24 | permanent link

April 20, 2010

"Inference for Unlabelled Graphs" (This Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) care about the community discovery problem for networks and (2) will be in Pittsburgh on Friday.

I've talked about the community discovery problem here before, and even contributed to it; if you want a state-of-the-field you should read Aaron. This week, the CMU statistics seminar delivers a very distinguished statistician's take:

Peter Bickel, "Inference for Unlabelled Graphs"
Abstract:A great deal of attention has recently been paid to determining subcommunities on the basis of relations, corresponding to edges, between individuals, corresponding to vertices of an unlabelled graph (Newman, SIAM Review 2003; Airoldi et al, JMLR 2008; Leskovec, Kleinberg et al, SIGKDD 2005). We have developed a nonparametric framework for probabilistic ergodic models of infinite unlabelled graphs (PNAS 2009) and made some connections with modularities arising in the physics literature and community models in the social sciences. A fundamental difficulty in implementing these procedures is computational complexity. We show how a method of moments approach can partially bypass these difficulties.
This is joint work with Aiyou Chen and Liza Levina.
Place and time: Giant Eagle Auditorium, Baker Hall A51, 4:30--5:30 PM on Friday, April 23, 2010

As always, seminars are free and open to the public.

(This might motivate me to finally finish my post on Bickel and Chen's paper...)

Networks; Enigmas of Chance

Posted by crshalizi at April 20, 2010 16:57 | permanent link

April 19, 2010

The Bootstrap

My "Computing Science" column for American Scientist, "The Bootstrap", is now available for your reading pleasure. Hopefully, this will assuage your curiosity about how to use the same data set not just to fit a statistical model but also to say how much uncertainty there is in the fit. (Hence my recent musings about the cost of bootstrapping.) And then the rest of the May-June issue looks pretty good, too.

I have been reading American Scientist since I started graduate school, lo these many years ago, and throughout that time one of the highlights for me has been the "Computing Science" column by Brian Hayes; it was quite thrilling to be asked about being one of the substitutes while he's on sabbatical, and I hope I've come close to his standard.

After-notes to the column itself:

  • Efron's original paper is now open access.
  • Of course, the time series is serially dependent, so I should really use a bootstrap which handles that, as in Künsch. Using either a moving block bootstrap or stationary bootstrap actually gave almost the same confidence bands as the one in the article, obtained by resampling consecutive pairs (perhaps because the optimal block length, selected per Lahiri, was just 4). The original version of the column went into that, but it had to be cut to fit the space.
  • A bigger issue is that the data set is really not stationary. Like everyone else, I pretend.
  • Originally, I wanted to use turbulent flow time series for the example, since it turns out that they are actually pretty well predicted by linear models, once you allow for very non-Gaussian driving noises. But I couldn't find any suitable data sets which wouldn't involve some tricky work to get permissions; perhaps I just didn't look in the right places. So I fell back on something which is publicly available, even though it's from a domain which if anything has gotten too much attention from statisticians and probabilists, and required some disclaimers.

Enigmas of Chance; Self-Centered

Posted by crshalizi at April 19, 2010 08:45 | permanent link

April 15, 2010

Dept. of "I Told You So"

Me, going on three years ago: "It is a further sign of our intellectual depravity that people take Bryan Caplan seriously, even when he is obviously a cheap imitation of The Onion."

Today: Holbo (and again), Warring, Henley, DeLong.

The Running Dogs of Reaction

Posted by crshalizi at April 15, 2010 23:13 | permanent link

Got Plenty of Time (The Porosity of the Avante Garde)

Empirically, the time needed for something to seep from self-consciously advanced subcultures to complete innocuousness really is about one generation. (Second link via Tapped.)

Linkage

Posted by crshalizi at April 15, 2010 22:30 | permanent link

April 08, 2010

"A Two-scale Framework for Variable Selection with Ultrahigh-dimensionality" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) have a vast number of variables you could use in your statistical models and want to reliably learn which ones matter, and (2) are in Pittsburgh in Monday.

As always, the seminar is free and open to the public:

Jianqing Fan, "A Two-scale Framework for Variable Selection with Ultrahigh-dimensionality"
Abstract: Ultrahigh-dimensionality characterizes many contemporary statistical problems from genomics and genetics to finance and economics. We outline a unified framework to ultrahigh dimensional variable selection problems: Iterative applications of vast-scale screening followed by moderate-scale variable selection. The framework is widely applicable to many statistical contexts: from multiple regression, generalized linear models, survival analysis to machine learning and compress sensing.
The fundamental building blocks are marginal variable screening and penalized likelihood methods. How high dimensionality can such methods handle? How large can false positive and negative be with marginal screening methods? What is the role of penalty functions? This talk will provide some fundamental insights into these problems. The focus will be on the sure screening property, false selection size, the model selection consistency and oracle properties. The advantages of using folded-concave over convex penalty will be clearly demonstrated. The methods will be convincingly illustrated by carefully designed simulation studies and the empirical studies on disease classifications using microarray data and forecast home price indexes at zip level.
Place and time: 4--5 pm on Monday, 12 April, in Porter Hall 125C, CMU

Let add that Fan and Yao's book on time series is one of the best available.

Enigmas of Chance

Posted by crshalizi at April 08, 2010 13:45 | permanent link

March 31, 2010

Books to Read While the Algae Grow in Your Fur, March 2010

Dylan Meconis, Bite Me! A Vampire Farce
Funny comic book satirizing, simultaneously, vampires a la Anne Rice and the French Revolution. Meconis apparently wrote and drew most it, online, while in high school; it's people like her what cause unrest.
I came to this by way of Meconis's current web-serial, Family Man, which has superior drawing and a more serious plot, but a similar sensibility. (If the idea of a comic about Spinozism and lycanthropy in eighteenth-century central Europe sounds the least bit interesting, you really need to read Family Man.)
E. M. Butler, The Tyranny of Greece over Germany
More exactly: how Winckelmann invented an ideal of ancient Greek life and art, and how that ideal influenced Lessing, Herder, Goethe, Schiller, Holderlin and Heine, followed by a sort of appendix on Nietzsche, Stefan George and (of all people) Heinrich Schliemann. This is a very curious book of a sort that I think humanists have mostly abandoned. Butler is not just relentlessly biographical (readers are expected to have Goethe's sexual history memorized), but very free with her speculations about the inner-most drives and natures of her heroes, and even about what they should have done to be "saved", or reconciled with their hypostatized "genius". Worse, she presents these guesses as just as certain as the prosaic facts of their biographies, sometimes to unintentionally comic effect: Nietzsche's mind was not, after all, "rent asunder by ecstatic worship of the god Dionysus", but by syphilis; he needed penicillin, not a convincing modern mythology. (Likewise Holderlin's "reason was destroyed" by schizophrenia, as Butler herself admits, and calling this "homesickness for the land of the gods" is unilluminating.) No comparison is attempted to imitations or admiration of the ancient Greeks in other times and places, or to contemporary German attitudes to other ancient and foreign cultures (except for some stray remarks about Herder), so it's hard to pick out what was particular to this tradition, as opposed to more general antiquarianism/primitivism and exoticism. Still, it is an interesting tradition...
Clark Glymour, Theory and Evidence
I'm not sure how much of this even Clark would still argue for (it was published in 1980!), so I won't belabor it, but I also think the most fundamental point is sound. Namely: it's possible to use parts of a theory, plus empirical evidence, to test other parts of the theory, or even (using different pieces of evidence) the same parts of the theory. (For instance, many theories include hypotheses which say that certain quantities must be constants, and provide multiple routes to estimating those constants; the estimates need to agree.) This means that theories which make the same predictions are not necessarily equally tested by those predictions, and that the Quine-Duhem problem of not being able to assign credit or blame to parts of theories is soluble. I think the account of what makes something a severe test in Error is superior, at least for statistical theories, but clearly this was pointing in the same direction.
(Insert the usual disclaimers here.)
Lucy A. Snyder, Spellbent
Mind-candy contemporary fantasy, set in Columbus, Ohio and adjacent hells. As good as one might expect from the author of the brilliant "Installing Linux on a Dead Badger", but much grimmer.
Carrie Vaughn, Kitty's House of Horrors
Mind-candy. The continuing adventures of a werewolf named "Kitty". What could go wrong with volunteering for a reality show to be filmed in middle-of-nowhere Montana? — One of the nice features of Vaughn's stories is that the supernatural is announcing its presence in a world otherwise much like ours, and people are reacting in ways that seem plausible, ranging from scientific research through media sensationalism... (Previously: 1, 2, 3, 4, 5, 6; but they're not necessary to read this.)
A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Applications
One of the most useful textbooks on the bootstrap that I've read. They are good at combining just enough theory to make it clear why some things work and others don't with lots of carefully-chosen examples and advice on practicalities. Background familiarity with statistical inference at the level of, e.g., All of Statistics is required, but no more. The code, in S, forms the basis of the R package boot; most of the examples I re-tried ran without any modification. Recommended without reservation for self-study (do the exercises!); it would also make for an excellent text for a computationally-oriented course for beginning graduate students, or even (selecting chapters) advanced undergraduates.
Davison's page on the book has errata and reviews.
Leann Sweeney, Pick Your Poison, A Wedding to Die For, Dead Giveaway, Shoot from the Lip
Mind-candy. Amiable series mystery centering around adoption.
Philip Palmer, Redclaw
Mind-candy. I rather liked the first two hundred pages or so, but the last half dragged on too long for my taste. (It would've been better at, say, 50 pages.) Recommended for those who enjoy scientifictional Lord of the Flies scenarios more than I do.
Stephen S. Cohen and J. Bradford DeLong, The End of Influence: What Happens When Other Countries Have the Money
I admit I bought this out of a certain sense of obligation: DeLong's website, in its various incarnations, has been entertaining and informing me since the mid-1990s, and it seemed only fair to reciprocate somehow. But it's actually a good (if very short and somewhat repetitive) book, which is really about guessing what might be coming next in international political economy, now that the "neo-liberal dream" is, or ought to be, thoroughly discredited by events.
Since my reaction to the book is largely positive, but I find it hard to convey that except by writing a summary, I will follow academic/Internet tradition and dwell on annoyances. First, they're not, obviously arguing that the US will become an uninfluential country; even if we gave up spending more than most of the rest of the world put together on our military, etc., we'd still have 5% of the world's population, in an extremely advanced, diversified and prosperous economy, and a state which, whatever its frustrations, is highly effective. Cohen and DeLong know this; a better title might've been something like The End of Supremacy. For that matter they never clearly say what they mean by "other countries having the money", or what it meant for the US to "have the money"; something like "be a major net lender to other countries" seems to what they have in mind, but it's unclear. And the suggestion that becoming a net debtor nation will undermine US cultural and intellectual influence is seriously, seriously under-argued.
Diana Rowland, Blood of the Demon
Mind-candy. Continuing contemporary fantasy/police procedural series. A bit more angsty this time; still fun. Cries out for sequels.
Call of Cthulhu
Mind-candy. Silent movie of the short story made a few years ago by the H. P. Lovecraft Historical Society. Nice Expressionist-influenced sets for R'lyeh, and the worst of the creepy racist bits thoughtfully elided. Worth 45 minutes of your Netflix-streaming time if you're into Cthulhiana.
Brotherhood of the Wolf
Mind-candy. Am wrong to I suspect that only in France could you make a big silly monster action movie centered on the struggle between les philosophes and the reactionary elements of the Church? Pairs well with a suspension of the critical faculties and a few glasses of Côtes du Rhône.
Jen Van Meter et al., Hopeless Savages vols. 2 and 3
More adorable first-family-of-punk mind candy. Sadly, this seems to be the end of the series.
James H. Schmitz, The Demon Breed (a.k.a. The Tuvela)
Mind-candy. Intensely enjoyable lone-human-and-her-otters-versus-alien-invaders-in-a-floating-jungle novel from 1968. (Update: the original cover image, which I just ran across, via.) Re-read in connection with donating, back in January, several hundred books my parents had been storing for me for over a dozen years. This was as fun as I remembered it, though very short by modern standards. (hough I must say it boggles the mind that when one of an advanced, technological civilization's domestic animals acquires both language and tool-use by apparent macromutation, the response is "huh, aren't they cute?", as opposed to a massive research effort. The old SF writers were often really lazy at thinking through their conceits... (The completely superfluous mentions of psychic powers at the beginning and end are in a different category, namely placating Schmitz's editor at Analog, the crankish and credulous but talented John W. Campbell.)
Relatedly, I finally got around to reading an earlier book my Schmitz I'd owned since c. 1995, Legacy, which didn't work nearly as well, because the early-1960s-vintage gender politics were inseparable from the story, while entirely absent from Demon Breed. It seems doubtful that Schmitz had his consciousness raised between 1962 and 1968 so I guess he simply improved his craft...

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Writing for Antiquity; The Commonwealth of Letters; Philosophy; The Dismal Science; The Continuing Crises; Cthulhiana

Posted by crshalizi at March 31, 2010 23:59 | permanent link

March 30, 2010

One Must Imagine Liberman Happy

Back in the day, when the blogs were young, one of the gods decided to travel the world incognito as an incoherent mumbler. A certain phonologist regarded this as an imposition, and devised a scheme whereby mortals would never have to worship incomprehensibilities. This angered the gods, who cursed the professor to spend eternity rolling a stone uphill only to keep having it fall back down patiently debunking reactionary appropriations of neuroscience as carefully as though they were actual attempts to advance human knowledge, and not meretricious myth-making. (An incomplete sampling of episodes, in no particular order except for the first being the most recent: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40). But one must imagine Liberman happy; the alternative is too terrible to contemplate.

Minds, Brains and Neurons; The Natural Science of the Human Species; Learned Folly

Posted by crshalizi at March 30, 2010 11:45 | permanent link

March 21, 2010

The Visual Display of Morally Obligatory Consequences

Via paperpools.

Enigmas of Chance; Learned Folly

Posted by crshalizi at March 21, 2010 21:20 | permanent link

Recognition from Alma Mater

Yes, I've seen this. Yes, those are (so far as I can recall) accurate quotes. No, I really don't track page-views, so I honestly don't know what the most-viewed things I've written are. Yes, it's an entirely undeserved honor to be named in such company. Yes, I do wish my writing was more positive and constructive, and less negative and critical. Yes, I realize it's easily within my power to change that. No, I do not seem to be doing too well on that front.

Self-Centered; Linkage

Posted by crshalizi at March 21, 2010 17:00 | permanent link

Learning Your Way to Maximum Power

Attention conservation notice: 2300 words about a paper other people wrote on learning theory and hypothesis testing. Mostly written last year as part of a never-used handout for 350, and rescued from the drafts folder as an exercise in structured procrastination so as to avoid a complete hiatus while I work on my own manuscripts.

P.S. Nauroz mubarak.

In a previous installment, we recalled the Neyman-Pearson lemma of statistical hypothesis testing: If we are trying to discriminate between signal and noise, and know the distribution of our data (x) both for when a signal is present (q) and when there is just noise (p), then the optimal test says "signal" when the likelihood ratio q(x)/p(x) exceeds a certain threshold, and "noise" otherwise. This is optimal in that, for any given probability of thinking noise is signal ("size"), it maximizes the power, the probability of detecting a signal when there is one.

The problem with just applying the Neyman-Pearson lemma directly to problems of interest is the bit about knowing the exact distributions of signal and noise. We should, forgive the expression, be so lucky. The traditional approach in theoretical statistics, going back to Neyman and Pearson themselves, has been to look for circumstances where we can get a single test of good power against a whole range of alternatives, no matter what they are. The assumptions needed for this are often rather special, and teaching this material means leading students through some of the more arid sections of books like these; the survivors are generally close to insensible by the time they reach the oases of confidence regions.

At the other extreme, a large part of modern statistics, machine learning and data mining is about classification problems, where we take feature-vectors x and assign them to one of a finite number of classes. Generally, we want to do this in a way which matches a given set of examples, which are presumed to be classified correctly. (This is obviously a massive assumption, but let it pass.) When there are only two classes, however, this is exactly the situation Neyman and Pearson contemplated; a binary classification rule is just a hypothesis test by another name. Indeed, this really the situation Neyman discussed in his later work (like his First Course in Probability and Statistics [1950]), where he advocated dropping the notion of "inductive inference" in favor of that of "inductive behavior", asking, in effect, what rule of conduct a learning agent should adopt so as to act well in the future.

The traditional approach in data-mining is to say that one should either (i) minimize the total probability of mis-classification, or (ii) assign some costs to false positives (noise taken for signal) and false negatives (signal taken for noise) and minimize the expected cost. Certainly I've made this recommendations plenty of times in my teaching. But this is not what Neyman and Perason would suggest. After all, the mis-classification rate, or any weighted combination of the error rates, will depend on what proportions of the data we look at actually are signal and noise. Which decision rule minimizes the chance of error depends on the actual proportion of instance of "signal" to those of "noise". If that ratio changes, a formerly optimal decision rule can become arbitrarily bad. (To give a simple but extreme example, suppose that 99% of all cases used to be noise. Then a decision rule which always said "noise" would be right 99% of the time. The minimum-error rule would be very close to "always say 'noise'". If the proportion of signal to noise should increase, the formerly-optimal decision rule could become arbitrarily bad. — The same is true, mutatis mutandis, of a decision rule which minimizes some weighted cost of mis-classifications.) But a Neyman-Pearson rule, which maximizes power subject to a constraint on the probability of false positives, is immune to changes in the proportions of the two classes, since it only cares about the distribution of the observables given the classes. But (and this is where we came in) the Neyman-Pearson rule depends on knowing the exact distribution of observables for the two classes...

This brings us to tonight's reading.

Clayton Scott and Robert Nowak, "A Neyman-Pearson Approach to Statistical Learning", IEEE Transactions on Information Theory 51 (2005): 3806--3819 [PDF reprint via Prof. Scott, PDF preprint via Prof. Nowak]
Abstract: The Neyman-Pearson (NP) approach to hypothesis testing is useful in situations where different types of error have different consequences or a priori probabilities are unknown. For any α>0, the NP lemma specifies the most powerful test of size α, but assumes the distributions for each hypothesis are known or (in some cases) the likelihood ratio is monotonic in an unknown parameter. This paper investigates an extension of NP theory to situations in which one has no knowledge of the underlying distributions except for a collection of independent and identically distributed (i.i.d.) training examples from each hypothesis. Building on a "fundamental lemma" of Cannon et al., we demonstrate that several concepts from statistical learning theory have counterparts in the NP context. Specifically, we consider constrained versions of empirical risk minimization (NP-ERM) and structural risk minimization (NP-SRM), and prove performance guarantees for both. General conditions are given under which NP-SRM leads to strong universal consistency. We also apply NP-SRM to (dyadic) decision trees to derive rates of convergence. Finally, we present explicit algorithms to implement NP-SRM for histograms and dyadic decision trees.

Statistical learning methods take in data and give back predictors --- here, classifiers. Showing that a learning method works generally means first showing that one can estimate the performance of any individual candidate predictor (with enough data), and then extending that to showing that the method will pick a good candidate.

The first step is an appeal to some sort of stochastic limit theorem, like the law of large numbers or the ergodic theorem: the data-generating process is sufficiently nice that if we fix any one prediction rule, its performance on a sufficiently large sample shows how it will perform in the future. (More exactly: by taking the sample arbitrarily large, we can have arbitrarily high confidence that in-sample behavior is arbitrarily close to the expected future behavior.) Here we can represent every classifier by the region R of x values where it says "signal". P(R) is the true false positive rate, or size, of the classifier, and Q(R) is the power. If we fix R in advance of looking at the data, then we can apply the law of large numbers separately to the "signal" and "noise" training samples, and conclude that, with high P-probability, the fraction of "noise" data points falling into R is close to P(R), and likewise with high Q-probability the fraction of "signal" points in R is about Q(R). In fact, we can use results like Hoeffding's inequality to say that, after n samples (from the appropriate source), the probability that either of these empirical relative frequencies differs from their true probabilities by as much as ±h is at most 2 e-2 nh2. The important point is that the probability of an error of fixed size goes down exponentially in the number of samples.

(Except for the finite-sample bound, this is all classical probability theory of the sort familiar to Neyman and Pearson, or for that matter Laplace. Neyman might well have known Bernstein's inequality, which gives similar though weaker bounds here than Hoeffding's; and even Laplace wouldn't've been surprised at the form of the result.)

Now suppose that we have a finite collection of classifier rules, or equivalently of "say 'signal'" regions R1, R2, ... Rm. The training samples labeled "noise" give us an estimate of the P(Ri), the false positive rates, and we just saw above that the probability of any of these estimates being very far from the truth is exponentially small; call this error probability c. The probability that even one of the estimates is badly off is at most cm. So we take our sample data and throw out all the classifiers whose false positive rate exceeds α (plus a small, shrinking fudge factor), and with at least probability 1-cm all the rules we're left with really do obey the size constraint. Having cut down the hypothesis space, we then estimate the true positive rates or powers Q(Ri) from the training samples labeled "signal". Once again, the probability that any one of these estimates is far from the truth is low, say d, and by the union bound again the probability that any of them are badly wrong is at most dm. This means that the sample maximum has to be close to the true maximum, and picking the Ri with the highest true positive rate then is (probabilistically) guaranteed to give us a classifier with close to the maximum attainable power. This is the basic strategy they call "NP empirical risk minimization". Its success is surprising: I would have guessed that in adapting the NP approach we'd need to actually estimate the distributions, or at least the likelihood ratio as a function of x, but Scott and Nowak show that's not true, that all we need to learn is the region R. So long as M is finite and fixed, the probability of making a mistake (of any given magnitude ±h) shrinks to zero exponentially (because c and d do), so by the Borel-Cantelli lemma we will only ever make finitely many mistakes. In fact, we could even let the number of classifiers or regions we consider grow with the number of samples, so long as it grows sub-exponentially, and still come to the same conclusion.

Notice that we've gone from a result which holds universally over the objects in some collection to one which holds uniformly over the collection. Think of it as a game between me and the Adversary, in which the Adversary gets to name regions R and I try to bound their performance; convergence means I can always find a bound. But it matters who goes first. Universal convergence means the Adversary picks the region first, and then I can tailor my convergence claim to the region. Uniform convergence means I need to state my convergence claim first, and then the Adversary is free to pick the region to try to break my bound. What the last paragraph showed is that for finite collections which don't grow too fast, I can always turn a strategy for winning at universal convergence into one for winning at uniform convergence. [1]

Nobody, however, wants to use just a finite collection of classifier rules. The real action is in somehow getting uniform convergence over infinite collections, for which the simple union bound won't do. There are lots of ways of turning this trick, but they all involve restricting the class of rules we're using, so that their outputs are constrained to be more or less similar, and we can get uniform convergence by approximating the whole collection with a finite number of representatives. Basically, we need to count not how many rules there are (infinity), but how many rules we can distinguish based on their output (at most 2n). As we get more data, we can distinguish more rules. Either this number keeps growing exponentially, in which case we're in trouble, or it ends up growing only polynomially, with the exponent being called the "Vapnik-Chervonenkis dimension". As any good book on the subject will explain, this is not the same as the number of adjustable parameters.

So, to recap, here's the NP-ERM strategy. We have a collection of classifier rules, which are equivalent to regions R, and this class is of known, finite VC dimension. One of these regions or classifiers is the best available approximation to the Neyman-Pearson classifier, because it maximizes power at fixed size. We get some data which we know is noise, and use it to weed out all the regions whose empirical size (false positive rate) is too big. We then use data which we know is signal to pick the region/classifier whose empirical power (true positive rate) is maximal. Even though we are optimizing over infinite spaces, we can guarantee that, with high probability, the size and power of the resulting classifier will come arbitrarily close to those of the best rule, and even put quantitative bounds on the approximation error given the amount of data and our confidence level. The strictness of the approximation declines as the VC dimension grows. Scott and Nowak also show that you can also pull the structural risk minimization trick here: maximize the the in-sample true positive rate, less a VC-theory bound on the over-fitting, and you still get predictive consistency, even if you let the capacity of the set of classifiers you'll use grow with the amount of data you have.

What's cool here is that this is a strategy for learning classifiers which gives us some protection against changes in the distribution, specifically against changes in the proportion of classes, and we can do this without having to learn the two probability density functions p and q, one just learns R. Such density estimation is certainly possible, but densities are much more complicated and delicate objects than mere sets, and the demands for data are correspondingly more extreme. (An interesting question, to which I don't know the answer, is how much we can work out about the ratio q(x)/p(x) by looking at the estimated maximum power as we vary the size α.) While Scott and Nowak work out detailed algorithms for some very particular families of classifier rules, their idea isn't tied to them, and you could certainly use it with, say, support vector machines.

[1] I learned this trick of thinking about quantifiers as games with the Adversary from Hintikka's Principles of Mathematics Revisited, but don't remember whether it was original to him or he'd borrowed it in turn. — Gustavo tells me that game semantics for logic began with Paul Lorenzen.

Enigmas of Chance

Posted by crshalizi at March 21, 2010 15:00 | permanent link

March 17, 2010

How the Social Scientists Got Their *s

Somewhere in the vastness of the scholarly literature there exists a sound, if not complete, history of the reception of statistical inference, especially regression, across the social sciences in the 20th century. I have not found it and would appreciate pointers, though I can only offer acknowledgments in return. If the history end neither with "thus did our fathers raise fertile gardens of rigor in the sterile deserts of anecdata" nor "thus did a dark age of cruel scientism overwhelm all, save a few lonely bastions of humanity", so much the better.

(I specifically mean the 20th century and not the 19th, and statistical inference and not "statistics" in the sense of aggregated numerical data. Erich Lehmann's "Some Standard Statistical Models" is in the right direction, but too focused inwards on statistics.)

Enigmas of Chance; Commit a Social Science

Posted by crshalizi at March 17, 2010 13:30 | permanent link

March 04, 2010

The True Price of Models Pulling Themselves Up by Their Bootstraps

For a project I just finished, I produced this figure:

I don't want to give away too much about the project (update, 19 April: it's now public), but the black curve is a smoothing spline which is trying to predict the random variable Rt+1 from Rt; the thin blue lines are 800 additional splines, fit to 800 bootstrap resamplings of the original data; and the thicker blue lines are the resulting 95% confidence bands for the regression curve [1]. (The tick marks on the horizontal axis show the actual data values.) Making this took about ten minutes on my laptop, using the boot and mgcv packages in R.

The project gave me an excuse to finally read Efron's original paper on the bootstrap, where my eye was caught by "Remark A" on p. 19 (my linkage):

Method 2, the straightforward calculation of the bootstrap distribution by repeated Monte Carlo sampling, is remarkably easy to implement on the computer. Given the original algorithm for computing R, only minor modifications are necessary to produce bootstrap replications R*1, R*2, ..., R*N. The amount of computer time required is just about N times that for the original computations. For the discriminant analysis problem reported in Table 2, each trial of N = 100 replications, [sample size] m = n = 20, took about 0.15 seconds and cost about 40 cents on Stanford's 370/168 computer. For a single real data set with m = n = 20, we might have taken N=1000, at a cost of $4.00.

My bootstrapping used N = 800, n = 2527. Ignoring the differences between fitting Efron's linear classifier and my smoothing spline, creating my figure would have cost $404.32 in 1977, or $1436.90 in today's dollars (using the consumer price index). But I just paid about $2400 for my laptop, which will have a useful life of (conservatively) three years, a ten-minute pro rata share of which comes to 1.5 cents.

The inexorable economic logic of the price mechanism forces me to conclude that bootstrapping is about 100,000 times less valuable for me now than it was for Efron in 1977.

Update: Thanks to D.R. for catching a typo.

[1]: Yes, yes, unless the real regression function is a smooth piecewise cubic there's some approximation bias from using splines, so this is really a confidence band for the optimal spline approximation to the true regression curve. I hope you are as scrupulous when people talk about confidence bands for "the" slope of their linear regression models. (Added 7 March to placate quibblers.)

Enigmas of Chance

Posted by crshalizi at March 04, 2010 13:35 | permanent link

March 03, 2010

Rhetorical Autognosis

The way I usually prepare for a lecture or a seminar is to spend a couple of hours pouring over my notes and references, writing and re-writing a few pages of arcane formulas, until I have the whole thing crammed into my head. When I actually speak I don't look at the notes at all. Fifteen minutes after I'm done speaking, I retain only the haziest outline of anything.

Which is to say, having finally realized that I've unconsciously modeled the way I teach and give talks on the magicians in Jack Vance, I really need to come up with better titles.

Self-Centered

Posted by crshalizi at March 03, 2010 10:15 | permanent link

March 02, 2010

36-490, Undergraduate Research, Spring 2010

What I've been doing instead of blogging. (I am particularly fond of the re-written factor analysis notes; and watch for the forthcoming notes on Markov models and point processes.) Fortunately for the kids, one of us knows what he's doing.

Enigmas of Chance; Corrupting the Young; Self-Centered

Posted by crshalizi at March 02, 2010 10:00 | permanent link

February 28, 2010

Books to Read While the Algae Grow in Your Fur, February 2010

Eric D. Kolaczyk, Statistical Analysis of Network Data: Methods and Models
This is the best available textbook on the subject. (I say this with all due respect for Wasserman and Faust, which was published sixteen years ago.)
Chapter one gives examples of networks, emphasizing that many non-social assemblages are networks, or have networks embedded in them, and can be profitably studied as such; this is story-telling and pretty pictures. Chapter two is background, divided into graph theory and graph algorithms (aimed at statisticians), and the essentials of probability and statistical inference (aimed at computer scientists). Chapter 3 deals with data collection (what do we measure? how do we gather the data? how do we organize it?) and visualization (how do we make those pretty pictures?). Chapter 4 covers descriptive statistics for networks, including ideas about partitioning networks into more-or-less distinct components, a.k.a. "community discovery". Both chapters 3 and 4 have terminal sections on what to do with time-varying networks; these are much less detailed than the rest, because we don't really know what to do with time-varying networks yet.
Chapter 5 deals with the fact that we generally do not have access to complete networks, but rather to samples of them. Inference from samples to larger assemblages (here, the complete network) is a fundamental statistical problem; depending on how the sample was collected, direct extrapolation from the sample to the whole can be quite accurate or highly misleading. Kolaczyk properly begins by reviewing the techniques used for sample inference in population surveys, such as Horvitz-Thompson estimation, which try to compensate for the biases introduced by the sampling scheme; he then turns to the most common sorts of network sampling methods, and gives some examples of how to incorporate the sampling into inferences. This is an area where much more needs to be done, but it's absolutely fundamental, and I'm extremely pleasing to see it handled here.
Chapter 6 considers probabilistic models of network structure and their statistical inference, mostly through the method of maximum likelihood. It begins with the classical Erdos-Renyi (-Rappoport-Solomonoff) random graph model and some of its immediate generalizations; the theory here is exceedingly pretty, but of course it never fits anything in the real world. It then turns to small-world (Watts-Strogatz) models, and to preferential-attachment and duplication models (introduced by Price, re-introduced by Barabasi and Albert owing to ignorance of the literature), including the particular duplication model due to Wiuf et al. which can be estimated by maximum likelihood (as we've seen). The last part of the chapter discusses exponential-family random graph models, which are a fascinating topic I will post more about soon. Chapter 7 is on inferring network structure from partial measurements, including link prediction, inference of phylogenetic trees, and inference of flow- or message- passing networks from traffic measurements ("network tomography"). There could have been a bit more integration between these two chapters, but there could stand to be more integration in the literature, too.
Chapter 8 looks at processes taking place on networks, divided between predicting random fields on networks, and modeling dynamical processes on them. For the first, Kolaczyk emphasizes Markov random fields (including the Hammersley-Clifford-[Griffeath-Grimmett-Preston-et-alii] theorem) and kernel regression. The only kind of dynamic process on networks treated in any detail is epidemic modeling; as usual, this is because much, much more remains to be done. Chapter 9 looks at statistical models of traffic on networks, some of them going back more than half a century in the economic geography literature. Finally chapter 10 is really more of an appendix, sketching the basic formalism of graphical models, and indicating how it connects to both Markov random fields and to exponential-family random graphs.
The material is up-to-date, the explanations are clear, the graphics are good, and the examples are interesting, covering social networks, biochemistry and molecular biology, neuroscience and telecommunications with about equal comfort. I would have no hesitation at all in using this for a class of first- or second- year graduate students, plan to use parts of it next time I teach 462, and can warmly recommend it for self-study. It should become a standard work.
(Amusingly, Powell's currently recommends that people who buy Kolaczyk also get Jenny Davidson's Breeding [which I'm still reading], and vice versa. This tells me that (i) not many people other than me have bought either book from them, and (ii) they need to make their data-mining algorithms a bit more outlier-resistant.)
Dog Soldiers
J. Random British Army squad vs. werewolves in deepest, darkest Scotland. Recommended by Carrie Vaughn.
Intelligence, season 2
I like where they took the story (though I have special reasons to be amused by the involvement of Caribbean financiers), and am sad the series got canceled.
The Last Winter
Decent horror movie about Arctic isolation and global warming. Suffers towards the end from showing too much of the bogey. (ROT-13'd spoilers: Fcrpgeny pnevobh whfg nera'g gung fpnel; naq V xrcg guvaxvat bs Nhqra, gubhtu gung'f cebonoyl vqvbflapengvp.)
Dexter 3
Few things are quite so restorative when facing the winter blahs as a well-made TV show that understands the true meaning and importance of friendship and family ties.
Marshall G. S. Hodgson, Rethinking World History: Essays on Europe, Islam and World History
Hodgson was a historian of Islam at the University of Chicago, best known for his monumental and fantastic Venture of Islam (I, II, III), which was an attempt to tell the story of "conscience and history in a world civilization". Both the "world" and the "civilization" part are important: Hodgson was one of those historians who breaks the world into civilizations, but didn't think of them as distinct organisms or similar weirdness; rather as complexes of very broadly-distributed but also very involving literate traditions. Moreover, the "world" part mattered a lot too: he constantly kept in view the fact that civilizations were never isolated from each other, and their interactions were vital to who they developed, particularly to "Islamicate" civilization, which for a long time occupied the central position in the "Afro-Eurasian Oecumene". The whole of it was an effort to see the history of Islam as part of world history, and to see world history itself objectively. He also tried very hard to try to inhabit and convey the moral universe of the people he wrote about; this was partly about historical understanding and partly about his own earnest Quaker conscience.
Hodgson spent many, many years working on a world history, which was left in an even more fragmentary state than The Venture of Islam at the time of his death; an unpublishable mess. Rethinking World History is a compilation of fragments this manuscript and selections from The Venture, along with some journal papers and letters. The product is an excellent epitome of Hodgson's more general and theoretical ideas about history and historiography: the central role of Islam in world history and the broad course of Islamicate civilization; the nature of tradition and the very broad, diffuse complexes of traditions that constitute civilizations, and the way all traditions constantly change; the errors of then-conventional "orientalist" scholarship; the sheer unprecedented weirdness of the modern "technical" age; the need to crush Eurocentrism if we're to understand history (and in particular the "optical illusion" which makes us think there's a "western civilization" going from ancient Greece through Rome to medieval western Europe and modern European states and their off-shoots); and finally the fundamental unity of human history, and how that manifested itself over time.
There is also an introduction by the editor, one Edmund Burke III, which is partly helpful, but also oddly dismissive of Hodgson. However this dismissal just takes the form of saying Hodgson's "culturalist" and doesn't acknowledge Immanuel Wallerstein (of all people!) and the more dodgy sort of Marxist; Burke doesn't even mention a single material error or omission these supposed flaws lead Hodgson into. While I appreciate Burke's work in pulling together the book, I wish he'd thought harder when writing his introduction.

Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Networks; Writing for Antiquity; Islam; The Great Transformation

Posted by crshalizi at February 28, 2010 23:59 | permanent link

February 12, 2010

More Output

My review of Susan Hough's Predicting the Unpredictable: The Tumultuous Science of Earthquake Prediction is out, here and at American Scientist.

If you are in Paris on Monday, you can hear Andrew Gelman talk about our joint paper on the real philosophical foundations of Bayesian data analysis.

Enigmas of Chance; Self-Centered; Philosophy; Incestuous Amplification

Posted by crshalizi at February 12, 2010 13:20 | permanent link

February 04, 2010

Upcoming Gigs: Bristol

I am giving two talks in Bristol next week about (not so coincidentally) my two latest papers.

"The Computational Structure of Spike Trains"
Bristol Centre for Complexity Sciences, SM2 in the School of Mathematics, 2 pm on Tuesday 9 February
Abstract: Neurons perform computations, and convey the results of those computations through the statistical structure of their output spike trains. Here we present a practical method, grounded in the information-theoretic analysis of prediction, for inferring a minimal representation of that structure and for characterizing its complexity. Starting from spike trains, our approach finds their causal state models (CSMs), the minimal hidden Markov models or stochastic automata capable of generating statistically identical time series. We then use these CSMs to objectively quantify both the generalizable structure and the idiosyncratic randomness of the spike train. Specifically, we show that the expected algorithmic information content (the information needed to describe the spike train exactly) can be split into three parts describing (1) the time-invariant structure (complexity) of the minimal spike-generating process, which describes the spike train statistically; (2) the randomness (internal entropy rate) of the minimal spike-generating process; and (3) a residual pure noise term not described by the minimal spike-generating process. We use CSMs to approximate each of these quantities. The CSMs are inferred nonparametrically from the data, making only mild regularity assumptions, via the causal state splitting reconstruction algorithm. The methods presented here complement more traditional spike train analyses by describing not only spiking probability and spike train entropy, but also the complexity of a spike train's structure. We demonstrate our approach using both simulated spike trains and experimental data recorded in rat barrel cortex during vibrissa stimulation.
Joint work with Rob Haslinger and Kristina Lisa Klinkner.
"Dynamics of Bayesian updating with dependent data and misspecified models"
Statistics seminar, Department of Mathematics, Seminar Room SM3, 2:15pm on Friday 20 February
Abstract: Much is now known about the consistency of Bayesian non-parametrics with independent or Markovian data.. Necessary conditions for consistency include the prior putting enough weight on the right neighborhoods of the true distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, plus some basic measure theory, lets me build a sieve-like structure for the prior. The main statistical assumption concerns the compatibility of the prior and the data-generating process, bounding the fluctuations in the log-likelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong.
(More on this paper)

I'll also be lecturing about prediction, self-organization and filtering to the BCCS students.

I presume that I will not spend the whole week talking about statistics, or working on the next round of papers and lectures; is there, I don't know, someplace in Bristol to hear music or something?

Update, 8 February: canceled at the last minute, unfortunately; with some hope of rescheduling.

Self-centered; Enigmas of Chance; Complexity; Minds, Brains, and Neurons

Posted by crshalizi at February 04, 2010 13:48 | permanent link

January 31, 2010

Books to Read While the Algae Grow in Your Fur, January 2010

Virginia Swift, Hello, Stranger
Enjoyable mystery with eccentric academics, God-botherers and gentrification in present-day Laramie. Nth book in a series; I'll keep an eye out for the others.
Intelligence
Smart crime/spook drama set in one of the most attractive cities in the world (Vancouver), which could only be improved if it didn't end in the WORST CLIFFHANGER EVER. (Ahem.) Not, of course, as good as The Wire, but then nothing is.
Daniel Waley, The Italian City-Republics
Short, readable political-institutional history of the communes of northern and central Italy. He begins with the communes starting to take form in the towns and wrest control from their bishops, say around 1000, and ends by about 1400, by which point the towns had almost all, except for Venice, descended into some form of monarchy, generally under the domination of the local feudal land/war-lords. (Waley says little about Venice, which in retrospect seems odd, though it didn't strike me while reading it.) While Waley is good at describing this historical trajectory, he says little about why so many Italian cities followed it. I'd think it'd be natural to compare the Italian case to contemporary cities elsewhere, but I think there is exactly one sentence on them. (I imagine all kinds of interesting comparative work could be or has been done.) But within those limits, it's a nice book. Waley has also written studies on Siena and Orvieto, which sound interesting.
Terry Pratchett, Nation
You don't really need me to recommend Terry Pratchett to you, especially when he's writing about how people find ways to go on when their world has been pointlessly destroyed.
Richard Hofstadter, Anti-Intellectualism in American Life
Astonishingly, this still feels like it fits after a lapse of half a century. The whole "tax-raising, latte-drinking, sushi-eating, Volvo-driving, New-York-Times-reading, body-piercing, Hollywood-loving, left-wing freak-show" nonsense of the last thirty years now makes a lot more sense; and the chapters about the history of American education were frankly a revelation to me. (The chapter on Dewey and his pedagogical influence seems like a model of being respectfully but unrelentingly critical.) No doubt for real historians, this is all painfully outdated, and whatever's actually sound has long since been incorporated into other works, which don't provide such unintentional moments of amusement as, when listing the unfair accusations heaped on Jefferson, including keeping a slave mistress and having children by her. (For that matter I don't care for the Beats very much, but they certainly contributed more to our literature than he thought they would.) Still: the man could write.
ObLinkage: Steve Laniel on AIiAL.
D. N. MacKenzie (trans.), Poems from the Divan of Khushâl Khân Khattak
The first significant body of poetry in Pashto; Khushal was a 17th century warlord in what is now the Northwest Frontier, owing his position to a combination of tribal authority and appointment by the Mughals. This seems to be the most recent translation of a selection from his poetry in English, dating from 1965. It is arranged on no particular principles (some Pashto editions are, following tradition, arranged alphabetically by the first letter of the poem), which produces a rather odd effect, that I might summarize as follows: Khushal is happily in love: wow is the beloved a hottie. Khushal is unhappily in love: separation is awful, especially if it's because the beloved doesn't want to see Khushal. Khushal is a fierce warrior who is also a keen hunter; falconry rules. Khushal has a remarkable capacity for drink. (Go ahead, try and tell me that's allegorical.) Aurangzeb sucks, especially in comparison to his father. (Well, he did, and sticking Khushal in jail can't have won him any points.) The Afghans should rally to Khushal and defeat Aurangzeb! Men are treacherous, false-faced bastards, but Afghans are really worse than the rest. (To be fair, having one of your own sons wage war on you in the name of Aurangzeb has got to be pretty embittering.) Khushal will withdraw from the sinful world and spend his days in pious penance. Khushal glorifies God. Repeat.
My grandfather's extemporized translations were better English poetry, but I will never hear those again.
Moez Draief and Laurent Massoulié, Epidemics and Rumors in Complex Networks
A nice short (< 120 pp.) account of the connections among stochastic network models, branching processes, and epidemic models, of the "susceptible-infectious-susceptible" or "susceptible-infectious-recovered" type, including epidemics on networks. ("Rumors" are assumed to fall under such models.)
They begin with the basic Galton-Watson branching process model, where each member of a population produces a random number of descendants (possibly zero), independently of everyone else, and this distribution is constant both within and across generations. Following over a century of tradition, they look at whether the population survives forever or goes extinct, how large it gets, how long it takes to go extinct if it does, etc. This then gets turned into a simple epidemic model ("member of population" = infected individual). It also maps on to the Erdos-Renyi network model, with "has an edge with" taking the place of "is a descendant of": pick your favorite node, and connect it to a random selection of other nodes, the number following a binomial distribution; connect each of them in turn to more random nodes. The size of the branching process's population corresponds to the size of the connected component in the graph. The mapping really only really works in the limit of low-density graphs (the size of the component is roughly a sum of independent quantities when there are no loops), but it's enough to study the emergence of a giant component and the behavior of the diameter of the graph. As a prelude to more sophisticated models, they then prove a form of Kurtz's Theorem on the convergence of Markov chains to ordinary differential equations in the large-population limit. The second half of the book rehearses Watts-Strogatz small-world and Barabási-Albert scale-free networks (including mention of Yule but not, oddly, of Herbert Simon), before wrapping up with epidemic models on graphs, and the "viral marketing" problem of deciding where, on a known and fixed network, to start an epidemic for maximum impact.
Of course, since it's a mathematics book, the problem of how to link these models to data isn't even dismissed.
This isn't a ground-breaking work, but it's nice to have all this in a single book, and one a bit more accessible than, say, Durrett's Random Graph Dynamics (though by the same token less comprehensive). The implied reader is comfortable with stochastic processes at the level of something like Grimmett and Stirzaker; measure-theoretic issues are avoided, even when discussing Kurtz's Theorem. (Their version is thus much less precise and powerful than his, but vastly easier to understand.) Anyone comfortable with that level of probability could read it without much trouble, and I'd happily use it in a class.
Disclaimer: I read a draft of the manuscript for the publisher in 2007, and they sent me a free copy of the book, but I have no stake in its success.
Joseph L. Graves, Jr., The Emperor's New Clothes: Biological Theories of Race at the Millennium
There are places where he lapses into biological jargon, and others where I think lay readers would have benefited from more detailed rebuttals of the common counter-arguments, but over-all I recommend this very strongly. (Thanks to I.B. for lending me her copy.)
Pascal Massart, Concentration Inequalities and Model Selection
Using empirical process theory, and more specifically concentration of measure, to get finite-sample, i.e., non-asymptotic, risk bounds for various forms of model selection. The basic strategy is to find conditions under which every model in a reasonable class will, with high probability, perform about as well on sample data as they can be expected to do on new data; this involves constraining the richness or flexibility of the model class. A little extra work, and the addition of suitable penalties to the fit, gets bounds that extend over multiple classes of model, even over a countable infinity of classes. Among other highlights, Massart shows why the famous AIC heuristic is often definitely sub-optimal, and how to correct it; it also offers corrections to Vapnik's (much better) structural risk minimization, and a nice treatment of data-set splitting (= 1-fold cross-validation). All of this is for IID data, so the usual caveats apply. Formally self-contained, but realistically some previous exposure to empirical processes (at the level of Pollard's notes if not higher) will be needed. Available for free as a large PDF preprint, but I found it much more convenient to read a dead-tree copy.
Elizabeth Bear, New Amsterdam
Alternate-history fantasy mystery stories. Owing something, perhaps, to Randall Garrett's "Lord Darcy" stories (the name of the heroine is distinctly suspicious), but without their complacency about the benevolence of the powers that be.
David Hand, Heikki Mannila and Padhraic Smyth, Principles of Data Mining
I've used this three times now in teaching 36-350, with about 75 students total over the years. I keep using it because it's the best textbook on data-mining I know. It covers the whole process, soup to nuts: data collection (and the importance of understanding what the data actually mean, if anything), cleaning, databases, model construction, model evaluation, optimization, visualization, etc. All of this is organized around four crucial questions: what kind of pattern are we looking for in the data, and how do we represent those patterns? how do we score representations against each other? how do we search for good representations? what do we need to do to implement that search efficiently? All of the basic methods (and many not so basic ones) are in here, all seen as different answers to these questions. I find its explanations extremely clear, and my students seem to as well. I regard it as a strength that it is not tied to pre-canned software, which would only encourage dependency and thoughtlessness.
The only real competition, to my mind, is Hastie, Tibshirani and Friedman. But the Stanford book is distinctly more about statistics, and has more statistical theory and math (though not, from my point of view, a lot of either), whereas this one is distinctly focused on data-mining and on computation. It would be nice if Hand &c. had material on support vector machines, and more on ensemble methods; perhaps it's time for a second edition?
Disclaimer: I almost took a post-doc under Smyth rather than coming to CMU, back in 2004; also, the MIT Press sent me a free review copy of this book (in 2001).

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Scientifiction and Fantastica; Writing for Antiquity; Afghanistan and Central Asia; The Natural Science of the Human Species; Networks; The Beloved Republic; The Commonwealth of Letters; Learned Folly

Posted by crshalizi at January 31, 2010 23:59 | permanent link

January 19, 2010

The Work of Art in the Age of Mechanical Reproduction

Attention conservation notice: 800+ words of inconclusive art/technological/economic-historical musings.

This thread over at Unfogged reminds me of something that's puzzled me for years, ever since reading this: why didn't prints displace paintings the same way that printed books displaced manuscript codices? Why didn't it become expected that visual artists, like writers, would primarily produce works for reproduction? (No doubt, in that branch of the wave-function*, obsessive fans still want to get the original drawings, but obsessive fans also collect writer's manuscripts, or even their typewriters, as well as their mass-produced books.) 16th century engraving technology was strong enough that it could implement powerful works of art (vide), so that can't be it. And by the 18th century at least writers could make a living (however precarious) from writing for the mass public, so why didn't visual artists (for the most part) do likewise? (Again, it's manifestly not as though technology has regressed.) Why is it still the case that a real, high-class visual artist is someone who makes one-offs? I know that reproductions have been important since at least the late 1800s, but for works and artists who first made their reputation with unique, hand-made objects, which is as though the only books which got sent to the printing press were ones which had already circulated to acclaim in manuscript.

Some possibilities I don't buy:

  1. Aesthetic limitations. There are valuable effects which can be achieved with a big original painting which prints just can't match. Response: there are effects you can achieve with an illuminated, calligraphic manuscript which you can't match with movable type, either. Those weren't valuable enough to keep printed books from taking over. Why the difference? Why not a focus on what can be done through prints, which is quite a lot? (Witness the experience of the 20th century and later, when most art lovers know most works of art they enjoy through reproductions.)
  2. Color. A real limitation; even today, getting color done well in mass visual media is not entirely trivial (cf.), and early modern Europe certainly couldn't do it at all. Response: What makes color so important? We know that some great art was made without its benefit, and we don't really know how much better it could have gotten had prints been the medium of choice. Even if color was all that, it just pushes the shift to the late 19th century.
  3. Artists too expensive. Whether you are producing one painting or a thousand prints, there is a considerable fixed cost to the artist's time and training. (The first print is very expensive.) Individual patrons could afford this; the mass public could not. Response: The same argument would apply to books. Besides, high fixed costs usually drive towards seeking a wider market, so that the fixed costs are distributed over a larger number of people. The argument would have to be one of failure of demand — that where there was one man willing to pay 100 guilders (or whatever) for a painting, there were not, say, 120 people willing to pay 1 guilder for prints. Why not?
  4. Paintings too cheap. There have always been too many people wanting to be visual artists for them to all make a living as original artists. One of the things they could do instead was paint copies. Response: The economy of scale problem still applies.
  5. States too weak. In a competitive market, market prices equal marginal costs. The marginal cost of producing another copy of a print is very, very low, so low that the fixed costs of drawing and designing it in the first place aren't recouped. As usual, then, competitive markets fail massively at producing informational goods. The modern solution is to institute and vigorously enforce intellectual property rights. These are monopoly privileges which the state grants to certain individuals; if anyone tries to compete with these favorites of the powers that be, then "goons with guns" (as my libertarian friends like to say) come to stop them. Doing this requires a really massively powerful and intrusive state, which is a relatively recent phenomenon, and not to be lightly deployed on behalf of artists, of all people. Artists who tried to go the mass-production route would've been even more starvation-prone than those who didn't attempt it. Response: An exactly parallel argument would explain why writers didn't embrace printing.
  6. The revolution has happened. The overwhelming majority of visual artists do aim their work at reproduction; it's just a small minority which continues to produce one-offs. This minority has, however, a lot more cultural prestige. Response: There's some merit to this, but it's bizarre and anomalous; it's not as though our really high-class literature was still illuminated or calligraphic manuscripts, and printing was reserved for declassé "commercial" work.
The most convincing argument I've been able to come up with has to do with how visual artworks were and are used. Even in manuscript, books were for reading: private consumption, or near enough. European culture, however, provided a steady stream of demand for works of visual art for public display, which is rather different. It were just a matter of pictures you'd like to look at for your own enjoyment, perhaps prints would serve. But if it's about decorating the church/guildhall/imposing estate, then you need a unique painting of St. Jerome/the burgomasters/the master of the house. The main point is that the owner has the resources to command their very own artwork, not the work's intrinsic aesthetic properties (which good reproductions would share). But even then, why not develop a second stream of reproducible artwork for private rather than conspicuous consumption? And indeed why not try to achieve similar effects in print, thereby broadcasting the message?

Updates, 31 January 2010: In correspondence, Elihu Gerson points to an interesting-looking book relevant to the social-use explanation.

Also, it seems I should clarify that I am not asking why (as Vukutu puts it) "people desire original works of visual art rather than printed reproductions". If you are going to paint in oils on canvas, then of course making a flat print of the result going to lose some detail of the physical object, and those details might contribute in important ways to people's experience of the object; there might be a real esthetic loss to looking at a reproduction of a painting. What I am asking is why then we do not produce artworks which are designed for reproduction. Or rather, we do produce lots of such art, but it's not seen as very valuable, and generally not even real art in the honorific sense. "Printed reproductions of physical paintings lose valuable details" does not answer "Why did our visual arts continue to focus on making one-off works?", unless you perhaps you add some extra premises, like (i) no print-reproducible image could be as esthetically valuable as a three-dimensional painting, and (ii) that difference in intrinsic quality was extremely important to the people who consumed art, and I am very dubious about both of these.

Finally, I don't think it's sufficient to point to "tradition", since traditions change all the time. That deserves another argument, but another time. In lieu of which, I'll just offer a quotation from a favorite book, Joseph (Abu Thomas) Levenson's Confucian China and Its Modern Fate; he is writing about ideas, but as he makes clear, what he says applies just as much to aesthetic or practical choices as to intellectual ones.

With the passing of time, ideas change. This statement is ambiguous, and less banal than it seems. It refers to thinkers in a given society, and it refers to thought. With the former shade of meaning, it seems almost a truism: men may change their minds or, at the very least, make a change from the mind of their fathers. Ideas at last lose currency, and new ideas achieve it. If we see an iconoclastic Chinese rejection, in the nineteenth and twentieth centuries, of traditional Chinese beliefs, we say that we see ideas changing.

But an idea changes not only when some thinkers believe it to be outworn but when other thinks continue to hold it. An idea changes in its persistence as well as in its rejection, changes "in itself" and not merely in its appeal to the mind. While iconoclasts relegate traditional ideas to the past, traditionalists, at the same time, transform traditional ideas in the present.

This apparently paradoxical transformation-with-preservation of a traditional idea arises form a change in its world, a change in the thinker's alternatives. For (in a Taoist manner of speaking) a thought includes what its thinker eliminates; an idea has its particular quality from the fact that other ideas, expressed in other quarters, are demonstrably alternatives. An idea is always grasped in relative association, never in absolute isolation, and no idea, in history, keeps a changeless self-identity. An audience which appreciates that Mozart is not Wagner will never hear the eighteenth-century Don Giovanni. The mind of a nostalgic European medievalist, though it may follow its model in the most intimate, accurate detail, is scarcely the mirror of a medieval mind; there is sophisticated protest where simple affirmation is meant to be. And a harried Chinese Confucianist among modern Chinese iconoclasts, however scrupulously he respects the past and conforms to the letter of tradition, has left his complacent Confucian ancestors hopelessly far behind him...

An idea, then, is a denial of alternatives and an answer to a question. What a man really means cannot be gathered solely from what he asserts; what he asks and what other men assert invest his ideas with meaning. In no idea does meaning simply inhere, governed only by it degree of correspondence with some unchanging objective reality, without regard to the problems of its thinker. [pp. xxvii--xxviii; for context, this passage was first published in 1958]

*: With apologies to the blogger formerly known as "the blogger formerly known as 'The Statistical Mechanic' ".

Manual trackback: Mostly Hoofless; 3 Quarks Daily; Cliopatria (!); Vukutu.

Writing for Antiquity

Posted by crshalizi at January 19, 2010 22:01 | permanent link

Three-Toed Sloth:   Hosted, but not endorsed, by the Center for the Study of Complex Systems