Jordan Ellenberg at Quomodocumque links to an old article he wrote about the expected value of lottery tickets. Despite the fact that the article is in Slate, it is free of knee-jerk contrarianism, and this so disturbs the fundamental order of the universe that I feel like I have to supply some of my own. I claim, therefore, that playing the lottery can be quite rational in cost-benefit terms, even if the expected monetary value of the ticket is negative, and one is risk averse. (And what God died and left expected utility in charge?)
The benefit to playing the lottery comes entirely between buying the ticket, and when the winner is revealed. During this interval, someone who has bought the ticket can entertain the idea that they might win, and pleasantly imagine how much better their life could be with the money, what they would do with it, etc. It's true that in some sense you always could thinking about "what if I had $280 million?", but many people find it very hard to get our imaginations going on sheer will-power. A plausible and concrete path to the riches, no matter how low the probability, serves as a hook on which to suspend disbelief. In this regard, indeed, lottery tickets are arguably quite cost-effective. If a $1 lottery ticket licenses even one hour of imagining a different life, I don't see how people who spend $12 for two or three hours of such imagining at a movie theater, or $25 for ten hours at a bookstore, are in any position to talk.
Despite having held this idea for years, I have never played the lottery, because I couldn't begin to make myself believe.
Posted by crshalizi at December 31, 2010 19:43 | permanent link
As a reward for the fortitude everyone showed during
the Dark
Gods' simultaneous assault on the Sun and the Moon, I bring the traditional
Hogswatchnight sausage products book
reviews: of Karl
Sigmund's The Calculus of Selfishness;
of Kurt
Jacobs's Stochastic Processes for Physicists: Understanding Noisy
Systems; and
of Josiah
Ober's Democracy and Knowledge: Innovation and Learning in Classical
Athens.
Now what did you the Hogfather get me this
year?
The Collective Use and Evolution of Concepts; Biology; Mathematics; Complexity; Physics; The Dismal Science; Commit a Social Science; Writing for Antiquity; Enigmas of Chance
Posted by crshalizi at December 24, 2010 01:08 | permanent link
A public service announcement: the Santa Fe Institute's annual complex systems summer school is taking applications online, through January 7th. The summer schools are one of the best things SFI does, a very intense intellectual experience in a beautiful setting with a lot of scary-bright peers and outstanding instructors (though occasionally they let me rant too*). If you are a graduate student or post-doc and read this weblog, the odds are very good that you would be interested in the summer school; so apply, why don't you?
*: Speaking of which, I don't think I ever linked to last year's lecture slides [1, 2, 3].
Posted by crshalizi at December 18, 2010 21:10 | permanent link
Attention conservation notice: Yet more ramblings about social thought and science fiction, including a long quotation from a work of philosophy written half a century ago, and some notions I already aired years ago, when writing about Lem and Sterling.
People on the Web are interested in the Singularity: who knew? Further to the question, and especially to Henry Farrell's post (on Patrick Nielsen Hayden's post on my post...), a long passage from Ernest Gellner's Thought and Change (University of Chicago Press, 1965). This comes as Gellner — a committed liberal — is discussing the "undemonstrability of our liberal values".
The ethical models which do happen to be relevant in our time and are applicable, partly at least, it is argued, include the Rail and the "Hypothetical Imperative" (target) type: the view that certain things are good because inevitable, and some others are such because they are means to things whose desirability cannot in practice be doubted (i.e. wealth, health and longevity rather than their opposites).Their application gives us as corollaries the two crucial and central values of our time — the attainment of affluence and the satisfaction of nationalism. "Affluence" means, in effect, a kind of consummation of industrial production and application of science to life, the adequate and general provision of the means of a life free from poverty and disease; "nationalism" requires, in effect, the attainment of a degree of cultural homogeneity within each political unit, sufficient to give the members of the unit a sense of participation. [CRS: I omit here a footnote in which Gellner refers to a later chapter on his theory of nationalism.]
The two logical paths along which one can reach these two conclusions, which in any case are hardly of staggering originality, are not independent of each other: nor are these two modes of reason very distinct from the general schema of the argument of this book — the attempt to see what conclusions are loosely implicit — nothing, alas, is implicit rigorously — in a lucid estimate of our situation.
These two data or limitations on further argument do not by any means uniquely determine either the final form of industrial society, or the path of its attainment. This is perhaps fortunate: it is gratifying to feel that doors remain open. The difficulties stem perhaps from the fact that they are too open, philosophically. If we look back at the logical devices men have employed to provide anchorages for their values, we see how very unsatisfactory they are: these seeming anchorages are themselves but floating seaweed. The notion of arguing from given desires and satisfactions (as in Utilitarianism), from the true nature of man (as in the diverse variants of the Hidden Prince doctrine, such as Platonism), from a global entelechy, or the notion of harmony, etc., are all pretty useless, generally because they assume something to be fixed which is in fact manipulable. But our needs are not fixed (except for certain minimal ones which are becoming so easy to satisfy as to pose no long-term problem). We have no fixed essence. The supposed anchorage turns out to be as free-floating as the ship itself, and usually attached to it. They provide no fixed point. That which is itself a variable cannot help to fix the value of the others.
This point has been obscured in the past by the fact that, though it was true in theory, it was not so in practice: human needs, the image of man, etc., were not changing very rapidly, still less were they under human control.
This openness, combined with a striking shortage of given premisses for fixing its specific content, gives some of us at any rate a sense of vertigo when contemplating the future of mankind. This vertigo seems to me somehow both justified and salutary. Once it was much in evidence: in a thinker such as H. G. Wells, futuristic fantasies were at the same time exercises in political theory. Today, science fiction is a routinised genre, and political theory is carefully complacent and dull — and certainly not, whatever else might be claimed for it, vertiginous. But whilst it is as well to be free of messianic hopes, it is good to retain some sense of vertigo.
This openness itself provides the clue to one additional, over-riding value — liberty, the preservation of open possibilities, both for individuals and collectively.
There is a lot that could be said about this — the difference between this sort of anti-foundationalism and, say, Rorty's; Gellner's confidence that satisfying minimal human needs is, or soon will be, a solvable problem is something I actually agree with, but oh does it ever sound complacent today; and so for that matter does his first person plural — but I want to focus on the next to last bit, about science fiction. It was pretty plain by, oh, 1848 at the latest that the kind of scientific knowledge we have now, and the technological power that goes with it, radically alters, and even more radically expands, the kind of societies are possible, lets us live our lives in ways profoundly different from our ancestors. (For instance, we can have affluence and liberty.) How then should we live? becomes a question of real concern, because we have, in fact, the power to change ourselves, and are steadily accruing more of it.
This, I think, is the question at the heart of science fiction at its best. (This meshes with Jo Walton's apt observation that one of the key aesthetic experiences of reading SF is having a new world unfold in one's mind.) Now it is clear that the vast majority of it is rehashing familiar themes and properties, and transparently projecting the social situation of its authors. I like reading that anyway, even when I can see how it would be generated algorithmically (perhaps just by a finite-state machine). Admittedly, I have no taste, but I actually think there is a lot to be said for this sort of entertainment, which has in any case been going on for quite a while now]. (As I may have said before, TV Tropes is the Morphology of the Folk-Tale of our time.) But sometimes, SF can break beyond that, to approach the question What should we make of our ourselves? with the imagination, and vertigo, it deserves.
Update, later that afternoon: Coincidentally, Paul McAuley.
Manual trackback: Crooked Timber (making me wonder if I shouldn't also file this under "incestuous amplification"!)
Posted by crshalizi at December 09, 2010 15:34 | permanent link
Attention conservation notice: Of interest only if you (1) happen to be in Ann Arbor today, and (2) care about causal inference in social networks. Also, this post was meant to go live over the weekend, but I messed up the timing, so it's probably too late for you to make plans.
I'll be speaking today at the Center for the Study of Complex Systems at the University of Michigan:
Manual trackback of a sort: AnnArbor.com
Posted by crshalizi at December 07, 2010 09:30 | permanent link
Attention conservation notice: 5000+ words, and many equations, about a proposal to improve, not macroeconomic models, but how such models are tested against data. Given the actual level of debate about macroeconomic policy, isn't it Utopian to worry about whether the most advanced models are not being checked against data in the best conceivable way? What follows is at once self-promotional, technical, and meta-methodological; what would be lost if you checked back in a few years, to see if it has borne any fruit?
Some months ago, we — Daniel McDonald, Mark Schervish, and I — applied for one of the initial grants from the Institute for New Economic Thinking, and, to our pleasant surprise, actually got it. INET has now put out a press release about this (and the other awards too, of course), and I've actually started to get some questions about it in e-mail; I am a bit sad that none of these berate me for becoming a tentacle of the Soros conspiracy.
To reinforce this signal, and on the general principle that there's no publicity like self-publicity, I thought I'd post something about the grant. In fact what follows is a lightly-edited version of our initial, stage I proposal, which was intended for maximal comprehensibility, plus some more detailed bits from the stage II document. (Those of you who sense a certain relationship between the grant and Daniel's thesis proposal are, of course, entirely correct.) I am omitting the more technical parts about our actual plans and work in progress, because (i) you don't care; (ii) some of it is actually under review already, at venues which insist on double-blinding; and (iii) I'll post about them when the papers come out. In the meanwhile, please feel free to write with suggestions, comments or questions.
Update, next day: And already I see that I need to be clearer. We are not trying to come up with a new macro forecasting model. We are trying to put the evaluation of macro models on the same rational basis as the evaluation of models for movie recommendations, hand-writing recognition, and search engines.
Macroeconomic forecasting is, or ought to be, in a state of confusion. The dominant modeling traditions among academic economists, namely dynamic stochastic general equilibrium (DSGE) and vector autoregression (VAR) models, both spectacularly failed to forecast the financial collapse and recession which began in 2007, or even to make sense of its course after the fact. Economists like Narayana Kocherlakota, James Morley, and Brad DeLong have written about what this failure means for the state of macroeconomic research, and Congress has held hearings in an attempt to reveal the perpetrators. (See especially the testimony by Robert Solow.) Whether existing approaches can be rectified, or whether basically new sorts of models are needed, is a very important question for macroeconomics, and, because of the privileged role of economists in policy making, for the public at large.
Largely unnoticed by economists, over the last three decades statisticians and computer scientists have developed sophisticated methods of model selection and forecast evaluation, under the rubric of statistical learning theory. These methods have revolutionized pattern recognition and artificial intelligence, and the modern industry of data mining would not exist without it. Economists' neglect of this theory is especially unfortunate, since it could be of great help in resolving macroeconomic disputes, and determining the reliability of whatever models emerge for macroeconomic time series. In particular, these methods guarantee with high probability that the forecasts produced by models estimated with finite amounts of data will be accurate. This allows for immediate model comparisons without appealing to asymptotic results or making strong assumptions about the data generating process, in stark contrast to AIC and similar model selection criteria. These results are also provably reliable unlike the pseudo-cross validation approach often used in economic forecasting whereby the model is fit using the initial portion of a data set and evaluated on the remainder. (For illustrations of the last, see, e.g., Athanasopoulos and Vahid, 2008; Faust and Wright, 2009; Christoffel, Coenen, and Warne, 2008; Del Negro, Schorfheide, Smets, and Wouters, 2004; and Smets and Wouters, 2007. This procedure can be heavily biased: the held out data is used to choose the model class under consideration, the distributions of the test set and the training set may be different, and large deviations from the normal course of events [e.g., the recessions in 1980--82] may be ignored.)
In addition to their utility for model selection, these methods give immediate upper bounds for the worst case prediction error. The results are easy to understand and can be reported to policy makers interested in the quality of the forecasts. We propose to extend proven techniques in statistical learning theory so that they cover the kind of models and data of most interest to macroeconomic forecasting, in particular exploiting the fact that major alternatives can all be put in the form of state-space models.
To properly frame our proposal, we review first the recent history and practice of macroeconomic forecasting, followed by the essentials of statistical learning theory (in more detail, because we believe it will be less familiar). We then describe the proposed work and its macroeconomic applications.
Through the 1970s, macroeconomic forecasting tended to rely on "reduced-form" models, predicting the future of aggregated variables based on their observed statistical relationships with other aggregated variables, perhaps with some lags, and with the enforcement of suitable accounting identities. Versions of these models are still in use today, and have only grown more elaborate with the passage of time; those used by the Federal Reserve Board of Governors contain over 300 equations. Contemporary vector autoregression models (VARs) are in much the same spirit.
The practice of academic macroeconomists, however, switched very rapidly in the late 1970s and early 1980s, in large part driven by the famous "critique" of such models by Lucas (published in 1976). He argued that even if these models managed to get the observable associations right, those associations were the aggregated consequences of individual decision making, which reflected, among other things, expectations about variables policy-makers would change in response to conditions. This, Lucas said, precluded using such models to predict what would happen under different policies.
Kydland and Prescott (1982) began the use of dynamic stochastic general equilibrium (DSGE) models to evade this critique. The new aim was to model the macroeconomy as the outcome of individuals making forward-looking decisions based on their preferences, their available technology, and their expectations about the future. Consumers and producers make decisions based on "deep" behavioral parameters like risk tolerance, the labor-leisure trade-off, and the depreciation rate which are supposedly insensitive to things like government spending or monetary policy. The result is a class of models for macroeconomic time series that relies heavily on theories about supposedly invariant behavioral and institutional mechanisms, rather than observed statistical associations.
DSGE models have themselves been heavily critiqued in the literature for ignoring many fundamental economic and social phenomena --- we find the objections to the representative agent assumption particularly compelling --- but we want to focus our efforts on a more fundamental aspect of these arguments. The original DSGE model of Kydland and Prescott had a highly stylized economy in which the only source of uncertainty was the behavior of productivity or technology, whose log followed an AR(1) process with known-to-the-agent coefficients. Much of the subsequent work in the DSGE tradition has been about expanding these models to include more sources of uncertainty and more plausible behavioral and economic mechanisms. In other words, economists have tried to improve their models by making them more complex.
Remarkably, there is little evidence that the increasing complexity of these models actually improves their ability to predict the economy. (Their performance over the last few years would seem to argue to the contrary.) For that matter, much the same sort of questions arise about VAR models, the leading alternatives to DSGEs. Despite the elaborate back-story about optimization, the form in which a DSGE is confronted with the data is a "state-space model," in which a latent (multivariate) Markov process evolves homogeneously in time, and observations are noisy functions of the state variables. VARs also have this form, as do dynamic factor models, and all the other leading macroeconomic time series models we know of. In every case, the response to perceived inadequacies of the models is to make them more complex.
The cases for and against different macroeconomic forecasting models are partly about economic theory, but also involve their ability to fit the data. Abstractly, these arguments have the form "It would be very unlikely that my model could fit the data well if it got the structure of the economy wrong; but my model does fit well; therefore I have good evidence that it is pretty much right." Assessing such arguments depends crucially on knowing how well bad models can fit limited amounts of data, which is where we feel we can make a contribution to this research.
Statistical learning theory grows out of advances in non-parametric statistical
estimation and in machine learning. Its goal is to control the risk or
generalization error of predictive models, i.e., their expected inaccuracy on
new data from the same source as that used to fit the model. That is, if the
model f predicts outcomes Y from inputs X and the loss function is
(e.g., mean-squared error or negative log-likelihood), the risk of the model is
However, economists, like other scientists, never have a single model with
no adjustable parameters fixed for them in advance by theory. (Not even the
most enthusiastic calibrators claim as much.) Rather, there is a class of
plausible models , one of which in particular is picked
out by minimizing the in-sample loss --- by least squares, or maximum
likelihood, or maximum a posteriori probability, etc. This means
Using more flexible models (allowing more general functional forms or
distributions, adding parameters, etc.) has two contrasting effects. On the
one hand, it improves the best possible accuracy, lowering the minimum of the
true risk R(f). On the other hand, it also increases the ability to, as it
were, memorize noise, raising for any fixed sample size
n. This qualitative observation --- a generalization of the bias-variance
trade-off from basic estimation theory --- can be made usefully precise by
quantifying the complexity of model classes. A typical result is a confidence
bound on
(and hence on the over-fitting), say that with probability
at least
,
Several inter-related model complexity measures are now available. The
oldest, called "Vapnik-Chervonenkis
dimension," effectively counts how many different data
sets can fit well by tuning the parameters in the model.
Another, "Rademacher complexity," directly measures the ability
of
to correlate with finite amounts of white noise
(Bartlett and
Mendelson,
2002; Mohri
and Rostamizadeh, 2009). This leads to particularly nice bounds of the
form
However we measure model complexity, once we have done so and have established
risk bounds, we can use those bounds for two purposes. One is to give a sound
assessment of how well our model will work in the future; this has clear
importance if the model's forecasts will be used to guide individual actions or
public policy. The other aim, perhaps even more important here, is to select
among competing models in a provably reliable way. Comparing in-sample
performance tends to pick complex models which over-fit. Adding heuristic
penalties based on the number of parameters, like the Akaike information
criterion (AIC), also does not solve the problem, basically because AIC
corrects for the average size of
over-fitting but ignores the variance (and higher moments). But if we
could instead use as our penalty, we would
select the model which actually will generalize better. If we only have a
confidence limit on
and use that as our penalty, we select the better model with high
confidence and can in many cases calculate the extra risk that comes from model
selection (Massart, 2007).
Statistical learning theory has proven itself in many practical applications, but most of its techniques have been developed in ways which keep us from applying it immediately to macroeconomic forecasting; we propose to rectify this deficiency. We anticipate that each of the three stages will require approximately a year. (More technical details follow below.)
First, we need to know the complexity of the model classes to which we wish to apply the theory. We have already obtained complexity bounds for AR(p) models, and are working to extend these results to VAR(p) models. Beyond this, we need to be able to calculate the complexity of general state-space models, where we plan to use the fact that distinct histories of the time series lead to different predictions only to the extent that they lead to different values of the latent state. We will then refine those results to find the complexity of various common DSGE specifications.
Second, most results in statistical learning theory presume that successive data points are independent of one another. This is mathematically convenient, but clearly unsuitable for time series. Recent work has adapted key results to situations where widely-separated data points are asymptotically independent ("weakly dependent" or "mixing" time series) [Meir, 2000; Mohri and Rostamizadeh, 2009, 2010; Dedecker et al., 2007]. Basically, knowing the rate at which dependence decays lets one calculate how many effectively-independent observations the time series has and apply bounds with this reduced, effective sample size. We aim to devise model-free estimates of these mixing rates, using ideas from copulas and from information theory. Combining these mixing-rate estimates with our complexity calculations will immediately give risk bounds for DSGEs, but not just for them.
Third, a conceptually simple and computationally attractive alternative to using learning theory to bound over-fitting is to use an appropriate bootstrap for dependent data to estimate generalization error. However, this technique currently has no theoretical basis, merely intuitive plausibility. We will investigate the conditions under which bootstrapping can yield non-asymptotic guarantees about generalization error.
Taken together, these results can provide probabilistic guarantees on a proposed forecasting model's performance. Such guarantees can give policy makers reliable empirical measures which intuitively explain the accuracy of a forecast. They can also be used to pick among competing forecasting methods.
As we said, there has been very little use of modern learning theory in economics (Al-Najjar, 2009 is an interesting, but entirely theoretical, exception), and none that we can find in macroeconomic forecasting. This is an undertaking which requires both knowledge of economics and of economic data, and skill in learning theory, stochastic processes, and prediction theory for state-space models. We aim to produce results of practical relevance to forecasting, and present them in such a way that econometricians, at least, can grasp their relevance.
If all we wanted to do was produce yet another DSGE, or even to improve the approximation methods used in DSGE estimation, there would be plenty of funding sources we could turn to, rather than INET. We are not interested in making those sorts of incremental advances (if indeed proposing a new DSGE is an "advance"). We are not even particularly interested in DSGEs. Rather, we want to re-orient how economic forecasters think about basic issues like evaluating their accuracy and comparing their models --- topics which should be central to empirical macroeconomics, even if DSGEs vanished entirely tomorrow. Thus INET seems like a much more natural sponsor than institutions with more of a commitment to existing practices and attitudes in economics.
[Detailed justification of our draft budget omitted]
In what follows, we provide a more detailed exposition of the technical content of our proposed work, including preliminary results. This is, unavoidably, rather more mathematical than our description above.
The initial work described here builds mainly on the work of Mohri and Rostamizadeh, 2009, which offers a handy blueprint for establishing data-dependent risk bounds which will be useful for macroeconomic forecasters. (Whether this bound is really optimal is another question we are investigating.) The bound that they propose has the general form
As mentioned earlier, statistical learning theory provides several ways of
measuring the complexity of a class of predictive models. The results we are
using here rely on what is known as the Rademacher complexity, which can be thought of as
measuring how well the model can (seem to) fit white noise. More specifically,
when we have a class of prediction functions f,
the Rademacher complexity of the class is
The idea, stripped of the technicalities required for actual implementation,
is to see how well our models could seem to fit outcomes which were
actually just noise. This provides a kind of baseline against which to assess
the risk of over-fitting, or failing to generalize. As the sample
size n grows, the sample correlation
coefficients
will approach 0 for each particular f, by the law of large numbers;
the over-all Rademacher complexity should also shrink, though more slowly,
unless the model class is so flexible that it can fit absolutely anything, in
which case one can conclude nothing about how well it will predict in the
future from the fact that it performed well in the past.
One of our goals is to calculate the Rademacher complexity of stationary state-space models. [Details omitted.]
Because time-series data are not independent, the number of data
points n in a sample S is no longer a good
characterization of the amount of information available in that sample. Knowing
the past allows forecasters to predict future data points to some degree, so
actually observing those future data points gives less information
about the underlying data generating process than in the case of iid data. For
this reason, the sample size term must be adjusted by the amount of dependence
in the data to determine the effective sample size
which can be much less than the true sample size n. These sorts of
arguments can be used to show that a typical data series
used for macroeconomic forecasting, detrended growth rates of US GDP from 1947
until 2010, has around n=252 actual data points, but an effective
sample size of
. To determine the effective sample
size to use, we must be able to estimate the dependence of a given time
series. The necessary notion of dependence is called the mixing rate.
Estimating the mixing rates of time-series data is a problem that has not
been well studied in the
literature. According
to Ron Meir, "as far as we are aware, there is no efficient practical
approach known at this stage for estimation of mixing parameters". In this
case, we need to be able to estimate a quantity known as
the -mixing rate.
Definition. Let(Herebe a stationary sequence of random variables or stochastic process with joint probability law
. For
, let
, the
-field generated by the observations between those times. Let
be the restriction of
to
with density
,
be the restriction of
to
with density
, and
the restriction of
to
with density
. Then the
-mixing coefficient at lag
is
The stochastic process X is called " -mixing"
if
as
,
meaning that the joint probability of events which are widely separated in time
increasingly approaches the product of the individual probabilities ---
that X is asymptotically independent.
The form of the definition of the -mixing coefficient
suggests a straightforward though perhaps naive procedure: use nonparametric
density estimation for the two marginal distributions as well as the joint
distribution, and then calculate the total variation distance by numerical
integration. This would be simple in principle, and could give good results;
however, one would need to show not just that the procedure was consistent, but
also learn enough about it that the generalization error bound could be
properly adjusted to account for the additional uncertainty introduced by using
an estimate rather than the true quantity. Initial numerical experiments on
the naive are not promising, but we are pursuing a number of more refined
ideas.
While intuitively plausible, there is no theory, yet, which says that the results of this bootstrap will actually control the generalization error. Deriving theoretical results for this type of bootstrap is the third component of our grant application.
Manual trackback: Economics Job Market Rumors [!]
Posted by crshalizi at December 02, 2010 12:55 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Minds, Brains, and Neurons; The Collective Use and Evolution of Concepts; Enigmas of Chance; Physics; Philosophy
Posted by crshalizi at November 30, 2010 23:59 | permanent link
Attention conservation notice: Yet another semi-crank pet notion, nursed quietly for many years, now postedin the absence of new thoughtsbecause reading The Half-Made World brought it back to mind.
The Singularity has happened; we call it "the industrial revolution" or "the long nineteenth century". It was over by the close of 1918.
Exponential yet basically unpredictable growth of technology, rendering long-term extrapolation impossible (even when attempted by geniuses)? Check.
Massive, profoundly dis-orienting transformation in the life of humanity, extending to our ecology, mentality and social organization? Check.
Annihilation of the age-old constraints of space and time? Check.
Embrace of the fusion of humanity and machines? Check.
Creation of vast, inhuman distributed systems of information-processing, communication and control, "the coldest of all cold monsters"? Check; we call them "the self-regulating market system" and "modern bureaucracies" (public or private), and they treat men and women, even those whose minds and bodies instantiate them, like straw dogs.
An implacable drive on the part of those networks to expand, to entrain more and more of the world within their own sphere? Check. ("Drive" is the best I can do; words like "agenda" or "purpose" are too anthropomorphic, and fail to acknowledge the radical novely and strangeness of these assemblages, which are not even intelligent, as we experience intelligence, yet ceaselessly calculating.)
Why, then, since the Singularity is so plainly, even intrusively, visible in our past, does science fiction persist in placing a pale mirage of it in our future? Perhaps: the owl of Minerva flies at dusk; and we are in the late afternoon, fitfully dreaming of the half-glimpsed events of the day, waiting for the stars to come out.
Manual trackback: Gearfuse; Random Walks; Text Patterns; The Daily Dish; The Slack Wire; Making Light (I am not worthy! Also, the Nietzsche quote is perfect); J. S. Bangs; Daily Grail; Crooked Timber; Peter Frase; Blogging the Hugo Winners; The Essence of Mathematics Is Its Freedom; der Augenblick; Monday Evening; The Duck of Minerva (appropriately enough)
Posted by crshalizi at November 28, 2010 11:00 | permanent link
Attention conservation notice: 900 words of wondering what the scientific literature would look like if it were entirely a product of publication bias. Veils the hard-won discoveries of actual empirical scientists in vague, abstract, hyper-theoretical doubts, without alleging any concrete errors. A pile of skeptical nihilism, best refuted by going back to the lab.
I have been musing about the following scenario for several years now, without ever getting around to doing anything with it. Since it came up in conversation last month between talks in New York, now seems like as good a time as any to get it out of my system.
Imagine an epistemic community that seeks to discover which of a large set of postulated phenomena actually happen. (The example I originally had in mind was specific foods causing or preventing specific diseases, but it really has nothing to do with causality, or observational versus experimental studies.) Let's build a stochastic model of this. At each time step, an investigator will draw a random candidate phenomenon from the pool, and conduct an appropriately-designed study. The investigator will test the hypothesis that the phenomenon exists, and calculate a p-value. Let's suppose that this is all done properly (no dead fish here), so that the p-value is uniformly distributed between 0 and 1 when the hypothesis is false and the phenomenon does not exist. The investigator writes up the report and submits it for publication.
What happens next depends on whether the phenomenon has entered the published literature already or not. If it has, the new p-value is allowed to be published. If it has not, the report is published if, and only if, the p-value is < 0.05. This is the "file-drawer problem": finding a lack of evidence for a phenomenon is publication-worthy only if people thought it existed.
The community combines the published p-values in some fashion — reasonably exact solutions to this problem were devised by R. A. Fisher and Karl Pearson in the 1930s, leading to Neyman's smooth test of goodness of fit, but I have been told by a psychologist that "of course" one should just use the median of the published p-values. Different rules of combination will lead to slightly different forms of this model.
The last assumption of the model is that, sadly, none of the phenomena the community is interested in exist. All of their null hypotheses are, strictly speaking, true. Just as neutral models of evolution are ones which have all sorts of evolutionary mechanisms except selection, this is a model of the scientific process without discovery. Since, by assumption, everyone does their calculations correctly and honestly, if we could look at all the published and unpublished p-values they'd be uniformly distributed between 0 and 1. But the first published p-value for any phenomenon is uniformly distributed between 0 and 0.05. A full 2% of initial announcements will have an impressive-seeming (nominal) significance level of 10-3.
Of course, when people try to replicate those initial findings, their p-values will be distributed between 0 and 1. The joint distribution of p-values from the initial study and m attempts at replication will be a product of independent uniforms, one on [0, 0.05] and m of them on [0,1]. What follows from this will depend on the exact rule used to aggregate individual studies, and on doing some calculations I have never pushed through, so I will structure it as a series of "exercises for the reader".
Let me draw the moral. Even if the community of inquiry is both too clueless to make any contact with reality and too honest to nudge borderline findings into significance, so long as they can keep coming up with new phenomena to look for, the mechanism of the file-drawer problem alone will guarantee a steady stream of new results. There is, so far as I know, no Journal of Evidence-Based Haruspicy filled, issue after issue, with methodologically-faultless papers reporting the ability of sheeps' livers to predict the winners of sumo championships, the outcome of speed dates, or real estate trends in selected suburbs of Chicago. But the difficulty can only be that the evidence-based haruspices aren't trying hard enough, and some friendly rivalry with the plastromancers is called for. It's true that none of these findings will last forever, but this constant overturning of old ideas by new discoveries is just part of what makes this such a dynamic time in the field of haruspicy. Many scholars will even tell you that their favorite part of being a haruspex is the frequency with which a new sacrifice over-turns everything they thought they knew about reading the future from a sheep's liver! We are very excited about the renewed interest on the part of policy-makers in the recommendations of the mantic arts...
Update, later that same day: I meant to mention this classic paper on the file-drawer problem, but forgot because I was writing at one in the morning.
Update, yet later: sense-negating typo fixed, thanks to Gustavo Lacerda.
Manual trackback: Wolfgang Beirl; Matt McIrvin's Steam-Operated World of Yesteryear; Idiolect
Modest Proposals; Learned Folly; The Collective Use and Evolution of Concepts; Enigmas of Chance
Posted by crshalizi at November 16, 2010 01:30 | permanent link
This should be interesting:
As always, the seminar is free and open to the public, but I should probably add, considering the topic, that if you come and talk like a crazy person you will be ignored and/or mocked and rebuked.
Posted by crshalizi at November 10, 2010 11:00 | permanent link
Speaking of listening to my inner economist: does Gary Becker charge graduate students a premium for supervising their dissertations? If not, shouldn't he, especially given the unusually high "degree of elite solidarity and hierarchical control over the placement of ... graduate students" in economics?
(And while I'm thinking about this, why hasn't anyone built PhDMeatMarket.com yet?)
Posted by crshalizi at November 09, 2010 10:10 | permanent link
This is the undergraduate "advanced data analysis", not to be confused with the graduate projects course I'm teaching right now. Actually, they used to be much more similar, but due to the uncanny growth of the undergraduate major, I will have seventy or so students in 402, and all of them doing projects is more than we can cope with. (My inner economist says that the statistics department should leave the curriculum alone and just keep raising the threshold for passing our classes until the demand for being a statistics major balances the supply of faculty energy, as per Parkinson's "The Short List, or Principles of Selection", but fortunately no one listens to my inner economist.) So about a dozen will do projects in 36-490, as last year, and everyone will learn about methods.
Update, 15 November: The class webpage will be here. Also: this is the same class as 36-608; graduate students should register under the latter number.
Posted by crshalizi at November 08, 2010 16:20 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction Fiction and Fantastica; Cthulhiana; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Philosophy
Posted by crshalizi at October 31, 2010 23:59 | permanent link
Regular service will resume shortly after the AIStats deadline, but I wanted to mention one of the highlights of the trip, because it's only too apt for what I'll be doing in the meanwhile. When Danny Yee kindly showed me around Oxford, we not only got to see a fragment of the Difference Engine, but I also encountered the only technology for reliably obtaining correctly specified models. This will henceforth replace references to "the Oracle" or "the Regression Model Fairy" in my lectures; "for angels are very bright mirrors".
Now back to wrestling with the MSS.
Posted by crshalizi at October 31, 2010 11:41 | permanent link
Attention conservation notice: Of limited interest if you (1) will not be in Pittsburgh on Monday, or (2) do not use the Web.
One of the first things I have the students in data mining read is "Amazon.com Recommendations Understand Area Woman Better Than Husband", from America's finest news source. The topic for next week's seminar is how to harness the power of statistical modeling to make recommendation engines even more thoughtful, attentive, delightful and broad-minded (all qualities for which statisticians are, of course, especially noted in our personal lives).
As always, the seminar is free and open to the public.
Posted by crshalizi at October 14, 2010 19:57 | permanent link
I'll be traveling for much of the rest of the month to give talks.
Between traveling, and needing to revise (or, in two cases, write) my talks, I am going to be an even worse correspondent than usual.
Posted by crshalizi at October 09, 2010 09:50 | permanent link
Wise and well-traveled readers! Can anyone recommend a hotel in London which is (in decreasing priority) not too expensive, walking distance from 12 Errol Street, and convenient to public transport?
Update, 9 October: Thanks again to everyone who wrote with suggestions, and helped me find my hotel.
Posted by crshalizi at October 07, 2010 19:29 | permanent link
Our usual assumption in statistics is that the world is capricious and haphazard, but is not trying fooling us. When we are fooled, and fall into error, it is due to fluctuations and our own intemperance, not to malice. We carry this attitude over to machine learning; when our models over-fit, we think it an excess of optimism (essentially, the winner's curse). Theologically, we think of evil as an absence (the lack of infinite data), rather than an independent and active force. Theoretical computer scientists, however, are traditionally rather more Manichean. They have developed a school of machine learning which tries to devise algorithms which are guaranteed to do well no matter what data the Adversary throws at them, somewhat misleadingly known as "online learning". (A fine introduction to this is Prediction, Learning, and Games.) There turn out to be important connections between online and statistical learning, and one of the leading explorers of those connections is
(I know I got the Augustinian vs. Manichean learning bit from Norbert Wiener, but I cannot now find the passage.)
Posted by crshalizi at October 05, 2010 17:35 | permanent link
... that I get my eagerly-awaited copy of Red Plenty on the anniversary of the launch of Sputnik? Well yes actually of course it could be a coincidence. (Thanks to Henry Farrell for kindly procuring the book for me.)
Posted by crshalizi at October 04, 2010 13:55 | permanent link
Books to Read While the Algae Grow in Your Fur; The Pleasures of Detection; Writing for Antiquity; Enigmas of Chance; Afghanistan and Central Asia; Scientifiction and Fantastica
Posted by crshalizi at September 30, 2010 23:59 | permanent link
Something I have been meaning to post about is a series of papers by Terry Adams and Andrew Nobel, on the intersection of machine learning theory (in the form of Vapnik-Chervonenkis dimension and the Glivenko-Cantelli property) with stochastic processes, specifically ergodic theory. (arxiv:1010.3162; arxiv:1007.2964; arxiv:1007.4037) I am very excited by this work, which I think is extremely important for understanding learning from dependent data, and so very pleased to report that, this week, our seminar speaker is —
Through poor planning on my part, I have a prior and conflicting engagement, though a very worthwhile one.
Posted by crshalizi at September 27, 2010 15:30 | permanent link
My first Ph.D. student, Linqiao Zhao, jointly supervised with Mark Schervish, has just successfully defended her dissertation:
This is the culmination of years of hard work and determination on Linqiao's part. I'm very proud to have helped. Congratulations, Dr. Zhao!
Posted by crshalizi at September 24, 2010 11:30 | permanent link
Attention conservation notice: Only of interest if you are (1) in Pittsburgh on Monday and (2) care about the community discovery problem for networks, or general methods of statistical clustering.
As always, the talk is free and open to the public.
Posted by crshalizi at September 15, 2010 20:15 | permanent link
Posted by crshalizi at September 15, 2010 10:50 | permanent link
Attention conservation notice: Academics squabbling about abstruse points in social theory.
Chris Bertram, back from a conference where he heard Michael Tomasello talk about his interesting experiments on (in Bertram's words) "young children and other primates [supporting the view] that humans are hard-wired with certain pro-social dispositions to inform, help, share etc and to engage in norm-guided behaviour of various kinds", wonders about the implications of the fact that "work in empirical psychology and evolutionary anthropolgy (and related fields) doesn't — quelle surprise! — support anything like the Hobbesian picture of human nature that lurks at the foundations of microeconomics, rational choice theory and, indeed, in much contemporary and historical political philosophy."
Brad DeLong asserts that the microfoundations of economics point not to a Hobbesian vision of the war of all against all, but rather to Adam Smith's propensities for peaceful cooperation, especially through exchange. "The foundation of microeconomics is not the Hobbesian 'this is good for me' but rather the Smithian 'this trade is good for us,' and on the uses and abuses of markets built on top of the 'this trade is good for us' principle." Bertram objects that this isn't true, and others in DeLong's comments section further object that modern economics simply does not rest on this Smithian vision. DeLong replies: "Seems to me the normal education of an economist includes an awful lot about ultimatum games and rule of law these days..."
I have to call this one against DeLong — rather to my surprise, since I usually get more out of his writing than Bertram's. The fact is that the foundations of standard microeconomic models envisage people as hedonistic sociopaths [ETA: see below], and theorists prevent mayhem from breaking out in their models by the simple expedient of ignoring the possibility.
If you open up any good book on welfare economics or general equilibrium which has appeared since Debreu's Theory of Value (or indeed before), you will see a clear specification of what the economic agents care about: this is entirely a function of their own consumption of goods and services. Does any agent in any such model care at all about what any other agent gets to consume? No; it is a matter of purest indifference to them whether their fellows experience feast or famine; even whether they live or die. If one such agent has an unsatiated demand for potato chips, and the cost of one more chip will be to devastate innumerable millions, they simply are not equipped to care. (And the principle of Pareto optimality shrugs, saying "who are we to judge?") Arrow, Debreu and co. rule out by hypothesis any interaction between agents other than impersonal market exchange [ETA: or more exactly, their model does so], but the specification of the agents shows that they'd have no objection to pillage, or any preference for obtaining their consumption basket by peaceful truck, barter and commerce rather than fire, sword and fear.
Well, you might say, welfare economics and general equilibrium concern themselves with what happens once peaceful market systems have been established. Of course they don't need to put a "pillaging, not really my thing" term in the utility functions, since it would never come up. Surely things are better in game theory, which has long been seen to be the real microfoundations for economics?
In a word, no. If you ask why a von Neumann-Morgenstern agent refrains from pillaging, you get the answers that (1) the game is postulated not to have pillaging as an option, or (2) he is restrained by fear of some power stronger than himself, whether that power be an individual or an assembly. (Thus von Neumann: "It is just as foolish to complain that people are selfish and treacherous as it is to complain that the magnetic field does not increase unless the electric field has a curl.") Option (1) being obviously irrelevant to explaining why people obey the law, etc., we are left with option (2), which is the essence of all the leading attempts, within economics, to give microfoundations to such phenomena. This is very much in line with the thought of an eminent British moral philosopher — one can read the Folk Theorem as saying that Leviathan could be a distributed system — but that philosopher is not Dr. Smith.
One can defend the utility of the Hobbesian, game-theoretic vision, and though in my humble (and long-standing) opinion the empirical results on things like the ultimatum game mean that it can be no more than an approximation useful in certain circumstances, and ideas like those of Tomasello (and Smith) need to be taken very seriously. But of course those ideas are not part of the generally-accepted microfoundations of economics. This is why every graduate student in economics reads (something equivalent to) Varian's Microeconomic Analysis, but not Bowles's Microeconomics: Behavior, Institutions, and Evolution; would that they did. If you read Bowles, you will in fact learn a great deal about the ultimatum game, the rule of law, and so forth; in a standard microeconomics text you will not. I think the Hobbesian vision is wrong, but anyone who thinks that modern economics's micro-foundations aren't thoroughly Hobbesian is engaged in wishful thinking.
Update, 15 September: A reader observes, correctly, that actual sociopaths show much more other-regarding preferences than does Homo economicus (typically, forms of cruelty). I could quibble and gesture to dissocial personality disorder, but point taken.
Update, 24 December: In the comments at DeLong's, Robert Waldmann rightly chides me for conflating the actual social views of Arrow and Debreu with what they put into their model of general equilibrium. I have updated the text accordingly.
Manual trackback: Stephen Kinsella; Marginal Utility; Marc Kaufmann; Brad DeLong; Contingencies
*: Varian wrote a book, with Carl Shapiro, giving advice to businesses in industries with imperfect competition. The advice is to (1) extract as much as possible from the customer, to the point where they just barely prefer doing business with you to switching to a competitor or taking their marbles and going home, (2) disguise how much you will extract from your customers as much as possible, (3) participate in standards-setting and public policy formation, so as to ensure that the standards and policies will be to your commercial advantage as much as possible, and (4) generally engage in as much anti-competitive behavior as possible without risk of legal consequences. All this may in fact be sound advice for increasing the (more or less short-run) profits of such firms, but the premises are purely Hobbesian. Were there no risk of legal consequences, their arguments would extend straightforwardly to pillaging. The only reason Shapiro and Varian would counsel Apple against, say, running a phishing scam on everyone who bought a Macintosh would be that it was very likely they'd be caught, with adverse consequences; obviously if Apple made enough money from such a scam, Shaprio and Varian's arguments would say not only "go phish", but "lobby to make such phishing legal" (perhaps under the principle of caveat emptor).
The Dismal Science; Philosophy; The Natural Science of the Human Species
Posted by crshalizi at September 13, 2010 13:00 | permanent link
Posted by crshalizi at September 08, 2010 13:20 | permanent link
As threatened, I'll post links to the paper being discussed each week in the statistical modeling seminar. This will happen after the discussion, and my own brief comments here will also not be shared with the students beforehand. This should be an RSS feed for this page.
Posted by crshalizi at September 07, 2010 16:49 | permanent link
[Capitalism] has created more massive and more colossal productive forces than have all preceding generations together. Subjection of Nature's forces to man, machinery, application of chemistry to industry and agriculture, steam-navigation, railways, electric telegraphs, clearing of whole continents for cultivation, canalisation of rivers, whole populations conjured out of the ground — what earlier century had even a presentiment that such productive forces slumbered in the lap of social labour?
Ghosts in the Hollow from Jim Lo Scalzo on Vimeo.
Posted by crshalizi at September 06, 2010 19:30 | permanent link
Attention conservation notice: Yet more cleaning out of to-be-blogged bookmarks, with links of a more technical nature than last time. Contains log-rolling promotion of work by friends, acquaintances, and senior colleagues.
Wolfgang Beirl raises an interesting question in statistical mechanics: what is " the current state-of-the-art if one needs to distinguish a weak 1st order phase transition from a 2nd order transition with lattice simulations?" (This is presumably unrelated to Wolfgang's diabolical puzzle-picture.)
Maxim Raginsky's new blog, The Information Structuralist. Jon Wilkin's new blog, Lost in Transcription. Jennifer Jacquet's long-running blog, Guilty Planet.
Larry Wasserman has started a new wiki for inequalities in statistics and machine learning; I contributed an entry on Markov's inequality. Relatedly: Larry's lecture notes for intermediate statistics, starting with Vapnik-Chervonenkis theory. (It really does make more sense that way.)
Sharad Goel on birds of a feather shopping together, on the basis of data set that sounds really quite incredible. "It's perhaps tempting to conclude from these results that shopping is contagious .... Though there is probably some truth to that claim, establishing such is neither our objective nor justified from our analysis." (Thank you!)
Mark Liberman on the Wason selection test. There is I feel something quite deep here for ideas that connect the meaning of words to their use, or, more operationally, test whether someone understands a concept by their ability to use it; but I'm not feeling equal to articulating this.
What it's like being a bipolar writer. What it's like being a schizophrenic neuroscientist (the latter via Mind Hacks).
The Phantom of Heilbronn, in which the combined police forces of Europe spend years chasing a female serial killer, known solely from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier. Draw your own morals for data mining and the national surveillance state. (Via arsyed on delicious.)
Herbert Simon and Paul Samuelson take turns, back in 1962 beating up on Milton Friedman's "Methodology of Positive Economics", an essay whose exquisite awfulness is matched only by its malign influence. (This is a very large scan of a xerox copy, from the CMU library's online collection of Simon's personal files.) Back in July, Robert Solow testified before Congress on "Building a Science of Economics for the Real World" (via Daniel McDonald). To put it in "shorter Solow" form: I helped invent macroeconomics, and let me assure you that this was not what we had in mind. Related, James Morley on DSGEs (via Brad DeLong).
This brings us to the paper-link-dump portion of the program.
And now, back to work.
Manual trackback: Beyond Microfoundations
Linkage; Enigmas of Chance; The Dismal Science; Minds, Brains, and Neurons; Physics; Networks; Commit a Social Science; Incestuous Amplification
Posted by crshalizi at September 04, 2010 11:05 | permanent link
Attention conservation notice: 1600+ dry, pedantic words and multiple equations on how some heterodox economists mis-understand ergodic theory.
Robert Vienneau, at Thoughts on Economics, has posted an example of a stationary but non-ergodic stochastic process. This serves as a reasonable prompt to follow up on my comment, a propos of Yves Smith's book, that the post-Keynesian school of economists seems to be laboring under a number of confusions about "ergodicity".
I hasten to add that there is nothing wrong with Vienneau's example: it is indeed a stationary but non-ergodic process. (In what follows, I have lightly tweaked his notation to suit my own tastes.) Time is indexed in discrete steps, and Xt = YZt, where Z is a sequence of independent, mean-zero, variance 1 Gaussian random variables (i.e., standard discrete-time white noise), and Y is a chi-distributed random variable (i.e., the square root of something which has a chi-squared distribution). Z is transparently a stationary process, and Y is constant over time, so X must also be a stationary process. However, by simulation Vienneau shows that the empirical cumulative distribution functions from different realizations of the process do not converge on a common limit.
In fact, the result can be strengthened considerably. Given Y = y, X is just Gaussian white noise with standard deviation y, so by the Glivenko-Cantelli theorem, the empirical CDF of X converges almost surely on the CDF of that Gaussian. The marginal distribution of Xt for each t is however a mixture of Gaussians of different standard deviations, and not a Gaussian. Conditionally on Y, therefore, the empirical CDF converges to the marginal distribution of the stationary process with probability 0. Since this convergence has conditional probability zero for every value of y, it has probability zero unconditionally as well. So Vienneau's process very definitely fails to be ergodic.
(Proof of the unconditionality claim:
Let C be the indicator variable for the empirical CDF converging to the
marginal distribution.
for all y, but
by the law of total expectation.)
Two things, however, are worth noticing. First, Vienneau's X process is a mixture of ergodic processes; second, which mixture component is sampled from is set once, at the beginning, and thereafter each sample path looks like a perfectly well-behaved realization of an ergodic process. These observations generalize. The ergodic decomposition theorem (versions of which go back as far as von Neumann's original work on ergodic theory) states that every stationary process is a mixture of processes which are both stationary and ergodic. Moreover, which ergodic component a sample path is in is an invariant of the motion — there is no mixing of ergodic processes within a realization. It's worth taking a moment, perhaps, to hand-wave about this.
Start with the actual definition of ergodic processes. Ergodicity is a property of the probability distribution for whole infinite sequences X = (X1, X2, ... Xt, ... ). As time advances, the dynamics chop off the initial parts of this sequence of random variables. Some sets of sequences are invariant under such "shifts" — constant sequences, for instance, but also many other more complicated sets. A stochastic process is ergodic when all invariant sets either have probability zero or probability one. What this means is that (almost) all trajectories generated by an ergodic process belong to a single invariant set, and they all wander from every part of that set to every other part — they are "metrically transitive". (Because: no smaller set with any probability is invariant.) From this follows Birkhoff's individual ergodic theorem, which is the basic strong law of large numbers for dependent data. If X is an ergodic process, then for any (integrable) function f, the average of f(Xt) along a sample path, the "time average" of f, converges to a unique value almost surely. So with probability 1, time averages converge to values characteristic of the ergodic process.
Now go beyond a single ergodic probability distribution. Two distributions
are called "mutually singular" if one of them gives probability 1 to an event
which has probability zero according to the other, and vice versa. Any two
ergodic processes are either identical or mutually singular. To see this,
realize that two distributions must give different expectation values to at
least one function; otherwise they're the same distribution. Pick
such a distinguishing function and call it f, with expectation
values f1 and f2 under the two
distributions. Well, the set of sample paths where
has probability 1 under the first measure, and probability 0 under the second.
Likewise, under the second measure the time average is almost certain to
converge on f2, which almost never happens under the first
measure. So any two ergodic measures are mutually singular.
This means that a mixture of two (or more) ergodic processes cannot, itself, be ergodic. But a mixture of stationary processes is stationary. So the stationary ergodic processes are "extremal points" in the set of all stationary processes. The convex hull of these extremal points are the set of stationary but non-ergodic processes which can be obtained by mixing stationary and ergodic processes. It is less trivial to show that every stationary process belongs to this family, that it is a mixture of stationary and ergodic processes, but this can indeed be done. (See, for instance, this beautiful paper by Dynkin.) Part of the proof shows that which ergodic component a stationary process's sample path is in does not change over time — ergodic components are themselves invariant sets of trajectories. The general form of Birkhoff's theorem thus has time averages converging to a random limit, which depends on the ergodic component the process started in. This can be shown even at the advanced undergraduate level, as in Grimmett and Stirzaker.
At this point, three notes seem in order.
I actually don't know whether the ergodic decomposition can extend beyond
this, but I suspect not, since the defining condition for AMS is very close to
a Cesaro-mean decay-of-dependence property which turns out to be equivalent to
ergodicity, namely that, for any two sets A and B
where T-t are the powers of the back-shift operator
(what time series econometricians usually write L), so
that T-tB are all the trajectories which will
be in the set B in t time-steps. (See Lemma 6.7.4 in the first,
online, edition, of Gray, p. 148). This means that, on average, the far future
becomes unpredictable from the present.
As the last remark suggests, it is entirely possible for a process to be stationary and ergodic but to have sensitive dependence on initial conditions; this is generally the case for chaotic processes, which is why there are classic articles with titles like "The Ergodic Theory of Chaos and Strange Attractors". Chaotic systems rapidly amplify small perturbations, at least along certain directions, so they are subject to positive destabilizing feedbacks, but they have stable long-run statistical properties.
Going further, consider the sort of self-reinforcing urn processes which Brian Arthur and collaborators made famous as models of lock-in and path dependence. (Actually, in the classification of my old boss Scott Page, these models are merely state-dependent, and do not rise to the level of path dependence, or even of phat dependence, but that's another story.) These are non-stationary, but it is easily checked that, so long as the asymptotic response function has only a finite number of stable fixed points, they satisfy the definition of asymptotic mean stationarity given above. (I leave it as an exercise whether this remains true in a case like the original Polya urn model.) Hence they are mixtures of ergodic processes. Moreover, if we have only a single realization — a unique historical trajectory — then we have something which looks just like a sample path of an ergodic process, because it is one. ("[L]imiting sample averages will behave as if they were in fact produced by a stationary and ergodic system" — Gray, p. 235 of 2nd edition.) That this was just one component of a larger, non-ergodic model limits our ability to extrapolate to other components, unless we make strong modeling assumptions about how the components relate to each other, but so what?
I make a fuss about this because the post-Keynesians seem to have fallen into a number of definite errors here. (One may see these errors in e.g., Crotty's "Are Keynesian Uncertainty and Macrotheory Compatible?" [PDF], which however also has insightful things to say about conventions and institutions as devices for managing uncertainty.) It is not true that non-stationarity is a sufficient condition for non-ergodicity; nor is it a necessary one. It is not true that "positive destabilizing feedback" implies non-ergodicity. It is not true that ergodicity is incompatible with sensitive dependence on initial conditions. It is not true that ergodicity rules out path-dependence, at least not the canonical form of it exhibited by Arthur's models.
Update, 12 September: Fixed the embarrassing mis-spelling of Robert's family name in my title.
Manual trackback: Robert Vienneau; Beyond Microfoundations
Posted by crshalizi at September 01, 2010 11:50 | permanent link
The admirable Mason Porter, responding to a universal and critical demand, has started the Power Law Shop, celebrating my very favorite class of probability distributions in all the world. This is certainly the funniest thing to come out of the SAMSI complex networks workshop.
Manual trackback: The Monkey Cage; Structure and Strangeness; Quantum Chaotic Thoughts; Science after Sunclipse
Posted by crshalizi at September 01, 2010 10:10 | permanent link
Attention conservation notice: Clearing out my to-blog folder, limiting myself to stuff which isn't too technical and/or depressing.
The late Charles Tilly was, it appears, working on a world history of cities, states and trust networks when he died. The first chapter is online (open access), and makes me really regret that we'll never see the rest. It includes a truly marvelous depiction of the rise of the Mongol Empire, from Marco Polo:
Some time after the migration of the Tartars to [Karakorum], and about the year of our lord 1162, they proceeded to elect for their king a man who was named Chingis-khan, one of approved integrity, great wisdom, commanding eloquence, and eminent for his valour. He began his reign with so much justice and moderation, that he was beloved and revered as their deity rather than their sovereign; and the fame of his great and good qualities spreading over that part of the world, all the Tartars, however dispersed, placed themselves under his command. Finding himself thus at the head of so many brave men, he became ambitious of emerging from the deserts and wildernesses by which he was surrounded, and gave them orders to equip themselves with bows, and such other weapons as they were expert at using, from the habits of their pastoral life. He then proceeded to render himself master of cities and provinces; and such was the effect produced by his character for justice and other virtues, that wherever he went, he found the people disposed to submit to him, and to esteem themselves happy when admitted to his protection and favour.
John Emerson has a slightly different explanation: the culmination of a thousand years of increasingly sophisticated military rivalry in central Eurasia.
My hypothesis is that, for the last several decades during the twelfth century, northern China, Karakitai, the Silk Road between them, and the Mongolian and Manchurian hinterlands served as a pressure cooker or laboratory where strategy, tactics, and military organization were perfected during a period of constant warfare. The Jin Chinese fought against the Song Chinese and sometimes the Xixia or the Mongols, the Xixia fought against the Jin and the Mongols, the Mongols fought with the other two and with each other, and because they were busy with one another they put little pressure on the Karakitai farther west, who were able to concentrate on maintaining their hegemony in Central Asia.The states in this zone (and the non-state Mongols) hardened up and improved their discipline, organization and skills during decades of practice wars, so that when Genghis Khan finally united the steppe, subjugated the Xixia, and neutralized the Jin (in part because Jin forces had been deserting to the Mongols), he had essentially won the military championship of the toughest league in the world, so that every army he met from then until the Mamluks in Egypt would be far inferior to his. When Genghis Khan gained control of this military high pressure zone, there was no one who could stop him. Furthermore, once Genghis Khan controlled a plurality of the steppe, there was a snowball effect when most of the remaining steppe peoples not allied to his enemies joined him (semi-voluntarily — the alternative was destruction).
Also from Emerson, a selection of Byzantine anecdotes. They really don't make political slanders like they used to, despite some people's best efforts.
Rajiv Sethi ponders The Astonishing Voice of Albert Hirschman; Steve Laniel reviews Exit, Voice, and Loyalty. As an application, consider the plight of would-be refugees from Facebook.
John Dewey writing on economics, economic policy and the financial collapse in 1932, under the rubric of "The Collapse of a Romance" (cached copy). Here Dewey sounds almost Austrian on the connection between uncertainty and the capitalist process — and accordingly condemns the latter as sheer gambling. (Cf.) This line was particularly nice: "Human imagination had never before conceived anything so fantastic as the idea that every individual is actuated in all his desires by an insight into just what is good for him, and that he is equipped with the sure foresight which will enable him to calculate ahead and get just what he is after."
Relatedly, my friend Chris Wiggins observed struggling to save at-risk youth.
Ken MacLeod on Apophatic atheology.
Fifteenth Century Peasant Romance Comics. (Hark, a Vagrant is generally a treasure.)
Ta-Nehisi Coates schools the Freakonomics crowd in the concept of "sample selection bias".
Kalashnikov wanted to be a poet; but war was interested in him.
A visual history of lolcats since the 1800s.
Jordan Ellenberg on math in the age of Romanticism.
Becoming death, destroyer of mosquito worlds. How termites evolved from cockroach-like insects (not to be read while eating).
"This is why I'll never be an adult" is scarily perceptive --- "Internet FOREVER!", indeed (via unfogged). While on the subject of moral psychology, how to keep someone with you forever (via Edge of the American West).
Cool data-mining tricks for academic libraries. Via Magistra et Mater, seen elsewhere connecting Carolingian texts and social media.
Canadian engineers are much stranger than you'd think.
Oleg Grabar on the history of images of Muhammad in Islamicate culture (via Laila Lalami).
Akhond of Swat on "Ideas of India" and The Reading Life of Gandhi, Ambedkar and Nehru.
Southern literature, objectively defined and measured by Jerry Leath Mills:
My survey of around thirty prominent twentieth-century southern authors has led me to conclude, without fear of refutation, that there is indeed a single, simple, litmus-like test for the quality of southernness in literature, one easily formulated into a question to be asked of any literary text and whose answer may be taken as definitive, delimiting, and final. The test is: Is there a dead mule in it? As we shall see, the presence of one or more specimens of Equus caballus x asinus (defunctus) constitutes the truly catalytic element, the straw that stirs the strong and heady julep of literary tradition in the American South.
Jessa Crispin on the pleasures of reading about polar travel, while nowhere near the poles.
"Having a world unfold in one's head is the fundamental SF experience." (Pretty much everything Jo Walton writes is worth reading.)
Bruce Sterling on zombie romance: "Paranormal Romance is a tremendous, bosom-heaving, Harry-Potter-sized, Twilight-shaped commercial success. It sorta says everything about modern gender relations that the men have to be supernatural. It also says everything about humanity that we're so methodically training ourselves to be intimate partners of entities that aren't human."
The Demon-haunted world, or, the past and future of practical city magic.
Manual trackback: The Monkey Cage
Update, 4 September: fixed typos and accidentally-omitted link.
Linkage; Writing for Antiquity; The Commonwealth of Letters; Afghanistan and Central Asia; Scientifiction and Fantastica
Posted by crshalizi at September 01, 2010 09:50 | permanent link
Books to Read While the Algae Grow in Your Fur; The Pleasures of Detection; Commit a Social Science; The Dismal Science; Biology; Mathematics; Scientifiction and Fantastica
Posted by crshalizi at August 31, 2010 23:59 | permanent link
Once again, the Santa Fe Institute is hiring post-docs. Once again, for sheer concentrated intellectual stimulation — not to mention views like this from your office window — there is no better position for an independent-minded young scientist with interdisciplinary interests. The official announcement follows:
The Omidyar Postdoctoral Fellowship at the Santa Fe Institute offers you:The Omidyar Fellowship at the Santa Fe Institute is unique among postdoctoral appointments. The Institute has no formal programs or departments. Research is collaborative and spans the physical, natural, and social sciences. Most research is theoretical and/or computational in nature, although it may include an empirical component. SFI typically has 15 Omidyar Fellows and postdoctoral researchers, 15 resident faculty, 95 external faculty, and 250 visitors per year. Descriptions of the research themes and interests of the faculty and current Fellows can be found at http://www.santafe.edu/research. Requirements:
- unparalleled intellectual freedom
- transdisciplinary collaboration with leading researchers worldwide
- up to three years in residence in Santa Fe, NM
- discretionary research and collaboration funds
- individualized mentorship and preparation for your next leadership role
- an intimate, creative work environment with an expansive sky
Applications are welcome from:
- a Ph.D. in any discipline (or expect to receive one by September 2011)
- an exemplary academic record
- a proven ability to work independently and collaboratively
- a demonstrated interest in multidisciplinary research
- evidence of the ability to think outside traditional paradigms
The Santa Fe Institute is an Equal Opportunity Employer.
- candidates from any country
- candidates from any discipline
- women and minorities, as they are especially encouraged to apply.
Deadline: 1 November 2010
To apply: www.santafe.edu We accept online applications ONLY.
Inquiries: email to ofellowshipinfo at santafe dot eduThe Santa Fe Institute is a private, independent, multidisciplinary research and education center founded in 1984. Since its founding, SFI has devoted itself to creating a new kind of scientific research community, pursuing emerging synthesis in science. Operating as a visiting institution, SFI seeks to catalyze new collaborative, multidisciplinary research; to break down the barriers between the traditional disciplines; to spread its ideas and methodologies to other institutions; and to encourage the practical application of its results.
The Omidyar Fellowship at the Santa Fe Institute is made possible by a generous gift from Pam and Pierre Omidyar.
Posted by crshalizi at August 31, 2010 20:00 | permanent link
The students are just starting on their projects, so, rather than say anything of substance, I try to extract the rational kernel from the traditional shell of the practices by which our cultural formation strives to reproduce itself. (Background.)
Posted by crshalizi at August 26, 2010 12:08 | permanent link
Every human relationship is a unique and precious snowflake, but do we treat them that way when we model them mathematically? No. No we do not. Join us next week to hear not just why this is wrong, but what to do instead. As always, the seminar is free and open to the public.
Let me add that Prof. Blitzstein will be visiting us from the Bible college of a prophecy-obsessed, theocratic Puritan cult clinging to the rudiments of civilization in a plague-blasted post-apocalyptic wasteland*, so I expect a good turn out to show him how we do these things around here.
*: No, really.
Manual trackback (!): The Inverse Square
Posted by crshalizi at August 19, 2010 16:30 | permanent link
I will not be teaching data mining this fall; 36-350 is being taken over this year by my friend and mentor Chris Genovese. Instead, I will be teaching 36-757 (if you'd be interested, you're already in it*), and co-teaching 36-835 with Rob Kass. Here's the announcement for the latter:
If there's interest, I'll post the reading list. Our first paper will definitely be Breiman's "Statistical Modeling: The Two Cultures" (Statistical Science 16 (2001): 199--231).
Update, 26 August: handouts for 757, which may be of broader interest.
Update, 7 September: There was interest in the 835 reading list.
*: This is the first half of "advanced data analysis", a year-long project our doctoral students do on analyzing data provided by an outside investigator, under the supervision of a faculty member. ADA culminates in the student presenting their findings in written and oral form, which serves as one of their three qualifying exams. The goal is to solve genuine scientific questions, not (or not just) to use the most shiny methodological toys. If you have some real-world data which need to be analyzed, and which seem like they might benefit from the attention of a very smart statistics graduate student, please get in touch. (I promise nothing.)
Posted by crshalizi at August 17, 2010 14:57 | permanent link
Books to Read While the Algae Grow in Your Fur; Writing for Antiquity; Afghanistan and Central Asia; Scientifiction and Fantastica; The Pleasures of Detection; The Beloved Republic; Enigmas of Chance
Posted by crshalizi at July 31, 2010 23:59 | permanent link
Attention conservation notice: 500 words on a student's thesis proposal, combining all the thrills of macroeconomic forecasting with the stylish vivacity of statistical learning theory. Even if you care, why not check back in a few years when the work is further along?
Daniel McDonald is writing his thesis, under the joint supervision of Mark Schervish and myself. I can use the present participle, because on Thursday he successfully defended his proposal:
Some of you may prefer the slides (note that Daniel is using DeLong's reduction of DSGEs to D2 normal form), or an even more compact visual summary:
Most macroeconomic forecasting models are, or can be turned into, "state-space models". There's an underlying state variable or variables, which evolves according to a nice Markov process, and then what we actually measure is a noisy function of the state; given the current state, future states and current observations are independent. (Some people like to draw a distinction between "state-space models" and "hidden Markov models", but I've never seen why.) The calculations can be hairy, especially once you allow for nonlinearities, but one can show that, asymptotically, maximum likelihood estimation, as well as various regularizations, have all the nice asymptotic properties one could want.
Asymptotic statistical theory is, of course, useless for macroeconomics. Or rather: if our methods weren't consistent even with infinite data, we'd know we should just give up. But if the methods only begin to give usably precise answers when the number of data points gets over 1024, we should give up too. Knowing that things could work with infinite data doesn't help when we really have 252 data points, and serial dependence shrinks the effective sample size to about 12 or 15. The wonderful thing about modern statistical learning theory is that it gives non-asymptotic results, especially risk bounds that hold at finite sample sizes. This is, of course, the reason why ergodic theorems, and the correlation time of US GDP growth rates, have been on my mind recently. In particular, this is why we are thinking about ergodic theorems which give not just finite-sample bounds (like the toy theorem I posted about), but can be made to do so uniformly over whole classes of functions, e.g., the loss functions of different macro forecasting models and their parameterizations.
Anyone wanting to know how to deal with non-stationarity is reminded that Daniel is proposing a dissertation in statistics, and not a solution to the problem of induction.
Enigmas of Chance; The Dismal Science; Incestuous Amplification
Posted by crshalizi at July 26, 2010 15:30 | permanent link
Attention conservation notice: A consideration of social banditry as a tool of climate-change policy. Sadly, this mockery apparently has about as much chance of actually helping as does action by the world's leading democracy.
Only on Unfogged would the comments on a post about visual penis jokes turn to a discussion of what, if anything, civil disobedience could do about climate change; but they did.
One of the goals of classic civil disobedience is to make maintaining an unjust institution costly, though I'm not sure how often it is put in these terms. Ordinarily, those who are disadvantaged or subordinated by a prevailing institution go along with it, they follow its norms and conventions without having to be forced. — whether because they accept those norms, or because they reasonably fear the retaliation that would come if they flouted them makes little difference. This makes maintaining the injustice a good deal for the oppressors: not only do they get the immediate benefits of the institution, they don't have to expend a lot of effort maintaining it. Mass civil disobedience disrupts this state of affairs. Even if the oppressors can live with the evidence of seeing that they are, in fact, the kind of people who will engage in brutality to retain their privileges, the time policemen spend working over Sunday-school teachers, etc., is time they do not spend patrolling the streets, catching burglars, etc. Mass civil disobedience, especially if prolonged, raises the cost of perpetuating injustice. The implicit challenge to Pharaoh is: "Are you really willing to pay what it takes to keep us in bondage?"
What does this suggest when it comes to climate change? Burning fossil fuels is not an act with any intrinsic moral significance. The trouble with it is that my burning those fuels inflicts costs on everyone else, and there is no mechanism, yet, for bringing those costs home to me, the burner. The issue is not one of unjust institutions, but of an unpriced externality. The corresponding direct action, therefore, is not making oppressors actually enforce their institutions, but internalizing the externality. I envisage people descending on oil refineries, coal mines, etc., and forcing the operators to hand over sums proportional to the greenhouse-gas contribution of their sales. What happened to the money afterwards would be a secondary consideration at best (though I wouldn't recommend setting it on fire). The situation calls not for civil disobedience but for social carbon banditry.
Of course, to really be effective, the banditry would need to be persistent, universal, and uniform. Which is to say, the banditry has to become a form of government again, if not necessarily a part of the state.
Posted by crshalizi at July 26, 2010 14:30 | permanent link
Attention conservation notice: Only of interest if you are (1) in Pittsburgh next Tuesday, and (2) care about statistical network modeling and community discovery. Also, the guest is a friend, collaborator and mentor; but, despite his undiscriminating taste in acquaintances, an excellent speaker and scientist.
Usually, during the summer the CMU statistics seminar finds a shaded corner and drowses through the heat, with no more activity than an occasional twitch of its tail. Next week, however, it rouses itself for an exceptional visitor:
As usual, the seminar is free and open to the public.
Posted by crshalizi at July 09, 2010 14:33 | permanent link
"They'd ask me, 'Raf, what abut this Revolution of yours? What kind of world are you really trying to give us?' I've had a long time to consider that question.""And?"
"Did you ever hear the Jimi Hendrix Rendition of 'The Star-Spangled Banner'?"
Starlitz blinked. "Are you kidding? That cut still moves major product off the back catalog."
"Next time, really listen to that piece of music. Try to imagine a country where that music truly was the national anthem. Not weird, not far-out, not hip, not a parody, not a protest against some war, not for young Yankees stoned on some stupid farm in New York. Where music like that was social reality. That is how I want people to live...."
[Bruce Sterling, A Good Old-Fashioned Future, pp. 104--105]
"I wasn't born in America. In point of fact, I wasn't even born. But I work for our government because I believe in America. I happen to believe that this is a unique society. We have a unique role in the world."Oscar whacked the lab table with an open hand. "We invented the future! We built it! And if they could design or market it a little better than we could, then we just invented something else more amazing yet. If it took imagination, we always had that. If it took enterprise, we always had it. If it took daring and even ruthlessness, we had it — we not only built the atomic bomb, we used it! We're not some crowd of pious, sniveling, red-green Europeans trying to make the world safe for boutiques! We're not some swarm of Confucian social engineers who would love to watch the masses chop cotton for the next two millennia! We are a nation of hands-on cosmic mechanics!"
"And yet we're broke," Greta said.
[Bruce Sterling, Distraction, p. 90]
Posted by crshalizi at July 03, 2010 22:30 | permanent link
Attention conservation notice: Equation-filled attempt at a teaching note on some theorems in mathematical probability and their statistical application. (Plus an oblique swipe at macroeconomists.)
The "law of large numbers" says that averages of measurements calculated over increasingly large random samples converge on the averages calculated over the whole probability distribution; since that's a vague statement, there are actually several laws of large numbers, from the various ways of making this precise. As traditionally stated, they assume that the measurements are all independent of each other. Successive observations from a dynamical system or stochastic process are generally dependent on each other, so the laws of large numbers don't, strictly, apply, but they have analogs, called "ergodic theorems". (Blame Boltzmann.) Laws of large numbers and ergodic theorems are the foundations of statistics; they say that sufficiently large samples are representative of the underlying process, and so let us generalize from training data to future or currently-unobserved occurrences.
Here is the simplest route I know to such a theorem; I can't remember if I learned it from Prof. A. V. Chubukov's statistical mechanics class, or from Uriel Frisch's marvellous Turbulence. Start with a sequence of random variables X1, X2, ... Xn. Assume that they all have the same (finite) mean m and the same (finite) variance v; also assume that the covariance, E[XtXt+h] - E[Xt] E[Xt+h], depends only on the difference in times h and not on the starting time t. (These assumptions together comprise "second-order" or "weak" or "wide-sense" stationarity. Stationarity is not actually needed for ergodic theorems, one can get away with what's called "asymptotic mean stationarity", but stationarity simplifies the presentation here.) Call this covariance ch. We contemplate the arithmetic mean of the first n values in X, called the "time average":
What is the expectation value of the time average? Taking expectations is a linear operator, so
The most obvious sense we could try is for the variance of An to shrink as n grows. Let's work out that variance, remembering that for any random variable Y, Var[Y] = E[Y2] - (E[Y])2.
This used the linearity of expectations, and the definition of the covariances ch. Imagine that we write out all the covariances in an n*n matrix, and average them together; that's the variance of An. The entries on the diagonal of the matrix are all c0 = v, and the off-diagonal entries are symmetric, because (check this!) c-h = ch. So the sum over the whole matrix is the sum on the diagonal, plus twice the sum of what's above the diagonal.
If the Xt were uncorrelated, we'd have ch = 0 for all h > 0, so the variance of the time average would be O(n-1). Since independent random variables are necessarily uncorrelated (but not vice versa), we have just recovered a form of the law of large numbers for independent data. How can we make the remaining part, the sum over the upper triangle of the covariance matrix, go to zero as well?
We need to recognize that it won't automatically do so. The assumptions we've made so far are compatible with a process where X1 is chosen randomly, and then all subsequent observations are copies of it, so that then the variance of the time average is v, no matter how long the time series; this is the famous problem of checking a newspaper story by reading another copy of the same paper. (More formally, in this situation ch = v for all h, and you can check that plugging this in to the equations above gives v for variance of An for all n.) So if we want an ergodic theorem, we will have to impose some assumption on the covariances, one weaker than "they are all zero" but strong enough to exclude the sequence of identical copies.
Using two inequalities to put upper bounds on the variance of the time average suggests a natural and useful assumption which will give us our ergodic theorem.
Returning to the variance of the time average,
From knowing the variance, we can get rather tight bounds on the probability of An's deviations from m if we assume that the fluctuations are Gaussian. Unfortunately, none of our assumptions so far entitle us to assume that. For independent data, we get Gaussian fluctuations of averages via the central limit theorem, and these results, too, can be extended to dependent data. But the assumptions needed for dependent central limit theorems are much stronger than merely a finite correlation time. What needs to happen, roughly speaking, is that if I take (nearly) arbitrary functions f and g, the correlation between f(Xt) and g(Xt+h) must go to zero as h grows. (This idea is quantified as "mixing" or "weak dependence".)
However, even without the Gaussian assumption, we can put some bounds on deviation probabilities by bounding the variance (as we have) and using Chebyshev's inequality:
Reverting to the case of finite correlation time T, observe that we have the same variance from n dependent samples as we would from n/(1+2T) independent ones. One way to think of this is that the dependence shrinks the effective sample size by a factor of 2T+1. Another, which is related to the name "correlation time", is to imagine dividing the time series up into blocks of that length, i.e., a central point and its T neighbors in either direction, and use only the central points in our averages. Those are, in a sense, effectively uncorrelated. Non-trivial correlations extend about T time-steps in either direction. Knowing T can be very important in figuring out how much actual information is contained in your data set.
To give an illustration not entirely at random, quantitative macroeconomic modeling is usually based on official statistics, like GDP, which come out quarterly. For the US, which is the main but not exclusive focus of these efforts, the data effectively start in 1947, as what national income accounts exist before then are generally thought too noisy to use. Taking the GDP growth rate series from 1947 to the beginning of 2010, 252 quarters in all, de-trending, I calculate a correlation time of just over ten quarters. (This granting the economists their usual, but absurd, assumption that economic fluctuations are stationary.) So macroeconomic modelers effectively have 11 or 12 independent data points to argue over.
Constructively, this idea leads to the mathematical trick of "blocking". To extend a result about independent random sequences to dependent ones, divide the dependent sequence up into contiguous blocks, but with gaps between them, long enough that the blocks are nearly independent of each other. One then has the IID result for the blocks, plus a correction which depends on how much residual dependence remains despite the filler. Picking an appropriate combination of block length and spacing between blocks keeps the correction small, or at least controllable. This idea is used extensively in ergodic theory (including the simplest possible proof of the strong ergodic theorem) and information theory (see Almost None again), in proving convergence results for weakly dependent processes, in bootstrapping time series, and in statistical learning theory under dependence.
Manual trackback: An Ergodic Walk (fittingly enough); Thoughts on Economics
Update, 7 August: Fixed typos in equations.
Posted by crshalizi at July 02, 2010 13:40 | permanent link
The last post was really negative; to cleanse the palate, look at the Sloth Sanctuary of Costa Rica, dedicated to rescuing orphaned and imperiled sloths.
(Via Environmental Grafitti, via Matthew Berryman, and with thanks to John Emerson)
Posted by crshalizi at July 01, 2010 14:40 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Minds, Brains, and Neurons; The Continuing Crises Writing for Antiquity; The Dismal Science; The Commonwealth of Letters; Learned Folly
Posted by crshalizi at June 30, 2010 23:59 | permanent link
Attention conservation notice: Over 2500 words on how a psychologist who claimed to revolutionize aesthetics and art history would have failed undergrad statistics. With graphs, equations, heavy sarcasm, and long quotations from works of intellectual history. Are there no poems you could be reading, no music you could be listening to?
I feel I should elaborate my dismissal of Martindale's The Clockwork Muse beyond a mere contemptuous snarl.
The core of Martindale's theory is this. Artists, and still more consumers of art, demand novelty; they don't just want the same old thing. (They have the same old thing.) Yet there is also a demand, or a requirement, to stay within the bounds of a style. Combining this with a notion that coming up with novel ideas and images requires "regressing" to "primordial" modes of thought, he concludes
Each artist or poet must regress further in search of usable combinations of ideas or images not already used by his or her predecessors. We should expect the increasing remoteness or strangeness of similes, metaphors, images, and so on to be accompanied by content reflecting the increasingly deeper regression toward primordial cognition required to produce them. Across the time a given style is in effect, we should expect works of art to have content that becomes increasingly more and more dreamlike, unrealistic, and bizarre.Eventually, a turning point to this movement toward primordial thought during inspiration will be reached. At that time, increases in novelty would be more profitably attained by decreasing elaboration — by loosening the stylistic rules that govern the production of art works — than by attempts at deeper regression. This turning point corresponds to a major stylistic change. ... Thus, amount of primordial content should decline when stylistic change occurs. [pp. 61--64, his emphasis; the big gap corresponds to some pages of illustrations, and not me leaving out a lot of qualifying text]
Reference to actual work in cognitive science on creativity, both theoretical and experimental (see, e.g., Boden's review contemporary with Martindale's work), is conspicuously absent. But who knows, maybe his uncritical acceptance of these sub-Freudian notions has lead in some productive direction; let us judge them by their fruits.
Here is Martindale's Figure 9.1 (p. 288), supposedly showing the amount of "primordial content" in Beethoven's musical compositions from 1795 through 1826, or rather a two-year moving average of this.
Now, here is the figure which was, so help me, the second run of some R code I wrote.
What is going on here? All of the apparent structure revealed in Martindale's analysis is actually coming from his having smoothed his data, from having taken the two-year moving average. Remarkably enough, he realized that this could lead to artifacts, but brushed the concern aside:
One has to be careful in dealing with smoothed data. The smoothing by its very nature introduces some autocorrelation because the score for one year is in part composed of the score for the prior year. However, autocorrelations introduced by smoothing are positive and decline regularly with increase lags. That is not at all what we find in the case of Beethoven — or in other cases where I have used smoothed data. The smoothing is not creating correlations where non existed; it is magnifying patterns already in the data. [p. 289]
What this passage reveals is that Martindale did not understand the difference between the autocorrelation function of a time series, and the coefficients of an autoregressive model fit to that time series. (Indeed I suspect he did not understand the difference between correlation and regression coefficients in general.) The autoregressive coefficients correspond, much more nearly, to the partial autocorrelation function, and the partial autocorrelations which result from applying a moving average to white noise have alternating signs — just like Martindale's do. In fact, the coefficients he got are entirely typical of what happens when his procedure is applied to white noise:
I could go on about what has gone wrong in just the four pages Martindale devotes to Beethoven's style, but I hope my point is made. I won't say that he makes every conceivable mistake in his analysis, because my experience as a teacher of statistics is that there are always more possible errors than you would ever have suspected. But I will say that the errors he's making — creating correlations by averaging, confusing regression and correlation coefficients, etc. — are the sort of things which get covered in the first few lessons of a good course on time series. The fact that averaging white noise produces serial correlations, and a particular pattern of autoregressive coefficients, is in particular famous as the Yule-Slutsky effect, after its two early-20th-century discoverers. (Slutsky, interestingly, appears to have thought of this as an actual explanation for many apparent cycles, particularly of macroeconomic fluctuations under capitalism, though how he proposed to reconcile this with Marx I don't know.) I am not exaggerating for polemical effect when I say that I would fail Martindale from any class I taught on data analysis; or that every single one of the undergraduate students who took 490 this spring has demonstrated more skill at applied statistics than he does in this book.
Martindale's book has about 200 citations in Google Scholar. (I haven't tried to sort out duplicates, citation variants, and self-citations.) Most of these do not appear to be "please don't confuse us with that rubbish" citations. Some of them are from intelligent scholars, like Bill Benzon, who, through no fault of their own, are unable to evaluate Martindale's statistics, and so take his competence on trust. (Similarly with Dutton, who I would not describe as an "intelligent scholar".) This trust has probably been amplified by Martindale's rhetorical projection of confidence in his statistical prowess. (Look at that quote above.) — Oh, let's not mince words here: Martindale fashions himself as someone bringing the gospel of quantitative science to the innumerate heathen of the humanities, complete with the expectation that they'll be too stupid to appreciate the gift. For many readers, those who project such intellectual arrogance are not just more intimidating but also more credible, though rationally, of course, they shouldn't be. (If you want to suggest that I exploit this myself, well, you'd have a point.)
Could there be something to the idea of an intrinsic style cycle, of the sort Martindale (like many others) advocates? I actually wouldn't be surprised if there were situations when some such mechanism (shorn of the unbearably silly psychoanalytic bits) applies. In fact, the idea of this mechanism is much older than Martindale. For example, here is a passage from Marshall G. S. Hodgson's The Venture of Islam, which I happen to have been re-reading recently:
After the death of [the critic] Ibn-Qutaybah [in 889], however, a certain systematizing of critical standards set in, especially among his disciples, the "school of Baghdad". ... Finally the doctrine of the pre-eminence of the older classics prevailed. So far as concerned poetry in the standard Mudâi Arabic, which was after all, not spoken, puristic literary standards were perhaps inevitable: an artificial medium called for artificial norms. That critics should impose some limits was necessary, given the definition of shi`r poetry in terms of imposed limitations. With the divorce between the spoken language of passion and the formal language of composition, they had a good opportunity to exalt a congenially narrow interpretation of those limits. Among adîbs who so often put poetry to purposes of decoration or even display, the critics' word was law. Generations of poets afterwards strove to reproduce the desert qasîdah ode in their more serious work so as to win the critics' acclaim.Some poets were able to respond with considerable skill to the critics' demands. Abû-Tammâm (d. c. 845) both collected and edited the older poetry and also produced imitations himself of great merit. But work such as his, however admirable, could not be duplicated indefinitely. In any case, it could appear insipid. A living tradition could not simply mark time; it had to explore whatever openings there might be for working through all possible variations on its themes, even the grotesque. Hence in the course of subsequent generations, taste came to favor an ever more elaborate style both in verse and in prose. Within the forms which had been accepted, the only recourse for novelty (which was always demanded) was in the direction of more far-fetched similes, more obscure references to educated erudition, more subtle connections of fancy.
The peak of such a tendency was reached in the proud poet al-Mutanabbi', "the would-be prophet" (915--965 — nicknamed so for a youthful episode of religious propagandizing, in which his enemies said he claimed to be a prophet among the Bedouin), who travelled whenever he did not meet, where he was, with sufficient honor for his taste. He himself consciously exemplified, it is said, something of the independent spirit of the ancient poets. Though he lived by writing panegyrics, he long preferred, to Baghdad, the semi-Bedouin court of the Hamdânid Sayf-al-dawlah at Aleppo; and on his travels he died rather than belie his valiant verses, when Bedouin attacked the caravan and he defended himself rather than escape. His verse has been ranked as the best in Arabic on the ground that his play of words showed the widest range of ingenuity, his images held the tension between fantasy and actuality at the tautest possible without falling into absurdity.
After him, indeed, his heirs, bound to push yet further on the path, were often trapped in artificial straining for effect; and sometimes they appear simply absurd. In any case, poetry in literary Arabic after the High Caliphal Period soon became undistinguished. Poets strove to meet the critics' norms, but one of the critics' demands was naturally for novelty within the proper forms. But such novelty could be had only on the basis of over-elaboration. This the critics, disciplined by the high, simple standards of the old poetry, properly rejected too. Within the received style of shi`r, good further work was almost ruled out by the effectively high standards of the `Abbâsî critics. [volume I, pp. 463--464, omitting some diacritical marks which I don't know how to make in HTML]
Now, it does not matter here what the formal requirements of such poetry were, still less those of the qasidah; nor is it relevant whether Hodgson's aesthetic judgments were correct. I quote this because he points to the very same mechanism — demand for novelty plus restrictions of a style leading to certain kinds of elaboration and content — decades before Martindale (Hodgson died, with this part of his book complete, in 1968), and with no pretense that he was making an original argument, as opposed to rehearsing a familiar one.
But there are obvious problems with turning this mechanism into the Universal Scientific Law of Artistic Change, as Martindale wants to do. Or rather problems which should be obvious, many of which were well put by Joseph (Abu Thomas) Levenson in Confucian China and Its Modern Fate:
Historians of the arts have sometimes led their subjects out of the world of men into a world of their own, where the principles of change seem interior to the art rather than governed by decisions of the artist. Thus, we have been assured that seventeenth-century Dutch landscape bears no resemblance to Breughel because by the seventeenth century Breughel's tradition of mannerist landscape had been exhausted. Or we are treated to tautologies, according to wich art is "doomed to become moribund" when it "reaches the limit of its idiom", and in "yielding its final flowers" shows that "nothing more can be done with it" — hece the passing of the grand manner of the eighteenth entury in Europe and the romantic movement of the nineteenth.How do aesthetic valuies really come to be superseded? This sort of thing, purporting to be a revelation of cause, an answer to a question, leaves the question still to be asked. For Chinese painting, well before the middle of the Ch'ing period, with its enshrinement of eclectic virtuosi and connoisseurs, had, by any "internal" criteria, reached the limit of its idiom and yielded its final flowers. And yet the values of the past persisted for generations, and the fear of imitation, the feeling that creativity demanded freshness in the artist's purposes, remained unfamiliar to Chinese minds. Wang Hui was happy to write on a landscape he painted in 1692 that it was a copy of a copy of a Sung original; while his colleague, Yün Shou-p'ing, the flower-painter, was described approvingly by a Chi'ing compiler as having gone back to the "boneless" painting of Hsü Ch'ung-ssu, of the eleventh century, and made his work one with it. (Yün had often, in fact, inscribed "Hsü Ch'ung-ssu boneless flower picture" on his own productions.) And Tsou I-kuei, another flower-painter, committed to finding a traditional sanction for his art, began a treatise with the following apologia:
When the ancients discussed painting they treated landscape in detail but slighted flowering plants. This does not imply a comparison of their merits. Flower painting flourished in the northern Sung, but Hsü [Hsi] and Huang [Ch'üan] could not express themselves theoretically, and therefore their methods were not transmitted.The lesson taught by this Chinese experience is that an art-form is "exhausted"when its practitioners think it is. And a circular explanation will not hold — they think so not when some hypothetically objective exhaustion occurs in the art itself, but when outer circumstances, beyond the realm of purely aesthetic content, has changed their subjective criteria; otherwise, how account for the varying lengths of time it takes for different publics to leave behind their worked-out forms? [pp. 40–41]
Martindale seems to be completely innocent of such considerations. What he brings to this long-running discussion is, supposedly, quantitative evidence, and skill in its analysis. But this is precisely what he lacks. I have only gone over one of his analyses here, but I claim that the level of incompetence displayed here is actually entirely typical of the rest of the book.
Manual trackback: Evolving Thoughts; bottlerocketscience
Minds, Brains, and Neurons; Writing for Antiquity; The Commonwealth of Letters; Learned Folly; Enigmas of Chance
Posted by crshalizi at June 30, 2010 15:00 | permanent link
For some reason, Clay Shirky's 2003 essay "Power Laws, Weblogs, and Inequality" seems to be making the rounds again. Allow me to remind the world that, at least as of 2004, the distribution of links to weblogs was definitely not a power law. Whether this matters to Shirky's broader arguments about the development of new media is a different question; perhaps all that's needed is for the distribution to be right skewed and heavy tailed. But the actual essay stresses the power law business, which is wrong.
If you have more recent data and would like an updated analysis, you can use our tools and do it yourself.
Posted by crshalizi at June 28, 2010 09:31 | permanent link
Attention conservation notice: 750+ self-promoting words about a new preprint on Bayesian statistics and the philosophy of science. Even if you like watching me ride those hobby-horses, why not check back in a few months and see if peer review has exposed it as a mass of trivialities, errors, and trivial errors?
I seem to have a new pre-print:
As the two or three people who still read this blog may recall, I have long had a Thing about Bayesianism, or more exactly the presentation of Bayesianism as the sum total of rationality, and the key to all methodologies. (Cf.) In particular, the pretense that all a scientist really wants, or should want, is to know the posterior probability of their theories — the pretense that Bayesianism is a solution to the problem of induction — bugs me intensely. This is the more or less explicit ideology of a lot of presentations of Bayesian statistics (especially among philosophers, economists* and machine-learners). Not only is this crazy as methodology — not only does it lead to the astoundingly bass-ackwards mistake of thinking that using a prior is a way of "overcoming bias", and to myths about Bayesian super-intelligences — but it doesn't even agree with what good Bayesian data analysts actually do.
If you take a good Bayesian practitioner and ask them "why are you using a hierarchical linear model with Gaussian noise and conjugate priors?", or even "why are you using that Gaussian process as your prior distribution over regression curves?", if they have any honesty and self-awareness they will never reply "After offering myself a detailed series of hypothetical bets, the stakes carefully gauged to assure risk-neutrality, I elicited it as my prior, and got the same results regardless of how I framed the bets" — which is the official story about operationalizing prior knowledge and degrees of belief. (And looking for "objective" priors is hopeless.) Rather, data analysts will point to some mixture of tradition, mathematical convenience, computational tractability, and qualitative scientific knowledge and/or guesswork. Our actual degree of belief in our models is zero, or nearly so. Our hope is that they are good enough approximations for the inferences we need to make. For such a purpose, Bayesian smoothing may well be harmless. But you need to test the adequacy of your model, including the prior.
Admittedly, checking your model involves going outside the formalism of Bayesian updating, but so what? Asking a Bayesian data analyst not just whether but how their model is mis-specified is not, pace Brad DeLong, tantamount to violating the Geneva Convention. Instead, it is recognizing them as a fellow member of the community of rational inquirers, rather than a dumb numerical integration subroutine. In practice, good Bayesian data analysts do this anyway. The ideology serves only to give them a guilty conscience about doing good statistics, or to waste time in apologetics and sophistry. Our modest hope is to help bring an end to these ideological mystifications.
The division of labor on this paper was very simple: Andy supplied all the worthwhile parts, and I supplied everything mistaken and/or offensive. (Also, Andy did not approve this post.)
*: Interestingly, even when economists insist that rationality is co-extensive with being a Bayesian agent, none of them actually treat their data that way. Even when they do Bayesian econometrics, they are willing to consider that the truth might be outside the support of the prior, which to a Real Bayesian is just crazy talk. (Real Bayesians enlarge their priors until they embrace everything which might be true.) Edward Prescott forms a noteworthy exception: under the rubric of "calibration", he has elevated his conviction that his prior guesses are never wrong into a new principle of statistical estimation.
Manual trackback: Andrew Gelman; Build on the Void; The Statistical Mechanic; A Fine Theorem; Evolving Thoughts; Making Sense with Facilitated Systems; Vukutu; EconTech; Gravity's Rainbow; Nuit Blanche; Smooth; Andrew Gelman again (incorporating interesting comments from Richard Berk); J.J. Hayes's Amazing Antifolk Explicator and Philosophic Analyzer; Manuel "Moe" G.
Bayes, anti-Bayes; Enigmas of Chance; Philosophy; Self-Centered
Posted by crshalizi at June 26, 2010 15:58 | permanent link
In the late 1950s, my grandfather, Abdussattar Shalizi, was the president of the planning office in Afghanistan's ministry of planning; back then Afghanistan had a planning office and a ministry of planning which were not just jokes. During that time he wrote a book called Afghanistan: Ancient Land with Modern Ways, mostly consisting of his photographs of the signs of the country's progress. This was, as you might guess, a propaganda piece, but I can testify that it was an utterly sincere propaganda piece. So far as I know my grandfather did not erect any Potemkin factories, schools, houses, irrigation works, record stores, Girl Scout troops, or secure roads for his photographs. Re-reading the book now fills me with pity and, to be honest, anger.
But it is important to remember, when people ignorantly mutter about a country stuck in the 12th century, not just that the 12th century meant something very different there than it did in Scotland, but that 1960 in Afghanistan actually happened. So I am very pleased to see, via my brother, a photo essay in Foreign Policy, by Mohammad Qayoumi, consisting of scanned photos from my grandfather's book, with his original captions and Qayoumi's commentary. Go look.
(My plan to post something positive at least once a week was a total failure. I am contemplating requiring every merely-critical post to be paired with a positive one.)
Manual trackback: Gaddeswarup
Posted by crshalizi at June 24, 2010 21:30 | permanent link
Attention conservation notice: 1000+ words about how I am irritated by journalists being foolish, and about attempts at causal inference on social networks. As novel as a cat meowing or a car salesman scamming.
I have long thought that most opinion writers could be replaced, to the advantage of all concerned, by stochastic context-free grammars. Their readers would be no less well-informed about how the world is and what should be done about it, would receive no less surprise and delight at the play of words and ideas, and the erstwhile writers would be free to pursue some other trade, which did not so corrode their souls. One reason I feel this way is that these writers habitually make stuff up because it sounds good to them, even when actual knowledge is attainable. They have, as a rule, no intellectual conscience. Yesterday, therefore, if you had told me that one of their number actually sought out some social science research, I would have applauded this as a modest step towards a better press corps.
Today, alas, I am reminded that looking at research is not helpful, unless you have the skills and skepticism to evaluate the research. Exhibit A is Ross "Chunky Reese Witherspoon Lookalike" Douthat, who stumbled upon this paper from McDermott, Christakis, and Fowler, documenting an association between people getting divorced and those close to them in the social network also getting divorced. Douthat spun this into the claim that "If your friends or neighbors or relatives get divorced, you're more likely to get divorced --- even if it's only on the margins --- no matter what kind of shape your marriage is in." It should come as no surprise that McDermott et al. did not, in any way whatsoever, try to measure what shape peoples' marriages were in.
Ezra Klein, responding to Douthat, suggests that the causal channel isn't making people who are happy in their marriages divorce, but leading people to re-evaluate whether they are really happily married, by making it clear that there is an alternative to staying married. "The prevalence of divorce doesn't change the shape your marriage is in. It changes your willingness to face up to the shape your marriage is in." (In other words, Klein is suggesting that many people call their marriages "happy" only through the mechanism of adaptive preferences, a.k.a. sour grapes.) Klein has, deservedly, a reputation for being more clueful than his peers, and his response shows a modicum of critical thought, but he is still relying on Ross Douthat to do causal inference, which is a sobering thought.
Both of these gentlemen are assuming that this association between network neighbors' divorces must be due to some kind of contagion — Douthat is going for some sort of imitation of divorce as such, Klein is looking to more of a social learning process about alternatives and their costs. Both of them ignore the possibility that there is no contagion here at all. Remember homophily: People tend to be friends with those who are like them. I can predict your divorce from your friends' divorces, because seeing them divorce tells me what kind of people they are, which tells me about what kind of person you are. From the sort of observational data used in this study, it is definitely impossible to say how much of the association is due to homophily and how much to contagion. (The edge-reversal test they employ does not work.) It seems to be impossible to even say whether there is any contagion at all.*
To be clear, I am not castigating columnists for not reading my pre-prints; on balance I'm probably happier that they don't. But the logical issue of running together influence from friends and inference from the kind of friends you have is clear and well known. (Our contribution was to show that you can't escape the logic through technical trickery.) One would hope it would have occurred to people to ponder it before calling for over-turning family law, or saying, in effect, "You should stay together, for the sake of your neighbors' kids". I also have no problem with McDermott et al. investigating this. It's a shame that their data is unable to answer the causal questions, but without their hard work in analyzing that data we wouldn't know there was a phenomenon to be explained.
I hope it's obvious that I don't object to people pontificating about whatever they like; certainly I do enough of it. If people can get paying jobs doing it, more power to them. I can even make out a case why ideologically committed opinionators have a role to play in the social life of the mind, like so. It's a big complicated world full of lots of things which might, conceivably, matter, and it's hard to keep track of them all, and figure out how one's principles apply** — it takes time and effort, and those are always in short supply. Communicating ideas takes more time and effort and skill. People who can supply the time, effort and skill to the rest of us, starting from more or less similar principles, thereby do us a service. But only if they are actually trustworthy — actually reasoning and writing in good faith — and know what they are talking about.
(Thanks, of a kind, to Steve Laniel for bringing this to my attention.)
*: Arbitrarily strong predictive associations of the kind reported here can be produced by either mechanism alone, in the absence of the other. We are still working on whether there are any patterns of associations which could not be produced by homophily alone, or contagion alone. So far the answer seems to be "no", which is disappointing.
**: And sometimes you reach conclusions so strange or even repugnant that the principles they followed from come into doubt themselves. And sometimes what had seemed to be a principle proves, on reflection, to be more like a general rule, adapted to particular circumstances. And sometimes one can't articulate principles at all. All of this, too, could and should be part of our public conversation; but let me speak briefly in the main text.
(Typos corrected, 26 June)
Manual trackback: The Monkey Cage.
Posted by crshalizi at June 24, 2010 20:45 | permanent link
Attention conservation notice: Combines quibbles about what's in an academic paper on tooth-brushing with more quibbles about the right way to do causal inference.
Chris Blattman finds a new paper which claims not brushing your teeth is associated with higher risk of heart disease, and is unimpressed:
Toothbrushing is associated with cardiovascular disease, even after adjustment for age, sex, socioeconomic group, smoking, visits to dentist, BMI, family history of cardiovascular disease, hypertension, and diagnosis of diabetes....participants who brushed their teeth less often had a 70% increased risk of a cardiovascular disease event in fully adjusted models.
The idea is that inflamed gums lead to certain chemicals or clot risks.
In the past five days I've seen this study reported in five newspapers, half a dozen radio news shows, and several blogs. These researchers know how to use a PR firm.
Sounds convincing. What could be wrong there?
OH WAIT. MAYBE PEOPLE WHO BRUSH THEIR TEETH TWICE A DAY GENERALLY TAKE BETTER CARE OF THEMSELVES AND WATCH WHAT THEY EAT.
I'm consistently blown away by what passes for causal analysis in medical journals.
Now, I am generally of one mind with Blattman about the awfulness of causal inference in medicine — I must write up the "neutral model of epidemiology" sometime soon — but here, I think, he's being a bit unfair. (I have not read or listened to any of the press coverage, but I presume it's awful, because it always is.) If you read the actual paper, which seems to be open access, one of the covariates is actually a fairly fine-grained set of measures of physical activity, albeit self-reported. (I'm not sure why the didn't list it in the abstract.) It would be nice to have information about diet, and of course self-reports are always extra dubious for moralized behaviors like exercise. Still, it's not right to say, IN ALL CAPS, that the authors of the paper did nothing about this.
In fact, the real weakness of the paper is that they have a reasonably clear mechanism in mind, and enough information to test it, but didn't do so. As Blattman says, the idea is that not brushing your teeth causes tooth and gum disease, tooth and gum disease cause inflammatory responses, and inflammation causes heart disease. Because of this, the authors measured the levels of two chemical markers of inflammation, and found that they were positively predicted by not brushing, even adjusting for their other variables (including physical activity). So far so good. Following the logic of Pearl's front-door criterion, what they should have done next, but did not, was see whether conditioning on the levels of these chemical markers substantially reduced the dependence of heart disease on tooth brushing. (The dependence should be eliminated by conditioning on the complete set of chemicals mediating the inflammatory response.) This is what one would expect if that mechanism I mentioned actually works, but not if the association comes down to not brushing being a sign that one's an unhealthy slob.
The moral is: brush your teeth, for pity's sake, unless you want to end up like this poor soul.
Posted by crshalizi at June 01, 2010 13:25 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Writing for Antiquity; Progressive Forces; Enigmas of Chance; Afghanistan and Central Asia; Commit a Social Science; The Continuing Crises; Philosophy; Cthulhiana
Posted by crshalizi at May 31, 2010 23:59 | permanent link
And why not listen to Eilen Jewell while doing your nothing? (Also: allow me to recommend the Thunderbird Cafe for all your smoky blues-bar needs in Pittsburgh.)
Posted by crshalizi at May 15, 2010 23:10 | permanent link
The Annals of Applied Statistics is running a special issue on "modeling and analysis of network data", or rather is spreading it over the current issue and the next. Go look, starting with Steve Fienberg's introduction. You need to subscribe, but then you or your institution should subscribe to AoAS. (Alternately, you could wait about six months for them to show up on arxiv.org.)
Disclaimer: I am an associate editor of AoAS, and helped handle many of the papers for this section.
Posted by crshalizi at May 14, 2010 13:00 | permanent link
Continuing, or in some cases reviving, long-standing but utterly unwelcome customs, several southern states declared April "Confederate History Month". The occasion redeemed itself by provoking a long series of posts from Ta-Nehisi Coates at The Atlantic, each of which "observ[s]e some aspect of the Confederacy—but through a lens darkly". These begin with one whose peroration is worthy of Mencken,
This is who they are—the proud and ignorant. If you believe that if we still had segregation we wouldn't "have had all these problems," this is the movement for you. If you believe that your president is a Muslim sleeper agent, this is the movement for you. If you honor a flag raised explicitly to destroy this country then this is the movement for you. If you flirt with secession, even now, then this movement is for you. If you are a "Real American" with no demonstrable interest in "Real America" then, by God, this movement of alchemists and creationists, of anti-science and hair tonic, is for you.The whole of it is a moving, empathic, and thereby all the more devastating meditation on memory, pride, shame, racism, heroism, moral courage, myths, the great personalities of the Civil War, and the enduring legacy of one of America's two great founding sins; on just how it is that we can be a country where a month set aside to remember a heritage of treason in defense of slavery is intended as a time of celebration and not of soul-searching.
(Owing to the folly of that venerable magazine's web design, there doesn't seem to be a single page collecting them, but I think this is the entire sequence: 1; 2; 3; 4; 5; 6; 7; 8; 9; 10; 11; 12; 13; 14; 15; 16; 17; 18; 19; 20; 21.)
(Incidentally, last week Coates asked his readers to explain financial derivatives to him, and this week he's move on to nuclear weapons. I speculate that if enough people buy his book, he is certain to not try out the business plan "1. Take a big position in the end-of-the-world trade; 2. Enrich uranium; 3. Profit!")
Posted by crshalizi at May 09, 2010 09:00 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; The Dismal Science; Enigmas of Chance; Writing for Antiquity; The Continuing Crises; The Running-Dogs of Reaction
Posted by crshalizi at April 30, 2010 23:59 | permanent link
Carlos Yu on Facebook yesterday: "What this country really needs is William Tecumseh Sherman." He went on:
... leaving a ten-mile wide trail of burned-out mobile homes and meth labs behind him, Sherman paused in his March to the Tea to regroup his forces. Water was always an issue for Sherman's armies, campaigning as they did in the dusty steppes surrounding Bakersfield, in the deserts of Arizona, and throughout the drought-stricken former Confederacy. Nowhere was their lack of water worse than among the abandoned exurban developments of central Florida, where the water table had been permanently damaged...This, I think, sums up everything admirably.
The Beloved Republic; The Continuing Crises; Modest Proposals
Posted by crshalizi at April 29, 2010 14:57 | permanent link
Attention conservation notice: 2700 words on a new paper on causal inference in social networks, and why it is hard. Instills an attitude of nihilistic skepticism and despair over a technical enterprise you never knew existed, much less cared about, which a few feeble attempts at jokes and a half-hearted constructive suggestion at the end fail to relieve. If any of this matters to you, you can always check back later and see if it survived peer review.
Well, we decided for a more sedate title for the actual paper, as opposed to the talk:
The basic problem here is as follows. (I am afraid this will spoil some of the jokes in the paper.) Consider the venerable parental question: "If your friend Joey jumped off a bridge, would you jump too?" The fact of the matter is that the answer is "yes"; but why does Joey's jumping off a bridge mean that Joey's friend Irene is more likely to jump off one too?
For Irene's parents, there is a big difference between (1) and (2) and the other explanations. The former suggest that it would be a good idea to keep Irene away from Joey, or at least to keep Joey from jumping off the bridge; with the others, however, that's irrelevant. In the case of (3) and (4), in fact, knowing that Irene is friends with Joey is just a clue as to what Irene is really like; the damage was already done, and they can hang out together as much as they want. The difference between these accounts is one of causal mechanisms. (Of course there can be mixed cases.)
What the statistician or social scientist sees is that bridge-jumping is correlated across the social network. In this it resembles many, many, many behaviors and conditions, such as prescribing new antibiotics (one of the classic examples), adopting other new products, adopting political ideologies, attaching tags to pictures on flickr, attaching mis-spelled jokes to pictures of cats, smoking, drinking, using other drugs, suicide, literary tastes, coming down with infectious diseases, becoming obese, and having bad acne or being tall for your age. For almost all of these conditions or behaviors, our data is purely observational, meaning we cannot, for one reason or another, just push Joey off the bridge and see how Irene reacts. Can we nonetheless tell whether bridge-jumping spreads by (some form) of contagion, or rather is due to homophily, or, if it is both, say how much each mechanism contributes?
A lot of people have thought so, and have tried to come at it in the usual way, by doing regression. Most readers can probably guess what I think about that, so I will just say: don't you wish. More sophisticated ideas, like propensity score matching, have also been tried, but people have pretty much assumed that it was possible to do this sort of decomposition. What Andrew and I showed is that in fact it isn't, unless you are willing to make very strong, and generally untestable, assumptions.
This becomes clear as soon as you draw the relevant graphical model, which goes like so:
Now it's easy to see where the trouble lies. If we learn that Joey jumped off a bridge yesterday, that tells us something about what kind of person Joey is, X(j). If Joey and Irene are friends, that tells us something about what kind of person Irene is, X(i), and so about whether Irene will jump off a bridge today. And this is so whether or not there is any direct influence of Joey's behavior on Irene's, whether or not there is contagion. The chain of inferences — from Joey's behavior to Joey's latent traits, and then over the social link to Irene's traits and thus to Irene's behavior — constitutes what Judea Pearl strikingly called a "back-door path" connecting the variables at either end. When such paths exist, as here, Y(i,t) will be at least somewhat predictable from Y(j,t-1), and sufficiently clever regressions will detect this, but they cannot distinguish how much of the predictability is due to the back door path and how much to direct influence. If this sounds hand-wavy to you, and you suspect that with some fancy adjustments you can duck and weave through it, read the paper.
To switch examples to something a little more serious than jumping off bridges, let's take it as a given that (as Christakis and Fowler famously reported), if Joey became obese last year, the odds of Irene becoming obese this year go up substantially. They interpreted this as a form of social contagion, and one can imagine various influences through which it might work (changing Irene's perception of what normal weight is, changing Irene's perception of what normal food consumption is, changes in happiness leading to changes in comfort food and/or comfort alcohol consumption, etc.). Now suppose that there is some factor X which affects both whether Joey and Irene become friends, and whether and when they become obese. For example:
Christakis and Fowler made an interesting suggestion in their obesity paper, however, which was actually one of the most challenging things for us to deal with. They noticed that friendships are sometimes not reciprocated, that Irene thinks of Joey as a friend, but Joey doesn't think of Irene that way — or, more cautiously, Irene reports Joey as a friend, but Joey doesn't name Irene. For these asymmetric pairs in their data, Christakis and Fowler note, it's easier to predict the person who named a friend from the behavior of the nominee than vice versa. This is certainly compatible with contagion, in the form of being influenced by those you regard as your friends, but is there any other way to explain it?
As it happens, yes. One need only suppose that being a certain kind of person — having certain values of the latent trait X — make you more likely to be (or be named as) a friend. Suppose that there is just a one-dimensional trait, like your location on the left-right political axis, or perhaps some scale of tastes. (Perhaps Irene and Joey are neo-conservative intellectuals, and the trait in question is just how violent they like their Norwegian black metal music.) Having similar values of the trait makes you more likely to be friends (that's homophily), but there is always an extra tendency to be friends with those who are closer to the median of the distribution, or at least to say those are who your friends are. (Wherever neo-conservatives really are on the black metal spectrum, they tend to say, on Straussian grounds, that their friends are those who prefer only the median amount of church-burning with their music.) If Irene thinks of Joey as a friend, but Joey does not, this is a sign that Irene has a more extreme value of the trait than Joey does, which changes how much their behavior predicts each other. Putting together a very basic model of this sort shows that it robustly generates the kind of asymmetry Christakis and Fowler found, even when there is really no contagion.
To be short about it, unless you actually know, and appropriately control for, the things which really lead people to form connections, you really have no way of distinguishing between contagion and homophily.
All of this can be turned around, however. Suppose that you want to know whether, or how strongly, some trait of people influences their choices. Following a long tradition with many illustrious exponents, for instance, people are very convinced that social class influences political choices, and there is indeed a predictive relationship here, though many people are totally wrong about what that relationship is. The natural supposition is that this predictive relationship reflects causation. But suppose that there is contagion, that you can catch ideology or even just choices from your friends. Social class is definitely a homophilous trait; this means that an opinion or attitude or choice can become entrenched among one social class, and not another, simply through diffusion, even if there is no intrinsic connection between them. And there's nothing special about class here; it could be any trait or combination of traits which leads to homophily.
Here, for example, is a simple simulation done using Andrew's ElectroGraph package.
Now let the choices evolve according to the simplest possible rule: at each point in time, a random individual picks one of their neighbors, again at random, and copies their opinion. After a few hundred such updates, the lower class has turned red, and the upper class has turned blue:
In their own way, each of the two models in our paper is sheer elegance in its simplicity, and I have been known to question the relevance of such models for actual social science. I don't think I'm guilty of violating my own strictures, however, because I'm not saying that the processes of, say, spreading political opinions really follows a voter model. (The reality is much more complicated.) The models make vivid what was already proved, and show that the conditions needed to produce the phenomena are not actually very extreme.
My motto as a writer might as well be "the urge to destroy is also a creative urge", but in this paper we do hold out some hope, which is that even if the causal effects of contagion and/or homophily cannot be identified, they might be bounded, following the approach pioneered by Manski for other unidentifiable quantities. Even if observable associations would never let us say exactly how strong contagion is, for instance, they might let us say that it has to lie inside some range, and if that range excludes zero, we know that contagion must be at work. (Or, if the association is stronger than contagion can produce, something else must be at work.) I suspect (with no proof) that one way to get useful bounds would be to use the pattern of ties in the network to divide it into sub-networks or, as we say in the trade, communities, and use the estimated communities as proxies for the homophilous trait. That is, if people tend to become friends because they are similar to each other, then the social network will tend to become a set of clumps of similar people, as in the figures above. So rather than just looking at the tie between Joey and Irene, we look at who else they are friends with, and who their friends are friends with, and so on, until we figure out how the network is divided into communities and that (say) Irene and Joey are in the same community, and therefore likely have the similar values of X, whatever it is. Adjusting for community might then approach actually adjusting for X, though it couldn't be quite the same. Right now, though, this idea is just a conjecture we're pursuing.
Manual trackback: The Monkey Cage; Citation Needed; Healthy Algorithms; Siris; Gravity's Rainbow; Orgtheory; PeteSearch
Networks; Enigmas of Chance; Complexity; Commit a Social Science; Self-Centered
Posted by crshalizi at April 28, 2010 18:00 | permanent link
I was going to blog about this paper
(Study of the scholarly misconstruction of reality suggests that this will lead to at most a marginal reduction in the number of claims that biochemical networks follow power laws.)
Posted by crshalizi at April 21, 2010 18:30 | permanent link
It evidently takes a week to find a priest and a nubile virgin in Europe.
Update: "On the other hand", as J.B. told me as soon as I posted this, "find one and you're not far from the other."
Posted by crshalizi at April 21, 2010 08:24 | permanent link
Attention conservation notice: Only of interest if you (1) care about the community discovery problem for networks and (2) will be in Pittsburgh on Friday.
I've talked about the community discovery problem here before, and even contributed to it; if you want a state-of-the-field you should read Aaron. This week, the CMU statistics seminar delivers a very distinguished statistician's take:
As always, seminars are free and open to the public.
(This might motivate me to finally finish my post on Bickel and Chen's paper...)
Posted by crshalizi at April 20, 2010 16:57 | permanent link
My "Computing Science" column for American Scientist, "The Bootstrap", is now available for your reading pleasure. Hopefully, this will assuage your curiosity about how to use the same data set not just to fit a statistical model but also to say how much uncertainty there is in the fit. (Hence my recent musings about the cost of bootstrapping.) And then the rest of the May-June issue looks pretty good, too.
I have been reading American Scientist since I started graduate school, lo these many years ago, and throughout that time one of the highlights for me has been the "Computing Science" column by Brian Hayes; it was quite thrilling to be asked about being one of the substitutes while he's on sabbatical, and I hope I've come close to his standard.
After-notes to the column itself:
Posted by crshalizi at April 19, 2010 08:45 | permanent link
Me, going on three years ago: "It is a further sign of our intellectual depravity that people take Bryan Caplan seriously, even when he is obviously a cheap imitation of The Onion."
Posted by crshalizi at April 15, 2010 23:13 | permanent link
Empirically, the time needed for something to seep from self-consciously advanced subcultures to complete innocuousness really is about one generation. (Second link via Tapped.)
Posted by crshalizi at April 15, 2010 22:30 | permanent link
Attention conservation notice: Only of interest if you (1) have a vast number of variables you could use in your statistical models and want to reliably learn which ones matter, and (2) are in Pittsburgh in Monday.
As always, the seminar is free and open to the public:
Let add that Fan and Yao's book on time series is one of the best available.
Posted by crshalizi at April 08, 2010 13:45 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Writing for Antiquity; The Commonwealth of Letters; Philosophy; The Dismal Science; The Continuing Crises; Cthulhiana
Posted by crshalizi at March 31, 2010 23:59 | permanent link
Back in the day, when the blogs were young, one of
the gods
decided to travel the world incognito as an incoherent mumbler. A certain
phonologist regarded this as an
imposition, and devised
a scheme
whereby mortals would never have to
worship incomprehensibilities.
This angered the gods, who cursed the professor to spend
eternity rolling a stone uphill only to keep having it fall back
down patiently debunking reactionary appropriations of neuroscience as
carefully as though they were actual attempts to advance human knowledge, and
not meretricious myth-making. (An incomplete sampling of episodes, in no
particular order except for the first being the most recent:
1, 2, 3, 4, 5, 6, 7, 8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30, 31, 32, 33, 34, 35, 36,
37,
38,
39,
40).
But one
must imagine Liberman happy; the alternative is too terrible to
contemplate.
Minds, Brains and Neurons; The Natural Science of the Human Species; Learned Folly
Posted by crshalizi at March 30, 2010 11:45 | permanent link
Posted by crshalizi at March 21, 2010 21:20 | permanent link
Yes, I've seen this. Yes, those are (so far as I can recall) accurate quotes. No, I really don't track page-views, so I honestly don't know what the most-viewed things I've written are. Yes, it's an entirely undeserved honor to be named in such company. Yes, I do wish my writing was more positive and constructive, and less negative and critical. Yes, I realize it's easily within my power to change that. No, I do not seem to be doing too well on that front.
Posted by crshalizi at March 21, 2010 17:00 | permanent link
Attention conservation notice: 2300 words about a paper other people wrote on learning theory and hypothesis testing. Mostly written last year as part of a never-used handout for 350, and rescued from the drafts folderas an exercise in structured procrastinationso as to avoid a complete hiatus while I work on my own manuscripts.P.S. Nauroz mubarak.
In a previous installment, we recalled the Neyman-Pearson lemma of statistical hypothesis testing: If we are trying to discriminate between signal and noise, and know the distribution of our data (x) both for when a signal is present (q) and when there is just noise (p), then the optimal test says "signal" when the likelihood ratio q(x)/p(x) exceeds a certain threshold, and "noise" otherwise. This is optimal in that, for any given probability of thinking noise is signal ("size"), it maximizes the power, the probability of detecting a signal when there is one.
The problem with just applying the Neyman-Pearson lemma directly to problems of interest is the bit about knowing the exact distributions of signal and noise. We should, forgive the expression, be so lucky. The traditional approach in theoretical statistics, going back to Neyman and Pearson themselves, has been to look for circumstances where we can get a single test of good power against a whole range of alternatives, no matter what they are. The assumptions needed for this are often rather special, and teaching this material means leading students through some of the more arid sections of books like these; the survivors are generally close to insensible by the time they reach the oases of confidence regions.
At the other extreme, a large part of modern statistics, machine learning and data mining is about classification problems, where we take feature-vectors x and assign them to one of a finite number of classes. Generally, we want to do this in a way which matches a given set of examples, which are presumed to be classified correctly. (This is obviously a massive assumption, but let it pass.) When there are only two classes, however, this is exactly the situation Neyman and Pearson contemplated; a binary classification rule is just a hypothesis test by another name. Indeed, this really the situation Neyman discussed in his later work (like his First Course in Probability and Statistics [1950]), where he advocated dropping the notion of "inductive inference" in favor of that of "inductive behavior", asking, in effect, what rule of conduct a learning agent should adopt so as to act well in the future.
The traditional approach in data-mining is to say that one should either (i) minimize the total probability of mis-classification, or (ii) assign some costs to false positives (noise taken for signal) and false negatives (signal taken for noise) and minimize the expected cost. Certainly I've made this recommendations plenty of times in my teaching. But this is not what Neyman and Perason would suggest. After all, the mis-classification rate, or any weighted combination of the error rates, will depend on what proportions of the data we look at actually are signal and noise. Which decision rule minimizes the chance of error depends on the actual proportion of instance of "signal" to those of "noise". If that ratio changes, a formerly optimal decision rule can become arbitrarily bad. (To give a simple but extreme example, suppose that 99% of all cases used to be noise. Then a decision rule which always said "noise" would be right 99% of the time. The minimum-error rule would be very close to "always say 'noise'". If the proportion of signal to noise should increase, the formerly-optimal decision rule could become arbitrarily bad. — The same is true, mutatis mutandis, of a decision rule which minimizes some weighted cost of mis-classifications.) But a Neyman-Pearson rule, which maximizes power subject to a constraint on the probability of false positives, is immune to changes in the proportions of the two classes, since it only cares about the distribution of the observables given the classes. But (and this is where we came in) the Neyman-Pearson rule depends on knowing the exact distribution of observables for the two classes...
This brings us to tonight's reading.
Statistical learning methods take in data and give back predictors --- here, classifiers. Showing that a learning method works generally means first showing that one can estimate the performance of any individual candidate predictor (with enough data), and then extending that to showing that the method will pick a good candidate.
The first step is an appeal to some sort of stochastic limit theorem, like the law of large numbers or the ergodic theorem: the data-generating process is sufficiently nice that if we fix any one prediction rule, its performance on a sufficiently large sample shows how it will perform in the future. (More exactly: by taking the sample arbitrarily large, we can have arbitrarily high confidence that in-sample behavior is arbitrarily close to the expected future behavior.) Here we can represent every classifier by the region R of x values where it says "signal". P(R) is the true false positive rate, or size, of the classifier, and Q(R) is the power. If we fix R in advance of looking at the data, then we can apply the law of large numbers separately to the "signal" and "noise" training samples, and conclude that, with high P-probability, the fraction of "noise" data points falling into R is close to P(R), and likewise with high Q-probability the fraction of "signal" points in R is about Q(R). In fact, we can use results like Hoeffding's inequality to say that, after n samples (from the appropriate source), the probability that either of these empirical relative frequencies differs from their true probabilities by as much as ±h is at most 2 e-2 nh2. The important point is that the probability of an error of fixed size goes down exponentially in the number of samples.
(Except for the finite-sample bound, this is all classical probability theory of the sort familiar to Neyman and Pearson, or for that matter Laplace. Neyman might well have known Bernstein's inequality, which gives similar though weaker bounds here than Hoeffding's; and even Laplace wouldn't've been surprised at the form of the result.)
Now suppose that we have a finite collection of classifier rules, or equivalently of "say 'signal'" regions R1, R2, ... Rm. The training samples labeled "noise" give us an estimate of the P(Ri), the false positive rates, and we just saw above that the probability of any of these estimates being very far from the truth is exponentially small; call this error probability c. The probability that even one of the estimates is badly off is at most cm. So we take our sample data and throw out all the classifiers whose false positive rate exceeds α (plus a small, shrinking fudge factor), and with at least probability 1-cm all the rules we're left with really do obey the size constraint. Having cut down the hypothesis space, we then estimate the true positive rates or powers Q(Ri) from the training samples labeled "signal". Once again, the probability that any one of these estimates is far from the truth is low, say d, and by the union bound again the probability that any of them are badly wrong is at most dm. This means that the sample maximum has to be close to the true maximum, and picking the Ri with the highest true positive rate then is (probabilistically) guaranteed to give us a classifier with close to the maximum attainable power. This is the basic strategy they call "NP empirical risk minimization". Its success is surprising: I would have guessed that in adapting the NP approach we'd need to actually estimate the distributions, or at least the likelihood ratio as a function of x, but Scott and Nowak show that's not true, that all we need to learn is the region R. So long as M is finite and fixed, the probability of making a mistake (of any given magnitude ±h) shrinks to zero exponentially (because c and d do), so by the Borel-Cantelli lemma we will only ever make finitely many mistakes. In fact, we could even let the number of classifiers or regions we consider grow with the number of samples, so long as it grows sub-exponentially, and still come to the same conclusion.
Notice that we've gone from a result which holds universally over the objects in some collection to one which holds uniformly over the collection. Think of it as a game between me and the Adversary, in which the Adversary gets to name regions R and I try to bound their performance; convergence means I can always find a bound. But it matters who goes first. Universal convergence means the Adversary picks the region first, and then I can tailor my convergence claim to the region. Uniform convergence means I need to state my convergence claim first, and then the Adversary is free to pick the region to try to break my bound. What the last paragraph showed is that for finite collections which don't grow too fast, I can always turn a strategy for winning at universal convergence into one for winning at uniform convergence. [1]
Nobody, however, wants to use just a finite collection of classifier rules. The real action is in somehow getting uniform convergence over infinite collections, for which the simple union bound won't do. There are lots of ways of turning this trick, but they all involve restricting the class of rules we're using, so that their outputs are constrained to be more or less similar, and we can get uniform convergence by approximating the whole collection with a finite number of representatives. Basically, we need to count not how many rules there are (infinity), but how many rules we can distinguish based on their output (at most 2n). As we get more data, we can distinguish more rules. Either this number keeps growing exponentially, in which case we're in trouble, or it ends up growing only polynomially, with the exponent being called the "Vapnik-Chervonenkis dimension". As any good book on the subject will explain, this is not the same as the number of adjustable parameters.
So, to recap, here's the NP-ERM strategy. We have a collection of classifier rules, which are equivalent to regions R, and this class is of known, finite VC dimension. One of these regions or classifiers is the best available approximation to the Neyman-Pearson classifier, because it maximizes power at fixed size. We get some data which we know is noise, and use it to weed out all the regions whose empirical size (false positive rate) is too big. We then use data which we know is signal to pick the region/classifier whose empirical power (true positive rate) is maximal. Even though we are optimizing over infinite spaces, we can guarantee that, with high probability, the size and power of the resulting classifier will come arbitrarily close to those of the best rule, and even put quantitative bounds on the approximation error given the amount of data and our confidence level. The strictness of the approximation declines as the VC dimension grows. Scott and Nowak also show that you can also pull the structural risk minimization trick here: maximize the the in-sample true positive rate, less a VC-theory bound on the over-fitting, and you still get predictive consistency, even if you let the capacity of the set of classifiers you'll use grow with the amount of data you have.
What's cool here is that this is a strategy for learning classifiers which gives us some protection against changes in the distribution, specifically against changes in the proportion of classes, and we can do this without having to learn the two probability density functions p and q, one just learns R. Such density estimation is certainly possible, but densities are much more complicated and delicate objects than mere sets, and the demands for data are correspondingly more extreme. (An interesting question, to which I don't know the answer, is how much we can work out about the ratio q(x)/p(x) by looking at the estimated maximum power as we vary the size α.) While Scott and Nowak work out detailed algorithms for some very particular families of classifier rules, their idea isn't tied to them, and you could certainly use it with, say, support vector machines.
[1] I learned this trick of thinking about quantifiers as games with the Adversary from Hintikka's Principles of Mathematics Revisited, but don't remember whether it was original to him or he'd borrowed it in turn. — Gustavo tells me that game semantics for logic began with Paul Lorenzen.
Posted by crshalizi at March 21, 2010 15:00 | permanent link
Somewhere in the vastness of the scholarly literature there exists a sound, if not complete, history of the reception of statistical inference, especially regression, across the social sciences in the 20th century. I have not found it and would appreciate pointers, though I can only offer acknowledgments in return. If the history end neither with "thus did our fathers raise fertile gardens of rigor in the sterile deserts of anecdata" nor "thus did a dark age of cruel scientism overwhelm all, save a few lonely bastions of humanity", so much the better.
(I specifically mean the 20th century and not the 19th, and statistical inference and not "statistics" in the sense of aggregated numerical data. Erich Lehmann's "Some Standard Statistical Models" is in the right direction, but too focused inwards on statistics.)
Posted by crshalizi at March 17, 2010 13:30 | permanent link
For a project I just finished, I produced this figure:
The project gave me an excuse to finally read Efron's original paper on the bootstrap, where my eye was caught by "Remark A" on p. 19 (my linkage):
Method 2, the straightforward calculation of the bootstrap distribution by repeated Monte Carlo sampling, is remarkably easy to implement on the computer. Given the original algorithm for computing R, only minor modifications are necessary to produce bootstrap replications R*1, R*2, ..., R*N. The amount of computer time required is just about N times that for the original computations. For the discriminant analysis problem reported in Table 2, each trial of N = 100 replications, [sample size] m = n = 20, took about 0.15 seconds and cost about 40 cents on Stanford's 370/168 computer. For a single real data set with m = n = 20, we might have taken N=1000, at a cost of $4.00.
My bootstrapping used N = 800, n = 2527. Ignoring the differences between fitting Efron's linear classifier and my smoothing spline, creating my figure would have cost $404.32 in 1977, or $1436.90 in today's dollars (using the consumer price index). But I just paid about $2400 for my laptop, which will have a useful life of (conservatively) three years, a ten-minute pro rata share of which comes to 1.5 cents.
The inexorable economic logic of the price mechanism forces me to conclude that bootstrapping is about 100,000 times less valuable for me now than it was for Efron in 1977.
Update: Thanks to D.R. for catching a typo.
[1]: Yes, yes, unless the real regression function is a smooth piecewise cubic there's some approximation bias from using splines, so this is really a confidence band for the optimal spline approximation to the true regression curve. I hope you are as scrupulous when people talk about confidence bands for "the" slope of their linear regression models. (Added 7 March to placate quibblers.)
Posted by crshalizi at March 04, 2010 13:35 | permanent link
The way I usually prepare for a lecture or a seminar is to spend a couple of hours pouring over my notes and references, writing and re-writing a few pages of arcane formulas, until I have the whole thing crammed into my head. When I actually speak I don't look at the notes at all. Fifteen minutes after I'm done speaking, I retain only the haziest outline of anything.
Which is to say, having finally realized that I've unconsciously modeled the way I teach and give talks on the magicians in Jack Vance, I really need to come up with better titles.
Posted by crshalizi at March 03, 2010 10:15 | permanent link
What I've been doing instead of blogging. (I am particularly fond of the re-written factor analysis notes; and watch for the forthcoming notes on Markov models and point processes.) Fortunately for the kids, one of us knows what he's doing.
Posted by crshalizi at March 02, 2010 10:00 | permanent link
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Networks; Writing for Antiquity; Islam; The Great Transformation
Posted by crshalizi at February 28, 2010 23:59 | permanent link
My review of Susan Hough's Predicting the Unpredictable: The Tumultuous Science of Earthquake Prediction is out, here and at American Scientist.
If you are in Paris on Monday, you can hear Andrew Gelman talk about our joint paper on the real philosophical foundations of Bayesian data analysis.
Enigmas of Chance; Self-Centered; Philosophy; Incestuous Amplification
Posted by crshalizi at February 12, 2010 13:20 | permanent link
I am giving two talks in Bristol next week about (not so coincidentally) my
two latest papers.
I'll also be lecturing
about prediction, self-organization
and filtering to the BCCS
students.
I presume that I will not spend the whole week talking about
statistics, or working on the next round of papers and lectures; is there, I
don't know, someplace in Bristol to hear music or something?
Update, 8 February: canceled at the last minute, unfortunately; with some hope of rescheduling.
Self-centered; Enigmas of Chance; Complexity; Minds, Brains, and Neurons
Posted by crshalizi at February 04, 2010 13:48 | permanent link
Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Scientifiction and Fantastica; Writing for Antiquity; Afghanistan and Central Asia; The Natural Science of the Human Species; Networks; The Beloved Republic; The Commonwealth of Letters; Learned Folly
Posted by crshalizi at January 31, 2010 23:59 | permanent link
Attention conservation notice: 800+ words of inconclusive art/technological/economic-historical musings.
This thread over at Unfogged reminds me of something that's puzzled me for years, ever since reading this: why didn't prints displace paintings the same way that printed books displaced manuscript codices? Why didn't it become expected that visual artists, like writers, would primarily produce works for reproduction? (No doubt, in that branch of the wave-function*, obsessive fans still want to get the original drawings, but obsessive fans also collect writer's manuscripts, or even their typewriters, as well as their mass-produced books.) 16th century engraving technology was strong enough that it could implement powerful works of art (vide), so that can't be it. And by the 18th century at least writers could make a living (however precarious) from writing for the mass public, so why didn't visual artists (for the most part) do likewise? (Again, it's manifestly not as though technology has regressed.) Why is it still the case that a real, high-class visual artist is someone who makes one-offs? I know that reproductions have been important since at least the late 1800s, but for works and artists who first made their reputation with unique, hand-made objects, which is as though the only books which got sent to the printing press were ones which had already circulated to acclaim in manuscript.
Some possibilities I don't buy:
Updates, 31 January 2010: In correspondence, Elihu Gerson points to an interesting-looking book relevant to the social-use explanation.
Also, it seems I should clarify that I am not asking why (as Vukutu puts it) "people desire original works of visual art rather than printed reproductions". If you are going to paint in oils on canvas, then of course making a flat print of the result going to lose some detail of the physical object, and those details might contribute in important ways to people's experience of the object; there might be a real esthetic loss to looking at a reproduction of a painting. What I am asking is why then we do not produce artworks which are designed for reproduction. Or rather, we do produce lots of such art, but it's not seen as very valuable, and generally not even real art in the honorific sense. "Printed reproductions of physical paintings lose valuable details" does not answer "Why did our visual arts continue to focus on making one-off works?", unless you perhaps you add some extra premises, like (i) no print-reproducible image could be as esthetically valuable as a three-dimensional painting, and (ii) that difference in intrinsic quality was extremely important to the people who consumed art, and I am very dubious about both of these.
Finally, I don't think it's sufficient to point to "tradition", since traditions change all the time. That deserves another argument, but another time. In lieu of which, I'll just offer a quotation from a favorite book, Joseph (Abu Thomas) Levenson's Confucian China and Its Modern Fate; he is writing about ideas, but as he makes clear, what he says applies just as much to aesthetic or practical choices as to intellectual ones.
With the passing of time, ideas change. This statement is ambiguous, and less banal than it seems. It refers to thinkers in a given society, and it refers to thought. With the former shade of meaning, it seems almost a truism: men may change their minds or, at the very least, make a change from the mind of their fathers. Ideas at last lose currency, and new ideas achieve it. If we see an iconoclastic Chinese rejection, in the nineteenth and twentieth centuries, of traditional Chinese beliefs, we say that we see ideas changing.But an idea changes not only when some thinkers believe it to be outworn but when other thinks continue to hold it. An idea changes in its persistence as well as in its rejection, changes "in itself" and not merely in its appeal to the mind. While iconoclasts relegate traditional ideas to the past, traditionalists, at the same time, transform traditional ideas in the present.
This apparently paradoxical transformation-with-preservation of a traditional idea arises form a change in its world, a change in the thinker's alternatives. For (in a Taoist manner of speaking) a thought includes what its thinker eliminates; an idea has its particular quality from the fact that other ideas, expressed in other quarters, are demonstrably alternatives. An idea is always grasped in relative association, never in absolute isolation, and no idea, in history, keeps a changeless self-identity. An audience which appreciates that Mozart is not Wagner will never hear the eighteenth-century Don Giovanni. The mind of a nostalgic European medievalist, though it may follow its model in the most intimate, accurate detail, is scarcely the mirror of a medieval mind; there is sophisticated protest where simple affirmation is meant to be. And a harried Chinese Confucianist among modern Chinese iconoclasts, however scrupulously he respects the past and conforms to the letter of tradition, has left his complacent Confucian ancestors hopelessly far behind him...
An idea, then, is a denial of alternatives and an answer to a question. What a man really means cannot be gathered solely from what he asserts; what he asks and what other men assert invest his ideas with meaning. In no idea does meaning simply inhere, governed only by it degree of correspondence with some unchanging objective reality, without regard to the problems of its thinker. [pp. xxvii--xxviii; for context, this passage was first published in 1958]
*: With apologies to the blogger formerly known as "the blogger formerly known as 'The Statistical Mechanic' ".
Manual trackback: Mostly Hoofless; 3 Quarks Daily; Cliopatria (!); Vukutu.
Posted by crshalizi at January 19, 2010 22:01 | permanent link