Attention conservation notice: I have no taste. Also, this month when I wasn't reading textbooks on regression, I was doped to the gills on a mixture of binged TV shows, serial audio fiction and flu medicine.
Books to Read While the Algae Grow in Your Fur; Enigmas of Chance; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; The Dismal Science; The Progressive Forces; Afghanistan and Central Asia; Automata and Calculating Machines; Mathematics
Posted at December 31, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; The Beloved Republic; Writing for Antiquity; The Commonwealth of Letters
Posted at November 30, 2015 23:59 | permanent link
Attention conservation notice: Only relevant if you are a student at Carnegie Mellon University, or have a pathological fondness for reading lecture notes on statistics.
In the so-called spring, I will again be teaching 36-402 / 36-608, undergraduate advanced data analysis:
The goal of this class is to train you in using statistical models to analyze data — as data summaries, as predictive instruments, and as tools for scientific inference. We will build on the theory and applications of the linear model, introduced in 36-401, extending it to more general functional forms, and more general kinds of data, emphasizing the computation-intensive methods introduced since the 1980s. After taking the class, when you're faced with a new data-analysis problem, you should be able to (1) select appropriate methods, (2) use statistical software to implement them, (3) critically evaluate the resulting statistical models, and (4) communicate the results of your analyses to collaborators and to non-statisticians.During the class, you will do data analyses with existing software, and write your own simple programs to implement and extend key techniques. You will also have to write reports about your analyses.
Graduate students from other departments wishing to take this course should register for it under the number "36-608". Enrollment for 36-608 is very limited, and by permission of the professors only.
Prerequisites: 36-401, with a grade of C or better. Exceptions are only granted for graduate students in other departments taking 36-608.
This will be my fifth time teaching 402, and the fifth time where the primary text is the draft of Advanced Data Analysis from an Elementary Point of View. (I hope my editor will believe that I don't intend for my revisions to illustrate Zeno's paradox.) It is the first time I will be co-teaching with the lovely and talented Max G'Sell.
Unbecoming whining: When I came to CMU, a decade ago, 402 was a projects class for about 10 students. It was larger than that when I inherited it.
Year | Students receiving final grades |
2011 | 69 |
2012 | 88 |
2013 | 90 |
2015 | 115 |
Posted at November 17, 2015 22:54 | permanent link
Attention conservation notice: Only of interest if you (1) care about statistical inference with network data, and (2) will be in Pittsburgh next week.
A (perhaps) too-skeptical view of statistics is that we should always think we have $ n=1 $, because our data set is a single, effectively irreproducible, object. With a lot of care and trouble, we can obtain things very close to independent samples in surveys and experiments. When we get to time series or spatial data, independence becomes a myth we must abandon, but we still hope that we can break up the data set into many nearly-independent chunks. To make those ideas plausible, though, we need to have observations which are widely separated from each other. And those asymptotic-independence stories themselves seem like myths when we come to networks, where, famously, everyone is close to everyone else. The skeptic would, at this point, refrain from drawing any inference whatsoever from network data. Fortunately for the discipline, Betsy Ogburn is not such a skeptic.
As always, the talk is free and open to the public.
Posted at November 09, 2015 22:14 | permanent link
Attention conservation notice: Only of interest if you (1) are interested in seeing machine learning methods turned (back) into ordinary inferential statistics, and (2) will be in Pittsburgh on Wednesday.
Leo Breiman's random forests have long been one of the poster children for what he called "algorithmic models", detached from his "data models" of data-generating processes. I am not sure whether developing classical, data-model statistical-inferential theory for random forests would please him, or has him spinning in his grave, but either way I'm sure it will make for an interesting talk.
As always, the talk is free and open to the public.
Posted at November 09, 2015 16:23 | permanent link
Attention conservation notice: 11 pages of textbook out-take on statistical methods, either painfully obvious or completely unintelligible.
I wrote up some notes on kriging for use in the regression class, but eventually decided teaching that and covariance estimation would be too much. Eventually I'll figure out how to incorporate it into the book, but in the meanwhile I offer it for the edification of the Internet.
Posted at November 03, 2015 19:00 | permanent link
Blogging will remain sparse while I teach, finish the book, write grant proposals, try not to screw up being involved in a faculty search, do all the REDACTED BECAUSE PRIVATE things, and dream about research. In the meanwhile:
A Twitter account, opened at Tim Danford's instigation. This is a semi-automated new account which is just for announcing new posts here; it (and I use the pronoun deliberately) follows no one, I read nothing, and messages or attempts to engage might as well be piped to /dev/null.
My online notebooks are in the same process of incremental update they've been for the last 21 years.
My on-going bookmarking, with short commentary. (Pinboard doesn't need my unsolicited endorsement, but has it.)
Tumblr, for pictures.
Posted at November 03, 2015 17:00 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Writing for Antiquity; Islam; Enigmas of Chance; Cthulhiana
Posted at October 31, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Writing for Antiquity; The Beloved Republic; The Dismal Science; The Great Transformation; Heard about Pittsburgh PA; Afghanistan and Central Asia; Cthulhiana; Pleasures of Detection, Portraits of Crime; Corrupting the Young
Posted at September 30, 2015 23:59 | permanent link
Attention conservation notice: Publicity for an upcoming academic talk, of interest only if (1) you care about quantifying uncertainty in statistics, and (2) will be in Pittsburgh on Monday.I am late in publicizing this, but hope it will help drum up attendance anyway:
- Mladen Kolar, "Robust Confidence Intervals via Kendall's Tau for Transelliptical Graphical Models"
- Abstract: Undirected graphical models are used extensively in the biological and social sciences to encode a pattern of conditional independences between variables, where the absence of an edge between two nodes $a$ and $b$ indicates that the corresponding two variables $X_a$ and $X_b$ are believed to be conditionally independent, after controlling for all other measured variables. In the Gaussian case, conditional independence corresponds to a zero entry in the precision matrix $\Omega$ (the inverse of the covariance matrix $\Sigma$). Real data often exhibits heavy tail dependence between variables, which cannot be captured by the commonly-used Gaussian or nonparanormal (Gaussian copula) graphical models. In this paper, we study the transelliptical model, an elliptical copula model that generalizes Gaussian and nonparanormal models to a broader family of distributions. We propose the ROCKET method, which constructs an estimator of $\Omega_{ab}$ that we prove to be asymptotically normal under mild assumptions. Empirically, ROCKET outperforms the nonparanormal and Gaussian models in terms of achieving accurate inference on simulated data. We also compare the three methods on real data (daily stock returns), and find that the ROCKET estimator is the only method whose behavior across subsamples agrees with the distribution predicted by the theory. (Joint work with Rina Foygel Barber.)
- Time and place: 4--5 pm on Monday, 28 September 2015, in Doherty Hall 1112.
As always, the talk is free and open to the public.
Posted at September 26, 2015 23:58 | permanent link
Attention conservation notice: A ponderous, scholastic joke, which could only hope to be amusing to those who combine a geeky enthusiasm for over-written horror stories from the early 20th century with nerdy enthusiasm for truly ancient books.
I wish to draw attention to certain parallels between De Rerum Natura, an ancient epic and didactic poem expounding a philosophy which is blasphemous according to nearly* every religion, and the Necronomicon, a fictitious book of magic supposedly expounding a doctrine which is blasphemous according to nearly** every religion.
The Necronomicon was, of course, invented by H. P. Lovecraft for his stories in the 1920s and 1930s. In his mythos, it was written by the mad poet "Abdul Alhazred", who died in +738 by being torn apart by invisible monsters. The book then led a twisty life through a thin succession of manuscript copies and translations, rare and almost lost. The book was, supposedly, full of the horrible, nearly indescribable, secrets of the universe: explaining how the world is an uncaring yet quite material place, in which the Earth's past and future are full of monsters, but natural monsters, how the reign of humanity is a transient episode, and the gods are in reality powerful extra-terrestrial beings, without any particular care for humanity. Reading the Necronomicon drives one mad, or at the very least the frightful knowledge it imparts permanently warps the mind. There are, supposedly, about half-a-dozen copies in existence, kept under lock and key (except when the story requires otherwise).
De Rerum Natura ("On the Nature of Things") is an entirely real book, written by the poet Titus Lucretius Carus around -55; according to legend, the poet went mad and died as a result of taking a love potion. The book thereafter led a twisty life through a thin trail of manuscript copies, and was almost lost over the course of the middle ages. The book is quite definitely full of what Lucretius thought of as the secrets of the universe (whose resistance to description is a running theme): how the entire universe is material and everything arises from the fortuitous concourse of atoms, how every phenomenon not matter how puzzling has a rational and material explanation, how there is no after-life to fear. It describes how the Earth's past was full of thoroughly-natural monsters, the reign of humanity and even the existence of the Earth is a transient episode, and how the gods are in reality powerful extra-terrestrial beings without any particular care for humanity, living (a Lovecraftian touch) in the spaces between worlds. In the centuries since its recovery, it has been retrospectively elevated into one of the great books of the Western civilization (whatever that is).
If we are to believe the latest historian of its reception, reading De Rerum Natura started out as an innocent pursuit of more elegant Latin, but ended up permanently warping the greatest minds of Renaissance Europe. The inescapable conclusion is that the Enlightenment is the result of the real-life Necronomicon, a book full of things humanity was not meant to know, using the printing revolution of early modern Europe to take over the intellectual world, until (in the words of the lesser poet) "all the earth ... flame[d] with a holocaust of ecstasy and freedom". Of course the same thing looks different from the point of view of us cultists:
And thus you will gain knowledge, guided by a little labor,
For one thing will illuminate the next, and blinding night
Won't steal your way; all secrets will be opened to your sight,
One truth illuminate another, as light kindles light.
*: I insert the qualifier for the sake of my Unitarian Universalist friends. ^
**: I insert the qualifier for the sake of my Unitarian Universalist friends. ^
Spoiling the conceit: I have no reason to believe that Lovecraft was thinking of Lucretius at any point in writing any of his stories featuring the Necronomicon, or even that the history of De Rerum Natura influenced the "forbidden tome" motif which Lovecraft drew on (and amplified). I also do not think that the Enlightenment is really about "shouting and killing and revelling in joy". (Though it would be its own kind of betrayal of the Enlightenment for one of its admirers, like me, not to face up to the ways some of its ideas have been used to justify very great evils, particularly when Europeans imposed themselves on less powerful peoples elsewhere.) Rather, this is all the result of the collision in my head of Ada Palmer's interview by Henry Farrell with Palmer's earlier appreciation of Ruthanna Emrys's "Litany of Earth", plus Ken MacLeod's cometary Lucretian deities, and early imprinting on Bruce Sterling.
Finally, I would pay good money to read the alternate history where it was the Necronomicon which humanists discovered mouldering in a monastic library and revived, where its ideas are as thoroughly normalized, pervasive and surpassed as Lucretius's are, and copies of Kitab al-Azif can be found in any bookstore as a Penguin Classic, translated by a distinguished contemporary poet. Failing that, I would like to read Lucretius's explanation of why we need have no fear of shoggoths.
Manual trackback: Metafilter
Posted at September 26, 2015 23:30 | permanent link
Attention conservation notice: Publicity for an upcoming academic talk, of interest only if (1) you will be in Pittsburgh and (2) you care about whether scientific research can be reproduced.
The timeliness of the opening talk of this year's statistics seminar is, in fact, an un-reproducible, if welcome, coincidence:
As always, the talk is free and open to the public.
Update, 14 September: Prof. Stodden's talk has had to be rescheduled; I will post an update with the new date once I know it.
Enigmas of Chance; The Collective Use and Evolution of Concepts
Posted at September 04, 2015 13:19 | permanent link
Attention conservation notice: I have no taste.
This monograph addresses the problem of "real-time" curve fitting in the presence of noise, from the computational and statistical viewpoints. Specifically, we examine the problem of nonlinear regression where observations $ \{Y_n: n= 1, 2, \ldots \} $ are made on a time series whose mean-value function $ \{ F_n(\theta) \} $ is known except for a finite number of parameters $ (\theta_1, \theta_2, \ldots \theta_p) = \theta^\prime $. We want to estimate this parameter. In contrast to the traditional formulation, we imagine the data arriving in temporal succession. We require that the estimation be carried out in real time so that, at each instant, the parameter estimate fully reflects all of the currently available data.(It's not all time series though: section 7.8 sketches applying the idea to experiments and estimating response surfaces.) Accordingly, most of the book is about coming up with ways of designing the $ a_n $ to ensure consistency, i.e., $ t_n \rightarrow \theta $ (in some sense), especially $ a_n $ sequences which are themselves very fast to compute.
The conventional methods of least-squares and maximum-likelihood estimation ... are inapplicable [because] ... the systems of normal equations that must be solved ... are generally so complex that it is impractical to try to solve them again and again as each new datum arrives.... Consequently, we are led to consider estimators of the "differential correction" type... defined recursively. The $ (n+1) $st estimate (based on the first $ n $ observations) is defined in terms of the $ n $th by an equation of the form \[ t_{n+1} = t_n + a_n[Y_n - F_n(t_n)] \] where $ a_n $ is a suitably chosen sequence of "smoothing" vectors.
Update, next day: added a link to Simon's comment on "continuity of approximation", and deleted an excessive "very". 4 September: replaced Simon link with one which should work outside CMU, fixed an embarrassing typo.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; Writing for Antiquity; Tales of Our Ancestors; Philosophy; The Dismal Science; Physics; Networks; Pleasures of Detection, Portraits of Crime; The Beloved Republic; Afghanistan and Central Asia
Posted at August 31, 2015 23:59 | permanent link
For the first time, I will be teaching a section of the course which is the pre-requisite for my spring advanced data analysis class. This is an introduction to linear regression modeling for our third-year undergrads, and others from related majors; my section is currently eighty students. Course materials, if you have some perverse desire to read them, will be posted on the class homepage twice a week.
This course is the first one in our undergraduate sequence where the students have to bring together probability, statistical theory, and analysis of actual data. I have mixed feelings about doing this through linear models. On the one hand, my experience of applied problems is that there are really very few situations where the "usual" linear model assumptions can be maintained in good conscience. On the other hand, I suspect it is usually easier to teach people the more general ideas if they've thoroughly learned a concrete special case first; and, perhaps more importantly, whatever the merits of (e.g.) Box-Cox transformations might actually be, it's the sort of thing people will expect statistics majors to know...
Addendum, later that night: I should have made it clear in the
first place that my syllabus is, up through the second exam, ripped
off borrowed with gratitude
from Rebecca Nugent, who
has taught
401 outstandingly for many years.
Update, since people have asked for it, links here (see the course page for the source files for lectures):
As post-mortems, some thoughts on the textbook and alternatives, and general [[lessons learned]].
Posted at August 31, 2015 13:52 | permanent link
Attention conservation notice: Facile moral philosophy, loosely tied to experimental sociology.
Via I forget who, Darius Kazemi explaining "How I Won the Lottery". The whole thing absolutely must be watched from beginning to end.
Kazemi is, of course, absolutely correct in every particular. What he says in his talk about art goes also for science and scholarship. Effort, ability, networking — these can, maybe, get you more tickets. But success is, ultimately, chance.
I say this not just because it resonates with my personal experience, but because of actual experimental evidence. In a series of very ingenious experiments, Matthew Salganik, Peter Dodds and Duncan Watts have constructed "artificial cultural markets" — music download sites where they could manipulate how (if at all) previous consumers' choices fed into the choices of those who came later. In one setting, for example, people saw songs listed in order of decreasing popularity, but when you came to the website you were randomly assigned to one of a number of sub-populations, and you only saw popularity within your sub-population. Simplifying somewhat (read the papers!), what Salganik et al. showed is that while there is some correlation in popularity across the different experimental sub-populations, it is quite weak. Moreover, as in the real world, the distribution of popularity is ridiculously heavy tailed (and skewed to the right): the same song can end up dominating the charts or just scraping by, depending entirely on accidents of chance (or experimental design).
In other words: lottery tickets.
If one has been successful, it is very tempting to think that one deserves it, that this is somehow reward for merit, that one is somehow better than those who did not succeed and were not rewarded. The moral to take from Kazemi, and from Salganik et al., is that while those who have won the lottery are more likely to have done something to get multiple tickets than those who haven't, they are intrinsically no better than many losers. How, then, those who find themselves holding winning tickets should act is another matter, but at the least they oughtn't to delude themselves about the source of their good fortune.
Posted at August 04, 2015 23:11 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Writing for Antiquity; The Great Transformation; Scientifiction and Fantastica; The Beloved Republic; Philosophy; The Dismal Science; Physics; Enigmas of Chance; Commit a Social Science
Posted at July 31, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Tales of Our Ancestors; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; The Dismal Science; Enigmas of Chance
Posted at June 30, 2015 23:59 | permanent link
One summer, when I was a boy, Uncle Zalo tried to teach me to shoot up at his ranch in the New Mexico high country. I was dismal, and I'm pretty sure the phrase "broad side of a barn" crossed his mind. It never crossed his lips, and he was never less than patient and encouraging but honest.
I had picked out new epic fantasy novels to bring him the next time I came to Santa Fe, and I wanted to talk with him about The Eternal Sky. I miss him.
Anything else I might have to say was already better said in his obituary.
Posted at June 12, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur;
Minds, Brains, and Neurons;
Enigmas of Chance;
Scientifiction and Fantastica;
Tales of Our Ancestors;
The Dismal Science;
The Continuing Crises;
Constant Conjunction Necessary Connexion;
Writing for Antiquity
Posted at May 31, 2015 23:59 | permanent link
Attention conservation notice: 2000+ words of academic navel-gazing about teaching a weird class in an obscure subject at an unrepresentative school; also, no doubt, more complacent than it ought to be.
Once again, it's the brief period between submitting all the grades for 402 and the university releasing the student evaluations (for whatever they're worth), so time to think about what I did, what worked, what didn't, and what to do better.
My self-evaluation was that the class went decently, but very far from perfectly, and needs improvement in important areas. I think the subject matter is good, the arrangement is at least OK, and the textbook a good value for the price. Most importantly, the vast majority of the students appear to have learned a lot about stuff they would not have picked up without the class. Since my goal is not for the students to have fun0 but to challenge them to learn as much as possible, and assist them in doing so, I think the main objective was achieved, though not in ways which will make me beloved or even popular.
All that is much as it was in previous iterations of the class; the big changes from the last time I taught this were the assignments, using R Markdown, and the size of the class.
Writing (almost) all new assignments — ten homeworks and three exams — was good; it reduced cheating1 to negligible proportions2 and kept me interested in the material. It was also a lot more work, but I think it was worth it. Basing them on real papers, mostly but not exclusively from economics, seems to have gone over well, especially considering how many students were in the joint major in economics and statistics. (It also led to a gratifying number of students reporting crises of faith about what they were being taught in their classes in other departments.) Relatedly, having the technical content of each homework only add up to 90 points, with the remaining 10 being allocated for following a writing rubric3 seems to have led to better writing, easier grading, and I think more perception of fairness in the grading.
Encouraging the use of R Markdown so that the students' data analyses were executable and replicable was a very good call. (I have to thank Jerzy Wieczorek for over-coming my skepticism by showing me R Markdown.) In fact, I think it worked well enough that in the future I will make it mandatory, with a teaching session at the beginning of the semester (and exceptions, with permission in advance, for those who want to use knitr and LaTeX). However, I may have to reconsider my use of the np package for kernel regression, since it is very aggressive about printing out progress messages which are not useful in a report.
The big challenge of the class was sheer size. The first time I taught this class, in 2011, it had 63 students; we hit 120 this year. (And the department expects about 50% more next year.) This, of course, made it impossible to get to know most of the students — at best I got a sense of the ones who ere were regular at my office hours or spoke up in lecture, and those who sent me e-mail frequently. (Linking the faces of the former to the names of the latter remains one of my weak points.) It also means I would have gone crazy if it weren't for the very good TAs (Dena Asta, Collin Eubanks, Sangwon "Justin" Hyun and Natalie Klein), and the assistance of Xizhen Cai, acting as my (as it were) understudy — but coordinating six people for teaching is also not one of my strengths. Over the four months of the semester I sent over a thousand e-mails about the class, roughly three quarters to students and a quarter among the six of us; I feel strongly that there have to be more efficient ways of doing this part of my job.
The "quality control" samples — select six students at random every week, have them in for fifteen minutes or so to talk about what they did on the last assignment and anything that leads to, with a promise that their answers will not hurt their grades — continue to be really informative. In particular, I made a point of asking every student how long they spent on that assignment and on previous ones, and most (though not all) were within the university's norms for a nine-credit class. Some students resisted participation, perhaps because they didn't trust the wouldn't-hurt-their-grades bit; if so, I failed at "drive out fear". Also, it needs a better name, since the students keep thinking it's their quality that's being controlled, rather than that of the teaching and grading.
Things that did not work so well:
Things I am considering trying next time:
— Naturally, while proofing this before posting, the university e-mailed me the course evaluations. They were unsurprisingly bimodal.
[0] I
have no objection to
fun, or to fun classes, or even to students having fun in my classes; it's
just not what I'm aiming at here. ^
[1] I am sorry to have to say
that there are some students who have tried to cheat, by re-using old
solutions. This is why I no longer put solutions on the public web, and part
of why I made sure to write new assignments this time, or, if I did re-cycle,
make substantial changes. ^
[2]
At least, cheating that we caught. (I will not describe how we caught
anyone.) ^
[3] This evolved a little over
the semester; here's the final version.
[4] The North American Mammals
Paleofauna Database for
homework
5 has about two thousand entries, so my thought would be to assign each
student a random extinct species as their pseudonym. These should be
socially neutral, and more memorable than numbers, but no doubt I'll discover
that some students have profound feelings about the
amphicyonidae. ^
The text is laid out cleanly, with clear divisions between problems
and sub-problems. The writing itself is well-organized, free of grammatical
and other mechanical errors, and easy to follow. Figures and tables are easy
to read, with informative captions, axis labels and legends, and are placed
near the text of the corresponding problems. All quantitative and mathematical
claims are supported by appropriate derivations, included in the text, or
calculations in code. Numerical results are reported to appropriate precision.
Code is either properly integrated with a tool like R Markdown or knitr, or
included as a separate R file. In the former case, both the knitted and the
source file are included. In the latter case, the code is clearly divided into
sections referring to particular problems. In either case, the code is
indented, commented, and uses meaningful names. All code is relevant to the
text; there are no dangling or useless commands. All parts of all problems are
answered with actual coherent sentences, and never with raw computer code or
its output. For full credit, all code runs, and the Markdown file knits (if
applicable). ^
Posted at May 22, 2015 19:34 | permanent link
\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Probwrt}[2]{\mathbb{P}_{#1}\left( #2 \right)} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \]
Attention conservation notice: 4900+ words, plus two (ugly) pictures and many equations, on a common mis-understanding in statistics. Veers wildly between baby stats. and advanced probability theory, without explaining either. Its efficacy at remedying the confusion it attacks has not been evaluated by a randomized controlled trial.
After ten years of teaching statistics, I feel pretty confident in saying that one of the hardest points to get through to undergrads is what "statistically significant" actually means. (The word doesn't help; "statistically detectable" or "statistically discernible" might've been better.) They have a persistent tendency to think that parameters which are significantly different from 0 matter, that ones which are insignificantly different from 0 don't matter, and that the smaller the p-value, the more important the parameter. Similarly, if one parameter is "significantly" larger than another, then they'll say the difference between them matters, but if not, not. If this was just about undergrads, I'd grumble over a beer with my colleagues and otherwise suck it up, but reading and refereeing for non-statistics journals shows me that many scientists in many fields are subject to exactly the same confusions as The Kids, and talking with friends in industry makes it plain that the same thing happens outside academia, even to "data scientists". (For example: an A/B test is just testing the difference in average response between condition A and condition B; this is a difference in parameters, usually a difference in means, and so it's subject to all the issues of hypothesis testing.) To be fair, one meets some statisticians who succumb to these confusions.
One reason for this, I think, is that we fail to teach well how, with enough data, any non-zero parameter or difference becomes statistically significant at arbitrarily small levels. The proverbial expression of this, due I believe to Andy Gelman, is that "the p-value is a measure of sample size". More exactly, a p-value generally runs together the size of the parameter, how well we can estimate the parameter, and the sample size. The p-value reflects how much information the data has about the parameter, and we can think of "information" as the product of sample size and precision (in the sense of inverse variance) of estimation, say $n/\sigma^2$. In some cases, this heuristic is actually exactly right, and what I just called "information" really is the Fisher information.
Rather than working on grant proposals Egged on by
a friend As a public service, I've written up some notes on this.
Throughout, I'm assuming that we're testing the hypothesis that a parameter, or
vector of parameters, $\theta$ is exactly zero, since that's overwhelming what
people calculate p-values for — sometimes, I think, by
a spinal
reflex not involving the frontal lobes. Testing $\theta=\theta_0$ for any
other fixed $\theta_0$ would work much the same way. Also, $\langle x, y
\rangle$ will mean the inner
product between the two vectors.
Let's start with a very simple example. Suppose we're testing whether some
mean parameter $\mu$ is equal to zero or not. Being straightforward folk, who
follow the lessons we were taught in our one room log-cabin
schoolhouse research methods class, we'll use the sample mean
$\hat{\mu}$ as our estimator, and take as our test statistic
$\frac{\hat{\mu}}{\hat{\sigma}/\sqrt{n}}$; that denominator is the standard
error of the mean. If we're really into old-fashioned recipes, we'll calculate
our p-value by comparing this to a table of the $t$ distribution with $n-2$
degrees of freedom, remembering that it's $n-2$ because we're using one degree
of freedom to get the mean estimate ($\hat{\mu}$) and another to get the
standard deviation estimate ($\hat{\sigma}$). (If we're a bit more open
to new-fangled notions, we
bootstrap.) Now what happens as $n$ grows?
Well, we remember the central limit theorem: $\sqrt{n}(\hat{\mu} - \mu) \rightarrow \mathcal{N}(0,\sigma^2)$. With a little manipulation, and some abuse of notation, this becomes \[ \hat{\mu} \rightarrow \mu + \frac{\sigma}{\sqrt{n}}\mathcal{N}(0,1) \] The important point is that $\hat{\mu} = \mu + O(n^{-1/2})$. Similarly, albeit with more algebra, $\hat{\sigma} = \sigma + O(n^{-1/2})$. Now plug these in to our formula for the test statistic: \[ \begin{eqnarray*} \frac{\hat{\mu}}{\hat{\sigma}/\sqrt{n}} & = & \sqrt{n}\frac{\hat{\mu}}{\hat{\sigma}}\\ & = & \sqrt{n}\frac{\mu + O(n^{-1/2})}{\sigma + O(n^{-1/2})}\\ & = & \sqrt{n}\left(\frac{\mu}{\sigma} + O(n^{-1/2})\right)\\ & = & \sqrt{n}\frac{\mu}{\sigma} + O(1) \end{eqnarray*} \] So, as $n$ grows, the test statistic will go to either $+\infty$ or $-\infty$, at a rate of $\sqrt{n}$, unless $\mu=0$ exactly. If $\mu \neq 0$, then the test statistic eventually becomes arbitrarily large, while the distribution we use to calculate p-values stabilizes at a standard Gaussian distribution (since that's a $t$ distribution with infinitely many degrees of freedom). Hence the p-value will go to zero as $n\rightarrow \infty$, for any $\mu\neq 0$. The rate at which it does so depends on the true $\mu$, the true $\sigma$, and the number of samples. The p-value reflects how big the mean is ($\mu$), how precisely we can estimate it ($\sigma$), and our sample size ($n$).
T-statistics calculated for five independent runs of Gaussian random variables with the specified parameters, plotted against sample size. Successive t-statistics along the same run are linked; the dashed lines are the asymptotic formulas, $\sqrt{n}\mu/\sigma$. Note that both axes are on a logarithmic scale. (Click on the image for a larger PDF version; source code.) |
Matters are much the same if instead of estimating a mean we're estimating a difference in means, or regression coefficients, or linear combinations of regression coefficients ("contrasts"). The p-value we get runs together the size of the parameter, the precision with which we can estimate the parameter, and the sample size. Unless the parameter is exactly zero, as $n\rightarrow\infty$, the p-value will converge stochastically to zero.
Even if two parameters are estimated from the same number of samples, the one with a smaller p-value is not necessarily larger; it may just have been estimated more precisely. Let's suppose we're in the land of good, old-fashioned linear regression, where $Y = \langle X, \beta \rangle + \epsilon$, where all the random variables have mean 0 (to simplify book-keeping), where $\epsilon$ is uncorrelated with $X$. Estimating $\beta$ with ordinary least squares, we get of course \[ \hat{\beta} = (\mathbf{x}^T \mathbf{x})^{-1} \mathbf{x}^T \mathbf{y} ~, \] with $\mathbf{x}$ being the $n\times 2$ matrix of $X$ values and $\mathbf{y}$ the $n\times 1$ matrix of $Y$ values. Since $\mathbf{y} = \mathbf{x} \beta + \mathbf{\epsilon}$, \[ \hat{\beta} = \beta + (\mathbf{x}^T \mathbf{x})^{-1}\mathbf{x}^T \mathbf{\epsilon} ~. \] Assuming the $\epsilon$ terms are uncorrelated with each other and have constant variance $\sigma^2_{\epsilon}$, we get \[ \Var{\hat{\beta}} = \sigma^2_{\epsilon} (\mathbf{x}^T \mathbf{x})^{-1} ~. \] To understand what's really going on here, notice that $\frac{1}{n} \mathbf{x}^T \mathbf{x}$ is the sample variance-covariance matrix of $X$; call it $\hat{\mathbf{v}}$. (I give it a hat because it's an estimate of the population covariance matrix.) So \[ \Var{\hat{\beta}} = \frac{\sigma^2_{\epsilon}}{n}\hat{\mathbf{v}}^{-1} \] The standard errors for the different components of $\hat{\beta}$ are thus going to be the square roots of the diagonal entries of $\Var{\hat{\beta}}$. We will therefore estimate different regression coefficients to different precisions. To make a regression coefficient precise, the predictor variable it belongs to should have a lot of variance, and it should have little correlation with other predictor variables. (If we used an orthogonal design, $\hat{\mathbf{v}}^{-1/2}$ will be a diagonal matrix whose entries are the reciprocals of the regressors' standard deviations.) Even if we think that the size of entries in $\beta$ is telling us something about how important different $X$ variables are, one of them having a bigger variance than the other doesn't make it more important in any interesting sense.
So far, I've talked about particular cases --- about estimating means or
linear regression coefficients, and even using particular estimators. But the
point can be made much more generally, though at some cost in abstraction.
Recall that a hypothesis test
can make two kinds of error: it can declare that there's some signal when it
really looks at noise (a "false alarm" or "type I" error), or it can ignroe the
presence of a signal and mistake it for noise (a "miss" or "type II" error).
The probability of a false alarm, when looking at noise, is called
the
Suppose that a consistent hypothesis test exists. Then at each sample size $n$, there's a range of p-values $[0,a_n]$ where we reject the noise hypothesis and claim there's a signal, and another $(a_n,1]$ where we say there's noise. Since the p-value is uniformly distributed under the noise hypothesis, the size of the test is just $a_n$, so consistency means $a_n$ must go to 0. The power of the test is the probability, in the presence of signal, that the p-value is in the rejection region, i.e., $\Probwrt{\mathrm{signal}}{P \leq a_n}$. Since, by consistency, the power is going to 1, the probability (in the presence of signal) that the p-value is less than any given value eventually goes to 1. Hence the p-value converges stochastically to 0 (again, when there's a signal). Thus, if there is a consistent hypothesis test, and there is any signal to be detected at all, the p-value must shrink towards 0.
I bring this up because, of course, the situations where people usually want to calculate p-values are in fact the ones where there usually are consistent hypothesis tests. These are situations where we have an estimator $\hat{\theta}$ of the parameter $\theta$ which is itself "consistent", i.e., $\hat{\theta} \rightarrow \theta$ in probability as $n \rightarrow \infty$. This means that with enough data, the estimate $\hat{\theta}$ will come arbitrarily close to the truth, with as much probability as we might desire. It's not hard to believe that this will mean there's a consistent hypothesis test --- just reject the null when $\hat{\theta}$ is too far from 0 --- but the next two paragraphs sketch a proof, for the sake of skeptics and quibblers.
Consistency of estimation means that for any level of approximation $\epsilon > 0$ and any level of confidence $\delta > 0$, for all $n \geq$ some $N(\epsilon,\delta,\theta)$, \[ \Probwrt{\theta}{\left|\hat{\theta}_n-\theta\right|>\epsilon} \leq \delta ~. \] This can be inverted: for any $n$ and any $\delta$, for any $\eta \geq \epsilon(n,\delta,\theta)$, \[ \Probwrt{\theta}{\left|\hat{\theta}_n-\theta\right|>\eta} \leq \Probwrt{\theta}{\left|\hat{\theta}_n-\theta\right|>\epsilon(n,\delta,\theta)} \leq \delta ~. \] Moreover, as $n\rightarrow\infty$ with $\delta$ and $\theta$ held constant, $\epsilon(n,\delta,\theta) \rightarrow 0$.
Pick any $\theta^* \neq 0$, and any $\alpha$ and $\beta > 0$ that you like. For each $n$, set $\epsilon = \epsilon(n,\alpha,0)$; abbreviate this sequence as $\epsilon_n$. I will use $\hat{\theta}_n$ as my test statistic, retaining the null hypothesis $\theta=0$ when $\left|\hat{\theta}_n\right| \leq \epsilon_n$, and reject it otherwise. By construction, my false alarm rate is at most $\alpha$. What's my miss rate? Well, again by consistency of the estimator, for any sufficiently small but fixed $\eta > 0$, if $n \geq N(|\theta^*| - \eta, \beta, \theta^*)$, then \[ \Probwrt{\theta^*}{\left|\hat{\theta}_n\right| < \eta} \leq \Probwrt{\theta^*}{\left|\hat{\theta}_n - \theta^*\right|\geq |\theta^*| - \eta} \leq \beta ~. \] (To be very close to 0, $\hat{\theta}$ has to be far from $\theta^*$.) So, if I wait until $n$ is large enough that $n \geq N(|\theta^*| - \eta, \beta, \theta^*)$ and that $\epsilon_n \leq \eta$, my power against $\theta=\theta^*$ is at least $1-\beta$ (and my false-positive rate is still at most $\alpha$). Since you got to pick pick $\alpha$ and $\beta$ arbitrarily, you can make them as close to 0 as we like, and I can still get arbitrarily high power against any alternative while still controlling the false-positive rate. In fact, you can pick a sequence of error rate pairs $(\alpha_k, \beta_k)$, with both rates going to zero, and for $n$ sufficiently large, I will, eventually, have a size less thant $\alpha_k$, and a power against $\theta=\theta^*$ greater than $1-\beta_k$. Hence, a consistent estimator implies the existence of a consistent hypothesis test. (Pedantically, we have built a universally consistent test, i.e., consistent whatever the true value of $\theta$ might be, but not necessarily a uniformly consistent one, where the error rates can be bounded independent of the true $\theta$. The real difficulty there is that there are parameter values in the alternative hypothesis $\theta \neq 0$ which come arbitrarily close to the null hypothesis $\theta=0$, and so an arbitrarily large amount of information may be needed to separate them with the desired reliability.)
So far, I've been arguing that the p-value should always go stochastically to zero as the sample size grows. In many situations, it's possible to be a bit more precise about how quickly it goes to zero. Again, start with the simple case of testing whether a mean is equal to zero. We saw that our test statistic $\hat{\mu}/(\hat{\sigma}/\sqrt{n}) \rightarrow \sqrt{n}\mu/\sigma + O(1)$, and that the distribution we compare this to approaches $\mathcal{N}(0,1)$. Since for a standard Gaussian $Z$ the probability that $Z > t$ is at most $\frac{\exp{\left\{-t^2/2\right\}}}{t\sqrt{2\pi}}$, the p-value in a two-sided test goes to zero exponentially fast in $n$, with the asymptotic exponential rate being $\frac{1}{2}\mu^2/\sigma^2$. Let's abbreviate the p-value after $n$ samples as $P_n$: \[ \begin{eqnarray*} P_n & = & \Prob{|Z| \geq \left|\frac{\hat{\mu}}{\hat{\sigma}/\sqrt{n}}\right|}\\ & = & 2 \Prob{Z \geq \left|\frac{\hat{\mu}}{\hat{\sigma}/\sqrt{n}}\right|}\\ & \leq & 2\frac{\exp{\left\{-n\hat{\mu}^2/2\hat{\sigma}^2\right\}}}{\sqrt{n}\hat{\mu}\sqrt{2\pi}/\hat{\sigma}}\\ \frac{1}{n}\log{P_n} & \leq & \frac{\log{2}}{n} -\frac{\hat{\mu}^2}{2\hat{\sigma}^2} - \frac{\log{n}}{2n} - \frac{1}{n}\log{\frac{\hat{\mu}}{\hat{\sigma}}} - \frac{\log{2\pi}}{n}\\ \lim_{n\rightarrow\infty}{\frac{1}{n}\log{P_n}} & \leq & -\frac{\mu^2}{2\sigma^2} \end{eqnarray*} \] Since $\Prob{Z > t}$ is also at least $\exp{\left\{-t^2/2\right\}}/(t^2+1)\sqrt{2\pi}$, a parallel argument gives a matching lower bound, $\lim_{n\rightarrow\infty}{n^{-1}\log{P_n}} \geq -\frac{1}{2}\mu^2/\sigma^2$.
P-value versus sample size, color coded as in the previous figure. Notice that even the runs where $\mu$, and $\mu/\sigma$, are very small (in green), the p-value is declining exponentially. Again, click for a larger PDF, source code here. |
This is not just a cute trick with Gaussian approximations; it generalizes through the magic of large deviations theory. Glossing over some technicalities, a sequence of random variables $X_1, X_2, \ldots X_n$ obey a large deviations principle when \[ \lim_{n\rightarrow\infty}{\frac{1}{n}\log{\Probwrt{}{X_n \in B}}} = -\inf_{x\in B}{D(x)} \] where $D(x) \geq 0$ is the "rate function". If the set $B$ doesn't include a point where $D(x)=0$, the probability of $B$ goes to zero, exponentially in $n$,* with the exact rate depending on the smallest attainable value of the rate function $D$ over $B$. ("Improbable events tend to happen in the most probable way possible.") Very roughly speaking, then, $\Probwrt{}{X_n \in B} \approx \exp{\left\{ - n \inf_{x\in B}{D(x)}\right\}}$. Suppose that $X_n$ is really some estimator of the parameter $\theta$, and it obeys a large deviations principle for every $\theta$. Then the rate function $D$ is really $D_{\theta}$. For consistent estimators, $D_{\theta}(x)$ would have a unique minimum at $x=\theta$. The usual estimators based on sample means, correlations, sample distributions, maximum likelihood, etc., all obey large deviations principles, at least under most of the conditions where we'd want to apply them.
Suppose we make a test based on this estimator. Under $\theta=\theta^*$, $X_n$ will eventually be within any arbitrarily small open ball $B_{\rho}$ of size $\rho$ around $\theta^*$ we care to name; the probability of its lying outside $B_{\rho}$ will be going to zero exponentially fast, with the rate being $\inf_{x\in B^c_{\rho}}{D_{\theta^*}(x)} > 0$. For small $\rho$ and smooth $D_{\theta^*}$, Taylor-expanding $D_\theta^*$ about its minimum suggests that rate will be $\inf_{\eta: \|\eta\| > \rho}{\frac{1}{2}\langle \eta, J_{\theta^*} \eta\rangle}$, $J_{\theta^*}$ being the matrix of $D$'s second derivatives at $\theta^*$. This, clearly, is $O(\rho^2)$.
The probability under $\theta = 0$ of seeing results $X_n$ lying inside $B_{\rho}$ is very different. If we've made $\rho$ small enough that $B_{\rho}$ doesn't include 0, $\Probwrt{0}{X_n \in B_{\rho}} \rightarrow 0$ exponentially fast, with rate $\inf_{x \in B_{\rho}}{D_0(x)}$. Again, if $\rho$ is small enough and $D_0$ is smooth enough, the value of the rate function should be essentially $D_0(\theta^*) + O(\rho^2)$. If $\theta^*$ in turn is close enough to 0 for a Taylor expansion, we'd get a rate of $\frac{1}{2}\langle \theta^*, J_0 \theta^*\rangle$. To repeat, this is the exponential rate at which the p-value is going to zero when we test $\theta=0$ vs. $\theta\neq 0$, and the alternative value $\theta^*$ is true. It is no accident that this is the same sort of rate we got for the simple Gaussian-mean problem.
Relating the matrix I'm calling $J$ to the Fisher information matrix $F$ needs a longer argument, which I'll present even more sketchily. The empirical distribution obeys a large deviations principle whose rate function is the Kullback-Leibler divergence, a.k.a. the relative entropy; this result is called "Sanov's theorem". For small perturbations of the parameter $\theta$, the divergence between a distribution at $\theta+\eta$ and that at $\theta$ is, after yet another Taylor expansion and a little algebra, $\langle \eta, F_{\theta} \eta \rangle$. A general result in large deviations theory, the "contraction principle", says that if the $X_n$ obey an LDP with rate function $D$, then $Y_n = h(X_n)$ obeys an LDP with rate function $D^{\prime}(y) = \inf_{x : h(x) = y}{D(x)}$. Thus an estimator which is a function of the empirical distribution, which is most of them, will have a decay rate which is at most $\langle \eta, F_{\theta} \eta \rangle$, and possibly less, if it the estimator is crude enough. (The maximum likelihood estimator in an exponential family will, however, preserve large deviation rates, because it's a sufficient statistic.)
Much more limited than the bad old sort of research methods class (or third referee) would have you believe. If you find a small p-value, yay; you've got enough data, with precise enough measurement, to detect the effect you're looking for, or you're really unlucky. If your p-value is large, you're either really unlucky, or you don't have enough information (too few samples or too little precision), or the parameter is really close to zero. Getting a big p-value is not, by itself, very informative; even getting a small p-value has uncomfortable ambiguity. My advice would be to always supplement a p-value with a confidence set, which would help you tell apart "I can measure this parameter very precisely, and if it's not exactly 0 then it's at least very small" from "I have no idea what this parameter might be". Even if you've found a small p-value, I'd recommend looking at the confidence interval, since there's a difference between "this parameter is tiny, but really unlikely to be zero" and "I have no idea what this parameter might be, but can just barely rule out zero", and so on and so forth. Whether there are any scientific inferences you can draw from the p-value which you couldn't just as easily draw from the confidence set, I leave between you and your referees. What you definitely should not do is use the p-value as any kind of proxy for how important a parameter is.
If you want to know how much some variable matters for predictions of another variable, you are much better off just perturbing the first variable, plugging in to your model, and seeing how much the outcome changes. If you need a formal version of this, and don't have any particular size or distribution of perturbations in mind, then I strongly suggest using Gelman and Pardoe's "average predictive comparisons". If you want to know how much manipulating one variable will change another, then you're dealing with causal inference, but once you have a tolerable causal model, again you look at what happens when you perturb it. If what you really want to know is which variables you should include in your predictive model, the answer is the ones which actually help you predict, and this is why we have cross-validation (and have had it for as long as I've been alive), and, for the really cautious, completely separate validation sets. To get a sense of just how mis-leading p-values can be as a guide to which variables actually carry predictive information, I can hardly do better than Ward et al.'s "The Perils of Policy by p-Value", so I won't.
(I actually have a lot more use for p-values when doing goodness-of-fit testing, rather than as part of parametric estimation, though even there one has to carefully examine how the model fails to fit. But that's another story for another time.)
Nearly fifty years ago, R. R. Bahadur defined the efficiency of a test as the "rate at which it makes the null hypothesis more and more incredible as the sample size increases when a non-null distribution obtains", and gave a version of the large deviations argument to say that these rates should typically be exponential. The reason he could do so was that it was clear the p-value will always go to zero as we get more information, and so the issue is whether we're using that information effectively. In another fifty years, I presume that students will still have difficulties grasping this, but I piously hope that professionals will have absorbed the point.
References:
*: For the sake of completeness, I should add that sometimes we need to replace the $1/n$ scaling by $1/r(n)$ for some increasing function $r$, e.g., for dense graphs where $n$ counts the number of nodes, $r(n)$ would typically be $O(n^2)$. ^
(Thanks to KLK for discussions, and feedback on a draft.)
Update, 17 May 2015: Fixed typos (backwards inequality sign, errant $\theta$ for $\rho$) in large deviations section.
Manual trackback: Economist's View
Posted at May 16, 2015 12:39 | permanent link
Attention conservation notice: A 5000+ word attempt to provide real ancestors and support for an imaginary ideology I don't actually accept, drawing on fields in which I am in no way an expert. Contains long quotations from even-longer-dead writers, reckless extrapolation from arcane scientific theories, and an unwarranted tone of patiently explaining harsh, basic truths. Altogether, academic in one of the worst senses. Also, spoilers for several of MacLeod's novels, notably but not just The Cassini Division. Written for, and cross-posted to, Crooked Timber's seminar on MacLeod, where I will not be reading the comments.
I'll let Ellen May Ngwethu, late of the Cassini Division, open things up:
The true knowledge... the phrase is an English translation of a Korean expression meaning "modern enlightenment". Its originators, a group of Japanese and Korean "contract employees" (inaccurate Korean translation, this time, of the English term "bonded laborers") had acquired their modern enlightenment from battered, ancient editions of the works of Stirner, Nietzsche, Marx, Engels, Dietzgen, Darwin, and Spencer, which made up the entire philosophical content of their labor-camp library. (Twentieth-century philosophy and science had been excluded by their employers as decadent or subversive — I forget which.) With staggering diligence, they had taken these works — which they ironically treated as the last word in modern thought — and synthesized from them, and from their own bitter experiences, the first socialist philosophy based on totally pessimistic and cynical conclusions about human nature. Life is a process of breaking down and using other matter, and if need be, other life. Therefore, life is aggression, and successful life is successful aggression. Life is the scum of matter, and people are the scum of life. There is nothing but matter, forces, space and time, which together make power. Nothing matters, except what matters to you. Might makes right, and power makes freedom. You are free to do whatever is in your power, and if you want to survive and thrive you had better do whatever is in your interests. If your interests conflict with those of others, let the others pit their power against yours, everyone for theirselves. If your interests coincide with those of others, let them work together with you, and against the rest. We are what we eat, and we eat everything. All that you really value, and the goodness and truth and beauty of life, have their roots in this apparently barren soil. This is the true knowledge. We had founded our idealism on the most nihilistic implications of science, our socialism on crass self-interest, our peace on our capacity for mutual destruction, and our liberty on determinism. We had replaced morality with convention, bravery with safety, frugality with plenty, philosophy with science, stoicism with anaesthetics and piety with immortality. The universal acid of the true knowledge had burned away a world of words, and exposed a universe of things. Things we could use.1
What I want to consider here is how people who aren't inmates of a privatized gulag could come to the true knowledge, or something very like it; how they might use it; and some of how MacLeod makes it come alive.
One route, of course, would be through the Marxist and especially the Trotskyist tradition; I suspect this was MacLeod's. In "Their Morals and Ours", Trotsky laid out a famous formulation of what really matters:
A means can be justified only by its end. But the end in its turn needs to be justified. From the Marxist point of view, which expresses the historical interests of the proletariat, the end is justified if it leads to increasing the power of man over nature and to the abolition of the power of man over man.
Other2 moral ideas are really expressions of self- or, especially, class- interest, indeed tools in the class struggle:
Morality is one of the ideological functions in this struggle. The ruling class forces its ends upon society and habituates it into considering all those means which contradict its ends as immoral. That is the chief function of official morality. It pursues the idea of the "greatest possible happiness" not for the majority but for a small and ever diminishing minority. Such a regime could not have endured for even a week through force alone. It needs the cement of morality. The mixing of this cement constitutes the profession of the petty-bourgeois theoreticians, and moralists. They dabble in all colors of the rainbow but in the final instance remain apostles of slavery and submission.
But if you really want to know whether something is good or bad, Trotsky says, you ask whether it really conduces to "the liberation of mankind", to "to increasing the power of man over nature and to the abolition of the power of man over man". Intentions don't matter, nor do formal similarities; what matters is whether means and acts really help advance this over-riding end. Thus, explicitly, even terrorism can be justified under conditions where it will be effective (as when Trotsky practiced it during the Civil War).
Trotsky did not, of course, have occasion to contemplate eliminating an extra-terrestrial civilization, but I think his position would have been clear.
The good-means-good-for-me, might-is-right theme is also one with a long history in western philosophy, often as the dreadful fate from which philosophy will save us, but sometimes as the liberating truth which philosophy reveals. The means that something like the true knowledge could, paradoxically enough, be developed out of the classical western tradition.
The obvious way to do this would be to start from figures like Nietzsche who have said pretty similar things. Most of these 19th and 20th century figures would of course have looked on the Solar Union with utter horror, but even so there is, I think, a way there. Many of these philosophers simultaneously celebrate power and bemoan the way in which great, powerful are dragged down or confined by the weak. This creates a tension, if not an outright contradiction. Who is really more powerful? Clearly, if the mediocre masses can collectively dominate and overwhelm the individually magnificent few, the masses have more power. As Hume said, albeit in a somewhat different context, "force is always on the side of the governed". (Or again: "Such a regime could not have endured for even a week through force alone".) Someone who was willing to combine Nietzsche's celebration of power with a frank assessment of both their own power as an isolated individual and of the potential power of different groups could well end up at the true knowledge.
Even less work would be to go further back into the past, to the great figures of the 17th century, like Hobbes and, most especially, Spinoza. Here we find thinkers willing to found, if not socialism, then at least social and political life on "pessimistic and cynical conclusions about human nature". The latter's Political Treatise is quite explicit about the pessimism and the cynicism:
[M]en are of necessity liable to passions, and so constituted as to pity those who are ill, and envy those who are well off; and to be prone to vengeance more than to mercy: and moreover, that every individual wishes the rest to live after his own mind, and to approve what he approves, and reject what he rejects. And so it comes to pass, that, as all are equally eager to be first, they fall to strife, and do their utmost mutually to oppress one another; and he who comes out conqueror is more proud of the harm he has done to the other, than of the good he has done to himself. [Elwes edition, I.5]
Spinoza is equally clear that one's rights extend exactly as far as one's power3, and that the reason people band together is to increase their power4. It is precisely on this basis that Spinoza came to advocate democracy, as uniting more of the power of the people in the commonwealth, especially their powers of reasoning. Of course Spinoza's political views were not the true knowledge, but he actually provides a surprisingly close starting point, and reasoning from his premises and the stand-point of someone who knows they are not going to be at the top of the heap unless they level it all would get you most of the rest of the way there. This would include Spinoza's idea that obedience, allegiance, even solidarity are all dissolved when they are no longer advantageous.
I want to mention one more pseudo-ancestor for the true knowledge. I said before that the themes that might is right, and "good" means "good for me", are an ancient ones in the history of philosophy, but they were introduced as the awful dangers which ethics is supposed to save us from. All the way back in The Republic, we find clear statements of the idea that might is right, that the alternative to pursuing self-interest is sheer stupidity, and that cooperation emerges from alignment of interests. We are supposed to recoil from these ideas in horror, but they can only arouse horror if it seems like there's something to them5. The danger with this tactic is that the initial presentation of the amoralist ideas may end up seeming more convincing than their later refutation. (I think that's the case even in The Republic.) And then one is reduced to talking about how refusing to accept that some transcendental, unverifiable ideas are true will lead to bad-for-you consequences in this world, and the game is over.
No doubt some scholars in the Solar Union will, as I have done above, play the game of trying to find retrospective anticipations of some idea in the words of people who were really saying something else. On the other hand, at some point the true knowledge leaves its bonded-labor camps, joins up with the Sino-Soviet army, and starts expanding "from Vladivostok to Lisbon, from sea to shining sea". As it moves into the wider world, it encounters scientific knowledge considerable more up to date than Darwin and Engels. Does this set the stage for another shameful and self-defeating episode of an ideology trying desperately to hold on to a bit of fossilized science?
I actually don't see why it should. There are scientific theories nowadays which try to address the sort of questions that the true knowledge claims to answer, and I don't think the answers are really that different, though they are not usually presented so starkly.
Biologically, life is a process of assimilating matter and energy, of appropriating parts of the world to sustain itself. Nothing with a stomach is innocent of preying on other living things, and even plants survive, grow, and reproduce only by consuming their environment and re-shaping it to their convenience. The organisms which are better at appropriating and changing the world to suit themselves will live and expand at the expense of those which are worse at it. Those organisms whose acts serve their own good will do better for themselves than those which don't — whether or not that might in some extra-mundane sense be right or just. Abstract goods keep nothing alive, help nothing to grow; self-seeking is what will persist, and everything else will perish. And then when we throw these creatures together, they will inevitably compete, they will rival and oppose. Of course they can aid each other, but this aid will take the form of more effective exploitation of resources, including other life.
There is now a whole sub-field of biology devoted precisely to understanding when organisms will cooperate and assist each other, namely evolutionary game theory. It teaches us conditions for the selection of forms of reciprocity and even of solidarity, even among organisms without shared genetic interests. But those are, precisely, conditions under which the reciprocity and solidarity advance self-interest; it's cooperation in the service of selfishness.
Take the paradigm of the prisoners' dilemma, but tell it a bit differently. Alice and Babur are two bandits, who can either cooperate with each other in robbing villages and caravans, or defect by turning on each other. If they both cooperate, each will take $1,000; if they both defect, neither can steal effectively and they'll get $0. If Alice cooperates and Babur defects by turning on her, he will get $2,000 and she will lose $500, and vice versa. This has exactly the structure of the usual presentations of the dilemma, but makes it plain that "cooperation" is cooperation between Alice and Babur, and can perfectly well be cooperation in preying upon others. It's a famous finding of evolutionary game theory that a strategy of conditional cooperation, of Alice cooperating with Babur until he stops cooperating with her and vice versa, is better for those players than the treacherous, uncooperative one of their turning on each other, and that a population of conditional cooperators will resist invasion by non-cooperators6. Such strategies of cooperation in exploiting others are what the field calls "pro-social behavior"[^nbandits].
Since evolutionary game theorists are for the most part well-adjusted members of bourgeois society, neither psychopaths nor revolutionaries, they do not usually frame their conclusions with the starkness which their own theories would really justify; in this respect, there has been a decline since the glory days when von Neumann could pronounce that "It is just as foolish to complain that people are selfish and treacherous as it is to complain that the magnetic field does not increase unless the electric field has a curl." If we could revive some of that von Neumann spirit, a fair synthesis of works like The Evolution of Cooperation, The Calculus of Selfishness, A Cooperative Species, Individual Strategy and Social Structure, etc., would go something like this: "Cooperation evolves just to the extent that it both advances the self-interests of the cooperators, and each of them has enough power to make the other hurt if betrayed. Everything else is self-defeated, is 'dominated'. Typically, the gains from cooperation arise from more effectively exploiting others. Also, inside every positive-sum story about gains from cooperation, there is a negative-sum struggle over dividing those gains, a struggle where the advantage lies with the already-stronger party." A somewhat more speculative addendum would be the following: "We have evolved to like hurting those who have wronged us, or who have flouted rules we want them to follow, because our ancestors have had to rely for so many millions of years on selfish, treacherous fellow creatures, and 'pro-social punishment' is how we've kept each other in line enough to take over the world."
There is little need to elaborate on how neatly this dovetails with the true knowledge, so I won't7. This alignment is, I suspect, no coincidence.
Given these points, how do we think about choices between who to cooperate with, or even whether to cooperate at all? Look for those whose interests are aligned with yours, and where cooperation will do the most to advance your interests — to those with the most power, most closely aligned with you. To neglect to ally oneself when it would be helpful is not wicked — what has wickedness to do with any of this? — but it is stupid, because it leads to needless weakness.
At this point, or somewhere near it, the Sheenisov must have made a leap which seems plausible but not absolutely compelling. The united working class is more powerful than the other forces in capitalism, the last of the "tool-making cultures of the Upper Pleistocene". To throw in with that is to get with the strength. Why solidarity? Because it's the source of power. At the same time, it's a source of strength which can hardly tolerate other, rival powers — organized non-cooperators, capitalist and statist remnants, since they threaten it, and it them.
These arguments would apply to any sort of organism — including Jovian post-humans as well as us, and so Ellen May seems to me to have very much the worse of her argument with Mary-Lou Radiation Nation Smith:
"They're not monsters, you know. Why should you expect beings more powerful and intelligent than ourselves to be worse than ourselves? Wouldn't it be more reasonable to expect them to be better? Why should more power mean less good?" I could hardly believe I was hearing this. ... I searched for my most basic understanding, and dragged it out: "Because good means good for us!" Mary-Lou smiled encouragingly and spoke gently, as though talking someone down from a high ledge. "Yes, Ellen. But who is us? We're all — human, post-human, non-human — machines with minds in a mindless universe, and it behoves those of us with minds to work together if we can in the face of that mindless universe. It's the possibility of working together that forges an us, and only its impossibility that forces a them. That is the true knowledge as a whole — the union, and the division."8
(The worse of the argument, that is, unless Ellen May can destroy the fast folk, in which case there is no power to either unite with or to fear. "No Jovian superintelligences, no problem", as it were.)
As I said earlier, contemporary scientists studying the evolution of cooperation do not usually put their conclusions in such frank terms as the true knowledge. I don't even think that this is because they're reluctant to do so; I think it genuinely doesn't occur to them. (And this despite things like one of the founders of evolutionary game theory, John Maynard Smith, being an outright Marxist and ex-Communist.) Even when people like Bowles and Gintis — not Marxists, but no strangers to the leftist tradition — try to draw lessons from their work, they end up with very moderate social democracy, not the true knowledge. Since I know Bowles and Gintis, I am pretty sure that they are not holding back...
Why so few people are willing to push these ideas to (one) logical conclusion is an interesting question I cannot pretend to answer. I suspect that part of the answer has to do with people not having grown up with these ideas, so that the theories are used more to reconstruct pre-existing notions than as guides in their own right. If that's so, then a few more (academic) generations of their articulation, especially if some of the articulators should happen to have the right bullet-swallowing tendencies, could get us all the way to the true knowledge being worked out, not by bonded laborers but by biologists and economists.
This presents points where, I think, the true knowledge might not lead to the attractive-to-me Solar Union, but rather somewhere much darker. If I am a member of one of the subordinate classes, well, the strongest power locally is probably the one dominating me. Maybe solidarity with others would let me overthrow them and escape, but if that united front doesn't form, or fails, things get much, much worse for me. The true knowledge could actually justify obedience to the powers that be, if they're powerful enough, and not enough of us are united in opposition to them.
The other point of failure is this. If I am a member of an oppressing or privileged class, what lesson do I take from the true knowledge? Well, I might try to throw in my lot with the power that will win — but that means abandoning my current goods, the things which presently make me strong and enhance my life. My interest is served by allying with those who are also beneficiaries of inequality, and making sure the institutions which benefit me remain in place, or if they change alter to be even more in my favor. Members of a privileged class in the grip of moralizing superstition might sometimes be moved by pity, sympathy, or benevolence. Rulers who have themselves accepted the true knowledge will concede nothing except out of calculation that it's better for itself than the alternative. Voltaire once said something to the effect that whether or not God existed, he hoped his valet believed in Him; it might have been much more correct for Voltaire's valet to hope that his master, and still more rulers like Frederick the Great, feared an avenging God.
My somewhat depressing prospect is that our ruling classes are a lot more likely to talk themselves into the true knowledge by the evolutionary route than the rest of us are to discover revolutionary solidarity — though whether the occasional fits of benevolence on the part of rulers really make things much better than a frank embrace of their self-interest would is certainly a debatable proposition.
If anyone does want to start propagating the true knowledge, I think it would actually have pretty good prospects. A number of sociologists (Gellner, Boudon) have pointed out that really successful ideologies tend to combine two features. One is that they have a core good idea, one which makes lightbulbs go on for people. Since I can't put this better than Gellner did, I'll quote him:
The general precondition of a compelling, aura-endowed belief systems is that, at some one point at least, it should carry overwhelming, dramatic conviction. In other words, it is not enough that there should be a plague in the land, that many should be in acute distress and in fear and trembling, and that some practitioners be available who offer cure and solace, linked plausibly to the background beliefs of the society in question. All that may be necessary but it is not sufficient. Over and above the need, and over and above mere background plausibility (minimal conceptual eligibility), there must also be something that clicks, something which throws light on a pervasive and insistent and disturbing experience, something which at long last gives it a local habitation and a name, which turns a sense of malaise into an insight: something which recognizes and places an experience or awareness, and which other belief systems seem to have passed by.9
I think MacLeod gets this — look at how Ellen May talks about the true knowledge "struck home with the force of a revelation" (ch. 5, p. 89). But the click for the true knowledge is how it evades the common pitfall of attempts to work out materialist or naturalist ethics. After grounding everything in self-interest and self-assertion, there is a very strong tendency to get into mere self-assertion; "good" means "good for me, and for me alone". The true knowledge avoids this; it gives you a way of accepting that you are a transient, selfish mind in a mindless, indifferent universe, and sloughing off thousands of years of accumulated superstitious rubbish (from outright taboos and threats of the Supreme Fascist to incomprehensible commands from nowhere) — you can face the light, and escape the bullshit, and yet not be altogether a monster.
(Boudon would add something to Gellner's requirement that an ideology click: the idea should also be capable of "hyperbolic" use, of being over-applied through neglecting necessary qualifications and conditions. Arguably, the whole plot of The Cassini Division is driven by Ellen May's hyperbolization of part of the true knowledge.)
Clicking is one condition for an ideology to take off; but there's another.
Though belief systems need to be anchored in the background assumptions, in the pervasive obviousness of an intellectual climate, yet they cannot consist entirely of obvious, uncontentious elements. There are many ideas which are plainly true, or which appear to be such to those who have soaked up a given intellectual atmosphere: but their very cogency, obviousness, acceptability, makes them ineligible for serving as the distinguishing mark of membership of a charismatic community of believers. Demonstrable or obvious truths do not distinguish the believer from the infidel, and they do not excite the faithful. Only difficult beliefs can do that. And what makes a belief difficult? There must be an element both of menace and of risk. The belief must present itself in such a way that the person encountering, weighing the claim that is being made on him, can neither ignore it nor hedge his bets. His situation is such that, encountering the claim, he cannot but make a decision, and it will be a weighty one, whichever way he decides. He is obliged, by the very nature of the claim, to commit himself, one way or the other.10
The true knowledge would have this quality, that Gellner (following Kirkegaard) calls "offense", in spades.11
I'll close with two observations about this combination of click and offense. One is that it is of course very common for a certain sort of fiction, and science fiction often indulges in it. Heinlein, in particular, was very good at it, and in some ways The Cassini Division is, the color of Ellen May's hair notwithstanding, a very Heinleinian book, and Ellen May explaining the true knowledge to us is not that different from being on the receiving end of one of Heinlein's in-story lectures. (I know someone else made these points before me, but I can't remember who.) One of the things which makes me like MacLeod's books better than Heinlein's, beyond the content of the lectures appealing more to my prejudices, is that even in the story world, the ideas get opposed, and there is real argument.
The other observation is that MacLeod of course comes out of the Trotskyist tradition, part of the broader family of Communisms. During its glory days, when it was the "tragic hero of the 20th century", Communism quite certainly combined the ability to make things click with the ability to give offense. This must have been one of MacLeod's models for the true knowledge. MacLeod is not any longer any sort of Communist ("the actual effect" of Communism "was to complete the bourgeois revolution ... and to clear the ground for capitalism") or even Marxist, but there is a recurring theme in his work of some form of the "philosophy of praxis" re-appearing. One of the core Marxist ideas, going all the way back to the beginning, is that socialism isn't just an arbitrary body of ideas, but an adaptive response to the objective situation of the proletariat. Even if the very memory of the socialist movement were to vanish, it is (so the claim goes) something which life under capitalism will spontaneously regenerate. One symbol of this in MacLeod's fiction is the scene at the very end of Engine City, where a hybrid creature formed from the remains of three executed revolutionaries crawls from a mass grave. The formation of the true knowledge is another.
I don't, of course, actually believe in the true knowledge, but I find it hard to say why I shouldn't; this makes it, for me, one of MacLeod's more compelling creations. I have kept coming back to it for more than fifteen years now, and I doubt I'm done with it.
The Cassini Division, ch. 5, pp. 89--90 of the 1999 Tor edition; ellipses and italics in the original.^
Notice how Trotsky says the "interests of the proletariat" lie in "increasing the power of man over nature", not increasing the power of the proletariat over nature, and in "the abolition of the power of man over man", not abolishing the power of others over the proletariat (either as a whole or over its individual members). Thus he can reconcile saying that all moral ideas express a class standpoint with saying that his goals are for the benefit of all humanity. There is an implicit appeal here to an idea which goes back to Marx and Engels, that, because of the proletariat's particular class position, the only way it can pursue its interest is through universal liberation of humanity. What can one say but "how convenient"?^
"every natural thing has by nature as much right, as it has power to exist and operate" (II.3); "And so the natural right of universal nature, and consequently of every individual thing, extends as far as its power: and accordingly, whatever any man does after the laws of his nature, he does by the highest natural right, and he has as much right over nature as he has power" (II.4); "whatever anyone, be he learned or ignorant, attempts and does, he attempts and does by supreme natural right. From which it follows that the law and ordinance of nature, under which all men are born, and for the most part live, forbids nothing but what no one wishes or is able to do, and is not opposed to strifes, hatred, anger, treachery, or, in general, anything that appetite suggests" (II.8); "Besides, it follows that everyone is so far rightfully dependent on another, as he is under that other's authority, and so far independent, as he is able to repel all violence, and avenge to his heart's content all damage done to him, and in general to live after his own mind. He has another under his authority, who holds him bound, or has taken from him arms and means of defence or escape, or inspired him with fear, or so attached him to himself by past favour, that the man obliged would rather please his benefactor than himself, and live after his mind than after his own" (II.9--10).^
"If two come together and unite their strength, they have jointly more power, and consequently more right over nature than both of them separately, and the more there are that have so joined in alliance, the more right they all collectively will possess." (II.13).^
It would be horrifying if everyone were followed around by a drooling slimy befanged monster, careful to hide itself out of our sight, which might devour any one of us without warning at any moment. A philosophy which offered to re-assure us that lurking monsters do not follow us around would arouse little interest.^
The basic tit-for-tat strategy is not evolutionarily stable against invasion by more forgiving conditional cooperators, which leads to a lot of technically interesting wrinkles, which you can read about in, say, Karl Sigmund's great Games of Life. But various attempts to dethrone "strong reciprocity" (e.g., "Southampton" strategies, "zero-determinant" strategies) have all, so far as I know, proved unsuccessful.^
If I were going to elaborate, I'd have a lot to say about this bit from The Cassini Division (ch. 7, p. 144): "Without power, respect is dead. But our power needn't be the capacity to destroy them — our own infants, and many lower animals, have power over us because our interests are bound up with theirs. Because we value them, and because natural selection has built that valuing into our nervous systems, to the point where we cannot even wish to change it, though no doubt if we wanted to we could. This is elementary: the second iteration of the true knowledge."^
Cassini Divsion, ch. 10, p. 216, my ellipses.^
The Psychoanalytic Movement: The Cunning of Unreason, first edition (Evanston, Illinois: Northwestern University Press, 1996), p. 39.^
The Psychoanalytic Movement, pp. 40--41.^
The Cassini Division, ch. 5, pp. 93--94: "I think about being evil. To them, I realize, we are indeed bad and harmful, but — and the thought catches my breath — we are not bad and harmful to ourselves, and that is all that matters, to us. So as long as we are actually achieving our own good, it doesn't matter how evil we are to our enemies. Our Federation will be, to them, the evil empire, the domain of dark lords; and I will be a dark lady in it. Humanity is indeed evil, from any non-human point of view. I hug my human wickedness in a shiver of delight."^
Manual Trackback: Adam Kotsko; MetaFilter
Posted at May 12, 2015 13:53 | permanent link
Attention conservation notice: If you'd care about these links, you've probably seen them already.
I have been very much distracted from blogging by teaching undergraduates (last semester; this semester), by supervising graduate students, and by Life. Thus even this link round-up is something I literally began years ago, and am only now posting for lack of time to do real blogging.
Posted at May 05, 2015 22:28 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; The Dismal Science; Writing for Antiquity; The Continuing Crises; The Great Transformation; The Commonwealth of Letters; Automata and Calculating Machines; Afghanistan and Central Asia; The Beloved Republic
Posted at April 30, 2015 23:59 | permanent link
Attention conservation notice: Notice of an upcoming academic talk at Carnegie Mellon. Only of interest if you (1) care about how the mathematics of graph limits intersects with non-parametric network modeling, and (2) will be in Pittsburgh week after next.
As always, the talk is free and open to the public.
(I'd write something long here about why graph limits are so interesting, but why repeat myself?)
Posted at April 09, 2015 19:02 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; The Dismal Science; Writing for Antiquity; The Continuing Crises; The Great Transformation; The Commonwealth of Letters
Posted at March 31, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Pleasures of Detection, Portraits of Crime; Enigmas of Chance; Minds, Brains, and Neurons; Philosophy
Posted at February 28, 2015 23:59 | permanent link
Attention conservation notice: I have no taste.
Now imagine a sequence of variables $X_1, X_2, \ldots$, where $X_n = n^{-1}\sum_{i=1}^{n}{Y_i}$, with the $Y_i$ being (for simplicity) IID. Then we have a very parallel calculation which gives an exponentially shrinking probability: \[ \begin{eqnarray*} \Prob{X_n \geq a} & = & \Prob{\sum_{i=1}^{n}{Y_i} \geq na}\\ \Prob{X_n \geq a} & \leq & e^{-nta}{\left(\Expect{e^{tY_1}}\right)}^n\\ n^{-1}\log{\Prob{X_n \geq a}} &\leq & -\sup_{t}{ta - \log{\Expect{e^{tY_1}}}} \end{eqnarray*} \] Of course, there is still the matter of getting the matching lower bound, which I won't go into here, but is attainable.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Enigmas of Chance; Writing for Antiquity; The Beloved Republic; The Dismal Science; Pleasures of Detection, Portraits of Crime; Minds, Brains, and Neurons
Posted at January 31, 2015 23:59 | permanent link