November 30, 2015

Books to Read While the Algae Grow in Your Fur, November 2015

Attention conservation notice: I have no taste.

John Scalzi, The End of All Things
Mind candy science fiction, latest in the series begun with Old Man's War. At the surface level, it's a fun series of skiffy adventures, in which there are schemes, explosions, gadgets, secret lairs, etc., etc. Inter-textually, this is Scalzi sticking the knife in conversation with in Starship Troopers, having begun (in the first book) with a set-up which seems like a re-tread of Heinlein's ideology in that book, and then systematically having that universe collapse under the weight of (as my ancestors would've put it) its internal contradictions. I admit that taking pleasure in the latter aspect of the books is a recherché taste, and that I generally prefer my mind candy to be less inward-looking.
Sarah Vowell, Lafayette in the Somewhat United States
In which Vowell tackles the American Revolution, our memory of the Revolution, and how we owe our entire national existence to the French.
Kathleen George, Hideout
C. J. Lyons, Snake Skin
Two mind-candy mysteries, both, as it happens, set in Pittsburgh (*). George's is part of a continuing series of police procedurals, distinguished by really good characterization; here, the stand-out characters are the rather hopeless criminals. (One distinguishing feature of her books: the reader usually knows whodunnit very early on.) Lyons's, which I picked up by chance without realizing it had a local connection, is at once less elevated in its story-telling and more over-the-top in its action, but still passed the "I want to know what happens next" test.
*: Both are pretty good at the local color, at least by my standards as a mere ten-year resident rather than a real Yinzer. I admit I boggled at Lyons describing the immediate vicinity of the Pittsburgh Center for the Arts as a "blue collar" neighborhood, but then it occurred to me how rarely I got four blocks the that way from it...
Jeff VanderMeer, Acceptance
High quality mind candy science fiction/horror, sequel to Annihilation and Authority. Here, at the end, we get to see the marvels hidden inside the terrors — and inside the marvels, more terrors. I actually found the fragment of an explanation we got fairly satisfying, and liked that it was only a fragment, though I realize that tastes may differ here.
John Milton, Paradise Lost
I tried this as a teenager, but don't think I got beyond the bit early in Book III where the God the Father starts monologuing his plans to the Son. On this attempt I listened to an excellent audiobook (read by Ralph Cosham), and I loved it. The language is magnificent, as is Milton's attempt to depict action on a more-than-terrestrial scale. (Though his standards for mind-boggling vastness are comically small, compared to the actual universe shown to us by astronomy.) The ideology is rubbish, of course. So: score one for approaching literary classics in maturity, rather than as a callow youth.
Stray thoughts, probably already immensely refined in the libraries written about this book: (1) Those are some really vivid accounts of how things looked, for a blind man. (2) Sometimes it seems like Milton's trying to excise classical-mythological allusions in favor of Biblical ones (e.g., the places named in the invocation of the heavenly Muse at the very opening), but it's like he just can't stay away from them. (3) Similarly, I think there are very few historical or contemporary-geographical allusions to places in Europe, compared to quite striking ones for Asia (e.g., X 431ff) and even Africa ("Serraliona", X 703); if that's right, why?
Finally: I kept thinking, as I was immersed in this book about a creature bent on vengeance against its all-powerful creator, "What would Justice of Torren One Esk Nineteen make of this?"
(N.) Lee Wood, Kingdom of Lies and Kingdom of Silence
Mind candy mystery novels. The first is a combination of a procedural and an amateur-sleuth mystery; the second is just a procedural. They're well-told, with good characterization, but too many coincidences for me to be completely satisfied in the mysteries. (Picked up because of the quality of Wood's older science fiction and fantasy, particularly Looking for the Mahdi and Bloodrights.)

Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; The Beloved Republic; Writing for Antiquity; The Commonwealth of Letters

Posted at November 30, 2015 23:59 | permanent link

November 17, 2015

Course Announcement: 36-402, Advanced Data Analysis, Spring 2016

Attention conservation notice: Only relevant if you are a student at Carnegie Mellon University, or have a pathological fondness for reading lecture notes on statistics.

In the so-called spring, I will again be teaching 36-402 / 36-608, undergraduate advanced data analysis:

The goal of this class is to train you in using statistical models to analyze data — as data summaries, as predictive instruments, and as tools for scientific inference. We will build on the theory and applications of the linear model, introduced in 36-401, extending it to more general functional forms, and more general kinds of data, emphasizing the computation-intensive methods introduced since the 1980s. After taking the class, when you're faced with a new data-analysis problem, you should be able to (1) select appropriate methods, (2) use statistical software to implement them, (3) critically evaluate the resulting statistical models, and (4) communicate the results of your analyses to collaborators and to non-statisticians.

During the class, you will do data analyses with existing software, and write your own simple programs to implement and extend key techniques. You will also have to write reports about your analyses.

Graduate students from other departments wishing to take this course should register for it under the number "36-608". Enrollment for 36-608 is very limited, and by permission of the professors only.

Prerequisites: 36-401, with a grade of C or better. Exceptions are only granted for graduate students in other departments taking 36-608.

This will be my fifth time teaching 402, and the fifth time where the primary text is the draft of Advanced Data Analysis from an Elementary Point of View. (I hope my editor will believe that I don't intend for my revisions to illustrate Zeno's paradox.) It is the first time I will be co-teaching with the lovely and talented Max G'Sell.

Unbecoming whining: When I came to CMU, a decade ago, 402 was a projects class for about 10 students. It was larger than that when I inherited it.

Year Students receiving final grades
2011 69
2012 88
2013 90
2015 115
Since there are about 160 students in the pre-req class, I don't see how we can reasonably expect to get away with less than 140 of them continuing on to 402. (Even I can't teach 401 so badly than an eighth of them will get below a C.) This will mean at least six straight years of uninterrupted growth for 402, to the point where about 1/50 of the total undergraduate population will be taking it in the spring (and maybe 1/8 of all our undergrads will pass through it at some point). This has, of course, nothing to do with my qualities as an instructor, and everything to do with the apparently unstoppable increase in the number of students majoring in statistics and its kin. "Class sizes are doubling every five years" is clearly a better problem to have than "class sizes are halving every five years" (*), but we can neither count on the trend continuing for long, nor keep teaching in the same way if I think I can more or less continue with the same plan I had when the class was half as large, but if this goes on something is going to have to change. It's clearly a better problem to have than the reverse (*), but *: As I have said in a number of conversations over recent years, the nightmare scenario for statistics vs. "data science" is that statistics becomes a sort of mathematical analog to classics. People might pay lip-service to our value, especially people who are invested in pretending to intellectual rigor, but few would actually pay attention to anything we have to say.

Advanced Data Analysis from an Elementary Point of View

Posted at November 17, 2015 22:54 | permanent link

November 09, 2015

"Inference in the Presence of Network Dependence Due to Contagion" (Next Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) care about statistical inference with network data, and (2) will be in Pittsburgh next week.

A (perhaps) too-skeptical view of statistics is that we should always think we have $ n=1 $, because our data set is a single, effectively irreproducible, object. With a lot of care and trouble, we can obtain things very close to independent samples in surveys and experiments. When we get to time series or spatial data, independence becomes a myth we must abandon, but we still hope that we can break up the data set into many nearly-independent chunks. To make those ideas plausible, though, we need to have observations which are widely separated from each other. And those asymptotic-independence stories themselves seem like myths when we come to networks, where, famously, everyone is close to everyone else. The skeptic would, at this point, refrain from drawing any inference whatsoever from network data. Fortunately for the discipline, Betsy Ogburn is not such a skeptic.

Elizabeth Ogburn, "Inference in the Presence of Network Dependence Due to Contagion"
Abstract: Interest in and availability of social network data has led to increasing attempts to make causal and statistical inferences using data collected from subjects linked by social network ties. But inference about all kinds of estimands, starting with simple sample means, is challenging when only a single network of non-independent observations is available. There is a dearth of principled methods for dealing with the dependence that such observations can manifest. We describe methods for causal and semiparametric inference when the dependence is due solely to the transmission of information or outcomes along network ties.
Time and place: 4--5 pm on Monday, 16 November 2015, in 1112 Doherty Hall

As always, the talk is free and open to the public.

Enigmas of Chance; Networks

Posted at November 09, 2015 22:14 | permanent link

"Statistical Estimation with Random Forests" (This Week at the Statistics Seminar)

Attention conservation notice: Only of interest if you (1) are interested in seeing machine learning methods turned (back) into ordinary inferential statistics, and (2) will be in Pittsburgh on Wednesday.

Leo Breiman's random forests have long been one of the poster children for what he called "algorithmic models", detached from his "data models" of data-generating processes. I am not sure whether developing classical, data-model statistical-inferential theory for random forests would please him, or has him spinning in his grave, but either way I'm sure it will make for an interesting talk.

Stefan Wager, "Statistical Estimation with Random Forests"
Abstract: Random forests, introduced by Breiman (2001), are among the most widely used machine learning algorithms today, with applications in fields as varied as ecology, genetics, and remote sensing. Random forests have been found empirically to fit complex interactions in high dimensions, all while remaining strikingly resilient to overfitting. In principle, these qualities ought to also make random forests good statistical estimators. However, our current understanding of the statistics of random forest predictions is not good enough to make random forests usable as a part of a standard applied statistics pipeline: in particular, we lack robust consistency guarantees and asymptotic inferential tools. In this talk, I will present some recent results that seek to overcome these limitations. The first half of the talk develops a Gaussian theory for random forests in low dimensions that allows for valid asymptotic inference, and applies the resulting methodology to the problem of heterogeneous treatment effect estimation. The second half of the talk then considers high-dimensional properties of regression trees and forests in a setting motivated by the work of Berk et al. (2013) on valid post-selection inference; at a high level, we find that the amount by which a random forest can overfit to training data scales only logarithmically in the ambient dimension of the problem.
(This talk is based on joint work with Susan Athey, Brad Efron, Trevor Hastie, and Guenther Walther.)
Time and place: 4--5 pm on Wednesday, 11 November 2015 in Doherty Hall 1112

As always, the talk is free and open to the public.

Enigmas of Chance

Posted at November 09, 2015 16:23 | permanent link

November 03, 2015

Kriging in Perspective (Teaching outtakes)

Attention conservation notice: 11 pages of textbook out-take on statistical methods, either painfully obvious or completely unintelligible.

I wrote up some notes on kriging for use in the regression class, but eventually decided teaching that and covariance estimation would be too much. Eventually I'll figure out how to incorporate it into the book, but in the meanwhile I offer it for the edification of the Internet.

Enigmas of Chance; Corrupting the Young

Posted at November 03, 2015 19:00 | permanent link

Housekeeping Notes

Blogging will remain sparse while I teach, finish the book, write grant proposals, try not to screw up being involved in a faculty search, do all the REDACTED BECAUSE PRIVATE things, and dream about research. In the meanwhile:

A Twitter account, opened at Tim Danford's instigation. This is a semi-automated new account which is just for announcing new posts here; it (and I use the pronoun deliberately) follows no one, I read nothing, and messages or attempts to engage might as well be piped to /dev/null.

My online notebooks are in the same process of incremental update they've been for the last 21 years.

My on-going bookmarking, with short commentary. (Pinboard doesn't need my unsolicited endorsement, but has it.)

Tumblr, for pictures.

Self-centered

Posted at November 03, 2015 17:00 | permanent link

Three-Toed Sloth