Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Pleasures of Detection, Portraits of Crime; Scientifiction and Fantastica; The Beloved Republic; Writing for Antiquity; The Commonwealth of Letters
Posted at November 30, 2015 23:59 | permanent link
Attention conservation notice: Only relevant if you are a student at Carnegie Mellon University, or have a pathological fondness for reading lecture notes on statistics.
In the so-called spring, I will again be teaching 36-402 / 36-608, undergraduate advanced data analysis:
The goal of this class is to train you in using statistical models to analyze data — as data summaries, as predictive instruments, and as tools for scientific inference. We will build on the theory and applications of the linear model, introduced in 36-401, extending it to more general functional forms, and more general kinds of data, emphasizing the computation-intensive methods introduced since the 1980s. After taking the class, when you're faced with a new data-analysis problem, you should be able to (1) select appropriate methods, (2) use statistical software to implement them, (3) critically evaluate the resulting statistical models, and (4) communicate the results of your analyses to collaborators and to non-statisticians.During the class, you will do data analyses with existing software, and write your own simple programs to implement and extend key techniques. You will also have to write reports about your analyses.
Graduate students from other departments wishing to take this course should register for it under the number "36-608". Enrollment for 36-608 is very limited, and by permission of the professors only.
Prerequisites: 36-401, with a grade of C or better. Exceptions are only granted for graduate students in other departments taking 36-608.
This will be my fifth time teaching 402, and the fifth time where the primary text is the draft of Advanced Data Analysis from an Elementary Point of View. (I hope my editor will believe that I don't intend for my revisions to illustrate Zeno's paradox.) It is the first time I will be co-teaching with the lovely and talented Max G'Sell.
Unbecoming whining: When I came to CMU, a decade ago, 402 was a projects class for about 10 students. It was larger than that when I inherited it.
| Year | Students receiving final grades |
| 2011 | 69 |
| 2012 | 88 |
| 2013 | 90 |
| 2015 | 115 |
Posted at November 17, 2015 22:54 | permanent link
Attention conservation notice: Only of interest if you (1) care about statistical inference with network data, and (2) will be in Pittsburgh next week.
A (perhaps) too-skeptical view of statistics is that we should always think we have $ n=1 $, because our data set is a single, effectively irreproducible, object. With a lot of care and trouble, we can obtain things very close to independent samples in surveys and experiments. When we get to time series or spatial data, independence becomes a myth we must abandon, but we still hope that we can break up the data set into many nearly-independent chunks. To make those ideas plausible, though, we need to have observations which are widely separated from each other. And those asymptotic-independence stories themselves seem like myths when we come to networks, where, famously, everyone is close to everyone else. The skeptic would, at this point, refrain from drawing any inference whatsoever from network data. Fortunately for the discipline, Betsy Ogburn is not such a skeptic.
As always, the talk is free and open to the public.
Posted at November 09, 2015 22:14 | permanent link
Attention conservation notice: Only of interest if you (1) are interested in seeing machine learning methods turned (back) into ordinary inferential statistics, and (2) will be in Pittsburgh on Wednesday.
Leo Breiman's random forests have long been one of the poster children for what he called "algorithmic models", detached from his "data models" of data-generating processes. I am not sure whether developing classical, data-model statistical-inferential theory for random forests would please him, or has him spinning in his grave, but either way I'm sure it will make for an interesting talk.
As always, the talk is free and open to the public.
Posted at November 09, 2015 16:23 | permanent link
Attention conservation notice: 11 pages of textbook out-take on statistical methods, either painfully obvious or completely unintelligible.
I wrote up some notes on kriging for use in the regression class, but eventually decided teaching that and covariance estimation would be too much. Eventually I'll figure out how to incorporate it into the book, but in the meanwhile I offer it for the edification of the Internet.
Posted at November 03, 2015 19:00 | permanent link
Blogging will remain sparse while I teach, finish the book, write grant proposals, try not to screw up being involved in a faculty search, do all the REDACTED BECAUSE PRIVATE things, and dream about research. In the meanwhile:
A Twitter account, opened at Tim Danford's instigation. This is a semi-automated new account which is just for announcing new posts here; it (and I use the pronoun deliberately) follows no one, I read nothing, and messages or attempts to engage might as well be piped to /dev/null.
My online notebooks are in the same process of incremental update they've been for the last 21 years.
My on-going bookmarking, with short commentary. (Pinboard doesn't need my unsolicited endorsement, but has it.)
Tumblr, for pictures.
Posted at November 03, 2015 17:00 | permanent link