Attention conservation notice: I have no taste.
Books to Read While the Algae Grow in Your Fur; Scientifiction and Fantastica; Writing for Antiquity; Enigmas of Chance; The Dismal Science; The Collective Use and Evolution of Concepts; Pleasures of Detection, Portraits of Crime
Posted at January 31, 2016 23:59 | permanent link
Attention conservation notice: Only of interest if (1) you care about the intersection of high-dimensional statistics with information theory, and (2) will be in Pittsburgh next Wednesday.
It is, perhaps, only appropriate that the first statistics seminar of the semester is about connections between high-dimensional regression, and limits on how fast information can be sent over noisy channels.
As always, the talk is free and open the public.
Posted at January 27, 2016 18:24 | permanent link
Attention conservation notice: Navel-gazing by an academic.
This was my first time teaching our undergraduate course on linear models ("401"). I've taught the course which follows it (402) four times, and re-designed it once, but I've never had to actually take the students through the pre-req. They come in with courses on probability, on statistical inference, and on linear algebra, but usually no real experience with data analysis. Linear regression is usually their first time trying to connect statistical models to actual data — as well as learning about how linear regression works.
I am OK with how I did, but only about OK. The three big issues I need to work on are (1) connecting theory to practice, (2) getting feedback to students faster, and (3) better assignments.
(1) I feel like I did not strike a good balance, in lecture, between theory, computational examples, and how theory guides practice. The last thing I want to do is turn out people who just (think they) know which commands to run in R, without understanding what's actually going on. (As a student put it to a colleague in a previous semester, "The difference between 401 and econometrics is that in econometrics we have to know how to do all this stuff, and in 401 we also have to know why." This was not, I believe, intended as a compliment.) But based on the student evaluations, and still more the assignments, there're still students who are a bit fuzzy about what "holding all other predictor variables constant" actually means in a linear model. But then again, based on student feedback I persistently have a problem connecting mathematical theory to data-analytic practice; more serious re-thinking of how I teach may be in order.
(2) Students need faster and more consistent feedback on their assignments. We were somewhat constrained on speed this semester by a labor shortage, but I could have done more to ensure consistency across graders.
(3) Too many of the assignments were based on small, old data sets from the textbook. Mea culpa.
This was the first time we had two sections of 401, with two separate professors. I think we did OK at coordinating them, and I take full responsibility for all the failures and glitches. (I should add, because I know some of the students read this, that grades were curved and calculated completely independently across the two sections.)
I am very grateful for the work done on designing the curriculum for this course by my colleagues. Still, I feel like a lot of the course was spent on (to be slightly unfair) special cases which people could work out in closed form in the 1920s, and pretending that they had relevance to actual data analysis. (Cf.) The Kids do need at least a nodding acquaintance with that stuff, because people will expect it of them, but I would rather they be taught it as a nice bonus rather than a default. This would mean a lot more re-design that I put into the course.
Relatedly, I came to have a thorough, almost personal, dislike of the textbook, but that's another story.
Some things which did go well:
I'll indulge myself by ending on on an "achievement unlocked" unlocked note. This was (so far as I know) the first class I've taught where a student's response to one of my lectures was to ask Reddit "Is there any truth to this?". There can be few better proofs that I reached at least one of my students and inspired them to think critically about the material. I am being quite serious when I say that I wish something like this happened every week in every course.
Posted at January 09, 2016 22:38 | permanent link
Attention conservation notice: Only relevant if you are a student at Carnegie Mellon University, or have a pathological fondness for reading lecture notes on statistics.
In the so-called spring, I will again be teaching 36-402 / 36-608, undergraduate advanced data analysis:
The goal of this class is to train you in using statistical models to analyze data — as data summaries, as predictive instruments, and as tools for scientific inference. We will build on the theory and applications of the linear model, introduced in 36-401, extending it to more general functional forms, and more general kinds of data, emphasizing the computation-intensive methods introduced since the 1980s. After taking the class, when you're faced with a new data-analysis problem, you should be able to (1) select appropriate methods, (2) use statistical software to implement them, (3) critically evaluate the resulting statistical models, and (4) communicate the results of your analyses to collaborators and to non-statisticians.
During the class, you will do data analyses with existing software, and write your own simple programs to implement and extend key techniques. You will also have to write reports about your analyses.
Graduate students from other departments wishing to take this course should register for it under the number "36-608". Enrollment for 36-608 is very limited, and by permission of the professors only.
Prerequisites: 36-401, with a grade of C or better. Exceptions are only granted for graduate students in other departments taking 36-608.
This will be my fifth time teaching 402, and the fifth time where the primary text is the draft of Advanced Data Analysis from an Elementary Point of View. (I hope my editor will believe that I don't intend for my revisions to illustrate Zeno's paradox.) It is the first time I will be co-teaching with the lovely and talented Max G'Sell.
Unbecoming whining: 402 will be larger this year than last, just like it has been every year I've been here. This year, in fact, we'll have over 150 students in it, or about 1/50 of all CMU undergrads. (This has nothing to do with my teaching, and everything to do with our student population.) I think it's great that we're teaching what would be masters-level material at most schools to so many juniors and seniors, but I don't think we'll be able to keep doubling every five years without either having a lot of stuff break, or transforming the nature of the course yet again. It's clearly a better problem to have than "class sizes are halving every five years"*, but it's still a problem.
*: As I have said in a number of conversations over recent years, the nightmare scenario for statistics vs. "data science" is that statistics becomes a sort of mathematical analog to classics. People might pay lip-service to our value, especially people who are invested in pretending to intellectual rigor, but few would actually pay attention to anything we have to say.
Posted at January 09, 2016 22:00 | permanent link