It's that time again:

- 36-402, Advanced Data Analysis, Spring 2012
*Description*: This course introduces modern methods of data analysis, building on the theory and application of linear models from 36-401. Topics include nonlinear regression, nonparametric smoothing, density estimation, generalized linear and generalized additive models, simulation and predictive model-checking, cross-validation, bootstrap uncertainty estimation, multivariate methods including factor analysis and mixture models, and graphical models and causal inference. Students will analyze real-world data from a range of fields, coding small programs and writing reports.*Prerequisites*: 36-401 (modern regression); or consent of instructor, in extraordinary cases*Time and place*: 10:30--11:50 am, Tuesdays and Thursdays, in Porter Hall 100*Note*: Graduate students in other departments wishing to take this course for credit need consent of the instructor, and should register for 36-608.

Fuller details on the class homepage, including a detailed (but subject to change) list of topics, and links to the compiled course notes. I'll post updates here to the notes for specific lectures and assignments, like last time.

This is the same course I taught last spring, only grown from sixty-odd students to (currently) ninety-three (from 12 different majors!). The smart thing for me to do would probably be to change nothing (I haven't gotten to re-teach a class since 2009), but I felt the urge to re-organize the material and squeeze in a few more topics.

The biggest change I am making is introducing some quality-control sampling. The course is to big for me to look over much of the students' work, and even then, that gives me little sense of whether the assignments are really probing what they know (much less helping them learn). So I will be randomly selecting six students every week, to come to my office and spend 10--15 minutes each explaining the assignment to me and answering live questions about it. Even allowing for students being randomly selected multiple times*, I hope this will give me a reasonable cross-section of how well the assignments are working, and how well the grading tracks that. But it's an experiment and we'll see how it goes.

* (exercise for the student): Find the probability distribution of the number of times any given student gets selected. Assume 93 students, with 6 students selected per week, and 14 weeks. (Also assume no one drops the class.) Find the distribution of the total number of distinct students who ever get selected.

Posted at January 03, 2012 23:00 | permanent link