This is the *undergraduate* "advanced data analysis", not to be
confused with the graduate projects course
I'm teaching right now. Actually, they used to be much
more similar, but due to the uncanny growth of the undergraduate major, I will
have seventy or so students in 402, and all of them doing projects is more than
we can cope with. (My inner
economist says that the statistics department should leave the curriculum
alone and just keep raising the threshold for passing our classes until the
demand for being a statistics major balances the supply of faculty energy, as
per Parkinson's "The Short List, or Principles of Selection",
but fortunately no one listens to my inner economist.)
So about a dozen will do projects in 36-490, as last
year, and everyone will learn about methods.

- 36-402, Advanced Data Analysis, Spring 2011
*Description*: This course concentrates on methods for the analysis of data, building on the theory and application of the linear model from 36-401. Real-world examples will be drawn from a variety of fields.*Prerequisites*: 36-401 (modern regression), or an equivalent class, with my permission.*Topics*Tentative, and grouped by theme; presentation order will vary*Model evaluation*: statistical inference, prediction, and scientific inference; in-sample and out-of-sample errors, generalization and over-fitting, cross-validation; evaluating by simulating; bootstrap; information criteria and their limits; mis-specification checks*Yet More Regression*: regression = estimating the conditional expectation function; lightning review of ordinary least linear regression and what it is really doing; analysis of variance; limits of linear OLS; extensions: weighted least squares, basis functions; ridge regression and lasso.*Smoothing*: kernel smoothing, including local polynomial regression; splines; additive models; classification and regression trees; kernel density estimation*GAMs*: linear classifiers; logistic regression; generalized linear models; generalized additive models.*Latent variables and structured data*: principal components; factor analysis and latent variables; graphical models in general; latent cluster/mixture models; random effects; hierarchical models*Causality*: graphical causal models; estimating causal effects; discovering causal structure

*Time and place*: 10:30--11:50 Tuesdays and Thursdays in Porter Hall 100*Textbook*: Julian Faraway, Extending the Linear Model with R (Chapman Hall/CRC Press, 2006, ISBN 978-1-58488-424-8) will be**required**. (Faraway's page on the book, with help and errata.) There may be other**optional**books.*Mechanics*: nearly-weekly problem sets (mostly analyzing data sets, a*little*programming) will be due on Tuesdays; mid-term exam; final exam.*Computing*: You will be expected, and in some assignments required, to use the R programming language. All assignments will need a computer. Let me know at once if this will be a problem.*Office hours*: Monday 2--4 pm in Baker Hall 229C, or by appointment.

**Update**, 15 November: The class webpage will be
here. Also: this is the
same class as 36-608; graduate students should register under the latter
number.

Posted at November 08, 2010 16:20 | permanent link