36-402, Advanced Data Analysis, Spring 2011 (Course Announcement)
This is the undergraduate "advanced data analysis", not to be
confused with the graduate projects course
I'm teaching right now. Actually, they used to be much
more similar, but due to the uncanny growth of the undergraduate major, I will
have seventy or so students in 402, and all of them doing projects is more than
we can cope with. (My inner
economist says that the statistics department should leave the curriculum
alone and just keep raising the threshold for passing our classes until the
demand for being a statistics major balances the supply of faculty energy, as
per Parkinson's "The Short List, or Principles of Selection",
but fortunately no one listens to my inner economist.)
So about a dozen will do projects in 36-490, as last
year, and everyone will learn about methods.
- 36-402, Advanced Data Analysis, Spring 2011
- Description: This course concentrates on methods for the analysis
of data, building on the theory and application of the linear model from
36-401. Real-world examples will be drawn from a variety of fields.
- Prerequisites: 36-401 (modern regression), or an equivalent
class, with my permission.
- Topics Tentative, and grouped by theme; presentation order will
vary
- Model evaluation: statistical inference, prediction, and
scientific inference; in-sample and out-of-sample errors, generalization and
over-fitting, cross-validation; evaluating by simulating; bootstrap;
information criteria and their limits; mis-specification checks
- Yet More Regression: regression = estimating the
conditional expectation function; lightning review of ordinary least linear
regression and what it is really doing; analysis of variance; limits of linear
OLS; extensions: weighted least squares, basis functions; ridge regression and
lasso.
- Smoothing: kernel smoothing, including local polynomial
regression; splines; additive models; classification and regression
trees; kernel density estimation
- GAMs: linear classifiers; logistic regression; generalized
linear models; generalized additive models.
- Latent variables and structured data: principal
components; factor analysis and latent variables; graphical models in general;
latent cluster/mixture models; random effects; hierarchical models
- Causality: graphical causal models; estimating causal
effects; discovering causal structure
- Time and place: 10:30--11:50 Tuesdays and Thursdays in Porter Hall
100
- Textbook: Julian
Faraway, Extending the Linear Model with R (Chapman Hall/CRC
Press, 2006,
ISBN 978-1-58488-424-8)
will be required.
(Faraway's page on the book,
with help and errata.) There may be other optional
books.
- Mechanics: nearly-weekly problem sets (mostly analyzing data sets,
a little programming) will be due on Tuesdays; mid-term exam; final
exam.
- Computing: You will be expected, and in some assignments required,
to use the R programming language. All assignments will need a computer. Let
me know at once if this will be a problem.
- Office hours: Monday 2--4 pm in Baker Hall 229C, or by
appointment.
Update, 15 November: The class webpage will be
here. Also: this is the
same class as 36-608; graduate students should register under the latter
number.
Corrupting the Young;
Enigmas of Chance
Posted at November 08, 2010 16:20 | permanent link