Since class begins Monday, this is a good time for the public website to make its appearance. As before, lecture notes will also be posted here; you can use the RSS feed for this entry to keep track of them.

- Introduction to the course (25 August)
- Information retrieval and similarity searching (25 August)
- Multidimensional scaling and a first glance and classification (27 August)
- A little about page-rank (29 August)

Homework #1, due 8 September: assignment, R, newsgroups.tgz data file

Solutions - Image search, abstraction and invariance; the accompanying slides (8 September)
- Finding informative features (10 September)

Additional reading: David Feldman, "Introduction to Information Theory", chapter 1 - Information and interaction among
features (12 September)

Additional reading: Aleks Jakulin and Ivan Bratko, "Quantifying and Visualizing Attribute Interactions", arxiv:cs.AI/0308002

Homework #2, due 22 September: assignment

solutions, solutions code

Note: Information theory, axiomatic foundations, connections to statistics — elaboration on some points raised in lecture (12 September) - Categorization: types of categorization, basic classifiers and finding simple clusters in data (15 September)
- Hierarchical clustering; how many clusters? (17 September)
- Yet more clustering (19 September; slides)
- Making better features: transformations, principal components (22 September)
- Mathematics of principal components analysis; interpretations and limitations of PCA (24 September)
- Yet
more on linear dimensionality reduction: PCA + information retrieval =
Latent semantic indexing. Factor analysis: motivations, historical roots,
preliminaries to estimation (26 September)

Optional reading: Deerwester et al., "Indexing by Latent Semantic Analysis" [PDF]

Optional reading: Landauer and Dumais, "A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge" [PDF]

Optional reading: Thurstone, "The Vectors of Mind"

Home #3, due 3 October: assignment - More on factor analysis: estimation and the rotation problem (29 September)
- Principal Components versus Factor Analysis: worked examples, basic goodness-of-fit testing for factor analysis; R code for lecture (1 October)
- The truth about principal components
and factor analysis: strengths, limitations, factor models as graphical
models, factor models and mixture models, Thomson's sampling model; R code for Thomson's model (3 October)

Homework #4, due Friday, 10 October: assignment,`nci.kmeans`,`nci.pca2.kmeans` - Regression: predicting quantiative features: point prediction; expectations and mean-square optimality; regression functions; regression as smoothing; linear regression as linear smoothing; other kinds of linear smoothers; nearest-neighbor regression; kernel regression. R code for figures, data for running example (6 October)
- The truth about linear regression: optimal linear prediction; shifting distributions and omitted variables; rights and obligations of probabilistic assumptions; abuses of linear regression; how to hurt angels (8 October)
- Extending linear regression: weighted least-squares, heteroskedasticity, local linear regression. R code for figures, data for running example (10 October)
- Mid-term review (13 October; no hand-out)
- Mid-term: exam, solutions (15 October)
- Evaluating preditive models: in-sample and generalization error; over-fitting and under-fitting; model selection, capacity control, cross-validation. R for figures. (20 October)
- Using cross-validation: mechanics and examples (22 October; notes forthcoming)
- Using non-parametric smoothing: adaptive smoothing, testing parametric
forms (24 October; notes forthcoming)

Homework #5, due Friday, 31 October: assignment; solutions - Prediction trees 1: mostly regression trees, plus a "classification tree we can believe in" (27 October)
- Prediction trees 2: classification trees (29 October and 3 November)
- Bootstrapping, Bagging, and Random Forests (5 November)
- Combining Predictive Models and the Power of Diversity (7 November)
- Linear Classifiers and the Perceptron Algorithm (10 November)
- Logistic Regression and Newton's Method (12 November)

Homework #7, due Friday, 21 November: assignment; solutions - Neural Networks: The Mathematical Reality (14 November)
- Neural Networks: The Biological Myth (17 November)
- Support Vector Machines (19 November)
- Support vector machines continued (21 November; same handout as previous)

Homework #8, due Monday, 1 December: assignment; solutions - The Lecture Full of Fail: The wrong data, lying data, covariate shift,
low base-rates and overwhelming false positives, response
Waste, fraud and abuse (24 November)

Homework #9, due 15 December: assignment; solutions

Posted at December 28, 2008 10:49 | permanent link