January 26, 2013

Lecture: Model Evaluation, Error and Inference (Advanced Data Analysis from an Elementary Point of View)

Lecture 3, Model evaluation: error and inference. Statistical models have three main uses: as ways of summarizing (reducing, compressing) the data; as scientific models, facilitating actually scientific inference; and as predictors. Both summarizing and scientific inference are linked to prediction (though in different ways), so we'll focus on prediction. In particular for now we focus on the expected error of prediction, under some particular measure of error. The distinction between in-sample error and generalization error, and why the former is almost invariably optimistic about the latter. Over-fitting. Examples of just how spectacularly one can over-fit really very harmless data. A brief sketch of the ideas of learning theory and capacity control. Data-set-splitting as a first attempt at practically controlling over-fitting. Cross-validation for estimating generalization error and for model selection. Justifying model-based inferences.

Reading: Notes, chapter 3 (R)
Cox and Donnelly, ch. 6

Advanced Data Analysis from an Elementary Point of View

Posted at January 26, 2013 21:36 | permanent link

Three-Toed Sloth