May 14, 2003

A Fat Young Man without a Good Word for Any Regression

DSquared has a fun post about data-mining, econometrics, and why most statistical studies on controversial questions are apt to be worthless (permalinks bloggered, "Log Books" on 13 May '03). You're all but guaranteed to get a good fit to your data if you tweak your model enough, but then that fit is also all but guaranteed to be meaningless. (Daniel Davies and Myron Scholes appear to think alike on many subjects.) You might then wonder why anyone with self-respect would say they do data-mining, much less very sharp computer scientists and statisticians. The answer is that by "data-mining", they mean "statistical learning", and their name for what Davies mocks is "snooping", "hunting", "dredging" or "acting like a social scientist".

Sadly, I don't think I can steal any of DSquared's lines for the section on data-mining in my methods chapter, but it does make me feel better about spending five pages on just why over-fitting is a Really Bad Idea. (And I need to look up this LSE Econometrics and PcGets stuff.)

Enigmas of Chance

