Notebooks

Learning Theory (Formal, Computational or Statistical)

24 Dec 2008 10:34

I qualify it to distinguish this area from the broader field of machine learning, which includes much more with lower standards of proof, and from the theory of learning in organisms, which might be quite different.

The basic set-up is as follows. We have a bunch of inputs and outputs, and an unknown relationship between the two. We do have a class of hypotheses describing this relationship, and suppose one of them is correct. (The hypothesis class is always circumscribed, but may be infinite.) A learning algorithm takes in a set of inputs and outputs, its data, and produces a hypothesis. Generally we assume the data are generated by some random process, and the hypothesis changes as the data change. The key notion is that of a probaably approximately correct learning algorithm --- one where, if we supply enough data, we can get a hypothesis with an arbitrarily small error, with a probability arbitrarily close to one.

Generally, PAC-results concern either (1) the existence of a PAC algorithm, (2) quantifying how much data we need, in terms of either accuracy or reliability, or (3) devising new PAC algorithms with other desirable properties. What frustrates me about this literature, and the reason I don't devote more of my research to it (aside, of course, from my sheer incompetence) is that almost all of it assumes the data are statistically independent and identically distributed. Then PAC-like results follow essentially from extensions of the ordinary Law of Large Numbers. What's really needed, however, is something more like an ergodic theorem, for suitably-dependent data.

See also: Frequentist Consistency of Bayesian Procedures; Statistics; Statistics of Structured Data; Universal Prediction Algorithms


Notebooks:     Hosted, but not endorsed, by the Center for the Study of Complex Systems