Data Mining (36-350) Lecture Notes, Weeks 1--3

These handouts are ~~shamelessly ripped off~~ derivative work, amplifying and expanding those created by Tom Minka when he invented this course. (See his originals here.) Posted here in response to a number (> 1) of requests.

Lecture 5 is also a ~~shameless rip-off~~ explication of Aleks Jakulin's "Quantifying and Visualizing Attribute Interactions" (cs.AI/0308002).

Note to students in 36-350: This page will not keep up to date with the handouts, or with other course documents; use Blackboard!

Searching Documents by Similarity (28 August 2006). Why similarity search? Defining similarity and distance. The bag-of-words representation. Normalizations. Some results.
More on Similarity Search (30 August 2006). Stemming, linguistic issues. Picking out good features, or at least ignoring non-discriminative ones. Inverse document frequency. Using feedback from the searcher.
Searching Images by Similarity (6 September 2006). Representation and abstraction. How to search images without looking at images; a failure-mode. The bag-of-colors representation. More examples. Invariance and representation. See also: slides illustrating this lecture.
Finding Informative Features (11--13 September 2006). More on finding good features. Entropy and uncertainty. Information and entropy. Ranking features by informativeness. Examples.
Interactions Among Features (18 September 2006). Redundancy and enhancement of information. Information-sharing graphs. Examples.

Corrupting the Young; Enigmas of Chance

Posted at September 16, 2006 12:56 | permanent link

Three-Toed Sloth

September 16, 2006

Data Mining (36-350) Lecture Notes, Weeks 1--3