Data Mining (36-350) Lecture Notes, Weeks 1--3
These handouts are shamelessly ripped off derivative work,
amplifying and expanding those created
by Tom Minka when he
invented this course. (See his
originals here.) Posted
here in response to a number (> 1) of requests.
Lecture 5 is also a shameless rip-off explication
of Aleks Jakulin's
"Quantifying and Visualizing Attribute Interactions"
(cs.AI/0308002).
Note to students in 36-350: This page will not keep up to date with
the handouts, or with other course documents; use Blackboard!
- Searching
Documents by Similarity (28 August 2006). Why similarity search? Defining
similarity and distance. The bag-of-words representation. Normalizations.
Some results.
- More on
Similarity Search (30 August 2006). Stemming, linguistic issues. Picking
out good features, or at least ignoring non-discriminative ones. Inverse
document frequency. Using feedback from the searcher.
- Searching
Images by Similarity (6 September 2006). Representation and
abstraction. How to search images without looking at images; a failure-mode.
The bag-of-colors representation. More examples. Invariance and
representation. See also: slides
illustrating this lecture.
- Finding
Informative Features (11--13 September 2006). More on finding good features.
Entropy and uncertainty. Information and entropy. Ranking features by
informativeness. Examples.
- Interactions
Among Features (18 September 2006). Redundancy and enhancement of
information. Information-sharing graphs. Examples.
Corrupting the
Young;
Enigmas of
Chance
Posted at September 16, 2006 12:56 | permanent link