August 17, 2009

Course Announcement: 36-350, Data Mining, Fall 2009

Title: 36-350, Statistical Data Mining
Prereqs: One of 36-226, 36-310, 36-625, or consent of instructor. In addition, familiarity with vectors and matrices, and comfort with programming, will be very helpful.
Lectures: MWF 10:30--11:20, Porter Hall 226B. (The on-line class schedule thinks the Friday lecture is a lab; it's wrong.)
Course description:
Data mining is the art of extracting useful patterns from large bodies of data; finding seams of actionable knowledge in the raw ore of information. The rapid growth of computerized data, and the computer power available to analyze it, creates great opportunities for data mining in business, medicine, science, government, etc. The aim of this course is to help you take advantage of these opportunities in a responsible way. After taking the class, when you're faced with a new problem, you should be able to (1) select appropriate methods, and justify their choice, (2) use and program statistical software (i.e., R) to implement them, and (3) critically evaluate the results and communicate them to colleagues in business, science, etc.
Data mining is related to statistics and to machine learning, but has its own aims and scope. Statistics is a mathematical science, studying how reliable inferences can be drawn from imperfect data. Machine learning is a branch of engineering, developing a technology of automated induction. We will freely use tools from statistics and from machine learning, but we will use them as tools, not things to study in their own right. We will do a lot of calculations, but will not prove many theorems, and we will do even more experiments than calculations.

The current topic outline, the grading policy, etc., can all be found on the class webpage. This will mostly be very similar to the 2008 iteration of the class, since it seemed to work, with some modifications in light of that experience. Podcast lectures are probably not going to happen, owing to technical incompetence on my part.

