April 22, 2025

Course Announcement: "Statistical Principles of Generative AI" (Fall 2025)

Attention conservation notice: Notice of a fairly advanced course in a discipline you don't study at a university you don't attend. Combines a trendy subject matter near the peak of its hype cycle with a stodgy, even resentful, focus on old ideas.

This fall will be the beginning of my 21st year at CMU. I should know better than to volunteer to do a new prep --- but I don't.

Special Topics in Statistics: Statistical Principles of Generative AI (36-473/36-673)
Description: Generative artificial intelligence systems are fundamentally statistical models of text and images. The systems are very new, but they rely on well-established ideas about modeling and inference, some of them more than a century old. This course will introduce students to the statistical underpinnings of large language models and image generation models, emphasizing high-level principles over implementation details. It will also examine controversies about generative AI, especially the "artificial general intelligence" versus "cultural technology" debate, in light of those statistical foundations.
Audience: Advanced undergraduates in statistics and closely related majors; MS students in statistics.
Pre-requisites: 36-402 for undergraduates taking 36-473. For MS students taking 36-473, preparation equivalent to 36-402 and all its pre-req courses in statistics and mathematics. A course in stochastic processes, such as 36-410, is helpful but not required.
Expectations: All students can expect to do math, a lot of reading (not just skimming of AI-generated summaries) and writing, and do some small, desktop-or-laptop-scale programming exercises. Graduate students taking 36-673 will do small-group projects in the second half of the semester, which will require more elaborate programming and data analysis (and, probably, the use of departmental / university computing resources).
Time and place: Tuesdays and Thursdays, 2:00--3:20 pm, Baker Hall A36
Topical outline (tentative): Data compression and generative modeling; probability, likelihood, perplexity, and information. Large language models are high-order parametric Markov models fit by maximum likelihood. First-order Markov models and their dynamics. Estimating parametric Markov models by maximum likelihood and its asymptotics. Influence functions for Markov model estimation. Back-propagation for automatic differentiation. Stochastic gradient descent and other forms of stochastic approximation. Estimation and dynamics for higher-order Markov models. Variable-length Markov chains and order selection; parametric higher-order Markov chains. Prompting as conditioning. Influence functions and source attribution. Transformers; embedding discrete symbols into continuous vector spaces. Identification issues with embeddings; symmetries and optimization. "Attention", a.k.a. kernel smoothing. State-space modeling. Generative diffusion models. Diffusion as a stochastic (Markov) process; a small amount of stochastic calculus. Learning to undo diffusion. Mixture models. Generative diffusion models vs. kernel density estimation. Information-theoretic methods for diffusion density estimation. Combining models of text and images. Prompting as conditioning, again.

All of this, especially the topical outline, is subject to revision as we get closer to the semester actually starting.

Enigmas of Chance; Corrupting the Young

Posted at April 22, 2025 12:55 | permanent link

Three-Toed Sloth