Course Announcement: "Statistical Principles of Generative AI" (Fall 2025)
Attention
conservation notice: Notice of a fairly advanced course in a
discipline you don't study at a university you don't attend. Combines a trendy
subject matter near the peak of
its hype cycle
with a stodgy, even resentful, focus on old ideas.
This fall will be the beginning of my 21st year at CMU. I should know
better than to volunteer to do a new prep --- but I don't.
- Special Topics in Statistics: Statistical Principles of Generative AI (36-473/36-673)
- Description: Generative artificial intelligence systems are
fundamentally statistical models of text and images. The systems are
very new, but they rely on well-established ideas about modeling and
inference, some of them more than a century old. This course will
introduce students to the statistical underpinnings of large language
models and image generation models, emphasizing high-level principles
over implementation details. It will also examine controversies about
generative AI, especially the "artificial general intelligence" versus
"cultural technology" debate, in light of those statistical
foundations.
- Audience: Advanced undergraduates in statistics and
closely related majors; MS students in statistics.
- Pre-requisites: 36-402 for undergraduates taking 36-473.
For MS students taking 36-473, preparation equivalent to 36-402 and all its
pre-req courses in statistics and mathematics. A course in stochastic
processes, such as 36-410, is helpful but not required.
- Expectations: All students can expect to do math, a lot of
reading (not just skimming of AI-generated summaries) and writing, and do some
small, desktop-or-laptop-scale programming exercises. Graduate students taking
36-673 will do small-group projects in the second half of the semester, which
will require more elaborate programming and data analysis (and, probably, the
use of departmental / university computing resources).
- Time and place: Tuesdays and Thursdays, 2:00--3:20 pm, Baker Hall A36
- Topical outline (tentative): Data compression and
generative modeling; probability, likelihood, perplexity, and information.
Large language models are high-order parametric Markov models fit by maximum
likelihood. First-order Markov models and their dynamics. Estimating
parametric Markov models by maximum likelihood and its asymptotics. Influence
functions for Markov model estimation. Back-propagation for automatic
differentiation. Stochastic gradient descent and other forms of stochastic
approximation. Estimation and dynamics for higher-order Markov models.
Variable-length Markov chains and order selection; parametric higher-order
Markov chains. Prompting as conditioning. Influence functions and source
attribution. Transformers; embedding discrete symbols into continuous vector
spaces. Identification issues with embeddings; symmetries and optimization.
"Attention", a.k.a. kernel smoothing. State-space modeling. Generative
diffusion models. Diffusion as a stochastic (Markov) process; a small amount
of stochastic calculus. Learning to undo diffusion. Mixture models.
Generative diffusion models vs. kernel density estimation.
Information-theoretic methods for diffusion density estimation. Combining
models of text and images. Prompting as conditioning, again.
All of this, especially the topical outline, is subject to revision as we
get closer to the semester actually starting.
Enigmas of Chance;
Corrupting the Young
Posted at April 22, 2025 12:55 | permanent link