Adding noise to PCA to get a statistical model. The factor analysis model, or linear regression with unobserved independent variables. Assumptions of the factor analysis model. Implications of the model: observable variables are correlated only through shared factors; "tetrad equations" for one factor models, more general correlation patterns for multiple factors. (Our first look at latent variables and conditional independence.) Geometrically, the factor model says the data have a Gaussian distribution on some low-dimensional plane, plus noise moving them off the plane; and that is all. Estimation by heroic linear algebra; estimation by maximum likelihood. The rotation problem, and why it is unwise to reify factors. Other models which produce the same correlation patterns as factor models; in particular the Thomson sampling model, in which the appearance of factors arises from not knowing what the real variables are or how to measure them.
PDF handout; lecture-18.R computational examples you should step through (not done in class); correlates of sleep in mammals data set for those examples; thomson-model.R
Update, 9 April: A correspondent points me to this tweet, in what I can only call a "let's you and him fight" spirit. While the implicit charge against me by Adams is not without some justice, if you don't want this to happen, you really shouldn't brag about how many beauty pageants your child has won, or for that matter dress the poor beast in such funny clothes.
Posted at March 30, 2011 23:06 | permanent link