March 30, 2011

Principal Components Analysis (Advanced Data Analysis from an Elementary Point of View)

Principal components: the simplest, oldest and most robust of dimensionality-reduction techniques. PCA works by finding the line (plane, hyperplane) which passes closest, on average, to all of the data points. This is equivalent to maximizing the variance of the coordinates of projections on to the line/plane/hyperplane. Actually finding those principal components reduces to finding eigenvalues and eigenvectors of the sample covariance matrix. Why PCA is a data-analytic technique, and not a form of statistical inference. An example with cars. PCA with words: "latent semantic analysis"; an example with real newspaper articles. Visualization with PCA and multidimensional scaling. Cautions about PCA; the perils of reification; illustration with genetic maps.

PDF handout, pca.R for examples, cars data set, R workspace for the New York Times examples

Advanced Data Analysis from an Elementary Point of View

Posted at March 30, 2011 23:05 | permanent link

Three-Toed Sloth