September 24, 2011

Next Week at the Statistics Seminar; Week After Next at the Machine Learning Seminar

There's not much connection between the talks, other than that they should both be great, and I don't feel like writing two posts.

Ronald Coifman, "Analytic Organization of Observational Databases as a Tool for Learning and Inference"
Abstract: We describe a mathematical framework to learn and organize databases without incorporation of expert information. The database could be a matrix of a linear transformation for which the goal is to reorganize the matrix so as to achieve compression and fast algorithms. Or the database could be a collection of documents and their vocabulary, an array of sensor measurements such as EEG, or a financial time series or segments of recorded music. If we view the database as a questionnaire, we organize the population into a contextual demographic diffusion geometry and the questions into a conceptual geometry; this is an iterative process in which each organization informs the other, with the goal of entropy reduction of the whole data base.
This organization being totally data agnostic applies to the other examples thereby generating automatically a data driven conceptual/contextual pairing. We will describe the basic underlying tools from Harmonic Analysis for measuring success in extracting structure, tools which enable functional regression prediction and basically signal processing methodologies.
Time and Place: 4:30--5:30 pm on Monday, 26 September 2011 in Baker Hall, Giant Eagle Auditorium (A51)
Alex Smola, "Scaling Machine Learning to the Internet"
Abstract: In this talk I will give an overview over an array of highly scalable techniques for both observed and latent variable models. This makes them well suited for problems such as classification, recommendation systems, topic modeling and user profiling. I will present algorithms for batch and online distributed convex optimization to deal with large amounts of data, and hashing to address the issue of parameter storage for personalization and collaborative filtering. Furthermore, to deal with latent variable models I will discuss distributed sampling algorithms capable of dealing with tens of billions of latent variables on a cluster of 1000 machines.
The algorithms described are used for personalization, spam filtering, recommendation, document analysis, and advertising.
Time and Place: 3--4 pm on Thursday, 6 October in Gates Hall 8102

As always, both talks are free and open to the public.

