March 26, 2012

Networks, Crowds, and More Networks (This Week at the Statistics and Machine Learning Seminars)

Attention conservation notice: Only of interest if you (1) care about statistical models of networks or collective information-processing, and (2) will be in Pittsburgh this week.

I am behind in posting my talk announcements:

Andrew Thomas, "Marginal-Additive Models and Processes for Network-Correlated Outcomes"
Abstract: A key promise of social networks is the ability to detect and model the correlation of personal attributes along the structure of the network, in either static or dynamic settings. The basis for most of these models, the Markov Random Field on a lattice, has several assumptions that may not be reflected in real network data, namely the assumptions that the process is stationary on the lattice, and that the ties in the model are correctly specified. Additionally, it is less than clear how correlation over longer distances on networks can be adequately specified under the lattice mechanism, given the assumption of a stationary process at work.
Based on concepts from generalized additive models and spatial/geostatistical methods, I introduce a class of models that is more robust to the failure of these assumptions, more flexible to different definitions of network distance, and more generally applicable to large-scale studies of network phenomena. I apply this method to outcomes from two large-scale social network studies to demonstrate its use and versatility.
Time and place: 4--5 pm on Monday, 26 March 2012, in Scaife Hall 125
Sewoong Oh, "Learning from the Wisdom of the Crowd: Efficient Algorithms and Fundamental Limits"
Abstract: This talk is on designing extremely efficient and provably order-optimal algorithms to extract meaningful information from societal data, the kind of data that comes from crowdsourcing platforms like Amazon Mechanical Turk, or recommendation systems like the Netflix Challenge dataset. Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms where large-scale projects are broken into small tasks that are electronically distributed to numerous on-demand contributors. Because these low-paid workers can be unreliable, we need to devise schemes to increase confidence in our answers, typically by assigning each task multiple times and combining the answers in some way. I will present the first rigorous treatment of this problem, and provide both an optimal task assignment scheme (using a random graph) and an optimal inference algorithm (based on low-rank matrix approximation and belief propagation) for that task assignment. This approach significantly outperforms previous approaches and, in fact, is asymptotically order-optimal, which is established through comparisons to an oracle estimator. Another important problem in learning from the wisdom of the crowd is how to make product recommendations based on past user ratings. A common and effective way to model these user ratings datasets is to use low-rank matrices. In order to make recommendations, we need to predict the unknown entries of a ratings matrix. A natural approach is to find a low-rank matrix that best explains the observed entries. Motivated by this recommendation problem, my approach is to provide a general framework for recovering a low-rank matrix from partial observations. I will introduce a novel, efficient and provably order-optimal algorithm for this matrix completion problem. The optimality of this algorithm is established through a comparison to a minimax lower bound on what the best algorithm can do.
Time and place: 10--11 am on Wednesday, 28 March 2012, in Gates Hall 6115
Lise Getoor, "Collective Graph Identification"
Abstract: The importance of network analysis is growing across many domains, and is fundamental in understanding online social interactions, biological processes, communication, ecological, financial, transportation networks, and more. In most of these domains, the networks of interest are not directly observed, but must be inferred from noisy and incomplete data, data that was often generated for purposes other than scientific analysis. In this talk, I will introduce the problem of graph identification, the process of inferring the hidden network from noisy observational data. I will describe some of the component steps involved, and then I will describe a collective approach to graph identification, which interleaves the necessary steps in the accurate reconstruction of the network. Time permitting, I will also survey some of the work in my group on probabilistic databases, privacy, visual analytics, and active learning.
Time and place: 4:30--5:30 pm on Thursday, 29 March 2012, in Gates Hall 6115
As always, all talks are free and open to the public.

Enigmas of Chance; Networks; The Collective Use and Evolution of Concepts

Posted at March 26, 2012 10:00 | permanent link

Three-Toed Sloth