Networks, Crowds, and More Networks (This Week at the Statistics and Machine Learning Seminars)
Attention conservation notice: Only of interest if you (1) care about statistical models of networks or collective information-processing, and (2) will be in Pittsburgh this week.
I am behind in posting my talk announcements:
- Andrew Thomas, "Marginal-Additive Models and Processes for Network-Correlated Outcomes"
- Abstract: A key promise of social networks is the ability to
detect and model the correlation of personal attributes along the structure of
the network, in either static or dynamic settings. The basis for most of these
models, the Markov Random Field on a lattice, has several assumptions that may
not be reflected in real network data, namely the assumptions that the process
is stationary on the lattice, and that the ties in the model are correctly
specified. Additionally, it is less than clear how correlation over longer
distances on networks can be adequately specified under the lattice mechanism,
given the assumption of a stationary process at work.
- Based on concepts from generalized additive models and
spatial/geostatistical methods, I introduce a class of models that is more
robust to the failure of these assumptions, more flexible to different
definitions of network distance, and more generally applicable to large-scale
studies of network phenomena. I apply this method to outcomes from two
large-scale social network studies to demonstrate its use and versatility.
- Time and place: 4--5 pm on Monday, 26 March 2012, in Scaife Hall 125
- Sewoong Oh, "Learning from the Wisdom of the Crowd: Efficient Algorithms and Fundamental Limits"
- Abstract: This talk is on designing extremely efficient and
provably order-optimal algorithms to extract meaningful information from
societal data, the kind of data that comes from crowdsourcing platforms like
Amazon Mechanical Turk, or recommendation systems like the Netflix Challenge
dataset. Crowdsourcing systems, like Amazon Mechanical Turk, provide platforms
where large-scale projects are broken into small tasks that are electronically
distributed to numerous on-demand contributors. Because these low-paid workers
can be unreliable, we need to devise schemes to increase confidence in our
answers, typically by assigning each task multiple times and combining the
answers in some way. I will present the first rigorous treatment of this
problem, and provide both an optimal task assignment scheme (using a random
graph) and an optimal inference algorithm (based on low-rank matrix
approximation and belief propagation) for that task assignment. This approach
significantly outperforms previous approaches and, in fact, is asymptotically
order-optimal, which is established through comparisons to an oracle
estimator. Another important problem in learning from the wisdom of the crowd
is how to make product recommendations based on past user ratings. A common and
effective way to model these user ratings datasets is to use low-rank
matrices. In order to make recommendations, we need to predict the unknown
entries of a ratings matrix. A natural approach is to find a low-rank matrix
that best explains the observed entries. Motivated by this recommendation
problem, my approach is to provide a general framework for recovering a
low-rank matrix from partial observations. I will introduce a novel, efficient
and provably order-optimal algorithm for this matrix completion problem. The
optimality of this algorithm is established through a comparison to a minimax
lower bound on what the best algorithm can do.
- Time and place: 10--11 am on Wednesday, 28 March 2012, in Gates Hall 6115
- Lise Getoor, "Collective Graph Identification"
- Abstract: The importance of network analysis is growing across
many domains, and is fundamental in understanding online social interactions,
biological processes, communication, ecological, financial, transportation
networks, and more. In most of these domains, the networks of interest are not
directly observed, but must be inferred from noisy and incomplete data, data
that was often generated for purposes other than scientific analysis. In this
talk, I will introduce the problem of graph identification, the process of
inferring the hidden network from noisy observational data. I will describe
some of the component steps involved, and then I will describe a collective
approach to graph identification, which interleaves the necessary steps in the
accurate reconstruction of the network. Time permitting, I will also survey
some of the work in my group on probabilistic databases, privacy, visual
analytics, and active learning.
- Time and place: 4:30--5:30 pm on Thursday, 29 March 2012, in Gates Hall 6115
As always, all talks are free and open to the public.
Enigmas of Chance;
The Collective Use and Evolution of Concepts
Posted at March 26, 2012 10:00 | permanent link