"Analyzing large-scale data: Taxi Tipping behavior in NYC" (This Week at the Statistics Seminar)
Attention conservation notice: Only of interest if you (1) care about large-scale data analysis and/or taxis, and (2) will be in Pittsburgh on Thursday Friday.
The last but by no means least talk seminar talk this week:
- Taylor Arnold, "Analyzing large-scale data: Taxi Tipping behavior in NYC"
- Abstract: Statisticians are increasingly tasked with providing
insights from large streaming data sources, which can quickly grow to be
terabytes or petabytes in size. In this talk, I explore novel approaches for
applying classical and emerging techniques to large-scale
datasets. Specifically, I discuss methodologies for expressing estimators in
terms of the (weighted) Gramian matrix and other easily distributed summary
statistics. I then present an abstraction layer for implementing chunk-wise
algorithms that are interoperable over many parallel and distributed software
frameworks. The utility and insights garnered from these methods are shown
through an application to an event based dataset provided by the New York City
Taxi and Limousine Commission. I have joined these observations, which detail
every registered taxicab trip from 2009 to the present, with external sources
such as weather conditions and demographics. I use the aforementioned
techniques to explore factors associated with taxi demand and the tipping
behavior of riders. My focus is on developing novel techniques to facilitate
interactive exploratory data analysis and to construct interpretable models at
scale.
- Time and place:
4:30--5:30 pm on Thursday, 25 February 2016, in Baker Hall A51
4:30--5:30 pm on Friday, 26 February 2016, in Baker Hall A51
As always, the talk is free and open to the public.
Update: Dr. Arnold's talk has been pushed back a day due to
travel delays.
Enigmas of Chance
Posted at February 25, 2016 11:16 | permanent link