"Tidy Data" (Next Week at the Statistics Seminar)
Attention conservation notice: Only of interest if you (1)
do statistical computing and (2) will be in Pittsburgh on Monday.
For those of us who use R all the
time, next week's speaker needs no introduction. I can't make the kids
in statistical computing attend next week's
seminar, but I probably ought to.
- Hadley Wickham, "Tidy Data"
- Abstract: It's often said that 80 percent of the effort of
analysis is spent just getting the data ready to analyze, the process of data
cleaning. Data cleaning is not only a vital first step, but it is often
repeated multiple times over the course of an analysis as new problems come to
light. Despite the amount of time it takes up, there has been little research
on how to do clean data well. Part of the challenge is the breadth of
activities that cleaning encompasses, from outlier checking to data parsing to
missing value imputation. To get a handle on the problem, this talk focuses on
a small, but important, subset of data cleaning that I call data "tidying":
getting the data in a format that is easy to manipulate, model, and
visualize.
- In this talk you'll see some of the crazy data sets that I've struggled
with over the years, and learn the basic tools for making messy data tidy.
I'll also discuss tidy tools, tools that take tidy data as input and return
tidy data as output. The idea of a tidy tool is useful for critiquing existing
R functions, and will help to explain why some tasks that seem like they should
be easy are in fact quite hard. This work ties together reshape2, plyr and
ggplot2
with a consistent philosophy of data. Once you master this data format, you'll
find it much easier to manipulate, model and visualize your data.
- Time and place: 4--5 pm on Monday, 5 December 2011, in Doherty Hall A310
As always, the talk is free and open to the public. R groupies should
however contain themselves while Prof. Wickham is speaking.
Enigmas of Chance
Posted at November 30, 2011 15:00 | permanent link