July 11, 2007

Too Much Information

Via Jay Han in e-mail comes an hour-long video of a very cool talk Marjorie Shapiro recently gave at Google on what physicists hope to learn from the eagerly-awaited LHC at CERN, in particular the ATLAS experiment, and on the way data from any high-energy experiment of this type gets made and massaged. (There are also PDF slides.) Given the audience, the emphasis is on the latter, which might sound duller than talking about strings or loops or even the Higgs boson, but which I think is really deeply impressive. Incredible challenges in "data engineering" arise when you need to design your system to keep less than one observation in a hundred thousand, and that still produces petabytes of data, which must be analyzed by a collaboration of two thousand physicists and engineers. The ways physicists achieve all this are worth pondering by anyone who has, or hopes to have, huge bodies of data concealing a rare, relevant pieces of information.

Prof. Shapiro taught Jay and me particle physics when we were both undergrads at Cal. Watching the talk reminded me of why I liked her class so much, while still making me glad I wound up in statistics.

