Statistical Computing and Programming

08 May 2016 23:29

By this I do not just mean R, but R is a big part of being a working academic statistician these days...

R, for the record, is a free, open-source interpreted programming language (and interactive environment) for statistical computing. It descends from a language developed at Bell Labs (of blessed memory) called S. There is a commercial descendant of S called S-plus, but I know of no reason to use it, rather than R. For that matter, I know of no reason to use any of the commercial statistical environments (Stata, SPSS, Minitab, ...) rather than R, except for pesonal and organizational inertia. (Which is not to be slighted, of course.) The only real alternative, from my point of view, is hand-written code in something like C/C++ or Fortran --- which can of course be integrated with R. It would be a bit unfair to say that seeing a new method without an R implementation is cause for suspicion, but not wildly unfair.

(And, of course, people who use Excel to do statistics are perhaps to be pitied, but not to be taken seriously.)

— I am drawing a somewhat arbitrary terminological divide between "statistical computing", meaning computing environments for statistical data analysis, and "computational statistics", meaning computational methods of special relevance to statistical problems, or tricky or interesting computational problems arising from statistical problems. (One might even call it "numerical methods for statistics", except that some of the most relevant algorithms aren't very numerical.) When I teach statistical computing, some of it is computational statistics, and some of it is just plain programming, but lots of it is stuff like data manipulation, and reproducibility of the analysis...

See also: Statistics; Teaching Statistics; Programming;