Notebooks

## Statistical Computing and Programming

22 Aug 2019 15:07

By this I do not just mean R, but R is a big part of being a working academic statistician these days...

R, for the record, is a free, open-source interpreted programming language (and interactive environment) for statistical computing. It descends from a language developed at Bell Labs (of blessed memory) called S. There is a commercial descendant of S called S-plus, but I know of no reason to use it, rather than R. For that matter, I know of no reason to use any of the commercial statistical environments (Stata, SPSS, Minitab, ...) rather than R, except for pesonal and organizational inertia. (Which is not to be slighted, of course.) The only real alternative, from my point of view, is hand-written code in something like C/C++ or Fortran --- which can of course be integrated with R. It would be a bit unfair to say that seeing a new method without an R implementation is cause for suspicion, but not wildly unfair.

(And, of course, people who use Excel to do statistics are perhaps to be pitied, but not to be taken seriously.)

— I am drawing a somewhat arbitrary terminological divide between "statistical computing", meaning computing environments for statistical data analysis, and "computational statistics", meaning computational methods of special relevance to statistical problems, or tricky or interesting computational problems arising from statistical problems. (One might even call it "numerical methods for statistics", except that some of the most relevant algorithms aren't very numerical.) When I teach statistical computing, some of it is computational statistics, and some of it is just plain programming, but lots of it is stuff like data manipulation, and reproducibility of the analysis...

Recommended, big picture:
• John M. Chambers, Software for Data Analysis: Programming with R
Recommended, gentle introductions:
• W. John Braun and Duncan J. Murdoch, A First Course in Statistical Programming with R [They're not kidding about being a first course --- experienced programmers may find it irritatingly slow-paced --- but they do a rather good job for total novices.]
• Norman Matloff, The Art of R Programming: A Tour of Statistical Software Design
Recommended, tool sources:
• The R Project for Statistical Computing
• Journal of Statistical Software
Recommended, close-ups of particular tools (very inadequate):
• Winson Change, R Graphics Cookbook
• Julian J. Faraway, Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models
• Tristen Hayfield and Jeffrey S. Racine, "Nonparametric Econometrics: The np Package", Journal of Statistical Software 27 (2008): 5 [An extremely useful little R package]
• Michael Kane, John W. Emerson, Stephen Weston, "Scalable Strategies for Computing with Massive Data", Journal of Statistical Software 55 (2013): 14
• Phil Spector, Data Manipulation with R
• Paul Teetor, R Cookbook
• Hadley Wickham, "The Split-Apply-Combine Strategy for Data Analysis", Journal of Statistical Software 40 (2011): 1
• Joseph Adler, R in a Nutshell [Glowing review in J. Stat. Soft.]
• Adrian W. Bowman and Adelchi Azzalini, Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations
• Richard Cotton, Learning R
• Garrett Grolemund, Hands-On Programming with R: Write Your Own Functions and Simulations
• Owen Jones and Robert Maillardet and Andrew Robinson, Introduction to Scientific Programming and Simulation Using R
• Ben Klemens, Modeling with Data: Tools and Techniques for Scientific Computing [JSTOR; author's book site]
• Matthias Kohl and Peter Ruckdeschel, "R Package distrMod: S4 Classes and Methods for Probability Models", Journal of Statistical Software 35 (2010): 10 [Use this for re-writing the power law code?]
• John Maidonald, Data Analysis and Graphics Using R
• Quinn E. McCallum, Parallel R
• Wes McKinney, Python for Data Analysis
• Maria L. Rizzo, Statistical Computing with R