Notebooks

Statistics

Last update: 13 Dec 2024 21:14
First version: 15 February 2000; substantial edit, 5 December 2024

Since June 2005, I have been a professor of statistics. My considered view, after observing the discipline from within and without for some time, is that the discipline of statistics is (at least) three things at once.

  1. One of the mathematical sciences, elaborating on certain applications of probability which are closely related to (or just are) non-demonstrative inference and induction.
  2. A branch of engineering, concerned with systems and procedures for drawing reliable inferences from partial and noisy data. Like any engineering discipline, it has both prescriptive and descriptive parts; embraces design, analysis, implementation, operation, critique and repair; has a place for theoretical studies of simplified model set-ups to guide practice, etc., etc.
  3. A form of rhetoric appropriate to persuading highly numerate audiences. This is the least common perspective --- I learned it from Abelson's Statistics as Principled Argument --- but I find it's immediately convincing to anyone who's familiar with both statistics and classical rhetoric.

The first view is what you'll get from most textbooks. The second perspective is an explicit part of my teaching, but then, my school did begin life as "Carnegie Tech". I don't think the third viewpoint is taken in teaching anywhere, though a school where it was would be a very interesting place to teach. (Maybe you could get away with it at St. John's, or even the University of Chicago?) See also: Properties vs. principles in defining "good statistics"

Things I work on / am interested in / need to learn more about:
Teaching Statistics
Dependent data
Statistical inference for stochastic processes, a.k.a. time-series analysis. Signal processing and filtering. Spatial statistics. Spatio-temporal statistics.
Model selection
Especially: adapting to unknown characteristics of the data, like unknown noise distributions, or unknown smoothness of the regression function. Inference after model selection.
Model discrimination
That is, designing experiments so as to discriminate between competing classes of model. Adaptation to data issues here.
Rates of convergence of estimators to true values
Empirical process theory. (Cf. some questions in ergodic theory).
Estimating distribution functions
And estimating entropies, or other functionals of distributions.
Non-parametric methods
Both those that are genuinely distribution-free, and those that would more accurately be mega-parametric (even infinitely-parametric) methods, such as neural networks
Regression
Bootstrapping and other resampling methods
Cross-validation
Sufficient statistics
Exponential families
Information Geometry
Partial identification of parametric statistical models
Causal Inference
Decision theory
Conventional, and the sorts with some connection to how real decisions are made.
Graphical models
Monte Carlo and other simulation methods
"De-Bayesing"
Ways of taking Bayesian procedures and eliminating dependence on priors, either by replacing them by initial point-estimates, or by showing the prior doesn't matter, asymptotically or hopefully sooner. See: Frequentist consistency of Bayesian procedures.
Computational Statistics
Statistics of structured data
Statistics on manifolds
i.e., what to do when the data live in a continuous but non-Euclidean space.
Grammatical Inference
Factor analysis
Mixture models
Multiple testing
Predictive distributions
... especially if they have confidence/coverage properties
Density estimation
especially conditional density estimation; and density estimation on graphical models
Indirect inference
And other species of simulation-based inference
"Missing mass" and species abundance problems
I.e., how much of the distribution have we not yet seen?
Independence Tests, Conditional Independence Tests, Measures of Dependence and Conditional Dependence
Two-Sample Tests
Statistical Emulators for Simulation Models
Hilbert Space Methods for Statistics and Probability
Large Deviations and Information Theory in the Foundations of Statistics
Confidence Sets, Confidence Intervals
Nonparametric Confidence Sets for Functions
Conformal prediction
Optimal Linear Prediction and Estimation
Empirical Likelihood
(Decision, Classification, Regression, Prediction) Trees in Statistics and Machine Learning


Notebooks: