Notebooks

Statistics

09 Mar 2024 13:39

An application of probability, with intimate ties to machine learning, non-demonstrative inference and induction.

Since June 2005, I have been a professor of statistics. This made me interested in how to teach it.

See also: Properties vs. principles in defining "good statistics"

Things I need to learn more about:
Dependent data
Statistical inference for stochastic processes, a.k.a. time-series analysis. Signal processing and filtering. Spatial statistics. Spatio-temporal statistics.
Model selection
Especially: adapting to unknown characteristics of the data, like unknown noise distributions, or unknown smoothness of the regression function.
Model discrimination
That is, designing experiments so as to discriminate between competing classes of model. Adaptation to data issues here.
Rates of convergence of estimators to true values
Empirical process theory. (Cf. some questions in ergodic theory).
Estimating distribution functions
And estimating entropies, or other functionals of distributions.
Non-parametric methods
Both those that are genuinely distribution-free, and those that would more accurately be mega-parametric (even infinitely-parametric) methods, such as neural networks
Regression
Bootstrapping and other resampling methods
Cross-validation
Sufficient statistics
Exponential families
Information Geometry
Partial identification of parametric statistical models
Causal Inference
Decision theory
Conventional, and the sorts with some connection to how real decisions are made.
Graphical models
Monte Carlo and other simulation methods
"De-Bayesing"
Ways of taking Bayesian procedures and eliminating dependence on priors, either by replacing them by initial point-estimates, or by showing the prior doesn't matter, asymptotically or hopefully sooner. See: Frequentist consistency of Bayesian procedures.
Computational Statistics
Statistics of structured data
Statistics on manifolds
i.e., what to do when the data live in a continuous but non-Euclidean space.
Grammatical Inference
Factor analysis
Mixture models
Multiple testing
Predictive distributions
... especially if they have confidence/coverage properties
Density estimation
especially conditional density estimation; and density estimation on graphical models
Indirect inference
"Missing mass" and species abundance problems
I.e., how much of the distribution have we not yet seen?
Independence Tests, Conditional Independence Tests, Measures of Dependence and Conditional Dependence
Two-Sample Tests
Statistical Emulators for Simulation Models
Hilbert Space Methods for Statistics and Probability
Large Deviations and Information Theory in the Foundations of Statistics
Confidence Sets, Confidence Intervals
Nonparametric Confidence Sets for Functions
Conformal prediction
Optimal Linear Prediction and Estimation


Notebooks: