Statistics with Structured Data
Last update: 08 Dec 2024 12:14
First version: 5 February 2009
A lot of statistical theory and methods are
developed for fairly unstructred data --- each data-point is just that, a
single unarticulated point in some space. What to do when observations have
complicated internal structure?
One option: Break the structures into a collection of dependent
unstructured observations, i.e., each observation is a realization of a
non-trivial stochastic process. Examples: multivariate analysis (including its
grown-up version, graphical models),
time
series, spatial
statistics, network data analysis. Time
series and spatial statistics are much better developed than network statistics
not least because the dependency structures there are much simpler.
Directed networks give us essentially arbitrary binary relations; hypergraphs
arbitrary relational structures. This threatens (or promises) to bring up all sorts of issues from logic.
Can one do inference on the relational structure of complex observations?
How? (Grammatical inference, for
instance? Community discovery?)
See also:
Data Mining;
Machine Learning, Statistical Inference and Induction;
Statistics on Manifolds
Recommended, big-picture:
- Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning [Official blurb, Lise's book site with more links]
- Ulf Grenander, Elements of Pattern Theory
Recommended, miscellaneous close-ups:
- Tommi S. Jaakkola and David Haussler, "Exploiting generative models
in discriminative classifiers", NIPS 11 (1998)
[PDF]
- Leonid Peshkin, "Structure induction by lossless graph compression",
cs.DS/0703132 [Adapting
data-compression ideas to discover hierarchical structures in graphs, e.g., the
4 bases from a tinker-toy model of DNA.]
To read:
- Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer, "Online Learning
of Complex Prediction Problems Using Simultaneous
Projections", Journal of
Machine Learning Research 9 (2008): 1399--1435
- Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar and S. V. N. Vishwanathan (eds.), Predicting Structred Data [blurb]
- Peter J. Green, Peter, Nils Lid Hjort and Sylvia Richardson (eds.),
Highly Structured Stochastic Systems
- Ulf Grenander and Michael Miller, Pattern Theory:
From Representation to Inference
- Brijnesh J. Jain, Klaus Obermayer
- Fionn Murtagh, "Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy", arxiv:0805.2744
- Haonan Wang, J. S. Marron, "Object oriented data analysis: Sets of
trees", arxiv:0711.3147 ["Object
oriented data analysis is the statistical analysis of populations of complex
objects"]