Statistics with Structured Data

Last update: 21 Apr 2025 21:17
First version: 5 February 2009

A lot of statistical theory and methods are developed for fairly unstructred data --- each data-point is just that, a single unarticulated point in some space. What to do when observations have complicated internal structure?

One option: Break the structures into a collection of dependent unstructured observations, i.e., each observation is a realization of a non-trivial stochastic process. Examples: multivariate analysis (including its grown-up version, graphical models), time series, spatial statistics, network data analysis. Time series and spatial statistics are much better developed than network statistics not least because the dependency structures there are much simpler. Directed networks give us essentially arbitrary binary relations; hypergraphs arbitrary relational structures. This threatens (or promises) to bring up all sorts of issues from logic.

Can one do inference on the relational structure of complex observations? How? (Grammatical inference, for instance? Community discovery ?)

Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning [Official blurb, Lise's book site with more links]
Ulf Grenander, Elements of Pattern Theory

Tommi S. Jaakkola and David Haussler, "Exploiting generative models in discriminative classifiers", NIPS 11 (1998) [PDF]
Leonid Peshkin, "Structure induction by lossless graph compression", cs.DS/0703132 [Adapting data-compression ideas to discover hierarchical structures in graphs, e.g., the 4 bases from a tinker-toy model of DNA.]

Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer, "Online Learning of Complex Prediction Problems Using Simultaneous Projections", Journal of Machine Learning Research 9 (2008): 1399--1435
Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar and S. V. N. Vishwanathan (eds.), Predicting Structred Data [blurb]
Peter J. Green, Peter, Nils Lid Hjort and Sylvia Richardson (eds.), Highly Structured Stochastic Systems
Ulf Grenander and Michael Miller, Pattern Theory: From Representation to Inference
Brijnesh J. Jain, Klaus Obermayer
- "Structure Spaces", Journal of Machine Learning Research 10 (2009): 2667--2714
- "Learning in Riemannian Orbifolds", arxiv:1204.4294
Fionn Murtagh, "Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy", arxiv:0805.2744
Haonan Wang, J. S. Marron, "Object oriented data analysis: Sets of trees", arxiv:0711.3147 ["Object oriented data analysis is the statistical analysis of populations of complex objects"]