## Statistics with Structured Data

*27 Feb 2017 16:30*

A lot of statistical theory and methods are developed for fairly unstructred data --- each data-point is just that, a single unarticulated point in some space. What to do when observations have complicated internal structure?

One option: Break the structures into a collection of *dependent*
unstructured observations, i.e., each observation is a realization of a
non-trivial stochastic process. Examples: multivariate analysis (including its
grown-up version, graphical models),
time
series, spatial
statistics, network data analysis. Time
series and spatial statistics are much better developed than network statistics
not least because the dependency structures there are *much simpler*.
Directed networks give us essentially arbitrary binary relations; hypergraphs
arbitrary relational structures. This threatens (or promises) to bring up all sorts of issues from logic.

Can one do inference on the relational structure of complex observations? How? (Grammatical inference, for instance? Community discovery?)

See also: Data Mining; Machine Learning, Statistical Inference and Induction; Statistics on Manifolds

- Recommended, big-picture:
- Lise Getoor and Ben Taskar (eds.), Introduction to Statistical Relational Learning [Official blurb, Lise's book site with more links]
- Ulf Grenander, Elements of Pattern Theory

- Recommended, miscellaneous close-ups:
- Tommi S. Jaakkola and David Haussler, "Exploiting generative models in discriminative classifiers", NIPS 11 (1998) [PDF]
- Leonid Peshkin, "Structure induction by lossless graph compression", cs.DS/0703132 [Adapting data-compression ideas to discover hierarchical structures in graphs, e.g., the 4 bases from a tinker-toy model of DNA.]

- To read:
- Yonatan Amit, Shai Shalev-Shwartz, Yoram Singer, "Online Learning
of Complex Prediction Problems Using Simultaneous
Projections", Journal of
Machine Learning Research
**9**(2008): 1399--1435 - Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar and S. V. N. Vishwanathan (eds.), Predicting Structred Data [blurb]
- Peter J. Green, Peter, Nils Lid Hjort and Sylvia Richardson (eds.), Highly Structured Stochastic Systems
- Ulf Grenander and Michael Miller, Pattern Theory: From Representation to Inference
- Brijnesh J. Jain, Klaus Obermayer
- "Structure Spaces", Journal of Machine Learning Research
**10**(2009): 2667--2714 - "Learning in Riemannian Orbifolds", arxiv:1204.4294

- "Structure Spaces", Journal of Machine Learning Research
- Fionn Murtagh, "Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy", arxiv:0805.2744
- Haonan Wang, J. S. Marron, "Object oriented data analysis: Sets of trees", arxiv:0711.3147 ["Object oriented data analysis is the statistical analysis of populations of complex objects"]