## Density Estimation on Graphical Models

*27 Feb 2017 16:30*

Suppose I am interested in the joint distribution of some random variables.
I know *a priori* that these random variables satisfy certain
conditional independence relations. To be concrete, let's say the Oracle shows
me the relevant graphical model inscribed on
a golden tablet, but my glasses don't let me read the actual conditional
distributions. Suppose I also have access to a source of samples from the
distribution. How can I combine the samples and the conditional independencies
to get an estimate of the distribution?

If I didn't have the graphical model, I could just use my favorite
non-parametric density estimator to learn
the underlying joint distribution. (I am fond of kernel density estimates, but
to each their own.) Presumably, in fact, if I ignore the graph and just do a
density estimate, it will, being consistent, *converge* on a
distribution with all the conditional independencies implied by the graph.
This would in a sense work, but it seems
*wasteful*. By hypothesis, I *know* some variables are
independent of others (given third parties); I should be able to use this
somehow to constrain the density estimates and converge faster.

This seems like such an obvious problem that *someone* must have
solved it already, but I haven't been able to find anyone doing so.
(Corrections on this point would be appreciated.) Since I don't have the time
right now to follow it up, what follows is one potential idea. If you want
to pursue this, please get in touch.

First of all, posit some reference distribution on the graph. (Say the
uniform distribution, in the appropriate sense.) Further posit that the actual
joint distribution is absolutely continuous w.r.t. the reference distribution.
(We'll hope to pick up singular distributions as limits.) So every
distribution we'll be concerned with has a density (Radon-Nikodym derivative).
So long as we're making posits, let's assume that the *log* density is
well-behaved. Now if I take the set of all such measures (not necessarily just
probability distributions) on the graph, their log-densities form a Hilbert
space. If I restrict myself to measures which are faithful to the
graph, *their* log-densities form a Hilbert *sub*-space. The
natural estimator which respects the conditional independence constraints is
then the projection of the samples' empirical distribution on to the faithful
sub-space. To make this a bit more implementable, pick your favorite
orthonormal basis on the faithful subspace. Then calculate the projection of
the empirical distribution on to the basis distributions, and obtain the
estimated density by adding things up and exponentiating.

Of course, in general there are an infinite number of basis vectors, and infinitely many non-zero projection coefficients. So to be really practical one would want to truncate the expansion of the projection after a certain point (which presumably grows with the sample size). Bosq and Blanke (below) discuss such "adaptive projection estimators" in general terms, and on a first examination I can see nothing which forbids applying their methods to the present situation.

Like I said, I don't have time to pursue this, but if you (1) are interested in doing so, or (2) know of an existing solution, please get in touch.

- Recommended:
- Denis Bosq and Delphine Blanke, Inference and Prediction in Large Dimensions
- Peter Hall, Jeff Racine and Qi Li, "Cross-Validation and the
Estimation of Conditional Probability Densities", Journal of the American
Statistical Association
**99**(2004): 1015--1026 [PDF]

- To read:
- Han Liu, John Lafferty and Larry Wasserman, "Tree Density Estimation", arxiv:1001.1557