Density Estimation on Graphical Models

Last update: 21 Apr 2025 21:17
First version:

Suppose I am interested in the joint distribution of some random variables. I know a priori that these random variables satisfy certain conditional independence relations. To be concrete, let's say the Oracle shows me the relevant graphical model inscribed on a golden tablet, but my glasses don't let me read the actual conditional distributions. Suppose I also have access to a source of samples from the distribution. How can I combine the samples and the conditional independencies to get an estimate of the distribution?

If I didn't have the graphical model, I could just use my favorite non-parametric density estimator to learn the underlying joint distribution. (I am fond of kernel density estimates, but to each their own.) Presumably, in fact, if I ignore the graph and just do a density estimate, it will, being consistent, converge on a distribution with all the conditional independencies implied by the graph. This would in a sense work, but it seems wasteful. By hypothesis, I know some variables are independent of others (given third parties); I should be able to use this somehow to constrain the density estimates and converge faster.

This seems like such an obvious problem that someone must have solved it already, but I haven't been able to find anyone doing so. (Corrections on this point would be appreciated.) Since I don't have the time right now to follow it up, what follows is one potential idea. If you want to pursue this, please get in touch.

First of all, posit some reference distribution on the graph. (Say the uniform distribution, in the appropriate sense.) Further posit that the actual joint distribution is absolutely continuous w.r.t. the reference distribution. (We'll hope to pick up singular distributions as limits.) So every distribution we'll be concerned with has a density (Radon-Nikodym derivative). So long as we're making posits, let's assume that the log density is well-behaved. Now if I take the set of all such measures (not necessarily just probability distributions) on the graph, their log-densities form a Hilbert space. If I restrict myself to measures which are faithful to the graph, their log-densities form a Hilbert sub-space. The natural estimator which respects the conditional independence constraints is then the projection of the samples' empirical distribution on to the faithful sub-space. To make this a bit more implementable, pick your favorite orthonormal basis on the faithful subspace. Then calculate the projection of the empirical distribution on to the basis distributions, and obtain the estimated density by adding things up and exponentiating.

Of course, in general there are an infinite number of basis vectors, and infinitely many non-zero projection coefficients. So to be really practical one would want to truncate the expansion of the projection after a certain point (which presumably grows with the sample size). Bosq and Blanke (below) discuss such "adaptive projection estimators" in general terms, and on a first examination I can see nothing which forbids applying their methods to the present situation.

Like I said, I don't have time to pursue this, but if you (1) are interested in doing so, or (2) know of an existing solution, please get in touch.

Denis Bosq and Delphine Blanke, Inference and Prediction in Large Dimensions
Peter Hall, Jeff Racine and Qi Li, "Cross-Validation and the Estimation of Conditional Probability Densities", Journal of the American Statistical Association 99 (2004): 1015--1026 [PDF]

Han Liu, John Lafferty and Larry Wasserman, "Tree Density Estimation", arxiv:1001.1557