Gene Expression Data Analysis

08 Jul 2015 19:36

I won't try to explain what gene expression is or why it's important here (see Signal Transduction, Gene Expression, and Control of Metabolism instead). This notebook is to collect references on the analysis of gene expression data, particularly using statistical or machine learning techniques. I'm especially interested in methods of recovering the structure of the regulatory network from data, e.g. through application of graphical model techniques.

Aggregation. The important papers by Chu et al. and by Wimberly et al. (see below) reveal a major obstacle in the way of hopes for using graphical model methods. This is that the data in gene expression experiments is typically obtained not from one cell but from many hundreds or thousands, and the conditional independence relations that graphical models seek to determine are not, in general, preserved under such aggregation. (Chu et al. develop this point theoretically, and Wimberly et al. show that existing structure-learning methods fail on aggregated data from reasonable simulation models.) Having only just read the papers, it's not clear to me where this leaves us. One approach, which perhaps betrays my background as a physicist, would be to try to artificially synchronize the cells before measuring expression levels. More subtle and statistical approaches may be possible. Clearly, a very significant issue. (Thanks to Tom Heiman for letting me know about these papers.)

See also: Bioinformatics; Complex Networks; Molecular Biology