04 Oct 2002 14:31

An ugly name. The use of computation-intensive techniques to study biological data, especially data generated from sequencing long macromolecules (chromosomal DNA, proteins, etc.) or otherwise related to them.

Now, it so happens that I've written a whole dissertation about computation-intensive techniques for discovery patterns in sequences... This notebook will contain more, when I have more to put in it.

Things to look into: State of the art of using hidden Markov models (seems poor, frankly). Using grammatical inference to characterize sequence families, regulatory motifs, etc. Inferring metabolic or regulatory structure from large-scale expression data, especially gene chip data. Massaging gene chip data. Characterizing membrane proteins and their activity.

I've just heard that some people are using hidden Markov models to characterize gene-chip data at a single time, using some odd mapping of different genes into a serial order. This seems absurd to me, but I have it from a reliable source. If people are really doing that, there's a much better alternative easily available, namely using graphical models. Memo to self: investigate, and if there's a niche, publish!

See also: Artificial intelligence; Biotechnology; Developmental Biology; Evolution of Organisms; Gene Expression Data Analysis; Machine Learning, Statistical Inference and Induction; Molecular Biology; Signal Transduction, Control of Metabolism, and Gene Regulation