Bioinformatics
04 Oct 2002 14:31
An ugly name. The use of computation-intensive techniques to study biological data, especially data generated from sequencing long macromolecules (chromosomal DNA, proteins, etc.) or otherwise related to them.
Now, it so happens that I've written a whole dissertation about computation-intensive techniques for discovery patterns in sequences... This notebook will contain more, when I have more to put in it.
Things to look into: State of the art of using hidden Markov models (seems poor, frankly). Using grammatical inference to characterize sequence families, regulatory motifs, etc. Inferring metabolic or regulatory structure from large-scale expression data, especially gene chip data. Massaging gene chip data. Characterizing membrane proteins and their activity.
I've just heard that some people are using hidden Markov models to characterize gene-chip data at a single time, using some odd mapping of different genes into a serial order. This seems absurd to me, but I have it from a reliable source. If people are really doing that, there's a much better alternative easily available, namely using graphical models. Memo to self: investigate, and if there's a niche, publish!
See also: Artificial intelligence; Biotechnology; Developmental Biology; Evolution of Organisms; Gene Expression Data Analysis; Machine Learning, Statistical Inference and Induction; Molecular Biology; Signal Transduction, Control of Metabolism, and Gene Regulation
- Recommended:
- Baldi and Brunak, Bioinformatics: The Machine Learning Approach
- Sandrine Dudoit, Yee Hwa Yang, Matthew J. Callow and Terence P. Speed, "Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments," UCB Statistics Technical Report 578 [Abstract]
- Neal S. Holter, Madhusmita Mitra, Amos Maritan, Marek Cieplak, Jayanth R. Banavar and Nina V. Fedoroff, "Fundamental Patterns Underlying Gene Expression Profiles: Simplicity from Complexity," PNAS 97: 8409--8414
- To read:
- Sven Bergmann, Jan Ihmels and Naama Barkai, "Iterative signature algorithm for the analysis of large-scale gene expression data," Physical Review E 67 (2003): 031902
- Bower and Bolouri (eds.), Computational Modeling of Genetic and Biochemical Networks
- A. J. Butte and I. S. Kohane, "Mutual Information Relevance Networks: Functional Genomic Clustering Using Pairwise Entropy Measurements" [online]
- M. Caselle, F. Di Cunto, M. Pellegrino and P. Provero, "Finding regulatory sites from statistical analysis of nucleotide frequencies in the upstream region of eukaryotic genes," physics/0201033
- Josh M. Deutsch, "Algorithm for Finding Optimal Gene Sets in Microarray Prediction," physics/0108011
- Eytan Domany, "Cluster Analysis of Gene Expression Data," physics/0206056
- R. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Richard Durrett, Probability Models for DNA Sequence Evolution
- Warren Ewens and Gregory Grant, Statistical Methods in Bioinformatics: An Introduction
- Luca Ferraro, Andrea Giansanti, Giovanni Giuliano and Vittorio Rosato, "Co-expression of statistically over-represented peptides in proteomes: a key to phylogeny?", q-bio.MN/0410011
- Gad Getz, Hilah Gal, Itai Kela, Eytan Domany and Dan A. Notterman, "Coupled Two-Way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data," physics/0206060
- Gad Getz, Michele Vendruscolo, David Sachs and Eytan Domany, "Automated assignment of SCOP and CATH protein structure classification from FSSP scores," cond-mat/0102280
- Alexander N. Gorban's Home Page at Northeastern University
- Alexander N. Gorban, Andrey Yu. Zinovyev and Tatyana G. Popova, "Self-organizing Approach for Automated Gene Identification in Whole Genomes," physics/0108016 [Fuller version online here or here
- Dan Gusfield, Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
- Alexander K. Hartmann, "Sampling rare events: statistics of local sequence alignments," cond-mat/0108201
- Lenwood S. Heath, Naren Ramakrishnan, Ronald R. Sederoff, Ross W. Whetten, Boris I. Chevone, Craig A. Struble, Vincent Y. Jouenne, Dawei Chen, Leonel van Zyl and Ruth G. Alscher, "The Expresso Microarray Experiment Management System: The Functional Genomics of Stress Responses in Loblolly Pine," cs.OH/0110047
- Trinh Xuan Hoang, Marek Cieplak, Jayanth R. Banavar and Amos Maritan, "Prediction of Protein Secondary Structures From Conformational Biases," cond-mat/0201311
- Rui Hu and Bin Wang, "Statistically Significant Strings are Related to Regulatory Elements in the Promoter Regions of Saccharomyces cervisiae," physics/0009002
- Thomas B. Kepler, Lynn Crosby and Kevin T. Morgan, "Normalization and Analysis of DNA Microarray Data by Self-Consistency and Local Regression," SFI Working Paper 00-09-055
- Cyril Laboulais, Mohammed Ouali, Marc Le Bret and Jacques Gabarro-Arpa, "Hamming distance geometry of a protein conformational space. Application to the clustering of a 4 ns molecular dynamics trajectory of the HIV-1 integrase catalytic core," physics/0110067
- Ming Li, Xin Li, Bin Ma, Paul Vitanyi, "Normalized Information Distance and Whole Mitochondrial Genome Phylogeny Analysis," cs.CC/0111054 [Pardon me if I don't exactly swoon over apporximations to intrinsically uncomputable distance measures]
- Wentian Li
- "DNA Segmentation as A Model Selection Process," physics/0104027
- "New stopping criteria for segmenting DNA sequences," physics/0104026
- " Zipf's Law in Importance of Genes for Cancer Classification Using Microarray Data," physics/0104028
- Wentian Li, Fengzhu Sun and Ivo Grosse, "Extreme Value Distribution Based Gene Selection Criteria for Discriminant Microarray Data Analysis Using Logistic Regression", q-bio.QM/0403038
- Wentian Li and Yaning Yang, "How Many Genes Are Needed for a Discriminant Microarray Data Analysis?" physics/0104029
- Christopher Loose, Kyle Jensen, Isidore Rigoutsos and Gregory Stephanopoulos, "A linguistic model for the rational design of antimicrobial peptides", Nature 443 (2006): 867--869
- Felix Naef, Daniel A. Lim, Nila Patil and Marcelo O. Magnasco, "From Features to Expression: High-Density Oligonucleotide Array Analysis Revisited," physics/0102010
- Felix Naef, Nicholas D. Socci, and Marcelo Magnasco, "Extracting more signal at high intensities in oligonucleotide arrays," physics/0205031
- Jerome K. Percus, Mathematics of Genome Analysis
- Pavel A. Pevzner, Computational Molecular Biology: An Algorithmic Approach
- Y. Sakakibara, "Grammatical Inference in Bioinformatics", IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005): 1051--1062
- David Sankoff and Joseph Kruskal (eds.), Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison
- Federico Mattia Stefanini, "Identification of Highly Informative Molecular Profile Components Using Genetic Algorithms," SFI Working Paper 98-05-42
- James Tisdall, Beginning Perl for Bioinformatics
- Erik van Nimwegen, Mihaela Zavolan, Nikolaus Rajewsky and Eric D. Siggia, "Probabilistic Clustering of Sequences: Inferring new bacterial regulons by comparative genomics," physics/0206045
- Jean-Philippe Vert, "Kernel methods in genomics and computational biology", q-bio.QM/0510032
- Jean-Philippe Vert and Minoru Kanehisa, "Graph-driven features extraction from microarray data," physics/0206055
- Jason L. T. Wang, Bruce A. Shaprio and Dennis Shasha (eds.), Pattern Discovery in Biomolecular Data: Tools, Techniques, and Applications
- Chris Wiggins and Ilya Nemenman, "Process Pathway Inference via Time Series Analysis," physics/0206031
- Andrey Zinovyev (any relation to that Zinoviev?), Genome Visualization Tools
- To write:
- Kristina Lisa Shalizi, CRS, Walter Fontana, "Pattern Discovery in Artificially Evolved RNA Sequences"