Motivation: Class distinction is a supervised learning approach that has been successfully employed in the analysis of high-throughput gene expression data. Identification of a set of genes that predicts differential biological states allows for the development of basic and clinical scientific approaches to the diagnosis of disease. The Independent Consistent Expression Discriminator (ICED) was designed to provide a more biologically relevant search criterion during predictor selection by embracing the inherent variability of gene expression in any biological state. The four components of ICED include (i) normalization of raw data; (ii) assignment of weights to genes from both classes; (iii) counting of votes to determine optimal number of predictor genes for class distinction; (iv) calculation of prediction strengths for classification results. The search criteria employed by ICED is designed to identify not only genes that are consistently expressed at one level in one class and at a consistently different level in another class but identify genes that are variable in one class and consistent in another. The result is a novel approach to accurately select biologically relevant predictors of differential disease states from a small number of microarray samples. Results: The data described herein utilized ICED to analyze the large AML/ALL training and test data set (Golub et al., 1999, Science, 286, 531-537) in addition to a smaller data set consisting of an animal model of the childhood neurodegenerative disorder, Batten disease, generated for this study. Both of the analyses presented herein have correctly predicted biologically relevant perturbations that can be used for disease classification, irrespective of sample size. Furthermore, the results have provided candidate proteins for future study in understanding the disease process and the identification of potential targets for therapeutic intervention.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics