We introduce an algorithm, LLLAMA, which combines simple pattern recognizers into a general method for estimating the entropy of a sequence. Each pattern recognizer exploits a partial match between subsequences to build a model of the sequence. Since the primary features of interest in biological sequence domains are subsequences with small variations in exact composition, LLLAMA is particularly suited to such domains. We describe two methods, LLLAMA-length and LLLAMA-alone, which use this entropy estimate to perform maximum a posteriori classification. We apply these methods to several problems in three-dimensional structure classification of short DNA sequences. The results include a surprisingly low 3.6% error rate in predicting helical conformation of oligonucleotides. We compare our results to those obtained using more traditional methods for automated generation of classifiers.
|Original language||English (US)|
|Number of pages||12|
|Journal||Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing|
|State||Published - 1998|
All Science Journal Classification (ASJC) codes
- Biomedical Engineering
- Computational Theory and Mathematics