We describe a novel procedure for generating and optimizing pattern descriptors that can be used to find structural motifs in DNA or RNA sequences. This combines a pattern-description language (based primarily on secondary structure alignment and conservation of some key nucleotides) with a scoring function that relies heavily on estimated folding free energies for the secondary structure of interest. For the cloverleaf secondary structure characteristic of tRNA, we show that a fairly simple pattern descriptor can find almost all known tRNA genes in both bacterial and eukaryotic genomes, and that false positives (sequences that match the pattern but that are probably not tRNAs) can be recognized by their high estimated folding free energies. A general procedure for optimizing descriptors (and hence for finding new structural motifs) is also described. For six bacterial, four eukaryotic, and four archaea genome sequences, our results compare favorably with those of the more complex and specialized tRNAscan-SE algorithm. Prospects for using this general approach to find other RNA structural motifs are discussed.
|Original language||English (US)|
|Number of pages||11|
|State||Published - May 1 2003|
All Science Journal Classification (ASJC) codes
- Molecular Biology
- Nearest neighbor energy
- Secondary structure