Here, we describe the construction of a phylogenetically deep, whole-genome alignment of 20 flowering plants, along with an analysis of plant genome conservation. Each included angiosperm genome was aligned to a reference genome, Arabidopsis thaliana, using the LASTZ/MULTIZ paradigm and tools from the University of California-Santa Cruz Genome Browser source code. In addition to the multiple alignment, we created a local genome browser displaying multiple tracks of newly generated genome annotation, as well as annotation sourced from published data of other research groups. An investigation into A. thaliana gene features present in the aligned A. lyrata genome revealed better conservation of start codons, stop codons, and splice sites within our alignments (51% of features from A. thaliana conserved without interruption in A. lyrata) when compared with previous publicly available plant pairwise alignments (34% of features conserved). The detailed view of conservation across angiosperms revealed not only high coding-sequence conservation but also a large set of previously uncharacterized intergenic conservation. From this, we annotated the collection of conserved features, revealing dozens of putative noncoding RNAs, including some with recorded small RNA expression. Comparing conservation between kingdoms revealed a faster decay of vertebrate genome features when compared with angiosperm genomes. Finally, conserved sequences were searched for folding RNA features, including but not limited to noncoding RNA (ncRNA) genes. Among these, we highlight a double hairpin in the 5′-untranslated region (5′-UTR) of the PRIN2 gene and a putative ncRNA with homology targeting the LAF3 protein.
All Science Journal Classification (ASJC) codes
- Ecology, Evolution, Behavior and Systematics
- Molecular Biology
- RNA folding
- comparative genomics
- ultraconserved elements