This grant will support the development of an algorithm for finding genome rearrangements from data produced by the next-generation sequencing (NGS) technologies. NGS has become an essential tool for the life science research. These technologies are used to address many fundamental questions in various fields of biology allowing one to find exact differences between samples (for instance, related species) via comparison to a reference genome. While single-nucleotide mismatches are robustly identified in such studies, precise and reliable detection of more complex changes, genome rearrangements, still presents a significant problem and solving this problem will benefit a large number of biological researchers. The expected deluge of available genome sequences in the near future will require novel approaches for automated functional predictions dealing with 'big data' on a scale unimaginable today. Reliable identification of rearrangements will be an important step in such automated genome annotation. Active involvement and a strong interaction of undergraduate, graduate and postdoctoral-level participants on Rutgers-Camden campus is foreseen (including members of underrepresented minority groups) in these research activities. All of this work will have a strong synergy with the NSF award supporting the high-performance computational facilities on campus. With this award, a novel approach for finding genome rearrangements will be developed, based on integrating multiple lines of evidence and logical constraints on mapped paired reads. It will evaluate for each nucleotide in a reference sequence a likelihood of involvement in a rearrangement event. Given the rapidly changing filed of NGS the methodology will also allow for including of novel types of evidence, for example, long reads, to improve predictions of rearrangements. As a result of this research a functional algorithm will be implemented with a user interface, displaying complex rearrangements and highlighting available evidence from the sequence data in a convenient graphical form, together with genomic context information. The algorithm will be described in publications and at scientific meetings and will be made publicly available. This will aid in interpretation of evolutionary, functional and other roles of the detected genomic changes and result in substantial improvements in NGS data analysis in many areas of biology.
|Effective start/end date||8/1/15 → 7/31/18|
- National Science Foundation (National Science Foundation (NSF))