TY - JOUR
T1 - Modeling and analysis of RNA-seq data
T2 - a review from a statistical perspective
AU - Li, Wei Vivian
AU - Li, Jingyi Jessica
N1 - Funding Information:
This work was supported by the following grants: National Science Foundation DMS-1613338, NIH/NIGMS R01GM120507, PhRMA Foundation Research Starter Grant in Informatics, Johnson & Johnson WiSTEM2D Award, and Sloan Research Fellowship (to J.J.L) and the UCLA Dissertation Year Fellowship (to W.V.L). The authors would like to thank the insightful feedbacks from Dr. Lior Pachter at California Institute of Technology and Dr. Michael I. Love at University of North Carolina at Chapel Hill.
Publisher Copyright:
© 2018, Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2018/9/1
Y1 - 2018/9/1
N2 - Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.[Figure not available: see fulltext.].
AB - Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusions: The development of statistical and computational methods for analyzing RNA-seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statistical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development.[Figure not available: see fulltext.].
KW - RNA-seq
KW - alternatively spliced exons
KW - differentially expressed genes
KW - isoform reconstruction and quantification
KW - statistical modeling
UR - http://www.scopus.com/inward/record.url?scp=85051651700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051651700&partnerID=8YFLogxK
U2 - 10.1007/s40484-018-0144-7
DO - 10.1007/s40484-018-0144-7
M3 - Review article
AN - SCOPUS:85051651700
SN - 2095-4689
VL - 6
SP - 195
EP - 209
JO - Quantitative Biology
JF - Quantitative Biology
IS - 3
ER -