Long-read sequencing and de novo assembly of a Chinese genome

Lingling Shi, Yunfei Guo, Chengliang Dong, John Huddleston, Hui Yang, Xiaolu Han, Aisi Fu, Quan Li, Na Li, Siyi Gong, Katherine E. Lintner, Qiong Ding, Zou Wang, Jiang Hu, Depeng Wang, Feng Wang, Lin Wang, Gholson J. Lyon, Yongtao Guan, Yufeng ShenOleg V. Evgrafov, James A. Knowles, Francoise Thibaud-Nissen, Valerie Schneider, Chack Yung Yu, Libing Zhou, Evan E. Eichler, Kwok Fai So, Kai Wang

Research output: Contribution to journalArticlepeer-review

183 Scopus citations


Short-read sequencing has enabled the de novo assembly of several individual human genomes, but with inherent limitations in characterizing repeat elements. Here we sequence a Chinese individual HX1 by single-molecule real-time (SMRT) long-read sequencing, construct a physical map by NanoChannel arrays and generate a de novo assembly of 2.93 Gb (contig N50: 8.3 Mb, scaffold N50: 22.0 Mb, including 39.3 Mb N-bases), together with 206 Mb of alternative haplotypes. The assembly fully or partially fills 274 (28.4%) N-gaps in the reference genome GRCh38. Comparison to GRCh38 reveals 12.8 Mb of HX1-specific sequences, including 4.1 Mb that are not present in previously reported Asian genomes. Furthermore, long-read sequencing of the transcriptome reveals novel spliced genes that are not annotated in GENCODE and are missed by short-read RNA-Seq. Our results imply that improved characterization of genome functional variation may require the use of a range of genomic technologies on diverse human populations.

Original languageEnglish (US)
Article number12065
JournalNature communications
StatePublished - Jun 30 2016
Externally publishedYes

All Science Journal Classification (ASJC) codes

  • General Chemistry
  • General Biochemistry, Genetics and Molecular Biology
  • General Physics and Astronomy


Dive into the research topics of 'Long-read sequencing and de novo assembly of a Chinese genome'. Together they form a unique fingerprint.

Cite this