PacBio Sequencing Contributes to New Japanese Reference Genome
Monday, February 10, 2020
People of Japanese descent just moved a little closer toward the promise of precision medicine thanks to a population-specific reference genome based on the de novo genome assembly of three Japanese individuals. A new preprint describing the work shows that SMRT Sequencing was instrumental in the achievement.
Scientists from Tohoku University, led by Jun Takayama (@jntkym), Kengo Kinoshita (@kk824), Masayuki Yamamoto, and Gen Tamiya, aimed to create an improved reference genome resource that would better represent the genetic background of a Japanese population than the current human reference genome. “Some ethnic ancestries are under-represented in the international human reference genome (e.g., GRCh37), especially Asian populations, due to a strong bias toward European and African ancestries in a single mosaic haploid genome consisting chiefly of a single donor,” they write.
To address that challenge, they sequenced the genomes of three Japanese individuals to more than 100-fold coverage with PacBio SMRT Sequencing. The contig N50 value for each genome was approximately 20 Mb. Bionano optical maps were used to perform hybrid scaffolding to boost contiguity even further. “These and other assembly statistics were better than or comparable to other published de novo assemblies,” the authors report.
Fig 1a. Construction of JG1: PCA plot showing that the three sample donors are within the Japanese population cluster.
Next, the team had to merge all three of these assemblies to “construct a reference-quality haploid genome sequence,” they write. “We integrated the genomes using the major allele for consensus, and anchored the scaffolds using sequence-tagged site markers from conventional genetic and radiation hybrid maps to reconstruct each chromosome sequence.” The meta-assembly was designed to avoid the inclusion of rare variants and unresolved sequences for broadest possible applicability.
Takayama et al. validated the utility of this new reference genome — known as JG1 — by analyzing its representation of common variants among Japanese people and its ability to home in on causal variants for rare disease from seven Japanese families. In all cases, the population-specific reference performed at least as well as or better than other assemblies in detecting relevant variation; for example, in the rare disease case, JG1 reduced the number of false-positive variant calls from an exome analysis.
JG1 “is highly contiguous, accurate, and carries the major allele in the majority of single nucleotide variant sites for a Japanese population,” the scientists report. “We expect that population-specific reference genome such as JG1 will prove to be practical and beneficial options for genome analyses of individuals originated from the population.”
PacBio long-read sequencing is being used to develop population-specific reference genomes as part of several international research efforts. Learn more about these projects and explore detailed assembly information in our interactive map.