Menu
April 21, 2020  |  

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50?bp) and 27,622 SVs (=50?bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.


April 21, 2020  |  

A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour.

A complete and accurate genome sequence provides a fundamental tool for functional genomics and DNA-informed breeding. Here, we assemble a high-quality genome (contig N50 of 6.99?Mb) of the apple anther-derived homozygous line HFTH1, including 22 telomere sequences, using a combination of PacBio single-molecule real-time (SMRT) sequencing, chromosome conformation capture (Hi-C) sequencing, and optical mapping. In comparison to the Golden Delicious reference genome, we identify 18,047 deletions, 12,101 insertions and 14 large inversions. We reveal that these extensive genomic variations are largely attributable to activity of transposable elements. Interestingly, we find that a long terminal repeat (LTR) retrotransposon insertion upstream of MdMYB1, a core transcriptional activator of anthocyanin biosynthesis, is associated with red-skinned phenotype. This finding provides insights into the molecular mechanisms underlying red fruit coloration, and highlights the utility of this high-quality genome assembly in deciphering agriculturally important trait in apple.


April 21, 2020  |  

Study of the whole genome, methylome and transcriptome of Cordyceps militaris.

The complete genome of Cordyceps militaris was sequenced using single-molecule real-time (SMRT) sequencing technology at a coverage over 300×. The genome size was 32.57?Mb, and 14 contigs ranging from 0.35 to 4.58?Mb with an N50 of 2.86?Mb were assembled, including 4 contigs with telomeric sequences on both ends and an additional 8 contigs with telomeric sequences on either the 5′ or 3′ end. A methylome database of the genome was constructed using SMRT and m4C and m6A methylated nucleotides, and many unknown modification types were identified. The major m6A methylation motif is GA and GGAG, and the major m4C methylation motif is GC or CG/GC. In the C. militaris genome DNA, there were four types of methylated nucleotides that we confirmed using high-resolution LCMS-IT-TOF. Using PacBio Iso-Seq, a total of 31,133 complete cDNA sequences were obtained in the fruiting body. The conserved domains of the nontranscribed regions of the genome include TATA boxes, which are the initial regions of genome replication. There were 406 structural variants between the HN and CM01 strains, and there were 1,114 structural variants between the HN and ATCC strains.


April 21, 2020  |  

Long-Read Sequencing Emerging in Medical Genetics

The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for identification of structural variants, sequencing repetitive regions, phasing alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the currently prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.


April 21, 2020  |  

Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1.

Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (>?1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method.We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow.We assembled a complete mt genome sequence of 17,675 bp, which is >?4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat.We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such ‘new’ repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.


April 21, 2020  |  

Chromosome rearrangements shape the diversification of secondary metabolism in the cyclosporin producing fungus Tolypocladium inflatum.

Genes involved in production of secondary metabolites (SMs) in fungi are exceptionally diverse. Even strains of the same species may exhibit differences in metabolite production, a finding that has important implications for drug discovery. Unlike in other eukaryotes, genes producing SMs are often clustered and co-expressed in fungal genomes, but the genetic mechanisms involved in the creation and maintenance of these secondary metabolite biosynthetic gene clusters (SMBGCs) remains poorly understood.In order to address the role of genome architecture and chromosome scale structural variation in generating diversity of SMBGCs, we generated chromosome scale assemblies of six geographically diverse isolates of the insect pathogenic fungus Tolypocladium inflatum, producer of the multi-billion dollar lifesaving immunosuppressant drug cyclosporin, and utilized a Hi-C chromosome conformation capture approach to address the role of genome architecture and structural variation in generating intraspecific diversity in SMBGCs. Our results demonstrate that the exchange of DNA between heterologous chromosomes plays an important role in generating novelty in SMBGCs in fungi. In particular, we demonstrate movement of a polyketide synthase (PKS) and several adjacent genes by translocation to a new chromosome and genomic context, potentially generating a novel PKS cluster. We also provide evidence for inter-chromosomal recombination between nonribosomal peptide synthetases located within subtelomeres and uncover a polymorphic cluster present in only two strains that is closely related to the cluster responsible for biosynthesis of the mycotoxin aflatoxin (AF), a highly carcinogenic compound that is a major public health concern worldwide. In contrast, the cyclosporin cluster, located internally on chromosomes, was conserved across strains, suggesting selective maintenance of this important virulence factor for infection of insects.This research places the evolution of SMBGCs within the context of whole genome evolution and suggests a role for recombination between chromosomes in generating novel SMBGCs in the medicinal fungus Tolypocladium inflatum.


April 21, 2020  |  

One reference genome is not enough

A recent study on human structural variation indicates insufficiencies and errors in the human reference genome, GRCh38, and argues for the construction of a human pan-genome.


April 21, 2020  |  

High-coverage, long-read sequencing of Han Chinese trio reference samples.

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18?kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.


April 21, 2020  |  

The tech for the next decade: promises and challenges in genome biology.

The 19th Annual Advances in Genome Biology and Technology (AGBT) meeting came back to Marco Island, Florida, and was held in the renovated venue from 27 February to 2 March 2019. The meeting showed a variety of new technology, both in wet lab and in bioinformatics. This year’s themes included single-cell technology and applications, spatially resolved gene expression measurements, new sequencing platforms, genome assembly and variation, and long and linked reads.


April 21, 2020  |  

Tandem-genotypes: robust detection of tandem repeat expansions from long DNA reads.

Tandemly repeated DNA is highly mutable and causes at least 31 diseases, but it is hard to detect pathogenic repeat expansions genome-wide. Here, we report robust detection of human repeat expansions from careful alignments of long but error-prone (PacBio and nanopore) reads to a reference genome. Our method is robust to systematic sequencing errors, inexact repeats with fuzzy boundaries, and low sequencing coverage. By comparing to healthy controls, we prioritize pathogenic expansions within the top 10 out of 700,000 tandem repeats in whole genome sequencing data. This may help to elucidate the many genetic diseases whose causes remain unknown.


October 23, 2019  |  

CRISPR/Cas9-mediated scanning for regulatory elements required for HPRT1 expression via thousands of large, programmed genomic deletions.

The extent to which non-coding mutations contribute to Mendelian disease is a major unknown in human genetics. Relatedly, the vast majority of candidate regulatory elements have yet to be functionally validated. Here, we describe a CRISPR-based system that uses pairs of guide RNAs (gRNAs) to program thousands of kilobase-scale deletions that deeply scan across a targeted region in a tiling fashion (“ScanDel”). We applied ScanDel to HPRT1, the housekeeping gene underlying Lesch-Nyhan syndrome, an X-linked recessive disorder. Altogether, we programmed 4,342 overlapping 1 and 2 kb deletions that tiled 206 kb centered on HPRT1 (including 87 kb upstream and 79 kb downstream) with median 27-fold redundancy per base. We functionally assayed programmed deletions in parallel by selecting for loss of HPRT function with 6-thioguanine. As expected, sequencing gRNA pairs before and after selection confirmed that all HPRT1 exons are needed. However, HPRT1 function was robust to deletion of any intergenic or deeply intronic non-coding region, indicating that proximal regulatory sequences are sufficient for HPRT1 expression. Although our screen did identify the disruption of exon-proximal non-coding sequences (e.g., the promoter) as functionally consequential, long-read sequencing revealed that this signal was driven by rare, imprecise deletions that extended into exons. Our results suggest that no singular distal regulatory element is required for HPRT1 expression and that distal mutations are unlikely to contribute substantially to Lesch-Nyhan syndrome burden. Further application of ScanDel could shed light on the role of regulatory mutations in disease at other loci while also facilitating a deeper understanding of endogenous gene regulation. Copyright © 2017 American Society of Human Genetics. All rights reserved.


October 23, 2019  |  

A knowledge-based molecular screen uncovers a broad-spectrum OsSWEET14 resistance allele to bacterial blight from wild rice.

Transcription activator-like (TAL) effectors are type III-delivered transcription factors that enhance the virulence of plant pathogenic Xanthomonas species through the activation of host susceptibility (S) genes. TAL effectors recognize their DNA target(s) via a partially degenerate code, whereby modular repeats in the TAL effector bind to nucleotide sequences in the host promoter. Although this knowledge has greatly facilitated our power to identify new S genes, it can also be easily used to screen plant genomes for variations in TAL effector target sequences and to predict for loss-of-function gene candidates in silico. In a proof-of-principle experiment, we screened a germplasm of 169 rice accessions for polymorphism in the promoter of the major bacterial blight susceptibility S gene OsSWEET14, which encodes a sugar transporter targeted by numerous strains of Xanthomonas oryzae pv. oryzae. We identified a single allele with a deletion of 18 bp overlapping with the binding sites targeted by several TAL effectors known to activate the gene. We show that this allele, which we call xa41(t), confers resistance against half of the tested Xoo strains, representative of various geographic origins and genetic lineages, highlighting the selective pressure on the pathogen to accommodate OsSWEET14 polymorphism, and reciprocally the apparent limited possibilities for the host to create variability at this particular S gene. Analysis of xa41(t) conservation across the Oryza genus enabled us to hypothesize scenarios as to its evolutionary history, prior to and during domestication. Our findings demonstrate that resistance through TAL effector-dependent loss of S-gene expression can be greatly fostered upon knowledge-based molecular screening of a large collection of host plants.© 2015 The Authors The Plant Journal © 2015 John Wiley & Sons Ltd.


September 22, 2019  |  

Advantages of genome sequencing by long-read sequencer using SMRT technology in medical area.

PacBio RS II is the first commercialized third-generation DNA sequencer able to sequence a single molecule DNA in real-time without amplification. PacBio RS II’s sequencing technology is novel and unique, enabling the direct observation of DNA synthesis by DNA polymerase. PacBio RS II confers four major advantages compared to other sequencing technologies: long read lengths, high consensus accuracy, a low degree of bias, and simultaneous capability of epigenetic characterization. These advantages surmount the obstacle of sequencing genomic regions such as high/low G+C, tandem repeat, and interspersed repeat regions. Moreover, PacBio RS II is ideal for whole genome sequencing, targeted sequencing, complex population analysis, RNA sequencing, and epigenetics characterization. With PacBio RS II, we have sequenced and analyzed the genomes of many species, from viruses to humans. Herein, we summarize and review some of our key genome sequencing projects, including full-length viral sequencing, complete bacterial genome and almost-complete plant genome assemblies, and long amplicon sequencing of a disease-associated gene region. We believe that PacBio RS II is not only an effective tool for use in the basic biological sciences but also in the medical/clinical setting.


September 22, 2019  |  

Differential increases of specific FMR1 mRNA isoforms in premutation carriers.

Over 40% of male and ~16% of female carriers of a premutation FMR1 allele (55-200 CGG repeats) will develop fragile X-associated tremor/ataxia syndrome, an adult onset neurodegenerative disorder, while about 20% of female carriers will develop fragile X-associated primary ovarian insufficiency. Marked elevation in FMR1 mRNA transcript levels has been observed with premutation alleles, and RNA toxicity due to increased mRNA levels is the leading molecular mechanism proposed for these disorders. However, although the FMR1 gene undergoes alternative splicing, it is unknown whether all or only some of the isoforms are overexpressed in premutation carriers and which isoforms may contribute to the premutation pathology.To address this question, we have applied a long-read sequencing approach using single-molecule real-time (SMRT) sequencing and qRT-PCR. Our SMRT sequencing analysis performed on peripheral blood mononuclear cells, fibroblasts and brain tissue samples derived from premutation carriers and controls revealed the existence of 16 isoforms of 24 predicted variants. Although the relative abundance of all mRNA isoforms was significantly increased in the premutation group, as expected based on the bulk increase in mRNA levels, there was a disproportionate (fourfold to sixfold) increase, relative to the overall increase in mRNA, in the abundance of isoforms spliced at both exons 12 and 14, specifically Iso10 and Iso10b, containing the complete exon 15 and differing only in splicing in exon 17.These findings suggest that RNA toxicity may arise from a relative increase of all FMR1 mRNA isoforms. Interestingly, the Iso10 and Iso10b mRNA isoforms, lacking the C-terminal functional sites for fragile X mental retardation protein function, are the most increased in premutation carriers relative to normal, suggesting a functional relevance in the pathology of FMR1-associated disorders. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.