Menu
July 19, 2019  |  

Full-length mitochondrial-DNA sequencing on the PacBio RSII.

Conventional mitochondrial-DNA (MT DNA) sequencing approaches use Sanger sequencing of 20-40 partially overlapping PCR fragments per individual, which is a time- and resource-consuming process. We have developed a high-throughput, accurate, fast, and cost-effective human MT DNA sequencing approach. In this setup we first generate long-range PCR products for two partially overlapping 7.7 and 9.2 kb MT DNA-specific amplicons, add sample-specific barcodes, and sequence these on the PacBio RSII system to obtain full-length MT DNA sequences for genotyping/haplotyping purposes.


July 19, 2019  |  

Single-molecule sequencing revealing the presence of distinct JC polyomavirus populations in patients with progressive multifocal leukoencephalopathy.

Progressive multifocal leukoencephalopathy (PML) is a fatal disease caused by reactivation of JC polyomavirus (JCPyV) in immunosuppressed individuals and lytic infection by neurotropic JCPyV in glial cells. The exact content of neurotropic mutations within individual JCPyV strains has not been studied to our knowledge.We exploited the capacity of single-molecule real-time sequencing technology to determine the sequence of complete JCPyV genomes in single reads. The method was used to precisely characterize individual neurotropic JCPyV strains of 3 patients with PML without the bias caused by assembly of short sequence reads.In the cerebrospinal fluid sample of a 73-year-old woman with rapid PML onset, 3 distinct JCPyV populations could be identified. All viral populations were characterized by rearrangements within the noncoding regulatory region (NCCR) and 1 point mutation, S267L in the VP1 gene, suggestive of neurotropic strains. One patient with PML had a single neurotropic strain with rearranged NCCR, and 1 patient had a single strain with small NCCR alterations.We report here, for the first time, full characterization of individual neurotropic JCPyV strains in the cerebrospinal fluid of patients with PML. It remains to be established whether PML pathogenesis is driven by one or several neurotropic strains in an individual.


July 19, 2019  |  

Gorilla MHC class I gene and sequence variation in a comparative context.

Comparisons of MHC gene content and diversity among closely related species can provide insights into the evolutionary mechanisms shaping immune system variation. After chimpanzees and bonobos, gorillas are humans’ closest living relatives; but in contrast, relatively little is known about the structure and variation of gorilla MHC class I genes (Gogo). Here, we combined long-range amplifications and long-read sequencing technology to analyze full-length MHC class I genes in 35 gorillas. We obtained 50 full-length genomic sequences corresponding to 15 Gogo-A alleles, 4 Gogo-Oko alleles, 21 Gogo-B alleles, and 10 Gogo-C alleles including 19 novel coding region sequences. We identified two previously undetected MHC class I genes related to Gogo-A and Gogo-B, respectively, thereby illustrating the potential of this approach for efficient and highly accurate MHC genotyping. Consistent with their phylogenetic position within the hominid family, individual gorilla MHC haplotypes share characteristics with humans and chimpanzees as well as orangutans suggesting a complex history of the MHC class I genes in humans and the great apes. However, the overall MHC class I diversity appears to be low further supporting the hypothesis that gorillas might have experienced a reduction of their MHC repertoire.


July 19, 2019  |  

Detecting PKD1 variants in polycystic kidney disease patients by single-molecule long-read sequencing.

A genetic diagnosis of autosomal-dominant polycystic kidney disease (ADPKD) is challenging due to allelic heterogeneity, high GC content, and homology of the PKD1 gene with six pseudogenes. Short-read next-generation sequencing approaches, such as whole-genome sequencing and whole-exome sequencing, often fail at reliably characterizing complex regions such as PKD1. However, long-read single-molecule sequencing has been shown to be an alternative strategy that could overcome PKD1 complexities and discriminate between homologous regions of PKD1 and its pseudogenes. In this study, we present the increased power of resolution for complex regions using long-read sequencing to characterize a cohort of 19 patients with ADPKD. Our approach provided high sensitivity in identifying PKD1 pathogenic variants, diagnosing 94.7% of the patients. We show that reliable screening of ADPKD patients in a single test without interference of PKD1 homologous sequences, commonly introduced by residual amplification of PKD1 pseudogenes, by direct long-read sequencing is now possible. This strategy can be implemented in diagnostics and is highly suitable to sequence and resolve complex genomic regions that are of clinical relevance. © 2017 The Authors. Human Mutation published by Wiley Periodicals, Inc.


July 19, 2019  |  

Detecting AGG interruptions in male and female FMR1 premutation carriers by single-molecule sequencing.

The FMR1 gene contains an unstable CGG repeat in its 5′ untranslated region. Premutation alleles range between 55 and 200 repeat units and confer a risk for developing fragile X-associated tremor/ataxia syndrome or fragile X-associated primary ovarian insufficiency. Furthermore, the premutation allele often expands to a full mutation during female germline transmission giving rise to the fragile X syndrome. The risk for a premutation to expand depends mainly on the number of CGG units and the presence of AGG interruptions in the CGG repeat. Unfortunately, the detection of AGG interruptions is hampered by technical difficulties. Here, we demonstrate that single-molecule sequencing enables the determination of not only the repeat size, but also the complete repeat sequence including AGG interruptions in male and female alleles with repeats ranging from 45 to 100 CGG units. We envision this method will facilitate research and diagnostic analysis of the FMR1 repeat expansion. © 2016 WILEY PERIODICALS, INC.


July 19, 2019  |  

A new method for sequencing the hypervariable Plasmodium falciparum gene var2csa from clinical samples.

VAR2CSA, a member of the Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) family, mediates the binding of P. falciparum-infected erythrocytes to chondroitin sulfate A, a surface-associated molecule expressed in placental cells, and plays a central role in the pathogenesis of placental malaria. VAR2CSA is a target of naturally acquired immunity and, as such, is a leading vaccine candidate against placental malaria. This protein is very polymorphic and technically challenging to sequence. Published var2csa sequences, mostly limited to specific domains, have been generated through the sequencing of cloned PCR amplicons using capillary electrophoresis, a method that is both time consuming and costly, and that performs poorly when applied to clinical samples that are commonly polyclonal. A next-generation sequencing platform, Pacific Biosciences (PacBio), offers an alternative approach to overcome these issues.PCR primers were designed that target a 5 kb segment in the 5′ end of var2csa and the resulting amplicons were sequenced using PacBio sequencing. The primers were optimized using two laboratory strains and were validated on DNA from 43 clinical samples, extracted from dried blood spots on filter paper or from cryopreserved P. falciparum-infected erythrocytes. Sequence reads were assembled using the SMRT-analysis ConsensusTools module.Here, a PacBio sequencing-based approach for recovering a segment encoding the majority of VAR2CSA’s extracellular region is described; this segment includes the totality of the first four domains in the 5′ end of var2csa (~5 kb), from clinical malaria samples. The feasibility of the method is demonstrated, showing a high success rate from cryopreserved samples and more limited success from dried blood spots stored at room temperature, and characterized the genetic variation of the var2csa locus.This method will facilitate a detailed analysis of var2csa genetic variation and can be adapted to sequence other hypervariable P. falciparum genes.


July 19, 2019  |  

Long-read sequencing across the C9orf72 ‘GGGGCC’ repeat expansion: implications for clinical use and genetic discovery efforts in human disease.

Many neurodegenerative diseases are caused by nucleotide repeat expansions, but most expansions, like the C9orf72 ‘GGGGCC’ (G4C2) repeat that causes approximately 5-7% of all amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) cases, are too long to sequence using short-read sequencing technologies. It is unclear whether long-read sequencing technologies can traverse these long, challenging repeat expansions. Here, we demonstrate that two long-read sequencing technologies, Pacific Biosciences’ (PacBio) and Oxford Nanopore Technologies’ (ONT), can sequence through disease-causing repeats cloned into plasmids, including the FTD/ALS-causing G4C2 repeat expansion. We also report the first long-read sequencing data characterizing the C9orf72 G4C2 repeat expansion at the nucleotide level in two symptomatic expansion carriers using PacBio whole-genome sequencing and a no-amplification (No-Amp) targeted approach based on CRISPR/Cas9.Both the PacBio and ONT platforms successfully sequenced through the repeat expansions in plasmids. Throughput on the MinION was a challenge for whole-genome sequencing; we were unable to attain reads covering the human C9orf72 repeat expansion using 15 flow cells. We obtained 8× coverage across the C9orf72 locus using the PacBio Sequel, accurately reporting the unexpanded allele at eight repeats, and reading through the entire expansion with 1324 repeats (7941 nucleotides). Using the No-Amp targeted approach, we attained >?800× coverage and were able to identify the unexpanded allele, closely estimate expansion size, and assess nucleotide content in a single experiment. We estimate the individual’s repeat region was >?99% G4C2 content, though we cannot rule out small interruptions.Our findings indicate that long-read sequencing is well suited to characterizing known repeat expansions, and for discovering new disease-causing, disease-modifying, or risk-modifying repeat expansions that have gone undetected with conventional short-read sequencing. The PacBio No-Amp targeted approach may have future potential in clinical and genetic counseling environments. Larger and deeper long-read sequencing studies in C9orf72 expansion carriers will be important to determine heterogeneity and whether the repeats are interrupted by non-G4C2 content, potentially mitigating or modifying disease course or age of onset, as interruptions are known to do in other repeat-expansion disorders. These results have broad implications across all diseases where the genetic etiology remains unclear.


July 19, 2019  |  

Reference grade characterization of polymorphisms in full-length HLA class I and II genes with short-read sequencing on the Ion PGM system and long-reads generated by Single Molecule, Real-time Sequencing on the PacBio platform

Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.


July 7, 2019  |  

ThermoAlign: a genome-aware primer design tool for tiled amplicon resequencing.

Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments – thermoalignments – across the genome to identify primers predicted to bind specifically to the target site. For amplification-based resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (˜85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.


July 7, 2019  |  

MHC class I diversity in chimpanzees and bonobos.

Major histocompatibility complex (MHC) class I genes are critically involved in the defense against intracellular pathogens. MHC diversity comparisons among samples of closely related taxa may reveal traces of past or ongoing selective processes. The bonobo and chimpanzee are the closest living evolutionary relatives of humans and last shared a common ancestor some 1 mya. However, little is known concerning MHC class I diversity in bonobos or in central chimpanzees, the most numerous and genetically diverse chimpanzee subspecies. Here, we used a long-read sequencing technology (PacBio) to sequence the classical MHC class I genes A, B, C, and A-like in 20 and 30 wild-born bonobos and chimpanzees, respectively, with a main focus on central chimpanzees to assess and compare diversity in those two species. We describe in total 21 and 42 novel coding region sequences for the two species, respectively. In addition, we found evidence for a reduced MHC class I diversity in bonobos as compared to central chimpanzees as well as to western chimpanzees and humans. The reduced bonobo MHC class I diversity may be the result of a selective process in their evolutionary past since their split from chimpanzees.


July 7, 2019  |  

Archetype JC polyomavirus prevails in a rare case of JC polyomavirus nephropathy and in stable renal transplant recipients with JC polyomavirus viruria.

JC polyomavirus (JCPyV) is reactivated in approximately 20% of renal transplant recipients and it may rarely cause JCPyV-associated nephropathy (JCPyVAN). Whereas progressive multifocal leukoencephalopathy of the brain is caused by rearranged neurotropic JCPyV, little is known about viral sequence variation in JCPyVAN due to the rarity of this condition.Using single-molecule real-time sequencing, characterization of full-length JCPyV genomes from urine and plasma of one JCPyVAN patient and twenty stable renal transplant recipients with JCPyV viruria was attempted. Sequence analysis of JCPyV strains was performed with the emphasis on the NCCR region, the major capsid protein gene VP1 and the large T antigen (LTag) gene.Exclusively archetype strains were identified in urine of the JCPyVAN patient. Full-length JCPyV sequences were not retrieved from plasma. Archetype strains were found in urine of nineteen stable renal transplant recipients, with JCPyV quasispecies detected in five samples. In a patient with minor graft dysfunction, a strain with archetype-like NCCR region was discovered. Individual point mutations were detected in both VP1 and LTag genes.Archetype JCPyV was dominant in the JCPyVAN patient and in stable renal transplant recipients. Archetype rather than rearranged JCPyV seems to drive the pathogenesis of JCPyVAN.


July 7, 2019  |  

Complete genome sequences of the Neethling-like lumpy skin disease virus strains obtained directly from three commercial live attenuated vaccines.

Lumpy skin disease virus (LSDV) causes an economically important disease in cattle. Here, we report the complete genome sequences of three LSDV strains obtained directly from the live attenuated vaccines: Lumpyvax (MSD Animal Health), Herbivac LS (Deltamune) and Lumpy Skin Disease Vaccine (Onderstepoort Biological Products). Copyright © 2016 Mathijs et al.


July 7, 2019  |  

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.