August 19, 2021  |  Human genetics research

Application Brochure: Scalable human whole genome HiFi sequencing for rare and inherited disease research

PacBio highly accurate long reads – HiFi reads – offer a single-platform solution for rare and inherited disease research, elucidating suspected genetic causes of disease in up to ~50% of cases that have not previously been explained using short-read exome or whole genome sequencing. PacBio offers an efficient workflow, developed in collaboration with Children’s Mercy Kansas City, which provides a scalable solution for sequencing 100s to 1000s of whole human genomes per year on the Sequel II and Sequel IIe Systems.

August 19, 2021  |  Infectious disease research

Application Brief: Variant detection using whole genome sequencing with HiFi reads – Best Practices

With highly accurate long reads (HiFi reads) from the Sequel II or IIe Systems you can comprehensively detect variants in 100s to 1000s of genomes in a year. HiFi reads provide high precision and recall for single nucleotide variants (SNVs), indels, structural variants (SVs), and copy number variants (CNVs), including in difficult-to-map repetitive regions.

June 1, 2021  |  

Joint calling and PacBio SMRT Sequencing for indel and structural variant detection in populations

Fast and effective variant calling algorithms have been crucial to the successful application of DNA sequencing in human genetics. In particular, joint calling – in which reads from multiple individuals are pooled to increase power for shared variants – is an important tool for population surveys of variation. Joint calling was applied by the 1000 Genomes Project to identify variants across many individuals each sequenced to low coverage (about 5-fold). This approach successfully found common small variants, but broadly missed structural variants and large indels for which short-read sequencing has limited sensitivity. To support use of large variants in rare disease and common trait association studies, it is necessary to perform population-scale surveys with a technology effective at detecting indels and structural variants, such as PacBio SMRT Sequencing. For these studies, it is important to have a joint calling workflow that works with PacBio reads. We have developed pbsv, an indel and structural variant caller for PacBio reads, that provides a two-step joint calling workflow similar to that used to build the ExAC database. The first stage, discovery, is performed separately for each sample and consolidates whole genome alignments into a sparse representation of potentially variant loci. The second stage, calling, is performed on all samples together and considers only the signatures identified in the discovery stage. We applied the pbsv joint calling workflow to PacBio reads from twenty human genomes, with coverage ranging from 5-fold to 80-fold per sample for a total of 460-fold. The analysis required only 102 CPU hours, and identified over 800,000 indels and structural variants, including hundreds of inversions and translocations, many times more than discovered with short-read sequencing. The workflow is scalable to thousands of samples. The ongoing application of this workflow to thousands of samples will provide insight into the evolution and functional importance of large variants in human evolution and disease.

June 1, 2021  |  

Comprehensive variant detection in a human genome with highly accurate long reads

Introduction: Long-read sequencing has been applied successfully to assemble genomes and detect structural variants. However, due to high raw-read error rates (10-15%), it has remained difficult to call small variants from long reads. Recent improvements in library preparation and sequencing chemistry have increased length, accuracy, and throughput of PacBio circular consensus sequencing (CCS) reads, resulting in 15-20kb reads with average read quality above 99%. Materials and Methods: We sequenced a library from human reference sample HG002 to 18-fold coverage on the PacBio Sequel II with two SMRT Cells 8M. The CCS algorithm was used to generate highly accurate (average 99.9%) 12.9kb reads, which were mapped to the hg19 reference with pbmm2. We detected small variants using Google DeepVariant with a model trained for CCS and phased the variants using WhatsHap. Structural variants were detected with pbsv. Variant calls were evaluated against Genome in a Bottle (GIAB) benchmarks. Results: With these reads, DeepVariant achieves SNP and Indel F1 scores of 99.70% and 96.59% against the GIAB truth set, and pbsv achieves 97.72% recall on structural variants longer than 50bp. Using WhatsHap, small variants were phased into haplotype blocks with 145kb N50. The improved mappability of long reads allows us to align to and detect variants in medically relevant genes such as CYP2D6 and PMS2 that have proven “difficult-to-map” with short reads. Conclusions: These highly accurate long reads combine the mappability and ability to detect structural variants of long reads with the accuracy and ability to detect small variants of short reads.

July 19, 2019  |  

An incomplete understanding of human genetic variation.

Deciphering the genetic basis of human disease requires a comprehensive knowledge of genetic variants irrespective of their class or frequency. Although an impressive number of human genetic variants have been catalogued, a large fraction of the genetic difference that distinguishes two human genomes is still not understood at the base-pair level. This is because the emphasis has been on single-nucleotide variation as opposed to less tractable and more complex genetic variants, including indels and structural variants. The latter, we propose, will have a large impact on human phenotypes but require a more systematic assessment of genomes at deeper coverage and alternate sequencing and mapping technologies. Copyright © 2016 by the Genetics Society of America.

July 7, 2019  |  

BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection.

Although recent developed algorithms have integrated multiple signals to improve sensitivity for insertion and deletion (INDEL) detection, they are far from being perfect and still have great limitations in detecting a full size range of INDELs. Here we present BreakSeek, a novel breakpoint-based algorithm, which can unbiasedly and efficiently detect both homozygous and heterozygous INDELs, ranging from several base pairs to over thousands of base pairs, with accurate breakpoint and heterozygosity rate estimations. Comprehensive evaluations on both simulated and real datasets revealed that BreakSeek outperformed other existing methods on both sensitivity and specificity in detecting both small and large INDELs, and uncovered a significant amount of novel INDELs that were missed before. In addition, by incorporating sophisticated statistic models, we for the first time investigated and demonstrated the importance of handling false and conflicting signals for multi-signal integrated methods.© The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

July 7, 2019  |  

The challenges and importance of structural variation detection in livestock.

Recent studies in humans and other model organisms have demonstrated that structural variants (SVs) comprise a substantial proportion of variation among individuals of each species. Many of these variants have been linked to debilitating diseases in humans, thereby cementing the importance of refining methods for their detection. Despite progress in the field, reliable detection of SVs still remains a problem even for human subjects. Many of the underlying problems that make SVs difficult to detect in humans are amplified in livestock species, whose lower quality genome assemblies and incomplete gene annotation can often give rise to false positive SV discoveries. Regardless of the challenges, SV detection is just as important for livestock researchers as it is for human researchers, given that several productive traits and diseases have been linked to copy number variations (CNVs) in cattle, sheep, and pig. Already, there is evidence that many beneficial SVs have been artificially selected in livestock such as a duplication of the agouti signaling protein gene that causes white coat color in sheep. In this review, we will list current SV and CNV discoveries in livestock and discuss the problems that hinder routine discovery and tracking of these polymorphisms. We will also discuss the impacts of selective breeding on CNV and SV frequencies and mention how SV genotyping could be used in the future to improve genetic selection.

July 7, 2019  |  

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

July 7, 2019  |  

The complete chloroplast genome sequence of tung tree (Vernicia fordii): Organization and phylogenetic relationships with other angiosperms.

Tung tree (Vernicia fordii) is an economically important tree widely cultivated for industrial oil production in China. To better understand the molecular basis of tung tree chloroplasts, we sequenced and characterized its genome using PacBio RS II sequencing platforms. The chloroplast genome was sequenced with 161,528?bp in length, composed with one pair of inverted repeats (IRs) of 26,819?bp, which were separated by one small single copy (SSC; 18,758?bp) and one large single copy (LSC; 89,132?bp). The genome contains 114 genes, coding for 81 protein, four ribosomal RNAs and 29 transfer RNAs. An expansion with integration of an additional rps19 gene in the IR regions was identified. Compared to the chloroplast genome of Jatropha curcas, a species from the same family, the tung tree chloroplast genome is distinct with 85 single nucleotide polymorphisms (SNPs) and 82 indels. Phylogenetic analysis suggests that V. fordii is a sister species with J. curcas within the Eurosids I. The nucleotide sequence provides vital molecular information for understanding the biology of this important oil tree.

July 7, 2019  |  

Zinc resistance within swine associated methicillin resistant staphylococcus aureus (MRSA) Isolates in the USA is associated with MLST lineage.

Zinc resistance in livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) sequence type (ST) 398 is primarily mediated by the czrC gene co-located with the mecA gene, encoding methicillin resistance, within the type V SCCmec element. Because czrC and mecA are located within the same mobile genetic element, it has been suggested that the use of in feed zinc as an antidiarrheal agent has the potential to contribute to the emergence and spread of MRSA in swine through increased selection pressure to maintain the SCCmec element in isolates obtained from pigs. In this study we report the prevalence of the czrC gene and phenotypic zinc resistance in US swine associated LA-MRSA ST5 isolates, MRSA ST5 isolates from humans with no swine contact, and US swine associated LA-MRSA ST398 isolates. We demonstrate that the prevalence of zinc resistance in US swine associated LA-MRSA ST5 isolates was significantly lower than the prevalence of zinc resistance in MRSA ST5 isolates from humans with no swine contact, swine associated LA-MRSA ST398 isolates, and previous reports describing zinc resistance in other LA-MRSA ST398 isolates. Collectively our data suggest that selection pressure associated with zinc supplementation in feed is unlikely to have played a significant role in the emergence of LA-MRSA ST5 in the US swine population. Additionally, our data indicate that zinc resistance is associated with MLST lineage suggesting a potential link between genetic lineage and carriage of resistance determinants.Importance Our data suggest that coselection thought to be associated with the use of zinc in feed as an antimicrobial agent is not playing a role in the emergence of livestock-associated methicillin resistant Staphylococcus aureus (LA-MRSA) ST5 in the US swine population. Additionally, our data indicate that zinc resistance is more associated with multi locus sequence type (MLST) lineage suggesting a potential link between genetic lineage and carriage of resistance markers. This information is important to public health professionals, veterinarians, producers, and consumers. Copyright © 2017 American Society for Microbiology.

July 7, 2019  |  

Population genomics of picophytoplankton unveils novel chromosome hypervariability.

Tiny photosynthetic microorganisms that form the picoplankton (between 0.3 and 3 µm in diameter) are at the base of the food web in many marine ecosystems, and their adaptability to environmental change hinges on standing genetic variation. Although the genomic and phenotypic diversity of the bacterial component of the oceans has been intensively studied, little is known about the genomic and phenotypic diversity within each of the diverse eukaryotic species present. We report the level of genomic diversity in a natural population of Ostreococcus tauri (Chlorophyta, Mamiellophyceae), the smallest photosynthetic eukaryote. Contrary to the expectations of clonal evolution or cryptic species, the spectrum of genomic polymorphism observed suggests a large panmictic population (an effective population size of 1.2 × 10(7)) with pervasive evidence of sexual reproduction. De novo assemblies of low-coverage chromosomes reveal two large candidate mating-type loci with suppressed recombination, whose origin may pre-date the speciation events in the class Mamiellophyceae. This high genetic diversity is associated with large phenotypic differences between strains. Strikingly, resistance of isolates to large double-stranded DNA viruses, which abound in their natural environment, is positively correlated with the size of a single hypervariable chromosome, which contains 44 to 156 kb of strain-specific sequences. Our findings highlight the role of viruses in shaping genome diversity in marine picoeukaryotes.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.