Structural variant calling combining Illumina and low-coverage Pacbio Detection of large genomic variation (structural variants) has proven challenging using short-read methods. Long-read approaches which can span these large events have promise to dramatically expand the ability to accurately call structural variants. Although sequencing with Pacific Biosciences (Pacbio) long-read technology has become increasingly high throughput, generating high coverage with the technology can still be limiting and investigators often would like to know what pacbio coverages are adequate to call structural variants. Here, we present a method to identify a substantially higher fraction of structural variants in the human genome using low-coverage…
Introduction: Long-read sequencing has revealed more than 20,000 structural variants spanning over 12 Mb in a healthy human genome. Short-read sequencing fails to detect most structural variants but has remained the more effective approach for small variants, due to 10-15% error rates in long reads, and copy-number variants (CNVs), due to lack of effective long-read variant callers. The development of PacBio highly accurate long reads (HiFi reads) with read lengths of 10-25 kb and quality >99% presents the opportunity to capture all classes of variation with one approach.Methods: We sequence the Genome in a Bottle benchmark sample HG002 and an…
Explore the types of human genomic variation and the diseases known to be caused by structural variants.
Interested to learn about pangenomes? Explore this guide to learn how they provide a more complete picture of the core genes of a given species and how that can provide better biological understanding.
Ellen Paxinos, a scientist at PacBio, shares her AGBT poster on work done in collaboration with reference lab Monogram Biosciences using Single Molecule, Real-Time (SMRT) sequencing to detect minor species and variants in HCV. Using two genotypes mixed together, the team was able to detect variants down to 1% and to identify both viral haplotypes from the data. Paxinos says the study is a model for looking at genomic variation in chronic viral infection.
In this Labroots webinar, Meredith Ashby, Director of Microbial Genomics at PacBio, describes the utility of highly accurate long-read sequencing, known as HiFi sequencing, to understand the SARs-CoV-2 viral genome. HiFi sequencing enables mutation phasing and rare variant detection to understand viral stability and mutation rates, as well as providing insights into viral population structure for monitoring viral evolution. Ashby also shares how HiFi sequencing can be used to explore the host immune response to COVID-19, specifically by providing full-length sequencing of the B cell repertoire, IGH locus and HLA genes. Access additional COVID-19 Sequencing Tools and Resources at at…
Domestication of clonally propagated crops such as pineapple from South America was hypothesized to be a ‘one-step operation’. We sequenced the genome of Ananas comosus var. bracteatus CB5 and assembled 513?Mb into 25 chromosomes with 29,412 genes. Comparison of the genomes of CB5, F153 and MD2 elucidated the genomic basis of fiber production, color formation, sugar accumulation and fruit maturation. We also resequenced 89 Ananas genomes. Cultivars ‘Smooth Cayenne’ and ‘Queen’ exhibited ancient and recent admixture, while ‘Singapore Spanish’ supported a one-step operation of domestication. We identified 25 selective sweeps, including a strong sweep containing a pair of tandemly duplicated…
The evolution of Bordetella pertussis from a common ancestor similar to Bordetella bronchiseptica has occurred through large-scale gene loss, inactivation and rearrangements, largely driven by the spread of insertion sequence element repeats throughout the genome. B. pertussis is widely considered to be monomorphic, and recent evolution of the B. pertussis genome appears to, at least in part, be driven by vaccine-based selection. Given the recent global resurgence of whooping cough despite the wide-spread use of vaccination, a more thorough understanding of B. pertussis genomics could be highly informative. In this chapter we discuss the evolution of B. pertussis, including how…
Dysregulation of alpha-synuclein expression has been implicated in the pathogenesis of synucleinopathies, in particular Parkinsontextquoterights Disease (PD) and Dementia with Lewy bodies (DLB). Previous studies have shown that the alternatively spliced isoforms of the SNCA gene are differentially expressed in different parts of the brain for PD and DLB patients. Similarly, SNCA isoforms with skipped exons can have a functional impact on the protein domains. The large intronic region of the SNCA gene was also shown to harbor structural variants that affect transcriptional levels. Here we apply the first study of using long read sequencing with targeted capture of both…
Brassica napus (AACC, 2n = 38) is an important oilseed crop grown worldwide. However, little is known about the population evolution of this species, the genomic difference between its major genetic groups, such as European and Asian rapeseed, and the impacts of historical large-scale introgression events on this young tetraploid. In this study, we reported the de novo assembly of the genome sequences of an Asian rapeseed (B. napus), Ningyou 7, and its four progenitors and compared these genomes with other available genomic data from diverse European and Asian cultivars. Our results showed that Asian rapeseed originally derived from European rapeseed but subsequently…
A total of 91 draft genome sequences were used to analyze isolates of Salmonella enterica serovar Enteritidis obtained from feral mice caught on poultry farms in Pennsylvania. One objective was to find mutations disrupting open reading frames (ORFs) and another was to determine if ORF-disruptive mutations were present in isolates obtained from other sources. A total of 83 mice were obtained between 1995-1998. Isolates separated into two genomic clades and 12 subgroups due to 742 mutations. Nineteen ORF-disruptive mutations were found, and in addition, bigA had exceptional heterogeneity requiring additional evaluation. The TRAMS algorithm detected only 6 ORF disruptions. The…
Anopheles funestus is one of the 3 most consequential and widespread vectors of human malaria in tropical Africa. However, the lack of a high-quality reference genome has hindered the association of phenotypic traits with their genetic basis in this important mosquito.Here we present a new high-quality A. funestus reference genome (AfunF3) assembled using 240× coverage of long-read single-molecule sequencing for contigging, combined with 100× coverage of short-read Hi-C data for chromosome scaffolding. The assembled contigs total 446 Mbp of sequence and contain substantial duplication due to alternative alleles present in the sequenced pool of mosquitos from the FUMOZ colony. Using…
Lacerta viridis and Lacerta bilineata are sister species of European green lizards (eastern and western clades, respectively) that, until recently, were grouped together as the L. viridis complex. Genetic incompatibilities were observed between lacertid populations through crossing experiments, which led to the delineation of two separate species within the L. viridis complex. The population history of these sister species and processes driving divergence are unknown. We constructed the first high-quality de novo genome assemblies for both L. viridis and L. bilineata through Illumina and PacBio sequencing, with annotation support provided from transcriptome sequencing of several tissues. To estimate gene flow…
As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise…
Newly emerged wheat blast disease is a serious threat to global wheat production. Wheat blast is caused by a distinct, exceptionally diverse lineage of the fungus causing rice blast disease. Through sequencing a recent field isolate, we report a reference genome that includes seven core chromosomes and mini-chromosome sequences that harbor effector genes normally found on ends of core chromosomes in other strains. No mini-chromosomes were observed in an early field strain, and at least two from another isolate each contain different effector genes and core chromosome end sequences. The mini-chromosome is enriched in transposons occurring most frequently at core…