In this PAG 2018 presentation, Marty Badgett of PacBio, shares updates on PacBio products and performance. He highlights high-quality genome assembles for Arabidopsis, rice, and maize, the SMRTbell Express Template…
This webinar, presented by Nisha Pillai, provides an overview of bioinformatics approaches for PacBio Single Molecule, Real-Time (SMRT) Sequencing data and discusses the whole genome sequencing application including: assembly workflow…
PAG Conference: Using cattle subspecies crosses to explore chromosome of origin expression through Iso-seq analysis
In this PAG 2018 presentation, John Williams of University of Adelaide, presents research on using PacBio SMRT Sequencing to explore the genetic origins of cattle subspecies, Angus (Bos taurus taurus)…
In this PacBio User Group Meeting presentation, Tina Graves-Lindsay of the McDonnell Genome Institute and the Genome Reference Consortium speaks about the importance of phasing human reference genomes. Her team…
The release of the PacBio Sequel II System in 2019 brought dramatic throughput improvements and protocols for producing a new data type, highly accurate long reads or HiFi reads. PacBio…
The domestic pig (Sus scrofa) is important both as a food source and as a biomedical model with high anatomical and immunological similarity to humans. The draft reference genome (Sscrofa10.2) represented a purebred female pig from a commercial pork production breed (Duroc), and was established using older clone-based sequencing methods. The Sscrofa10.2 assembly was incomplete and unresolved redundancies, short range order and orientation errors and associated misassembled genes limited its utility. We present two highly contiguous chromosome-level genome assemblies created with more recent long read technologies and a whole genome shotgun strategy, one for the same Duroc female (Sscrofa11.1) and one for an outbred, composite breed male animal commonly used for commercial pork production (USMARCv1.0). Both assemblies are of substantially higher (>90-fold) continuity and accuracy compared to the earlier reference, and the availability of two independent assemblies provided an opportunity to identify large-scale variants and to error-check the accuracy of representation of the genome. We propose that the improved Duroc breed assembly (Sscrofa11.1) become the reference genome for genomic research in pigs.
The zebra mussel, Dreissena polymorpha, continues to spread from its native range in Eurasia to Europe and North America, causing billions of dollars in damage and dramatically altering invaded aquatic ecosystems. Despite these impacts, there are few genomic resources for Dreissena or related bivalves, with nearly 450 million years of divergence between zebra mussels and its closest sequenced relative. Although the D. polymorpha genome is highly repetitive, we have used a combination of long-read sequencing and Hi-C-based scaffolding to generate the highest quality molluscan assembly to date. Through comparative analysis and transcriptomics experiments we have gained insights into processes that likely control the invasive success of zebra mussels, including shell formation, synthesis of byssal threads, and thermal tolerance. We identified multiple intact Steamer-Like Elements, a retrotransposon that has been linked to transmissible cancer in marine clams. We also found that D. polymorpha have an unusual 67 kb mitochondrial genome containing numerous tandem repeats, making it the largest observed in Eumetazoa. Together these findings create a rich resource for invasive species research and control efforts.
Comparative genomics reveals unique wood-decay strategies and fruiting body development in the Schizophyllaceae.
Agaricomycetes are fruiting body-forming fungi that produce some of the most efficient enzyme systems to degrade wood. Despite decades-long interest in their biology, the evolution and functional diversity of both wood-decay and fruiting body formation are incompletely known. We performed comparative genomic and transcriptomic analyses of wood-decay and fruiting body development in Auriculariopsis ampla and Schizophyllum commune (Schizophyllaceae), species with secondarily simplified morphologies, an enigmatic wood-decay strategy and weak pathogenicity to woody plants. The plant cell wall-degrading enzyme repertoires of Schizophyllaceae are transitional between those of white rot species and less efficient wood-degraders such as brown rot or mycorrhizal fungi. Rich repertoires of suberinase and tannase genes were found in both species, with tannases restricted to Agaricomycetes that preferentially colonize bark-covered wood, suggesting potential complementation of their weaker wood-decaying abilities and adaptations to wood colonization through the bark. Fruiting body transcriptomes revealed a high rate of divergence in developmental gene expression, but also several genes with conserved expression patterns, including novel transcription factors and small-secreted proteins, some of the latter which might represent fruiting body effectors. Taken together, our analyses highlighted novel aspects of wood-decay and fruiting body development in an important family of mushroom-forming fungi. © 2019 The Authors. New Phytologist © 2019 New Phytologist Trust.
Chromosome-level reference genome of X12, a highly virulent race of the soybean cyst nematode Heterodera glycines.
Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean that is spreading across major soybean production regions worldwide. Increased SCN virulence has recently been observed in both the United States and China. However, no study has reported a genome assembly for H. glycines at the chromosome scale. Herein, the first chromosome-level reference genome of X12, an unusual SCN race with high infection ability, is presented. Using whole-genome shotgun (WGS) sequencing, PacBio sequencing, Illumina paired-end sequencing, 10X Genomics linked reads and high-throughput chromatin conformation capture (Hi-C) genome scaffolding techniques, a 141.01-Mb assembled genome was obtained with scaffold and contig N50 sizes of 16.27 Mb and 330.54 kb, respectively. The assembly showed high integrity and quality, with over 90% of Illumina reads mapped to the genome. The assembly quality was evaluated using Core Eukaryotic Genes Mapping Approach (CEGMA) and Benchmarking Universal Single-Copy Orthologs (BUSCO). A total of 11,882 genes were predicted using De novo, Homolog and RNAseq data generated from eggs, second-stage juveniles (J2), third-stage juveniles (J3) and fourth-stage juveniles (J4) of X12, and 79.0% of homologous sequences were annotated in the genome. These high-quality X12 genome data will provide valuable resources for research in a broad range of areas, including fundamental nematode biology, SCN-plant interactions and coevolution, and also contribute to the development of technology for overall SCN management. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.
The persimmon genome reveals clues to the evolution of a lineage-specific sex determination system in plants
Most angiosperms bear hermaphroditic flowers, but a few species have evolved outcrossing strategies, such as dioecy, the presence of separate male and female individuals. We previously investigated the mechanisms underlying dioecy in diploid persimmon (D. lotus) and found that male flowers are specified by repression of the autosomal gene MeGI by its paralog, the Y-encoded pseudo-gene OGI. This mechanism is thought to be lineage-specific, but its evolutionary path remains unknown. Here, we developed a full draft of the diploid persimmon genome (D. lotus), which revealed a lineage-specific genome-wide paleoduplication event. Together with a subsequent persimmon-specific duplication(s), these events resulted in the presence of three paralogs, MeGI, OGI and newly identified Sister of MeGI (SiMeGI), from the single original gene. Evolutionary analysis suggested that MeGI underwent adaptive evolution after the paleoduplication event. Transformation of tobacco plants with MeGI and SiMeGI revealed that MeGI specifically acquired a new function as a repressor of male organ development, while SiMeGI presumably maintained the original function. Later, local duplication spawned MeGI’s regulator OGI, completing the path leading to dioecy. These findings exemplify how duplication events can provide flexible genetic material available to help respond to varying environments and provide interesting parallels for our understanding of the mechanisms underlying the transition into dieocy in plants.
This study reports the first haplotype phased reference quality genome assembly of textquoteleftMurrahtextquoteright an Indian breed of river buffalo. A mother-father-progeny trio was used for sequencing so that the individual haplotypes could be assembled in the progeny. Parental DNA samples were sequenced on the Illumina platform to generate a total of 274 Gb paired-end data. The progeny DNA sample was sequenced using PacBio long reads and 10x Genomics linked reads at 166x coverage along with 802Gb of optical mapping data. Trio binning based FALCON assembly of each haplotype was scaffolded with 10x Genomics reads and superscaffolded with BioNano Maps to build reference quality assembly of sire and dam haplotypes of 2.63Gb and 2.64Gb with just 59 and 64 scaffolds and N50 of 81.98Mb and 83.23Mb, respectively. BUSCO single copy core gene set coverage was > 91.25%, and gVolante-CEGMA completeness was >96.14% for both haplotypes. Finally, RaGOO was used to order and build the chromosomal level assembly with 25 scaffolds and N50 of 117.48 Mb (sire haplotype) and 118.51 Mb (dam haplotype). The improved haplotype phased genome assembly of river buffalo may provide valuable resources to discover molecular mechanisms related to milk production and reproduction traits.
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. These assemblies can be created in various ways, such as use of tissues that contain single-haplotype (haploid) genomes, or by co-sequencing of parental genomes, but these approaches can be impractical in many situations. We present FALCON-Phase, which integrates long-read sequencing data and ultra-long-range Hi-C chromatin interaction data of a diploid individual to create high-quality, phased diploid genome assemblies. The method was evaluated by application to three datasets, including human, cattle, and zebra finch, for which high-quality, fully haplotype resolved assemblies were available for benchmarking. Phasing algorithm accuracy was affected by heterozygosity of the individual sequenced, with higher accuracy for cattle and zebra finch (>97%) compared to human (82%). In addition, scaffolding with the same Hi-C chromatin contact data resulted in phased chromosome-scale scaffolds.
Streptococcus oralis subsp. dentisani Produces Monolateral Serine-Rich Repeat Protein Fibrils, One of Which Contributes to Saliva Binding via Sialic Acid.
Our studies reveal that the oral colonizer and cause of infective endocarditis Streptococcus oralis subsp. dentisani displays a striking monolateral distribution of surface fibrils. Furthermore, our data suggest that these fibrils impact the structure of adherent bacterial chains. Mutagenesis studies indicate that these fibrils are dependent on three serine-rich repeat proteins (SRRPs), here named fibril-associated protein A (FapA), FapB, and FapC, and that each SRRP forms a different fibril with a distinct distribution. SRRPs are a family of bacterial adhesins that have diverse roles in adhesion and that can bind to different receptors through modular nonrepeat region domains. Amino acid sequence and predicted structural similarity searches using the nonrepeat regions suggested that FapA may contribute to interspecies interactions, that FapA and FapB may contribute to intraspecies interactions, and that FapC may contribute to sialic acid binding. We demonstrate that a fapC mutant was significantly reduced in binding to saliva. We confirmed a role for FapC in sialic acid binding by demonstrating that the parental strain was significantly reduced in adhesion upon addition of a recombinantly expressed, sialic acid-specific, carbohydrate binding module, while the fapC mutant was not reduced. However, mutation of a residue previously shown to be essential for sialic acid binding did not decrease bacterial adhesion, leaving the precise mechanism of FapC-mediated adhesion to sialic acid to be defined. We also demonstrate that the presence of any one of the SRRPs is sufficient for efficient biofilm formation. Similar structures were observed on all infective endocarditis isolates examined, suggesting that this distribution is a conserved feature of this S. oralis subspecies.Copyright © 2019 American Society for Microbiology.
A chromosomal-level genome assembly for the insect vector for Chagas disease, Triatoma rubrofasciata.
Triatoma rubrofasciata is a widespread pathogen vector for Chagas disease, an illness that affects approximately 7 million people worldwide. Despite its importance to human health, its evolutionary origin has not been conclusively determined. A reference genome for T. rubrofasciata is not yet available.We have sequenced the genome of a female individual with T. rubrofasciatausing a single molecular DNA sequencing technology (i.e., PacBio Sequel platform) and have successfully reconstructed a whole-genome (680-Mb) assembly that covers 90% of the nuclear genome (757 Mb). Through Hi-C analysis, we have reconstructed full-length chromosomes of this female individual that has 13 unique chromosomes (2n = 24 = 22 + X1 + X2) with a contig N50 of 2.72 Mb and a scaffold N50 of 50.7 Mb. This genome has achieved a high base-level accuracy of 99.99%. This platinum-grade genome assembly has 12,691 annotated protein-coding genes. More than 95.1% of BUSCO genes were single-copy completed, indicating a high level of completeness of the genome.The platinum-grade genome assembly and its annotation provide valuable information for future in-depth comparative genomics studies, including sexual determination analysis in T. rubrofasciata and the pathogenesis of Chagas disease. © The Author(s) 2019. Published by Oxford University Press.
Yellowhorn (Xanthoceras sorbifolium) is a species of the Sapindaceae family native to China and is an oil tree that can withstand cold and drought conditions. A pseudomolecule-level genome assembly for this species will not only contribute to understanding the evolution of its genes and chromosomes but also bring yellowhorn breeding into the genomic era.Here, we generated 15 pseudomolecules of yellowhorn chromosomes, on which 97.04% of scaffolds were anchored, using the combined Illumina HiSeq, Pacific Biosciences Sequel, and Hi-C technologies. The length of the final yellowhorn genome assembly was 504.2 Mb with a contig N50 size of 1.04 Mb and a scaffold N50 size of 32.17 Mb. Genome annotation revealed that 68.67% of the yellowhorn genome was composed of repetitive elements. Gene modelling predicted 24,672 protein-coding genes. By comparing orthologous genes, the divergence time of yellowhorn and its close sister species longan (Dimocarpus longan) was estimated at ~33.07 million years ago. Gene cluster and chromosome synteny analysis demonstrated that the yellowhorn genome shared a conserved genome structure with its ancestor in some chromosomes.This genome assembly represents a high-quality reference genome for yellowhorn. Integrated genome annotations provide a valuable dataset for genetic and molecular research in this species. We did not detect whole-genome duplication in the genome. The yellowhorn genome carries syntenic blocks from ancient chromosomes. These data sources will enable this genome to serve as an initial platform for breeding better yellowhorn cultivars. © The Author(s) 2019. Published by Oxford University Press.