June 1, 2021  |  

A genome assembly of the domestic goat from 70x coverage of single molecule, real-time sequence.

Goat is an important source of milk, meat, and fiber, especially in developing countries. An advantage of goats as livestock is the low maintenance requirements and high adaptability compared to other milk producers. The global population of domestic goats exceeds 800 million. In Africa, goat production is characterized by low productivity levels, and attempts to introduce more productive breeds have met with poor success due in part to nutritional constraints. It has been suggested that incorporation of selective breeding within the herds adapted for survival could represent one approach to improving food security across Africa. A recently produced genome assembly of a Chinese Yunnan breed goat, based on 192 Gb of short reads across a range of insert sizes from 180 bp to 20 kb, reported a contig N50 of 18.7 kb. The scaffold N50 was improved from 2.2 Mb to 3.1 Mb by addition of fosmid end sequence, with an estimated 140 million Ns in gaps and 91% coverage. The assembly has proven somewhat problematic for pursuing genome-wide association analysis with SNP arrays, apparently due in part to errors in ordering of markers using the draft genome. In order to provide a higher quality assembly, we sequenced a highly inbred, San Clemente breed goat genome using 458 SMRT cells on the Pacific Biosciences platform. These cells generated 193.5 Gbases of sequence after processing into subreads, with mean 5110 bases and max subread length of 40.5 kb. This sequence data generated an assembly using the recently reported MHAP error correction approach and Celera Assembler v8.2. The contig N50 was 2.5 Mb, with the largest contig spanning 19.5 Mb. Additional characteristics of the assembly will be presented.


June 1, 2021  |  

New advances in SMRT Sequencing facilitate multiplexing for de novo and structural variant studies

The latest advancements in Sequel II SMRT Sequencing have increased average read lengths up to 50% compared to Sequel II chemistry 1.0 which allows multiplexing of 2-3 small organisms (<500 Mb) such as insects and worms for producing reference quality assemblies, calling structural variants for up to 2 samples with ~3 Gb genomes, analysis of 48 microbial genomes, and up to 8 communities for metagenomic profiling in a single SMRT Cell 8M. With the improved processivity of the new Sequel II sequencing polymerase, more SMRTbell molecules reach rolling circle mode resulting in longer overall read lengths, thus allowing efficient detection of barcodes (up to 80%) in the SMRTbell templates. Multiplexing of genomes larger than microbial organisms is now achievable. In collaboration with the Wellcome Sanger Institute, we have developed a workflow for multiplexing two individual Anopheles coluzzii using as low as 150 ng genomic DNA per individual. The resulting assemblies had high contiguity (contig N50s over 3 Mb) and completeness (>98% of conserved genes) for both individuals. For microbial multiplexing, we multiplexed 48 microbes with varying complexities and sizes ranging 1.6-8.0 Mb in single SMRT Cell 8M. Using a new end-to-end analysis (Microbial Assembly Analysis, SMRT Link 8.0), assemblies resulted in complete circularized genomes (>200-fold coverage) and efficient detection of >3-200 kb plasmids. Finally, the long read lengths (>90 kb) allows detection of barcodes in large insert SMRTbell templates (>15 kb) thus facilitating multiplex of two human samples in 1 SMRT Cell 8M for detecting SVs, Indels and CNVs. Here, we present results and describe workflows for multiplexing samples for specific applications for SMRT Sequencing.


April 21, 2020  |  

Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid

Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time


April 21, 2020  |  

Full-length mRNA sequencing and gene expression profiling reveal broad involvement of natural antisense transcript gene pairs in pepper development and response to stresses.

Pepper is an important vegetable with great economic value and unique biological features. In the past few years, significant development has been made towards understanding the huge complex pepper genome; however, pepper functional genomics has not been well studied. To better understand the pepper gene structure and pepper gene regulation, we conducted full-length mRNA sequencing by PacBio sequencing and obtained 57862 high-quality full-length mRNA sequences derived from 18362 previously annotated and 5769 newly detected genes. New gene models were built that combined the full-length mRNA sequences and corrected approximately 500 fragmented gene models from previous annotations. Based on the full-length mRNA, we identified 4114 and 5880 pepper genes forming natural antisense transcript (NAT) genes in-cis and in-trans, respectively. Most of these genes accumulate small RNAs in their overlapping regions. By analyzing these NAT gene expression patterns in our transcriptome data, we identified many NAT pairs responsive to a variety of biological processes in pepper. Pepper formate dehydrogenase 1 (FDH1), which is required for R-gene-mediated disease resistance, may be regulated by nat-siRNAs and participate in a positive feedback loop in salicylic acid biosynthesis during resistance responses. Several cis-NAT pairs and subgroups of trans-NAT genes were responsive to pepper pericarp and placenta development, which may play roles in capsanthin and capsaicin biosynthesis. Using a comparative genomics approach, the evolutionary mechanisms of cis-NATs were investigated, and we found that an increase in intergenic sequences accounted for the loss of most cis-NATs, while transposon insertion contributed to the formation of most new cis-NATs. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.


April 21, 2020  |  

Defining transgene insertion sites and off-target effects of homology-based gene silencing informs the use of functional genomics tools in Phytophthora infestans.

DNA transformation and homology-based transcriptional silencing are frequently used to assess gene function in Phytophthora. Since unplanned side-effects of these tools are not well-characterized, we used P. infestans to study plasmid integration sites and whether knockdowns caused by homology-dependent silencing spreads to other genes. Insertions occurred both in gene-dense and gene-sparse regions but disproportionately near the 5′ ends of genes, which disrupted native coding sequences. Microhomology at the recombination site between plasmid and chromosome was common. Studies of transformants silenced for twelve different gene targets indicated that neighbors within 500-nt were often co-silenced, regardless of whether hairpin or sense constructs were employed and the direction of transcription of the target. However, cis-spreading of silencing did not occur in all transformants obtained with the same plasmid. Genome-wide studies indicated that unlinked genes with partial complementarity with the silencing-inducing transgene were not usually down-regulated. We learned that hairpin or sense transgenes were not co-silenced with the target in all transformants, which informs how screens for silencing should be performed. We conclude that transformation and gene silencing can be reliable tools for functional genomics in Phytophthora but must be used carefully, especially by testing for the spread of silencing to genes flanking the target.


April 21, 2020  |  

Klebsiella quasipneumoniae Provides a Window into Carbapenemase Gene Transfer, Plasmid Rearrangements, and Patient Interactions with the Hospital Environment.

Several emerging pathogens have arisen as a result of selection pressures exerted by modern health care. Klebsiella quasipneumoniae was recently defined as a new species, yet its prevalence, niche, and propensity to acquire antimicrobial resistance genes are not fully described. We have been tracking inter- and intraspecies transmission of the Klebsiella pneumoniae carbapenemase (KPC) gene, blaKPC, between bacteria isolated from a single institution. We applied a combination of Illumina and PacBio whole-genome sequencing to identify and compare K. quasipneumoniae from patients and the hospital environment over 10- and 5-year periods, respectively. There were 32 blaKPC-positive K. quasipneumoniae isolates, all of which were identified as K. pneumoniae in the clinical microbiology laboratory, from 8 patients and 11 sink drains, with evidence for seven separate blaKPC plasmid acquisitions. Analysis of a single subclade of K. quasipneumoniae subsp. quasipneumoniae (n?=?23 isolates) from three patients and six rooms demonstrated seeding of a sink by a patient, subsequent persistence of the strain in the hospital environment, and then possible transmission to another patient. Longitudinal analysis of this strain demonstrated the acquisition of two unique blaKPC plasmids and then subsequent within-strain genetic rearrangement through transposition and homologous recombination. Our analysis highlights the apparent molecular propensity of K. quasipneumoniae to persist in the environment as well as acquire carbapenemase plasmids from other species and enabled an assessment of the genetic rearrangements which may facilitate horizontal transmission of carbapenemases. Copyright © 2019 Mathers et al.


April 21, 2020  |  

The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications.

UNITE (https://unite.ut.ee/) is a web-based database and sequence management environment for the molecular identification of fungi. It targets the formal fungal barcode-the nuclear ribosomal internal transcribed spacer (ITS) region-and offers all ~1 000 000 public fungal ITS sequences for reference. These are clustered into ~459 000 species hypotheses and assigned digital object identifiers (DOIs) to promote unambiguous reference across studies. In-house and web-based third-party sequence curation and annotation have resulted in more than 275 000 improvements to the data over the past 15 years. UNITE serves as a data provider for a range of metabarcoding software pipelines and regularly exchanges data with all major fungal sequence databases and other community resources. Recent improvements include redesigned handling of unclassifiable species hypotheses, integration with the taxonomic backbone of the Global Biodiversity Information Facility, and support for an unlimited number of parallel taxonomic classification systems.


April 21, 2020  |  

The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination.

The terrestrial isopod Armadillidium vulgare is an original model to study the evolution of sex determination and symbiosis in animals. Its sex can be determined by ZW sex chromosomes, or by feminizing Wolbachia bacterial endosymbionts. Here, we report the sequence and analysis of the ZW female genome of A. vulgare. A distinguishing feature of the 1.72 gigabase assembly is the abundance of repeats (68% of the genome). We show that the Z and W sex chromosomes are essentially undifferentiated at the molecular level and the W-specific region is extremely small (at most several hundreds of kilobases). Our results suggest that recombination suppression has not spread very far from the sex-determining locus, if at all. This is consistent with A. vulgare possessing evolutionarily young sex chromosomes. We characterized multiple Wolbachia nuclear inserts in the A. vulgare genome, none of which is associated with the W-specific region. We also identified several candidate genes that may be involved in the sex determination or sexual differentiation pathways. The A. vulgare genome serves as a resource for studying the biology and evolution of crustaceans, one of the most speciose and emblematic metazoan groups. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

Characterizing the major structural variant alleles of the human genome.

In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity. Copyright © 2018 Elsevier Inc. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.