April 21, 2020  |  

Mitochondrial DNA and their nuclear copies in the parasitic wasp Pteromalus puparum: A comparative analysis in Chalcidoidea.

Chalcidoidea (chalcidoid wasps) are an abundant and megadiverse insect group with both ecological and economical importance. Here we report a complete mitochondrial genome in Chalcidoidea from Pteromalus puparum (Pteromalidae). Eight tandem repeats followed by 6 reversed repeats were detected in its 3308?bp control region. This long and complex control region may explain failures of amplifying and sequencing of complete mitochondrial genomes in some chalcidoids. In addition to 37 typical mitochondrial genes, an extra identical isoleucine tRNA (trnI) was detected at the opposite end of the control region. This recent mitochondrial gene duplication indicates that gene arrangements in chalcidoids are ongoing. A comparison among available chalcidoid mitochondrial genomes reveals rapid gene order rearrangements overall and high protein substitution rates in most chalcidoid taxa. In addition, we identified 24 nuclear sequences of mitochondrial origin (NUMTs) in P. puparum, summing up to 9989?bp, with 3617?bp of these NUMTs originating from mitochondrial coding regions. NUMTs abundance in P. puparum is only one-twelfth of that in its relative, Nasonia vitripennis. Based on phylogenetic analysis, we provide evidence that a faster nuclear degradation rate contributes to the reduced NUMT numbers in P. puparum. Overall, our study shows unusually high rates of mitochondrial evolution and considerable variation in NUMT accumulation in Chalcidoidea. Copyright © 2018. Published by Elsevier B.V.

April 21, 2020  |  

Heterochromatin-enriched assemblies reveal the sequence and organization of the Drosophila melanogaster Y chromosome.

Heterochromatic regions of the genome are repeat-rich and poor in protein coding genes, and are therefore underrepresented in even the best genome assemblies. One of the most difficult regions of the genome to assemble are sex-limited chromosomes. The Drosophila melanogaster Y chromosome is entirely heterochromatic, yet has wide-ranging effects on male fertility, fitness, and genome-wide gene expression. The genetic basis of this phenotypic variation is difficult to study, in part because we do not know the detailed organization of the Y chromosome. To study Y chromosome organization in D. melanogaster, we develop an assembly strategy involving the in silico enrichment of heterochromatic long single-molecule reads and use these reads to create targeted de novo assemblies of heterochromatic sequences. We assigned contigs to the Y chromosome using Illumina reads to identify male-specific sequences. Our pipeline extends the D. melanogaster reference genome by 11.9 Mb, closes 43.8% of the gaps, and improves overall contiguity. The addition of 10.6 MB of Y-linked sequence permitted us to study the organization of repeats and genes along the Y chromosome. We detected a high rate of duplication to the pericentric regions of the Y chromosome from other regions in the genome. Most of these duplicated genes exist in multiple copies. We detail the evolutionary history of one sex-linked gene family, crystal-Stellate While the Y chromosome does not undergo crossing over, we observed high gene conversion rates within and between members of the crystal-Stellate gene family, Su(Ste), and PCKR, compared to genome-wide estimates. Our results suggest that gene conversion and gene duplication play an important role in the evolution of Y-linked genes. Copyright © 2019 Chang and Larracuente.

April 21, 2020  |  

Population Genome Sequencing of the Scab Fungal Species Venturia inaequalis, Venturia pirina, Venturia aucupariae and Venturia asperata.

The Venturia genus comprises fungal species that are pathogens on Rosaceae host plants, including V. inaequalis and V. asperata on apple, V. aucupariae on sorbus and V. pirina on pear. Although the genetic structure of V. inaequalis populations has been investigated in detail, genomic features underlying these subdivisions remain poorly understood. Here, we report whole genome sequencing of 87 Venturia strains that represent each species and each population within V. inaequalis We present a PacBio genome assembly for the V. inaequalis EU-B04 reference isolate. The size of selected genomes was determined by flow cytometry, and varied from 45 to 93 Mb. Genome assemblies of V. inaequalis and V. aucupariae contain a high content of transposable elements (TEs), most of which belong to the Gypsy or Copia LTR superfamilies and have been inactivated by Repeat-Induced Point mutations. The reference assembly of V. inaequalis presents a mosaic structure of GC-equilibrated regions that mainly contain predicted genes and AT-rich regions, mainly composed of TEs. Six pairs of strains were identified as clones. Single-Nucleotide Polymorphism (SNP) analysis between these clones revealed a high number of SNPs that are mostly located in AT-rich regions due to misalignments and allowed determining a false discovery rate. The availability of these genome sequences is expected to stimulate genetics and population genomics research of Venturia pathogens. Especially, it will help understanding the evolutionary history of Venturia species that are pathogenic on different hosts, a history that has probably been substantially influenced by TEs.Copyright © 2019 Le Cam et al.

April 21, 2020  |  

Competition between mobile genetic elements drives optimization of a phage-encoded CRISPR-Cas system: insights from a natural arms race.

CRISPR-Cas systems function as adaptive immune systems by acquiring nucleotide sequences called spacers that mediate sequence-specific defence against competitors. Uniquely, the phage ICP1 encodes a Type I-F CRISPR-Cas system that is deployed to target and overcome PLE, a mobile genetic element with anti-phage activity in Vibrio cholerae. Here, we exploit the arms race between ICP1 and PLE to examine spacer acquisition and interference under laboratory conditions to reconcile findings from wild populations. Natural ICP1 isolates encode multiple spacers directed against PLE, but we find that single spacers do not interfere equally with PLE mobilization. High-throughput sequencing to assay spacer acquisition reveals that ICP1 can also acquire spacers that target the V. cholerae chromosome. We find that targeting the V. cholerae chromosome proximal to PLE is sufficient to block PLE and is dependent on Cas2-3 helicase activity. We propose a model in which indirect chromosomal spacers are able to circumvent PLE by Cas2-3-mediated processive degradation of the V. cholerae chromosome before PLE mobilization. Generally, laboratory-acquired spacers are much more diverse than the subset of spacers maintained by ICP1 in nature, showing how evolutionary pressures can constrain CRISPR-Cas targeting in ways that are often not appreciated through in vitro analyses. This article is part of a discussion meeting issue ‘The ecology and evolution of prokaryotic CRISPR-Cas adaptive immune systems’.

April 21, 2020  |  

Sequence properties of certain GC rich avian genes, their origins and absence from genome assemblies: case studies.

More and more eukaryotic genomes are sequenced and assembled, most of them presented as a complete model in which missing chromosomal regions are filled by Ns and where a few chromosomes may be lacking. Avian genomes often contain sequences with high GC content, which has been hypothesized to be at the origin of many missing sequences in these genomes. We investigated features of these missing sequences to discover why some may not have been integrated into genomic libraries and/or sequenced.The sequences of five red jungle fowl cDNA models with high GC content were used as queries to search publicly available datasets of Illumina and Pacbio sequencing reads. These were used to reconstruct the leptin, TNFa, MRPL52, PCP2 and PET100 genes, all of which are absent from the red jungle fowl genome model. These gene sequences displayed elevated GC contents, had intron sizes that were sometimes larger than non-avian orthologues, and had non-coding regions that contained numerous tandem and inverted repeat sequences with motifs able to assemble into stable G-quadruplexes and intrastrand dyadic structures. Our results suggest that Illumina technology was unable to sequence the non-coding regions of these genes. On the other hand, PacBio technology was able to sequence these regions, but with dramatically lower efficiency than would typically be expected.High GC content was not the principal reason why numerous GC-rich regions of avian genomes are missing from genome assembly models. Instead, it is the presence of tandem repeats containing motifs capable of assembling into very stable secondary structures that is likely responsible.

April 21, 2020  |  

The Impact of cDNA Normalization on Long-Read Sequencing of a Complex Transcriptome

Normalization of cDNA is widely used to improve the coverage of rare transcripts in analysis of transcriptomes employing next-generation sequencing. Recently, long-read technology has been emerging as a powerful tool for sequencing and construction of transcriptomes, especially for complex genomes containing highly similar transcripts and transcript-spliced isoforms. Here, we analyzed the transcriptome of sugarcane, with a highly polyploidy plant genome, by PacBio isoform sequencing (Iso-Seq) of two different cDNA library preparations, with and without a normalization step. The results demonstrated that, while the two libraries included many of the same transcripts, many longer transcripts were removed and many new generally shorter transcripts were detected by normalization. For the same input cDNA and the same data yield, the normalized library recovered more total transcript isoforms, number of predicted gene families and orthologous groups, resulting in a higher representation for the sugarcane transcriptome, compared to the non-normalized library. The non-normalized library, on the other hand, included a wider transcript length range with more longer transcripts above ~1.25 kb, more transcript isoforms per gene family and gene ontology terms per transcript. A large proportion of the unique transcripts comprising ~52% of the normalized library were expressed at a lower level than the unique transcripts from the non-normalized library, across three tissue types tested including leaf, stalk and root. About 83% of the total 5,348 predicted long noncoding transcripts was derived from the normalized library, of which ~80% was derived from the lowly expressed fraction. Functional annotation of the unique transcripts suggested that each library enriched different functional transcript fractions. This demonstrated the complementation of the two approaches in obtaining a complete transcriptome of a complex genome at the sequencing depth used in this study.

April 21, 2020  |  

A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds.

The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map.Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel_HAv3) is significantly more contiguous and complete than the previous one (Amel_4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor >?98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features.The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.

April 21, 2020  |  

Assignment of virus and antimicrobial resistance genes to microbial hosts in a complex microbial community by combined long-read assembly and proximity ligation.

We describe a method that adds long-read sequencing to a mix of technologies used to assemble a highly complex cattle rumen microbial community, and provide a comparison to short read-based methods. Long-read alignments and Hi-C linkage between contigs support the identification of 188 novel virus-host associations and the determination of phage life cycle states in the rumen microbial community. The long-read assembly also identifies 94 antimicrobial resistance genes, compared to only seven alleles in the short-read assembly. We demonstrate novel techniques that work synergistically to improve characterization of biological features in a highly complex rumen microbial community.

April 21, 2020  |  

Development of a metabolic pathway transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii.

Clostridium spp. can synthesize valuable chemicals and fuels by utilizing diverse waste-stream substrates, including starchy biomass, lignocellulose, and industrial waste gases. However, metabolic engineering in Clostridium spp. is challenging due to the low efficiency of gene transfer and genomic integration of entire biosynthetic pathways.We have developed a reliable gene transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii based on the conjugal transfer of donor plasmids containing large transgene cassettes (>?5 kb) followed by the inducible activation of Himar1 transposase to promote integration. We established a conjugation protocol for the efficient generation of transconjugants using the Gram-positive origins of replication repL and repH. We also investigated the impact of DNA methylation on conjugation efficiency by testing donor constructs with all possible combinations of Dam and Dcm methylation patterns, and used bisulfite conversion and PacBio sequencing to determine the DNA methylation profile of the C. ljungdahlii genome, resulting in the detection of four sequence motifs with N6-methyladenosine. As proof of concept, we demonstrated the transfer and genomic integration of a heterologous acetone biosynthesis pathway using a Himar1 transposase system regulated by a xylose-inducible promoter. The functionality of the integrated pathway was confirmed by detecting enzyme proteotypic peptides and the formation of acetone and isopropanol by C. ljungdahlii cultures utilizing syngas as a carbon and energy source.The developed multi-gene delivery system offers a versatile tool to integrate and stably express large biosynthetic pathways in the industrial promising syngas-fermenting microorganism C. ljungdahlii. The simple transfer and stable integration of large gene clusters (like entire biosynthetic pathways) is expanding the range of possible fermentation products of heterologously expressing recombinant strains. We also believe that the developed gene delivery system can be adapted to other clostridial strains as well.

April 21, 2020  |  

Progression of the canonical reference malaria parasite genome from 2002-2019.

Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpretating intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will be continued to be maintained with ongoing curation ensuring continual improvements in annotation quality.

October 23, 2019  |  

Accurate identification and quantification of DNA species by next-generation sequencing in adeno-associated viral vectors produced in insect cells.

Recombinant adeno-associated viral (rAAV) vectors have proven excellent tools for the treatment of many genetic diseases and other complex diseases. However, the illegitimate encapsidation of DNA contaminants within viral particles constitutes a major safety concern for rAAV-based therapies. Moreover, the development of rAAV vectors for early-phase clinical trials has revealed the limited accuracy of the analytical tools used to characterize these new and complex drugs. Although most published data concerning residual DNA in rAAV preparations have been generated by quantitative PCR, we have developed a novel single-strand virus sequencing (SSV-Seq) method for quantification of DNA contaminants in AAV vectors produced in mammalian cells by next-generation sequencing (NGS). Here, we describe the adaptation of SSV-Seq for the accurate identification and quantification of DNA species in rAAV stocks produced in insect cells. We found that baculoviral DNA was the most abundant contaminant, representing less than 2.1% of NGS reads regardless of serotype (2, 8, or rh10). Sf9 producer cell DNA was detected at low frequency (=0.03%) in rAAV lots. Advanced computational analyses revealed that (1) baculoviral sequences close to the inverted terminal repeats preferentially underwent illegitimate encapsidation, and (2) single-nucleotide variants were absent from the rAAV genome. The high-throughput sequencing protocol described here enables effective DNA quality control of rAAV vectors produced in insect cells, and is adapted to conform with regulatory agency safety requirements.

September 22, 2019  |  

Comparison of the mitochondrial genomes and steady state transcriptomes of two strains of the trypanosomatid parasite, Leishmania tarentolae.

U-insertion/deletion RNA editing is a post-transcriptional mitochondrial RNA modification phenomenon required for viability of trypanosomatid parasites. Small guide RNAs encoded mainly by the thousands of catenated minicircles contain the information for this editing. We analyzed by NGS technology the mitochondrial genomes and transcriptomes of two strains, the old lab UC strain and the recently isolated LEM125 strain. PacBio sequencing provided complete minicircle sequences which avoided the assembly problem of short reads caused by the conserved regions. Minicircles were identified by a characteristic size, the presence of three short conserved sequences, a region of inherently bent DNA and the presence of single gRNA genes at a fairly defined location. The LEM125 strain contained over 114 minicircles encoding different gRNAs and the UC strain only ~24 minicircles. Some LEM125 minicircles contained no identifiable gRNAs. Approximate copy numbers of the different minicircle classes in the network were determined by the number of PacBio CCS reads that assembled to each class. Mitochondrial RNA libraries from both strains were mapped against the minicircle and maxicircle sequences. Small RNA reads mapped to the putative gRNA genes but also to multiple regions outside the genes on both strands and large RNA reads mapped in many cases over almost the entire minicircle on both strands. These data suggest that minicircle transcription is complete and bidirectional, with 3′ processing yielding the mature gRNAs. Steady state RNAs in varying abundances are derived from all maxicircle genes, including portions of the repetitive divergent region. The relative extents of editing in both strains correlated with the presence of a cascade of cognate gRNAs. These data should provide the foundation for a deeper understanding of this dynamic genetic system as well as the evolutionary variation of editing in different strains.

September 22, 2019  |  

Robust and effective methodologies for cryopreservation and DNA extraction from anaerobic gut fungi.

Cell storage and DNA isolation are essential to developing an expanded suite of microorganisms for biotechnology. However, many features of non-model microbes, such as an anaerobic lifestyle and rigid cell wall, present formidable challenges to creating strain repositories and extracting high quality genomic DNA. Here, we establish accessible, high efficiency, and robust techniques to store lignocellulolytic anaerobic gut fungi long term without specialized equipment. Using glycerol as a cryoprotectant, gut fungal isolates were preserved for a minimum of 23 months at -80 °C. Unlike previously reported approaches, this improved protocol is non-toxic and rapid, with samples surviving twice as long with negligible growth impact. Genomic DNA extraction for these isolates was optimized to yield samples compatible with next generation sequencing platforms (e.g. Illumina, PacBio). Popular DNA isolation kits and precipitation protocols yielded preps that were unsuitable for sequencing due to carbohydrate contaminants from the chitin-rich cell wall and extensive energy reserves of gut fungi. To address this, we identified a proprietary method optimized for hardy plant samples that rapidly yielded DNA fragments in excess of 10 kb with minimal RNA, protein or carbohydrate contamination. Collectively, these techniques serve as fundamental tools to manipulate powerful biomass-degrading gut fungi and improve their accessibility among researchers. Copyright © 2015 Elsevier Ltd. All rights reserved.

September 22, 2019  |  

The third revolution in sequencing technology.

Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Copyright © 2018 Elsevier Ltd. All rights reserved.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.