Menu
April 21, 2020

Comparison of mitochondrial DNA variants detection using short- and long-read sequencing.

The recent advent of long-read sequencing technologies is expected to provide reasonable answers to genetic challenges unresolvable by short-read sequencing, primarily the inability to accurately study structural variations, copy number variations, and homologous repeats in complex parts of the genome. However, long-read sequencing comes along with higher rates of random short deletions and insertions, and single nucleotide errors. The relatively higher sequencing accuracy of short-read sequencing has kept it as the first choice of screening for single nucleotide variants and short deletions and insertions. Albeit, short-read sequencing still suffers from systematic errors that tend to occur at specific positions where a high depth of reads is not always capable to correct for these errors. In this study, we compared the genotyping of mitochondrial DNA variants in three samples using PacBio’s Sequel (Pacific Biosciences Inc., Menlo Park, CA, USA) long-read sequencing and illumina’s HiSeqX10 (illumine Inc., San Diego, CA, USA) short-read sequencing data. We concluded that, despite the differences in the type and frequency of errors in the long-reads sequencing, its accuracy is still comparable to that of short-reads for genotyping short nuclear variants; due to the randomness of errors in long reads, a lower coverage, around 37 reads, can be sufficient to correct for these random errors.


April 21, 2020

Paragraph: A graph-based structural variant genotyper for short-read sequence data

Accurate detection and genotyping of structural variations (SVs) from short-read data is a long-standing area of development in genomics research and clinical sequencing pipelines. We introduce Paragraph, a fast and accurate genotyper that models SVs using sequence graphs and SV annotations produced by a range of methods and technologies. We demonstrate the accuracy of Paragraph on whole genome sequence data from a control sample with both short and long read sequencing data available, and then apply it at scale to a cohort of 100 samples of diverse ancestry sequenced with short-reads. Comparative analyses indicate that Paragraph has better accuracy than other existing genotypers. The Paragraph software is open-source and available at ?https://github.com/Illumina/paragraph


April 21, 2020

First near complete haplotype phased genome assembly of River buffalo (Bubalus bubalis)

This study reports the first haplotype phased reference quality genome assembly of textquoteleftMurrahtextquoteright an Indian breed of river buffalo. A mother-father-progeny trio was used for sequencing so that the individual haplotypes could be assembled in the progeny. Parental DNA samples were sequenced on the Illumina platform to generate a total of 274 Gb paired-end data. The progeny DNA sample was sequenced using PacBio long reads and 10x Genomics linked reads at 166x coverage along with 802Gb of optical mapping data. Trio binning based FALCON assembly of each haplotype was scaffolded with 10x Genomics reads and superscaffolded with BioNano Maps to build reference quality assembly of sire and dam haplotypes of 2.63Gb and 2.64Gb with just 59 and 64 scaffolds and N50 of 81.98Mb and 83.23Mb, respectively. BUSCO single copy core gene set coverage was > 91.25%, and gVolante-CEGMA completeness was >96.14% for both haplotypes. Finally, RaGOO was used to order and build the chromosomal level assembly with 25 scaffolds and N50 of 117.48 Mb (sire haplotype) and 118.51 Mb (dam haplotype). The improved haplotype phased genome assembly of river buffalo may provide valuable resources to discover molecular mechanisms related to milk production and reproduction traits.


April 21, 2020

Hemimetabolous insects elucidate the origin of sexual development via alternative splicing

Insects are the only animals in which sexual differentiation is controlled by sex-specific RNA splicing. The doublesex (dsx) transcription factor produces distinct male and female protein isoforms (DsxM and DsxF) under the control of the RNA splicing factor transformer (tra). tra itself is also alternatively spliced so that a functional Tra protein is only present in females; thus, DsxM is produced by default, while DsxF expression requires Tra. The sex-specific Dsx isoforms are essential for both male and female sexual differentiation. This pathway is profoundly different from the molecular mechanisms that control sex-specific development in other animal groups. In animals as different as vertebrates, nematodes, and crustaceans, sexual differentiation involves male-specific transcription of dsx-related transcription factors that are not alternatively spliced and play no role in female sexual development. To understand how the unique splicing-based mode of sexual differentiation found in insects evolved from a more ancestral transcription-based mechanism, we examined dsx and tra expression in three basal, hemimetabolous insect orders. We find that functional Tra protein is limited to females in the kissing bug Rhodnius prolixus (Hemiptera), but is present in both sexes in the louse Pediculus humanus (Phthiraptera) and the cockroach Blattella germanica (Blattodea). Although alternatively spliced dsx isoforms are seen in all these insects, they are sex-specific in the cockroach and the kissing bug but not in the louse. In B. germanica, RNAi experiments show that dsx is necessary for male, but not female, sexual differentiation, while tra controls female development via a dsx-independent pathway. Our results suggest that the distinctive insect mechanism based on the tra-dsx splicing cascade evolved in a gradual, mosaic process: sex-specific splicing of dsx predates its role in female sexual differentiation, while the role of tra in regulating dsx splicing and in sexual development more generally predates sex-specific expression of the Tra protein. We present a model where the canonical tra-dsx axis originated via merger between expanding dsx function (from males to both sexes) and narrowing tra function (from a general splicing factor to the dedicated regulator of dsx).


April 21, 2020

Trochodendron aralioides, the first chromosome-level draft genome in Trochodendrales and a valuable resource for basal eudicot research

Background The wheel tree (Trochodendron aralioides) is one of only two species in the basal eudicot order Trochodendrales. Together with Tetracentron sinense, the family is unique in having secondary xylem without vessel elements, long considered to be a primitive character also found in Amborella and Winteraceae. Recent studies however have shown that Trochodendraceae belong to basal eudicots and demonstrate this represents an evolutionary reversal for the group. Trochodendron aralioides is widespread in cultivation and popular for use in gardens and parks. Findings We assembled the T. aralioides genome using a total of 679.56 Gb of clean reads that were generated using both PacBio and Illumina short-reads in combination with 10XGenomics and Hi-C data. Nineteen scaffolds corresponding to 19 chromosomes were assembled to a final size of 1.614 Gb with a scaffold N50 of 73.37 Mb in addition to 1,534 contigs. Repeat sequences accounted for 64.226% of the genome, and 35,328 protein-coding genes with an average of 5.09 exons per gene were annotated using de novo, RNA-seq, and homology-based approaches. According to a phylogenetic analysis of protein-coding genes, T. aralioides diverged in a basal position relatively to core eudicots, approximately 121.8-125.8 million years ago. Conclusions Trochodendron aralioides is the first chromosome-scale genome assembled in the order Trochodendrales. It represents the largest genome assembled to date in the basal eudicot grade, as well as the closest order relative to the core-eudicots, as the position of Buxales remains unresolved. This genome will support further studies of wood morphology and floral evolution, and will be an essential resource for understanding rapid changes that took place at the base of the Eudicot tree. Finally, it can serve as a valuable source to aid both the acceleration of genome-assisted improvement for cultivation and conservation efforts of the wheel tree.


April 21, 2020

Convergent evolution of linked mating-type loci in basidiomycetes: an ancient fusion event that has stood the test of time

Sexual development is a key evolutionary innovation of eukaryotes. In many species, mating involves interaction between compatible mating partners that can undergo cell and nuclear fusion and subsequent steps of development including meiosis. Mating compatibility in fungi is governed by mating type determinants, which are localized at mating type (MAT) loci. In basidiomycetes, the ancestral state is hypothesized to be tetrapolar (bifactorial), with two genetically unlinked MAT loci containing homeodomain transcription factor genes (HD locus) and pheromone and pheromone receptor genes (P/R locus), respectively. Alleles at both loci must differ between mating partners for completion of sexual development. However, there are also basidiomycete species with bipolar (unifactorial) mating systems, which can arise through genomic linkage of the HD and P/R loci. In the order Tremellales, which is comprised of mostly yeast-like species, bipolarity is found only in the human pathogenic Cryptococcus species. Here, we describe the analysis of MAT loci from the Trichosporonales, a sister order to the Tremellales. We analyzed genome sequences from 29 strains that belong to 24 species, including two new genome sequences generated in this study. Interestingly, in all of the species analyzed, the MAT loci are fused and a single HD gene is present in each mating type. This is similar to the organization in the pathogenic Cryptococci, which also have linked MAT loci and carry only one HD gene per MAT locus instead of the usual two HD genes found in the vast majority of basidiomycetes. However, the HD and P/R allele combinations in the Trichosporonales are different from those in the pathogenic Cryptococcus species. The differences in allele combinations compared to the bipolar Cryptococci as well as the existence of tetrapolar Tremellales sister species suggest that fusion of the HD and P/R loci and differential loss of one of the two HD genes per MAT allele occurred independently in the Trichosporonales and pathogenic Cryptococci. This finding supports the hypothesis of convergent evolution at the molecular level towards fused mating-type regions in fungi, similar to previous findings in other fungal groups. Unlike the fused MAT loci in several other basidiomycete lineages though, the gene content and gene order within the fused MAT loci are highly conserved in the Trichosporonales, and there is no apparent suppression of recombination extending from the MAT loci to adjacent chromosomal regions, suggesting different mechanisms for the evolution of physically linked MAT loci in these groups.


April 21, 2020

Antibiotic production is organized by a division of labour in Streptomyces

One of the hallmark behaviors of social groups is division of labour, where different group members become specialized to carry out complementary tasks. By dividing labour, cooperative groups of individuals increase their efficiency, thereby raising group fitness even if these specialized behaviors reduce the fitness of individual group members. Here we provide evidence that antibiotic production in colonies of the multicellular bacterium Streptomyces coelicolor is coordinated by a division of labour. We show that S. coelicolor colonies are genetically heterogeneous due to massive amplifications and deletions to the chromosome. Cells with gross chromosomal changes produce an increased diversity of secondary metabolites and secrete significantly more antibiotics; however, these changes come at the cost of dramatically reduced individual fitness, providing direct evidence for a trade-off between secondary metabolite production and fitness. Finally, we show that colonies containing mixtures of mutant strains and their parents produce significantly more antibiotics, while colony-wide spore production remains unchanged. Our work demonstrates that by generating mutants that are specialized to hyper-produce antibiotics, streptomycetes reduce the colony-wide fitness costs of secreted secondary metabolites while maximizing the yield and diversity of these products.


April 21, 2020

Multiple Long-read Sequencing Survey of Herpes Simplex Virus Lytic Transcriptome

Long-read sequencing (LRS) has become increasingly important in RNA research due to its strength in resolving complex transcriptomic architectures. In this regard, currently two LRS platforms have demonstrated adequate performance: the Single Molecule Real-Time Sequencing by Pacific Biosciences (PacBio) and the nanopore sequencing by Oxford Nanopore Technologies (ONT). Even though these techniques produce lower coverage and are more error prone than short-read sequencing, they continue to be more successful in identifying transcript isoforms including polycistronic and multi-spliced RNA molecules, as well as transcript overlaps. Recent reports have successfully applied LRS for the investigation of the transcriptome of viruses belonging to various families. These studies have substantially increased the number of previously known viral RNA molecules. In this work, we used the Sequel and MinION technique from PacBio and ONT, respectively, to characterize the lytic transcriptome of the herpes simplex virus type 1 (HSV-1). In most samples, we analyzed the poly(A) fraction of the transcriptome, but we also performed random oligonucleotide-based sequencing. Besides cDNA sequencing, we also carried out native RNA sequencing. Our investigations identified more than 160 previously undetected transcripts, including coding and non-coding RNAs, multi-splice transcripts, as well as polycistronic and complex transcripts. Furthermore, we determined previously unsubstantiated transcriptional start sites, polyadenylation sites, and splice sites. A large number of novel transcriptional overlaps were also detected. Random-primed sequencing revealed that each convergent gene pair produces non-polyadenylated read-through RNAs overlapping the partner genes. Furthermore, we identified novel replication-associated transcripts overlapping the HSV-1 replication origins, and novel LAT variants with very long 5’ regions, which are co-terminal with the LAT-0.7kb transcript. Overall, our results demonstrated that the HSV-1 transcripts form an extremely complex pattern of overlaps, and that entire viral genome is transcriptionally active. In most viral genes, if not in all, both DNA strands are expressed.


April 21, 2020

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.


April 21, 2020

Insights into the bacterial species and communities of a full-scale anaerobic/anoxic/oxic wastewater treatment plant by using third-generation sequencing.

For the first time, full-length 16S rRNA sequencing method was applied to disclose the bacterial species and communities of a full-scale wastewater treatment plant using an anaerobic/anoxic/oxic (A/A/O) process in Wuhan, China. The compositions of the bacteria at phylum and class levels in the activated sludge were similar to which revealed by Illumina Miseq sequencing. At genus and species levels, third-generation sequencing showed great merits and accuracy. Typical functional taxa classified to ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifying bacteria (DB), anaerobic ammonium oxidation bacteria (ANAMMOXB) and polyphosphate-accumulating organisms (PAOs) were presented, which were Nitrosomonas (1.11%), Nitrospira (3.56%), Pseudomonas (3.88%), Planctomycetes (13.80%), Comamonadaceae (1.83%), respectively. Pseudomonas (3.88%) and Nitrospira (3.56%) were the most predominating two genera, mainly containing Pseudomonas extremaustralis (1.69%), Nitrospira defluvii (3.13%), respectively. Bacteria regarding to nitrogen and phosphorus removal at species level were put forward. The predicted functions proved that the A/A/O process was efficient regarding nitrogen and organics removal. Copyright © 2019 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.


April 21, 2020

Variant Phasing and Haplotypic Expression from Single-molecule Long-read Sequencing in Maize

Haplotype phasing of genetic variants is important for interpretation of the maize genome, population genetic analysis, and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing full-length isoforms are essential for functional genomics study. In this study, we performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on single-molecule full-length cDNA sequencing. To phase and analyze full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short read data and identified cases of allele-specific, gene-level, and isoform-level expression. Our results revealed that maize parental and hybrid lines exhibit different splicing activities. After phasing 6,847 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, different novel isoforms between maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase power and accuracy in studies of allelic expression.


April 21, 2020

ORF Capture-Seq: a versatile method for targeted identification of full-length isoforms

Most human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.


April 21, 2020

IncC blaKPC-2-positive plasmid characterized from ST648 Escherichia coli.

This study describes the characterization of type 2 IncC plasmids pC-Ec20-KPC and pC-Ec2-KPC, carrying blaKPC-2 gene, from two multiresistant E. coli recovered in the University Hospital of Larissa, in 2018.Escherichia coli, Ec-2Lar and Ec-20Lar, were recovered from rectal swabs from two patients, during the monthly surveillance cultures. Transfer experiments by conjugation were carried out with E. coli recipients. blaKPC-carrying plasmids were characterized by S1 profiling. Isolates were typed by MLST. Whole bacterial genome was sequenced using the Sequel platform.Both E. coli isolates, belonging to ST648, transferred the blaKPC-2 to E. coli A15 laboratory strain by conjugation. Plasmid analysis revealed that the transconjugants harbored blaKPC-positive plasmids of different sizes. Analysis of plasmid sequences showed that, in both isolates, blaKPC-2 gene was carried on type 2 IncC plasmids pC-Ec20-KPC and pC-Ec2-KPC. Both plasmids carried the ARI-B resistance island, which consisted of several resistance genes, intact and truncated copies of several mobile elements, and a 25,571-bp segment harboring coding sequences for an iron transporter. The blaKPC-2 gene was part of the transposon Tn4401a, which was bounded by direct repeats of 5 bp (TCCTT) suggesting its transposition into the IncC plasmids.To our knowledge, this is the first report on complete nucleotide sequences of type 2 IncC plasmids. These findings, which hypothesize the acquisition of KPC-2-encoding transposon Tn4401a by an IncC replicon, indicate the ongoing need for molecular surveillance studies of MDR pathogens. Additionally, they underline the increasing clinical importance of the IncC plasmid family.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020

Complete genome sequences of pooled genomic DNA from 10 marine bacteria using PacBio long-read sequencing.

High-quality, completed genomes are important to understand the functions of marine bacteria. PacBio sequencing technology provides a powerful way to obtain high-quality completed genomes. However individual library production is currently still costly, limiting the utility of the PacBio system for high-throughput genomics. Here we investigate how to generate high-quality genomes from pooled marine bacterial genomes.Pooled genomic DNA from 10 marine bacteria were subjected to a single library production and sequenced with eight SMRT cells on the PacBio RS II sequencing platform. In total, 7.35 Gbp of long-read data was generated, which is equivalent to an approximate 168× average coverage for the input genomes. Genome assembly showed that eight genomes with average nucleotide identities (ANI) lower than 91.4% can be assembled with high-quality and completion using standard assembly algorithms (e.g. HGAP or Canu). A reference-based reads phasing step was developed and incorporated to assemble the complete genomes of the remaining two marine bacteria that had an ANI?>?97% and whose initial assemblies were highly fragmented.Ten complete high-quality genomes of marine bacteria were generated. The findings and developments made here, including the reference-based read phasing approach for the assembly of highly similar genomes, can be used in the future to design strategies to sequence pooled genomes using long-read sequencing.Copyright © 2019. Published by Elsevier B.V.


April 21, 2020

Genome analysis and Hi-C assisted assembly of Elaeagnus angustifolia L., a deciduous tree belonging to Elaeagnaceae

Elaeagnus angustifolia L. is a deciduous tree of the Elaeagnaceae family. It is widely used in the study of abiotic stress tolerance in plants and for the improvement of desertification-affected land due to its characteristics of drought resistance, salt tolerance, cold resistance, wind resistance, and other environmental adaptation. Here, we report the complete genome sequencing using the Pacific Biosciences (PacBio) platform and Hi-C assisted assembly of E. angustifolia. A total of 44.27 Gb raw PacBio sequel reads were obtained after filtering out low-quality data, with an average length of 8.64 Kb. Assembly using Canu gave an assembly length of 781.09 Mb, with a contig N50 of 486.92 Kb. A total of 39.56 Gb of clean reads was obtained, with a sequencing coverage of 75×, and Q30 ratio > 95.46%. The 510.71 Mb genomic sequence was mapped to the chromosome, accounting for 96.94% of the total length of the sequence, and the corresponding number of sequences was 269, accounting for 45.83% of the total number of sequences. The genome sequence study of E. angustifolia can be a valuable source for the comparative genome analysis of the Elaeagnaceae family members, and can help to understand the evolutionary response mechanisms of the Elaeagnaceae to drought, salt, cold and wind resistance, and thereby provide effective theoretical support for the improvement of desertification-affected land.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.