April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

RNA sequencing: the teenage years.

Over the past decade, RNA sequencing (RNA-seq) has become an indispensable tool for transcriptome-wide analysis of differential gene expression and differential splicing of mRNAs. However, as next-generation sequencing technologies have developed, so too has RNA-seq. Now, RNA-seq methods are available for studying many different aspects of RNA biology, including single-cell gene expression, translation (the translatome) and RNA structure (the structurome). Exciting new applications are being explored, such as spatial transcriptomics (spatialomics). Together with new long-read and direct RNA-seq technologies and better computational tools for data analysis, innovations in RNA-seq are contributing to a fuller understanding of RNA biology, from questions such as when and where transcription occurs to the folding and intermolecular interactions that govern RNA function.


April 21, 2020  |  

Morphological and genomic characterisation of the hybrid schistosome infecting humans in Europe reveals a complex admixture between Schistosoma haematobium and Schistosoma bovis parasites

Schistosomes cause schistosomiasis, the worldtextquoterights second most important parasitic disease after malaria. A peculiar feature of schistosomes is their ability to produce viable and fertile hybrids. Originally only present in the tropics, schistosomiasis is now also endemic in Europe. Based on two genetic markers the European species had been identified as a hybrid between the ruminant-infective Schistosoma bovis and the human-infective Schistosoma haematobium.Here we describe for the first time the genomic composition of the European schistosome hybrid (77% of S. haematobium and 23% of S. bovis origins), its morphometric parameters and its compatibility with the European vector snail and intermediate host Compatibility is a key parameter for the parasites life cycle progression. We also show that egg morphology (a classical diagnostic parameter) does not allow for differential diagnosis while genetic tests do so. Additionally, we performed genome assembly improvement and annotation of S. bovis, the parental species for which no satisfactory genome assembly was available.For the first time since the discovery of hybrid schistosomes, these results reveal at the whole genomic level a complex admixture of parental genomes highlighting (i) the high permeability of schistosomes to other speciestextquoteright alleles, and (ii) the importance of hybrid formation for pushing species boundaries not only conceptionally but also geographically.


April 21, 2020  |  

A microbial factory for defensive kahalalides in a tripartite marine symbiosis.

Chemical defense against predators is widespread in natural ecosystems. Occasionally, taxonomically distant organisms share the same defense chemical. Here, we describe an unusual tripartite marine symbiosis, in which an intracellular bacterial symbiont (“Candidatus Endobryopsis kahalalidefaciens”) uses a diverse array of biosynthetic enzymes to convert simple substrates into a library of complex molecules (the kahalalides) for chemical defense of the host, the alga Bryopsis sp., against predation. The kahalalides are subsequently hijacked by a third partner, the herbivorous mollusk Elysia rufescens, and employed similarly for defense. “Ca E. kahalalidefaciens” has lost many essential traits for free living and acts as a factory for kahalalide production. This interaction between a bacterium, an alga, and an animal highlights the importance of chemical defense in the evolution of complex symbioses.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020  |  

Resequencing the Genome of Malassezia restricta Strain KCTC 27527.

The draft genome sequence of Malassezia restricta KCTC 27527, a clinical isolate from a patient with dandruff, was previously reported. Using the PacBio Sequel platform, we completed and reannotated the genome of M. restricta KCTC 27527 for a better understanding of the genome of this fungus.Copyright © 2019 Cho et al.


April 21, 2020  |  

Chromosomal-level assembly of the blolsod clam, Scapharca (Anadara) broughtonii, using long sequence reads and Hi-C.

The blood clam, Scapharca (Anadara) broughtonii, is an economically and ecologically important marine bivalve of the family Arcidae. Efforts to study their population genetics, breeding, cultivation, and stock enrichment have been somewhat hindered by the lack of a reference genome. Herein, we report the complete genome sequence of S. broughtonii, a first reference genome of the family Arcidae.A total of 75.79 Gb clean data were generated with the Pacific Biosciences and Oxford Nanopore platforms, which represented approximately 86× coverage of the S. broughtonii genome. De novo assembly of these long reads resulted in an 884.5-Mb genome, with a contig N50 of 1.80 Mb and scaffold N50 of 45.00 Mb. Genome Hi-C scaffolding resulted in 19 chromosomes containing 99.35% of bases in the assembled genome. Genome annotation revealed that nearly half of the genome (46.1%) is composed of repeated sequences, while 24,045 protein-coding genes were predicted and 84.7% of them were annotated.We report here a chromosomal-level assembly of the S. broughtonii genome based on long-read sequencing and Hi-C scaffolding. The genomic data can serve as a reference for the family Arcidae and will provide a valuable resource for the scientific community and aquaculture sector. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

A draft nuclear-genome assembly of the acoel flatworm Praesagittifera naikaiensis.

Acoels are primitive bilaterians with very simple soft bodies, in which many organs, including the gut, are not developed. They provide platforms for studying molecular and developmental mechanisms involved in the formation of the basic bilaterian body plan, whole-body regeneration, and symbiosis with photosynthetic microalgae. Because genomic information is essential for future research on acoel biology, we sequenced and assembled the nuclear genome of an acoel, Praesagittifera naikaiensis.To avoid sequence contamination derived from symbiotic microalgae, DNA was extracted from embryos that were free of algae. More than 290x sequencing coverage was achieved using a combination of Illumina (paired-end and mate-pair libraries) and PacBio sequencing. RNA sequencing and Iso-Seq data from embryos, larvae, and adults were also obtained. First, a preliminary ~17-kilobase pair (kb) mitochondrial genome was assembled, which was deleted from the nuclear sequence assembly. As a result, a draft nuclear genome assembly was ~656 Mb in length, with a scaffold N50 of 117 kb and a contig N50 of 57 kb. Although ~70% of the assembled sequences were likely composed of repetitive sequences that include DNA transposons and retrotransposons, the draft genome was estimated to contain 22,143 protein-coding genes, ~99% of which were substantiated by corresponding transcripts. We could not find horizontally transferred microalgal genes in the acoel genome. Benchmarking Universal Single-Copy Orthologs analyses indicated that 77% of the conserved single-copy genes were complete. Pfam domain analyses provided a basic set of gene families for transcription factors and signaling molecules.Our present sequencing and assembly of the P. naikaiensis nuclear genome are comparable to those of other metazoan genomes, providing basic information for future studies of genic and genomic attributes of this animal group. Such studies may shed light on the origins and evolution of simple bilaterians. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

A hybrid de novo assembly of the sea pansy (Renilla muelleri) genome.

More than 3,000 species of octocorals (Cnidaria, Anthozoa) inhabit an expansive range of environments, from shallow tropical seas to the deep-ocean floor. They are important foundation species that create coral “forests,” which provide unique niches and 3-dimensional living space for other organisms. The octocoral genus Renilla inhabits sandy, continental shelves in the subtropical and tropical Atlantic and eastern Pacific Oceans. Renilla is especially interesting because it produces secondary metabolites for defense, exhibits bioluminescence, and produces a luciferase that is widely used in dual-reporter assays in molecular biology. Although several anthozoan genomes are currently available, the majority of these are hexacorals. Here, we present a de novo assembly of an azooxanthellate shallow-water octocoral, Renilla muelleri.We generated a hybrid de novo assembly using MaSuRCA v.3.2.6. The final assembly included 4,825 scaffolds and a haploid genome size of 172 megabases (Mb). A BUSCO assessment found 88% of metazoan orthologs present in the genome. An Augustus ab initio gene prediction found 23,660 genes, of which 66% (15,635) had detectable similarity to annotated genes from the starlet sea anemone, Nematostella vectensis, or to the Uniprot database. Although the R. muelleri genome may be smaller (172 Mb minimum size) than other publicly available coral genomes (256-448 Mb), the R. muelleri genome is similar to other coral genomes in terms of the number of complete metazoan BUSCOs and predicted gene models.The R. muelleri hybrid genome provides a novel resource for researchers to investigate the evolution of genes and gene families within Octocorallia and more widely across Anthozoa. It will be a key resource for future comparative genomics with other corals and for understanding the genomic basis of coral diversity. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020  |  

Whole-genome sequence of the oriental lung fluke Paragonimus westermani.

Foodborne infections caused by lung flukes of the genus Paragonimus are a significant and widespread public health problem in tropical areas. Approximately 50 Paragonimus species have been reported to infect animals and humans, but Paragonimus westermani is responsible for the bulk of human disease. Despite their medical and economic importance, no genome sequence for any Paragonimus species is available.We sequenced and assembled the genome of P. westermani, which is among the largest of the known pathogen genomes with an estimated size of 1.1 Gb. A 922.8 Mb genome assembly was generated from Illumina and Pacific Biosciences (PacBio) sequence data, covering 84% of the estimated genome size. The genome has a high proportion (45%) of repeat-derived DNA, particularly of the long interspersed element and long terminal repeat subtypes, and the expansion of these elements may explain some of the large size. We predicted 12,852 protein coding genes, showing a high level of conservation with related trematode species. The majority of proteins (80%) had homologs in the human liver fluke Opisthorchis viverrini, with an average sequence identity of 64.1%. Assembly of the P. westermani mitochondrial genome from long PacBio reads resulted in a single high-quality circularized 20.6 kb contig. The contig harbored a 6.9 kb region of non-coding repetitive DNA comprised of three distinct repeat units. Our results suggest that the region is highly polymorphic in P. westermani, possibly even within single worm isolates.The generated assembly represents the first Paragonimus genome sequence and will facilitate future molecular studies of this important, but neglected, parasite group.


April 21, 2020  |  

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.


April 21, 2020  |  

Expedited assessment of terrestrial arthropod diversity by coupling Malaise traps with DNA barcoding 1.

Monitoring changes in terrestrial arthropod communities over space and time requires a dramatic increase in the speed and accuracy of processing samples that cannot be achieved with morphological approaches. The combination of DNA barcoding and Malaise traps allows expedited, comprehensive inventories of species abundance whose cost will rapidly decline as high-throughput sequencing technologies advance. Aside from detailing protocols from specimen sorting to data release, this paper describes their use in a survey of arthropod diversity in a national park that examined 21?194 specimens representing 2255 species. These protocols can support arthropod monitoring programs at regional, national, and continental scales.


April 21, 2020  |  

The Genome of Armadillidium vulgare (Crustacea, Isopoda) Provides Insights into Sex Chromosome Evolution in the Context of Cytoplasmic Sex Determination.

The terrestrial isopod Armadillidium vulgare is an original model to study the evolution of sex determination and symbiosis in animals. Its sex can be determined by ZW sex chromosomes, or by feminizing Wolbachia bacterial endosymbionts. Here, we report the sequence and analysis of the ZW female genome of A. vulgare. A distinguishing feature of the 1.72 gigabase assembly is the abundance of repeats (68% of the genome). We show that the Z and W sex chromosomes are essentially undifferentiated at the molecular level and the W-specific region is extremely small (at most several hundreds of kilobases). Our results suggest that recombination suppression has not spread very far from the sex-determining locus, if at all. This is consistent with A. vulgare possessing evolutionarily young sex chromosomes. We characterized multiple Wolbachia nuclear inserts in the A. vulgare genome, none of which is associated with the W-specific region. We also identified several candidate genes that may be involved in the sex determination or sexual differentiation pathways. The A. vulgare genome serves as a resource for studying the biology and evolution of crustaceans, one of the most speciose and emblematic metazoan groups. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


April 21, 2020  |  

Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes.

The commercial release of third-generation sequencing technologies (TGSTs), giving long and ultra-long sequencing reads, has stimulated the development of new tools for assembling highly contiguous genome sequences with unprecedented accuracy across complex repeat regions. We survey here a wide range of emerging sequencing platforms and analytical tools for de novo assembly, provide background information for each of their steps, and discuss the spectrum of available options. Our decision tree recommends workflows for the generation of a high-quality genome assembly when used in combination with the specific needs and resources of a project.Copyright © 2019 Elsevier Ltd. All rights reserved.


April 21, 2020  |  

Mitochondrial DNA and their nuclear copies in the parasitic wasp Pteromalus puparum: A comparative analysis in Chalcidoidea.

Chalcidoidea (chalcidoid wasps) are an abundant and megadiverse insect group with both ecological and economical importance. Here we report a complete mitochondrial genome in Chalcidoidea from Pteromalus puparum (Pteromalidae). Eight tandem repeats followed by 6 reversed repeats were detected in its 3308?bp control region. This long and complex control region may explain failures of amplifying and sequencing of complete mitochondrial genomes in some chalcidoids. In addition to 37 typical mitochondrial genes, an extra identical isoleucine tRNA (trnI) was detected at the opposite end of the control region. This recent mitochondrial gene duplication indicates that gene arrangements in chalcidoids are ongoing. A comparison among available chalcidoid mitochondrial genomes reveals rapid gene order rearrangements overall and high protein substitution rates in most chalcidoid taxa. In addition, we identified 24 nuclear sequences of mitochondrial origin (NUMTs) in P. puparum, summing up to 9989?bp, with 3617?bp of these NUMTs originating from mitochondrial coding regions. NUMTs abundance in P. puparum is only one-twelfth of that in its relative, Nasonia vitripennis. Based on phylogenetic analysis, we provide evidence that a faster nuclear degradation rate contributes to the reduced NUMT numbers in P. puparum. Overall, our study shows unusually high rates of mitochondrial evolution and considerable variation in NUMT accumulation in Chalcidoidea. Copyright © 2018. Published by Elsevier B.V.


April 21, 2020  |  

Mitochondrial genome characterization of Melipona bicolor: Insights from the control region and gene expression data.

The stingless bee Melipona bicolor is the only bee in which true polygyny occurs. Its mitochondrial genome was first sequenced in 2008, but it was incomplete and no information about its transcription was known. We combined short and long reads of M. bicolor DNA with RNASeq data to obtain insights about mitochondrial evolution and gene expression in bees. The complete genome has 15,001?bp, including a control region of 255?bp that contains all conserved structures described in honeybees with the highest AT content reported so far for bees (98.1%), displaying a compact but functional region. Gene expression control is similar to other insects however unusual patterns of expression may suggest the existence of different isoforms for the mitochondrially encoded 12S rRNA. Results reveal unique and shared features of the mitochondrial genome in terms of sequence evolution and gene expression making M. bicolor an interesting model to study mitochondrial genomic evolution. Copyright © 2019 Elsevier B.V. All rights reserved.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.