Menu
April 21, 2020

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020

Chlorella vulgaris genome assembly and annotation reveals the molecular basis for metabolic acclimation to high light conditions.

Chlorella vulgaris is a fast-growing fresh-water microalga cultivated at the industrial scale for applications ranging from food to biofuel production. To advance our understanding of its biology and to establish genetics tools for biotechnological manipulation, we sequenced the nuclear and organelle genomes of Chlorella vulgaris 211/11P by combining next generation sequencing and optical mapping of isolated DNA molecules. This hybrid approach allowed to assemble the nuclear genome in 14 pseudo-molecules with an N50 of 2.8 Mb and 98.9% of scaffolded genome. The integration of RNA-seq data obtained at two different irradiances of growth (high light-HL versus low light -LL) enabled to identify 10,724 nuclear genes, coding for 11,082 transcripts. Moreover 121 and 48 genes were respectively found in the chloroplast and mitochondrial genome. Functional annotation and expression analysis of nuclear, chloroplast and mitochondrial genome sequences revealed peculiar features of Chlorella vulgaris. Evidence of horizontal gene transfers from chloroplast to mitochondrial genome was observed. Furthermore, comparative transcriptomic analyses of LL vs HL provide insights into the molecular basis for metabolic rearrangement in HL vs. LL conditions leading to enhanced de novo fatty acid biosynthesis and triacylglycerol accumulation. The occurrence of a cytosolic fatty acid biosynthetic pathway can be predicted and its upregulation upon HL exposure is observed, consistent with increased lipid amount under HL. These data provide a rich genetic resource for future genome editing studies, and potential targets for biotechnological manipulation of Chlorella vulgaris or other microalgae species to improve biomass and lipid productivity.This article is protected by copyright. All rights reserved.


April 21, 2020

Insights into the bacterial species and communities of a full-scale anaerobic/anoxic/oxic wastewater treatment plant by using third-generation sequencing.

For the first time, full-length 16S rRNA sequencing method was applied to disclose the bacterial species and communities of a full-scale wastewater treatment plant using an anaerobic/anoxic/oxic (A/A/O) process in Wuhan, China. The compositions of the bacteria at phylum and class levels in the activated sludge were similar to which revealed by Illumina Miseq sequencing. At genus and species levels, third-generation sequencing showed great merits and accuracy. Typical functional taxa classified to ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifying bacteria (DB), anaerobic ammonium oxidation bacteria (ANAMMOXB) and polyphosphate-accumulating organisms (PAOs) were presented, which were Nitrosomonas (1.11%), Nitrospira (3.56%), Pseudomonas (3.88%), Planctomycetes (13.80%), Comamonadaceae (1.83%), respectively. Pseudomonas (3.88%) and Nitrospira (3.56%) were the most predominating two genera, mainly containing Pseudomonas extremaustralis (1.69%), Nitrospira defluvii (3.13%), respectively. Bacteria regarding to nitrogen and phosphorus removal at species level were put forward. The predicted functions proved that the A/A/O process was efficient regarding nitrogen and organics removal. Copyright © 2019 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.


April 21, 2020

Insect genomes: progress and challenges.

In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.


April 21, 2020

Microbial diversity in the tick Argas japonicus (Acari: Argasidae) with a focus on Rickettsia pathogens.

The soft tick Argas japonicus mainly infests birds and can cause human dermatitis; however, no pathogen has been identified from this tick species in China. In the present study, the microbiota in A. japonicus collected from an epidemic community was explored, and some putative Rickettsia pathogens were further characterized. The results obtained indicated that bacteria in A. japonicus were mainly ascribed to the phyla Proteobacteria, Firmicutes and Actinobacteria. At the genus level, the male A. japonicus harboured more diverse bacteria than the females and nymphs. The bacteria Alcaligenes, Pseudomonas, Rickettsia and Staphylococcus were common in nymphs and adults. The abundance of bacteria belonging to the Rickettsia genus in females and males was 7.27% and 10.42%, respectively. Furthermore, the 16S rRNA gene of Rickettsia was amplified and sequenced, and phylogenetic analysis revealed that 13 sequences were clustered with the spotted fever group rickettsiae (Rickettsia heilongjiangensis and Rickettsia japonica) and three were clustered with Rickettsia limoniae, which suggested that the characterized Rickettsia in A. japonicus were novel putative pathogens and also that the residents were at considerable risk for infection by tick-borne pathogens. © 2019 The Royal Entomological Society.


April 21, 2020

Virus-host coexistence in phytoplankton through the genomic lens

Phytoplankton-virus interactions are major determinants of geochemical cycles in the oceans. Viruses are responsible for the redirection of carbon and nutrients away from larger organisms back towards microorganisms via the lysis of microalgae in a process coined the “viral shunt”. Virus-host interactions are generally expected to follow “boom and bust” dynamics, whereby a numerically dominant strain is lysed and replaced by a virus resistant strain. Here, we isolated a microalga and its infective nucleo-cytoplasmic large DNA virus (NCLDV) concomitantly from the environment in the surface NW Mediterranean Sea, Ostreococcus mediterraneus, and show continuous growth in culture of both the microalga and the virus. Evolution experiments through single cell bottlenecks demonstrate that, in the absence of the virus, susceptible cells evolve from one ancestral resistant single cell, and vice-versa; that is that resistant cells evolve from one ancestral susceptible cell. This provides evidence that the observed sustained viral production is the consequence of a minority of virus-susceptible cells. The emergence of these cells is explained by low-level phase switching between virus-resistant and virus-susceptible phenotypes, akin to a bet hedging strategy. Whole genome sequencing and analysis of the ~14 Mb microalga and the ~200 kb virus points towards ancient speciation of the microalga within the Ostreococcus species complex and frequent gene exchanges between prasinoviruses infecting Ostreococcus species. Re-sequencing of one susceptible strain demonstrated that the phase switch involved a large 60 Kb deletion of one chromosome. This chromosome is an outlier chromosome compared to the streamlined, gene dense, GC-rich standard chromosomes, as it contains many repeats and few orthologous genes. While this chromosome has been described in three different genera, its size increments have been previously associated to antiviral immunity and resistance in another species from the same genus. Mathematical modelling of this mechanism predicts microalga-virus population dynamics consistent with the observation of continuous growth of both virus and microalga. Altogether, our results suggest a previously overlooked strategy in phytoplankton-virus interactions.


April 21, 2020

The replication-competent HIV-1 latent reservoir is primarily established near the time of therapy initiation.

Although antiretroviral therapy (ART) is highly effective at suppressing HIV-1 replication, the virus persists as a latent reservoir in resting CD4+ T cells during therapy. This reservoir forms even when ART is initiated early after infection, but the dynamics of its formation are largely unknown. The viral reservoirs of individuals who initiate ART during chronic infection are generally larger and genetically more diverse than those of individuals who initiate therapy during acute infection, consistent with the hypothesis that the reservoir is formed continuously throughout untreated infection. To determine when viruses enter the latent reservoir, we compared sequences of replication-competent viruses from resting peripheral CD4+ T cells from nine HIV-positive women on therapy to viral sequences circulating in blood collected longitudinally before therapy. We found that, on average, 71% of the unique viruses induced from the post-therapy latent reservoir were most genetically similar to viruses replicating just before ART initiation. This proportion is far greater than would be expected if the reservoir formed continuously and was always long lived. We conclude that ART alters the host environment in a way that allows the formation or stabilization of most of the long-lived latent HIV-1 reservoir, which points to new strategies targeted at limiting the formation of the reservoir around the time of therapy initiation.Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


April 21, 2020

Complete Genome Sequence of Enterococcus faecalis Strain SGAir0397, Isolated from a Tropical Air Sample Collected in Singapore.

Enterococcus faecalis strain SGAir0397 was isolated from a tropical air sample collected in Singapore. Its genome was assembled using single-molecule real-time sequencing data and comprises one circular chromosome with a length of 2.69 Mbp. The genome contains 2,595 protein-coding genes, 59 tRNAs, and 12 rRNAs.Copyright © 2019 Purbojati et al.


April 21, 2020

Relative Performance of MinION (Oxford Nanopore Technologies) versus Sequel (Pacific Biosciences) Third-Generation Sequencing Instruments in Identification of Agricultural and Forest Fungal Pathogens.

Culture-based molecular identification methods have revolutionized detection of pathogens, yet these methods are slow and may yield inconclusive results from environmental materials. The second-generation sequencing tools have much-improved precision and sensitivity of detection, but these analyses are costly and may take several days to months. Of the third-generation sequencing techniques, the portable MinION device (Oxford Nanopore Technologies) has received much attention because of its small size and possibility of rapid analysis at reasonable cost. Here, we compare the relative performances of two third-generation sequencing instruments, MinION and Sequel (Pacific Biosciences), in identification and diagnostics of fungal and oomycete pathogens from conifer (Pinaceae) needles and potato (Solanum tuberosum) leaves and tubers. We demonstrate that the Sequel instrument is efficient for metabarcoding of complex samples, whereas MinION is not suited for this purpose due to a high error rate and multiple biases. However, we find that MinION can be utilized for rapid and accurate identification of dominant pathogenic organisms and other associated organisms from plant tissues following both amplicon-based and PCR-free metagenomics approaches. Using the metagenomics approach with shortened DNA extraction and incubation times, we performed the entire MinION workflow, from sample preparation through DNA extraction, sequencing, bioinformatics, and interpretation, in 2.5 h. We advocate the use of MinION for rapid diagnostics of pathogens and potentially other organisms, but care needs to be taken to control or account for multiple potential technical biases.IMPORTANCE Microbial pathogens cause enormous losses to agriculture and forestry, but current combined culturing- and molecular identification-based detection methods are too slow for rapid identification and application of countermeasures. Here, we develop new and rapid protocols for Oxford Nanopore MinION-based third-generation diagnostics of plant pathogens that greatly improve the speed of diagnostics. However, due to high error rate and technical biases in MinION, the Pacific BioSciences Sequel platform is more useful for in-depth amplicon-based biodiversity monitoring (metabarcoding) from complex environmental samples.Copyright © 2019 American Society for Microbiology.


April 21, 2020

A Novel Bacteriophage Exclusion (BREX) System Encoded by the pglX Gene in Lactobacillus casei Zhang.

The bacteriophage exclusion (BREX) system is a novel prokaryotic defense system against bacteriophages. To our knowledge, no study has systematically characterized the function of the BREX system in lactic acid bacteria. Lactobacillus casei Zhang is a probiotic bacterium originating from koumiss. By using single-molecule real-time sequencing, we previously identified N6-methyladenine (m6A) signatures in the genome of L. casei Zhang and a putative methyltransferase (MTase), namely, pglX This work further analyzed the genomic locus near the pglX gene and identified it as a component of the BREX system. To decipher the biological role of pglX, an L. casei Zhang pglX mutant (?pglX) was constructed. Interestingly, m6A methylation of the 5′-ACRCAG-3′ motif was eliminated in the ?pglX mutant. The wild-type and mutant strains exhibited no significant difference in morphology or growth performance in de Man-Rogosa-Sharpe (MRS) medium. A significantly higher plasmid acquisition capacity was observed for the ?pglX mutant than for the wild type if the transformed plasmids contained pglX recognition sites (i.e., 5′-ACRCAG-3′). In contrast, no significant difference was observed in plasmid transformation efficiency between the two strains when plasmids lacking pglX recognition sites were tested. Moreover, the ?pglX mutant had a lower capacity to retain the plasmids than the wild type, suggesting a decrease in genetic stability. Since the Rebase database predicted that the L. casei PglX protein was bifunctional, as both an MTase and a restriction endonuclease, the PglX protein was heterologously expressed and purified but failed to show restriction endonuclease activity. Taken together, the results show that the L. casei Zhang pglX gene is a functional adenine MTase that belongs to the BREX system.IMPORTANCELactobacillus casei Zhang is a probiotic that confers beneficial effects on the host, and it is thus increasingly used in the dairy industry. The possession of an effective bacterial immune system that can defend against invasion of phages and exogenous DNA is a desirable feature for industrial bacterial strains. The bacteriophage exclusion (BREX) system is a recently described phage resistance system in prokaryotes. This work confirmed the function of the BREX system in L. casei and that the methyltransferase (pglX) is an indispensable part of the system. Overall, our study characterizes a BREX system component gene in lactic acid bacteria. Copyright © 2019 American Society for Microbiology.


April 21, 2020

A Highly Unusual V1 Region of Env in an Elite Controller of HIV Infection.

HIV elite controllers represent a remarkable minority of patients who maintain normal CD4+ T-cell counts and low or undetectable viral loads for decades in the absence of antiretroviral therapy. To examine the possible contribution of virus attenuation to elite control, we obtained a primary HIV-1 isolate from an elite controller who had been infected for 19?years, the last 10 of which were in the absence of antiretroviral therapy. Full-length sequencing of this isolate revealed a highly unusual V1 domain in Envelope (Env). The V1 domain in this HIV-1 strain was 49 amino acids, placing it in the top 1% of lengths among the 6,112 Env sequences in the Los Alamos National Laboratory online database. Furthermore, it included two additional N-glycosylation sites and a pair of cysteines suggestive of an extra disulfide loop. Virus with this Env retained good infectivity and replicative capacity; however, analysis of recombinant viruses suggested that other sequences in Env were adapted to accommodate the unusual V1 domain. While the long V1 domain did not confer resistance to neutralization by monoclonal antibodies of the V1/V2-glycan-dependent class, it did confer resistance to neutralization by monoclonal antibodies of the V3-glycan-dependent class. Our findings support results in the literature that suggest a role for long V1 regions in shielding HIV-1 from recognition by V3-directed broadly neutralizing antibodies. In the case of the elite controller described here, it seems likely that selective pressures from the humoral immune system were responsible for driving the highly unusual polymorphisms present in this HIV-1 Envelope.IMPORTANCE Elite controllers have long provided an avenue for researchers to reveal mechanisms underlying control of HIV-1. While the role of host genetic factors in facilitating elite control is well known, the possibility of infection by attenuated strains of HIV-1 has been much less studied. Here we describe an unusual viral feature found in an elite controller of HIV-1 infection and demonstrate its role in conferring escape from monoclonal antibodies of the V3-glycan class. Our results suggest that extreme variation may be needed by HIV-1 to escape neutralization by some antibody specificities. Copyright © 2019 Silver et al.


April 21, 2020

A chromosome-level draft genome of the grain aphid Sitobion miscanthi.

Sitobion miscanthi is an ideal model for studying host plant specificity, parthenogenesis-based phenotypic plasticity, and interactions between insects and other species of various trophic levels, such as viruses, bacteria, plants, and natural enemies. However, the genome information for this species has not yet to be sequenced and published. Here, we analyzed the entire genome of a parthenogenetic female aphid colony using Pacific Biosciences long-read sequencing and Hi-C data to generate chromosome-length scaffolds and a highly contiguous genome assembly.The final draft genome assembly from 33.88 Gb of raw data was ~397.90 Mb in size, with a 2.05 Mb contig N50. Nine chromosomes were further assembled based on Hi-C data to a 377.19 Mb final size with a 36.26 Mb scaffold N50. The identified repeat sequences accounted for 26.41% of the genome, and 16,006 protein-coding genes were annotated. According to the phylogenetic analysis, S. miscanthi is closely related to Acyrthosiphon pisum, with S. miscanthi diverging from their common ancestor ~25.0-44.9 million years ago.We generated a high-quality draft of the S. miscanthi genome. This genome assembly should help promote research on the lifestyle and feeding specificity of aphids and their interactions with each other and species at other trophic levels. It can serve as a resource for accelerating genome-assisted improvements in insecticide-resistant management and environmentally safe aphid management. © The Author(s) 2019. Published by Oxford University Press.


April 21, 2020

Medusavirus, a Novel Large DNA Virus Discovered from Hot Spring Water.

Recent discoveries of new large DNA viruses reveal high diversity in their morphologies, genetic repertoires, and replication strategies. Here, we report the novel features of medusavirus, a large DNA virus newly isolated from hot spring water in Japan. Medusavirus, with a diameter of 260?nm, shows a T=277 icosahedral capsid with unique spherical-headed spikes on its surface. It has a 381-kb genome encoding 461 putative proteins, 86 of which have their closest homologs in Acanthamoeba, whereas 279 (61%) are orphan genes. The virus lacks the genes encoding DNA topoisomerase II and RNA polymerase, showing that DNA replication takes place in the host nucleus, whereas the progeny virions are assembled in the cytoplasm. Furthermore, the medusavirus genome harbored genes for all five types of histones (H1, H2A, H2B, H3, and H4) and one DNA polymerase, which are phylogenetically placed at the root of the eukaryotic clades. In contrast, the host amoeba encoded many medusavirus homologs, including the major capsid protein. These facts strongly suggested that amoebae are indeed the most promising natural hosts of medusavirus, and that lateral gene transfers have taken place repeatedly and bidirectionally between the virus and its host since the early stage of their coevolution. Medusavirus reflects the traces of direct evolutionary interactions between the virus and eukaryotic hosts, which may be caused by sharing the DNA replication compartment and by evolutionarily long lasting virus-host relationships. Based on its unique morphological characteristics and phylogenomic relationships with other known large DNA viruses, we propose that medusavirus represents a new family, MedusaviridaeIMPORTANCE We have isolated a new nucleocytoplasmic large DNA virus (NCLDV) from hot spring water in Japan, named medusavirus. This new NCLDV is phylogenetically placed at the root of the eukaryotic clades based on the phylogenies of several key genes, including that encoding DNA polymerase, and its genome surprisingly encodes the full set of histone homologs. Furthermore, its laboratory host, Acanthamoeba castellanii, encodes many medusavirus homologs in its genome, including the major capsid protein, suggesting that the amoeba is the genuine natural host from ancient times of this newly described virus and that lateral gene transfers have repeatedly occurred between the virus and amoeba. These results suggest that medusavirus is a unique NCLDV preserving ancient footprints of evolutionary interactions with its hosts, thus providing clues to elucidate the evolution of NCLDVs, eukaryotes, and virus-host interaction. Based on the dissimilarities with other known NCLDVs, we propose that medusavirus represents a new viral family, Medusaviridae.Copyright © 2019 Yoshikawa et al.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.