Menu
July 7, 2019

The draft genome of whitefly Bemisia tabaci MEAM1, a global crop pest, provides novel insights into virus transmission, host adaptation, and insecticide resistance.

The whitefly Bemisia tabaci (Hemiptera: Aleyrodidae) is among the 100 worst invasive species in the world. As one of the most important crop pests and virus vectors, B. tabaci causes substantial crop losses and poses a serious threat to global food security. We report the 615-Mb high-quality genome sequence of B. tabaci Middle East-Asia Minor 1 (MEAM1), the first genome sequence in the Aleyrodidae family, which contains 15,664 protein-coding genes. The B. tabaci genome is highly divergent from other sequenced hemipteran genomes, sharing no detectable synteny. A number of known detoxification gene families, including cytochrome P450s and UDP-glucuronosyltransferases, are significantly expanded in B. tabaci. Other expanded gene families, including cathepsins, large clusters of tandemly duplicated B. tabaci-specific genes, and phosphatidylethanolamine-binding proteins (PEBPs), were found to be associated with virus acquisition and transmission and/or insecticide resistance, likely contributing to the global invasiveness and efficient virus transmission capacity of B. tabaci. The presence of 142 horizontally transferred genes from bacteria or fungi in the B. tabaci genome, including genes encoding hopanoid/sterol synthesis and xenobiotic detoxification enzymes that are not present in other insects, offers novel insights into the unique biological adaptations of this insect such as polyphagy and insecticide resistance. Interestingly, two adjacent bacterial pantothenate biosynthesis genes, panB and panC, have been co-transferred into B. tabaci and fused into a single gene that has acquired introns during its evolution.The B. tabaci genome contains numerous genetic novelties, including expansions in gene families associated with insecticide resistance, detoxification and virus transmission, as well as numerous horizontally transferred genes from bacteria and fungi. We believe these novelties likely have shaped B. tabaci as a highly invasive polyphagous crop pest and efficient vector of plant viruses. The genome serves as a reference for resolving the B. tabaci cryptic species complex, understanding fundamental biological novelties, and providing valuable genetic information to assist the development of novel strategies for controlling whiteflies and the viruses they transmit.


July 7, 2019

Complete genome sequence of Rothia aeria type strain JCM 11412, isolated from air in the Russian space laboratory Mir.

Here, we present the complete genome sequence of Rothia aeria type strain JCM 11412, isolated from air in the Russian space laboratory Mir. Recently, there has been an increasing number of reports on infections caused by R. aeria The genomic information will enable researchers to identify the pathogenicity of this organism. Copyright © 2016 Nambu et al.


July 7, 2019

Complete genome sequence of Streptococcus sp. strain NPS 308.

Streptococcus sp. strain NPS 308, isolated from an 8-year-old girl diagnosed with infective endocarditis, likely presents a novel species of Streptococcus Here, we present a complete genome sequence of this species, which will contribute to better understanding of the pathogenesis of infective endocarditis. Copyright © 2016 Kondo et al.


July 7, 2019

Whole genome sequence and comparative genomics of the novel Lyme borreliosis causing pathogen, Borrelia mayonii.

Borrelia mayonii, a Borrelia burgdorferi sensu lato (Bbsl) genospecies, was recently identified as a cause of Lyme borreliosis (LB) among patients from the upper midwestern United States. By microscopy and PCR, spirochete/genome loads in infected patients were estimated at 105 to 106 per milliliter of blood. Here, we present the full chromosome and plasmid sequences of two B. mayonii isolates, MN14-1420 and MN14-1539, cultured from blood of two of these patients. Whole genome sequencing and assembly was conducted using PacBio long read sequencing (Pacific Biosciences RSII instrument) followed by hierarchical genome-assembly process (HGAP). The B. mayonii genome is ~1.31 Mbp in size (26.9% average GC content) and is comprised of a linear chromosome, 8 linear and 7 circular plasmids. Consistent with its taxonomic designation as a new Bbsl genospecies, the B. mayonii linear chromosome shares only 93.83% average nucleotide identity with other genospecies. Both B. mayonii genomes contain plasmids similar to B. burgdorferi sensu stricto lp54, lp36, lp28-3, lp28-4, lp25, lp17, lp5, 5 cp32s, cp26, and cp9. The vls locus present on lp28-10 of B. mayonii MN14-1420 is remarkably long, being comprised of 24 silent vls cassettes. Genetic differences between the two B. mayonii genomes are limited and include 15 single nucleotide variations as well as 7 fewer silent vls cassettes and a lack of the lp5 plasmid in MN14-1539. Notably, 68 homologs to proteins present in B. burgdorferi sensu stricto appear to be lacking from the B. mayonii genomes. These include the complement inhibitor, CspZ (BB_H06), the fibronectin binding protein, BB_K32, as well as multiple lipoproteins and proteins of unknown function. This study shows the utility of long read sequencing for full genome assembly of Bbsl genomes, identifies putative genome regions of B. mayonii that may be linked to clinical manifestation or tissue tropism, and provides a valuable resource for pathogenicity, diagnostic and vaccine studies.


July 7, 2019

Genome sequence and comparative pathogenic determinants of multidrug resistant uropathogenic Escherichia coli O25b: H4, A clinical isolate from Saudi Arabia

Escherichia coli serotype O25b:H4 is involved in human urinary tract infections.In this study, we sequenced and analyzed E. coli O25b:H4 isolated from a patient sufferingfrom recurring UTI infections in an intensive care unit at Hera General Hospital inMakkah, Saudi Arabia. We aimed to determine the virulence genes for pathogenesis anddrug resistance of this isolate compared to other E. coli strains. We sequenced and analyzedthe E. coli O25b:H4 Saudi strain clinical isolate using next generation sequencing. Usingthe ERGO genome analysis platform, we performed annotations and identified virulenceand antibiotic resistance determinants of this clinical isolate. The E. coli O25b:H4 genomewas assembled into four contigs representing a total chromosome size of 5.28 Mb, andthree contigs were identified, including a 130.9 kb (virulence plasmid) contig bearing thebla-CTX gene and 32 kb and 29 kb contigs. In comparing this genome to otheruropathogenic E. coli genomes, we identified unique drug resistance and pathogenicityfactors. In this work, whole-genome sequencing and targeted comparative analysis of aclinical isolate of uropathogenic Escherichia coli O25b:H4 was performed. This strainencodes virulence genes linked with extraintestinal pathogenic E. coli (ExPEC) that areexpressed constitutively in E. coli ST131. We identified the genes responsible forpathogenesis and drug resistance and performed comparative analyses of the virulenceand antibiotic resistance determinants with those of other E. coli UPEC isolates. This isthe first report of genome sequencing and analysis of a UPEC strain from Saudi Arabia.


July 7, 2019

Colib’read on galaxy: a tools suite dedicated to biological information extraction from raw NGS reads

With next-generation sequencing (NGS) technologies, the life sciences face a deluge of raw data. Classical analysis processes for such data often begin with an assembly step, needing large amounts of computing resources, and potentially removing or modifying parts of the biological information contained in the data. Our approach proposes to focus directly on biological questions, by considering raw unassembled NGS data, through a suite of six command-line tools.


July 7, 2019

First complete genome sequence of a subdivision 6 Acidobacterium strain.

Although ubiquitous and abundant in soils, acidobacteria have mostly escaped isolation and remain poorly investigated. Only a few cultured representatives and just eight genomes of subdivisions 1, 3, and 4 are available to date. Here, we determined the complete genome sequence of strain HEG_-6_39, the first genome of Acidobacterium subdivision 6. Copyright © 2016 Huang et al.


July 7, 2019

Improved assembly of noisy long reads by k-mer validation.

Genome assembly depends critically on read length. Two recent technologies, from Pacific Biosciences (PacBio) and Oxford Nanopore, produce read lengths >20 kb, which yield de novo genome assemblies with vastly greater contiguity than those based on Sanger, Illumina, or other technologies. However, the very high error rates of these two new technologies (~15% per base) makes assembly imprecise at repeats longer than the read length and computationally expensive. Here we show that the contiguity and quality of the assembly of these noisy long reads can be significantly improved at a minimal cost, by leveraging on the low error rate and low cost of Illumina short reads. Namely, k-mers from the PacBio raw reads that are not present in Illumina reads (which account for ~95% of the distinct k-mers) are deemed sequencing errors and ignored at the seed alignment step. By focusing on the ~5% of k-mers that are error free, read overlap sensitivity is dramatically increased. Of equal importance, the validation procedure can be extended to exclude repetitive k-mers, which prevents read miscorrection at repeats and further improves the resulting assemblies. We tested the k-mer validation procedure using one long-read technology (PacBio) and one assembler (MHAP/Celera Assembler), but it is very likely to yield analogous improvements with alternative long-read technologies and assemblers, such as Oxford Nanopore and BLASR/DALIGNER/Falcon, respectively.© 2016 Carvalho et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Jabba: hybrid error correction for long sequencing reads.

Third generation sequencing platforms produce longer reads with higher error rates than second generation technologies. While the improved read length can provide useful information for downstream analysis, underlying algorithms are challenged by the high error rate. Error correction methods in which accurate short reads are used to correct noisy long reads appear to be attractive to generate high-quality long reads. Methods that align short reads to long reads do not optimally use the information contained in the second generation data, and suffer from large runtimes. Recently, a new hybrid error correcting method has been proposed, where the second generation data is first assembled into a de Bruijn graph, on which the long reads are then aligned.In this context we present Jabba, a hybrid method to correct long third generation reads by mapping them on a corrected de Bruijn graph that was constructed from second generation data. Unique to our method is the use of a pseudo alignment approach with a seed-and-extend methodology, using maximal exact matches (MEMs) as seeds. In addition to benchmark results, certain theoretical results concerning the possibilities and limitations of the use of MEMs in the context of third generation reads are presented.Jabba produces highly reliable corrected reads: almost all corrected reads align to the reference, and these alignments have a very high identity. Many of the aligned reads are error-free. Additionally, Jabba corrects reads using a very low amount of CPU time. From this we conclude that pseudo alignment with MEMs is a fast and reliable method to map long highly erroneous sequences on a de Bruijn graph.


July 7, 2019

Identification of the fluvirucin B2 (Sch 38518) biosynthetic gene cluster from Actinomadura fulva subsp. indica ATCC 53714: substrate specificity of the ß-amino acid selective adenylating enzyme FlvN.

Fluvirucins are 14-membered macrolactam polyketides that show antifungal and antivirus activities. Fluvirucins have the ß-alanine starter unit at their polyketide skeletons. To understand the construction mechanism of the ß-alanine moiety in fluvirucin biosyntheses, we have identified the biosynthetic cluster of fluvirucin B2 produced from Actinomadura fulva subsp. indica ATCC 53714. The identified gene cluster contains three polyketide synthases, four characteristic ß-amino acid-carrying enzymes, one decarboxylase, and one amidohydrolase. We next investigated the activity of the adenylation enzyme FlvN, which is a key enzyme for the selective incorporation of a ß-amino acid substrate. FlvN showed strong preference for l-aspartate over other amino acids such as ß-alanine. Based on these results, we propose a biosynthetic pathway for fluvirucin B2.


July 7, 2019

A simple thermoplastic substrate containing hierarchical silica lamellae for high-molecular-weight DNA extraction.

An inexpensive, magnetic thermoplastic nanomaterial is developed utilizing a hierarchical layering of micro- and nanoscale silica lamellae to create a high-surface-area and low-shear substrate capable of capturing vast amounts of ultrahigh-molecular-weight DNA. Extraction is performed via a simple 45 min process and is capable of achieving binding capacities up to 1 000 000 times greater than silica microparticles. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.


July 7, 2019

Information-optimal genome assembly via sparse read-overlap graphs.

In the context of third-generation long-read sequencing technologies, read-overlap-based approaches are expected to play a central role in the assembly step. A fundamental challenge in assembling from a read-overlap graph is that the true sequence corresponds to a Hamiltonian path on the graph, and, under most formulations, the assembly problem becomes NP-hard, restricting practical approaches to heuristics. In this work, we avoid this seemingly fundamental barrier by first setting the computational complexity issue aside, and seeking an algorithm that targets information limits In particular, we consider a basic feasibility question: when does the set of reads contain enough information to allow unambiguous reconstruction of the true sequence?Based on insights from this information feasibility question, we present an algorithm-the Not-So-Greedy algorithm-to construct a sparse read-overlap graph. Unlike most other assembly algorithms, Not-So-Greedy comes with a performance guarantee: whenever information feasibility conditions are satisfied, the algorithm reduces the assembly problem to an Eulerian path problem on the resulting graph, and can thus be solved in linear time. In practice, this theoretical guarantee translates into assemblies of higher quality. Evaluations on both simulated reads from real genomes and a PacBio Escherichia coli K12 dataset demonstrate that Not-So-Greedy compares favorably with standard string graph approaches in terms of accuracy of the resulting read-overlap graph and contig N50.Available at github.com/samhykim/nsgcourtade@eecs.berkeley.edu or dntse@stanford.eduSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

CoLoRMap: Correcting Long Reads by Mapping short reads.

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormapehaghshe@sfu.ca or cedric.chauve@sfu.caSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.