July 7, 2019  |  

Genome sequence of the small brown planthopper, Laodelphax striatellus.

Laodelphax striatellus Fallén (Hemiptera: Delphacidae) is one of the most destructive rice pests. L. striatellus is different from 2 other rice planthoppers with a released genome sequence, Sogatella furcifera and Nilaparvata lugens, in many biological characteristics, such as host range, dispersal capacity, and vectoring plant viruses. Deciphering the genome of L. striatellus will further the understanding of the genetic basis of the biological differences among the 3 rice planthoppers.A total of 190 Gb of Illumina data and 32.4 Gb of Pacbio data were generated and used to assemble a high-quality L. striatellus genome sequence, which is 541 Mb in length and has a contig N50 of 118 Kb and a scaffold N50 of 1.08 Mb. Annotated repetitive elements account for 25.7% of the genome. A total of 17?736 protein-coding genes were annotated, capturing 97.6% and 98% of the BUSCO eukaryote and arthropoda genes, respectively. Compared with N. lugens and S. furcifera, L. striatellus has the smallest genome and the lowest gene number. Gene family expansion and transcriptomic analyses provided hints to the genomic basis of the differences in important traits such as host range, migratory habit, and plant virus transmission between L. striatellus and the other 2 planthoppers.We report a high-quality genome assembly of L. striatellus, which is an important genomic resource not only for the study of the biology of L. striatellus and its interactions with plant hosts and plant viruses, but also for comparison with other planthoppers.© The Authors 2017. Published by Oxford University Press.

July 7, 2019  |  

HISEA: HIerarchical SEed Aligner for PacBio data.

The next generation sequencing (NGS) techniques have been around for over a decade. Many of their fundamental applications rely on the ability to compute good genome assemblies. As the technology evolves, the assembly algorithms and tools have to continuously adjust and improve. The currently dominant technology of Illumina produces reads that are too short to bridge many repeats, setting limits on what can be successfully assembled. The emerging SMRT (Single Molecule, Real-Time) sequencing technique from Pacific Biosciences produces uniform coverage and long reads of length up to sixty thousand base pairs, enabling significantly better genome assemblies. However, SMRT reads are much more expensive and have a much higher error rate than Illumina’s – around 10-15% – mostly due to indels. New algorithms are very much needed to take advantage of the long reads while mitigating the effect of high error rate and lowering the required coverage.An essential step in assembling SMRT data is the detection of alignments, or overlaps, between reads. High error rate and very long reads make this a much more challenging problem than for Illumina data. We present a new pairwise read aligner, or overlapper, HISEA (Hierarchical SEed Aligner) for SMRT sequencing data. HISEA uses a novel two-step k-mer search, employing consistent clustering, k-mer filtering, and read alignment extension.We compare HISEA against several state-of-the-art programs – BLASR, DALIGNER, GraphMap, MHAP, and Minimap – on real datasets from five organisms. We compare their sensitivity, precision, specificity, F1-score, as well as time and memory usage. We also introduce a new, more precise, evaluation method. Finally, we compare the two leading programs, MHAP and HISEA, for their genome assembly performance in the Canu pipeline.Our algorithm has the best alignment detection sensitivity among all programs for SMRT data, significantly higher than the current best. The currently best assembler for SMRT data is the Canu program which uses the MHAP aligner in its pipeline. We have incorporated our new HISEA aligner in the Canu pipeline and benchmarked it against the best pipeline for multiple datasets at two relevant coverage levels: 30x and 50x. Our assemblies are better than those using MHAP for both coverage levels. Moreover, Canu+HISEA assemblies for 30x coverage are comparable with Canu+MHAP assemblies for 50x coverage, while being faster and cheaper.The HISEA algorithm produces alignments with highest sensitivity compared with the current state-of-the-art algorithms. Integrated in the Canu pipeline, currently the best for assembling PacBio data, it produces better assemblies than Canu+MHAP.

July 7, 2019  |  

Efficient transgenesis and annotated genome sequence of the regenerative flatworm model Macrostomum lignano.

Regeneration-capable flatworms are informative research models to study the mechanisms of stem cell regulation, regeneration, and tissue patterning. However, the lack of transgenesis methods considerably hampers their wider use. Here we report development of a transgenesis method for Macrostomum lignano, a basal flatworm with excellent regeneration capacity. We demonstrate that microinjection of DNA constructs into fertilized one-cell stage eggs, followed by a low dose of irradiation, frequently results in random integration of the transgene in the genome and its stable transmission through the germline. To facilitate selection of promoter regions for transgenic reporters, we assembled and annotated the M. lignano genome, including genome-wide mapping of transcription start regions, and show its utility by generating multiple stable transgenic lines expressing fluorescent proteins under several tissue-specific promoters. The reported transgenesis method and annotated genome sequence will permit sophisticated genetic studies on stem cells and regeneration using M. lignano as a model organism.

July 7, 2019  |  

Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs.

Leishmania parasites are the causative of leishmaniasis, a group of potentially fatal human diseases. Control strategies for leishmaniasis can be enhanced by genome based investigations. The publication in 2005 of the Leishmania major genome sequence, and two years later the genomes for the species Leishmania braziliensis and Leishmania infantum were major milestones. Since then, the L. infantum genome, although highly fragmented and incomplete, has been used widely as the reference genome to address whole transcriptomics and proteomics studies. Here, we report the sequencing of the L. infantum genome by two NGS methodologies and, as a result, the complete genome assembly on 36 contigs (chromosomes). Regarding the present L. infantum genome-draft, 495 new genes have been annotated, a hundred have been corrected and 75 previous annotated genes have been discontinued. These changes are not only the result of an increase in the genome size, but a significant contribution derives from the existence of a large number of incorrectly assembled regions in current chromosomal scaffolds. Furthermore, an improved assembly of tandemly repeated genes has been obtained. All these analyses support that the de novo assembled L. infantum genome represents a robust assembly and should replace the currently available in the databases.

July 7, 2019  |  

The genome of an intranuclear parasite, Paramicrosporidium saccamoebae, reveals alternative adaptations to obligate intracellular parasitism.

Intracellular parasitism often results in gene loss, genome reduction, and dependence upon the host for cellular functioning. Rozellomycota is a clade comprising many such parasites and is related to the diverse, highly reduced, animal parasites, Microsporidia. We sequenced the nuclear and mitochondrial genomes ofParamicrosporidium saccamoebae[Rozellomycota], an intranuclear parasite of amoebae. A canonical fungal mitochondrial genome was recovered fromP. saccamoebaethat encodes genes necessary for the complete oxidative phosphorylation pathway including Complex I, differentiating it from most endoparasites including its sequenced relatives in Rozellomycota and Microsporidia. Comparative analysis revealed thatP. saccamoebaeshares more gene content with distantly related Fungi than with its closest relatives, suggesting that genome evolution in Rozellomycota and Microsporidia has been affected by repeated and independent gene losses, possibly as a result of variation in parasitic strategies (e.g. host and subcellular localization) or due to multiple transitions to parasitism.

July 7, 2019  |  

A 3-way hybrid approach to generate a new high-quality chimpanzee reference genome (Pan_tro_3.0).

The chimpanzee is arguably the most important species for the study of human origins. A key resource for these studies is a high-quality reference genome assembly; however, as with most mammalian genomes, the current iteration of the chimpanzee reference genome assembly is highly fragmented. In the current iteration of the chimpanzee reference genome assembly (Pan_tro_2.1.4), the sequence is scattered across more then 183 000 contigs, incorporating more than 159 000 gaps, with a genome-wide contig N50 of 51 Kbp. In this work, we produce an extensive and diverse array of sequencing datasets to rapidly assemble a new chimpanzee reference that surpasses previous iterations in bases represented and organized in large scaffolds. To this end, we show substantial improvements over the current release of the chimpanzee genome (Pan_tro_2.1.4) by several metrics, such as increased contiguity by >750% and 300% on contigs and scaffolds, respectively, and closure of 77% of gaps in the Pan_tro_2.1.4 assembly gaps spanning >850 Kbp of the novel coding sequence based on RNASeq data. We further report more than 2700 genes that had putatively erroneous frame-shift predictions to human in Pan_tro_2.1.4 and show a substantial increase in the annotation of repetitive elements. We apply a simple 3-way hybrid approach to considerably improve the reference genome assembly for the chimpanzee, providing a valuable resource for the study of human origins. Furthermore, we produce extensive sequencing datasets that are all derived from the same cell line, generating a broad non-human benchmark dataset.© The Author 2017. Published by Oxford University Press.

July 7, 2019  |  

Genome mining of astaxanthin biosynthetic genes from Sphingomonas sp. ATCC 55669 for heterologous overproduction in Escherichia coli.

As a highly valued keto-carotenoid, astaxanthin is widely used in nutritional supplements and pharmaceuticals. Therefore, the demand for biosynthetic astaxanthin and improved efficiency of astaxanthin biosynthesis has driven the investigation of metabolic engineering of native astaxanthin producers and heterologous hosts. However, microbial resources for astaxanthin are limited. In this study, we found that the a-Proteobacterium Sphingomonas sp. ATCC 55669 could produce astaxanthin naturally. We used whole-genome sequencing to identify the astaxanthin biosynthetic pathway using a combined PacBio-Illumina approach. The putative astaxanthin biosynthetic pathway in Sphingomonas sp. ATCC 55669 was predicted. For further confirmation, a high-efficiency targeted engineering carotenoid synthesis platform was constructed in E. coli for identifying the functional roles of candidate genes. All genes involved in astaxanthin biosynthesis showed discrete distributions on the chromosome. Moreover, the overexpression of exogenous E. coli idi in Sphingomonas sp. ATCC 55669 increased astaxanthin production by 5.4-fold. This study described a new astaxanthin producer and provided more biosynthesis components for bioengineering of astaxanthin in the future. © 2015 The Authors. Biotechnology Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

July 7, 2019  |  

Timing, rates and spectra of human germline mutation.

Germline mutations are a driving force behind genome evolution and genetic disease. We investigated genome-wide mutation rates and spectra in multi-sibling families. The mutation rate increased with paternal age in all families, but the number of additional mutations per year differed by more than twofold between families. Meta-analysis of 6,570 mutations showed that germline methylation influences mutation rates. In contrast to somatic mutations, we found remarkable consistency in germline mutation spectra between the sexes and at different paternal ages. In parental germ line, 3.8% of mutations were mosaic, resulting in 1.3% of mutations being shared by siblings. The number of these shared mutations varied significantly between families. Our data suggest that the mutation rate per cell division is higher during both early embryogenesis and differentiation of primordial germ cells but is reduced substantially during post-pubertal spermatogenesis. These findings have important consequences for the recurrence risks of disorders caused by de novo mutations.

July 7, 2019  |  

Complete genome sequence of Acinetobacter baumannii XH386 (ST208), a multi-drug resistant bacteria isolated from pediatric hospital in China.

Acinetobacter baumannii is an important bacterium that emerged as a significant nosocomial pathogen worldwide. The rise of A. baumannii was due to its multi-drug resistance (MDR), while it was difficult to treat multi-drug resistant A. baumannii with antibiotics, especially in pediatric patients for the therapeutic options with antibiotics were quite limited in pediatric patients. A. baumannii ST208 was identified as predominant sequence type of carbapenem resistant A. baumannii in the United States and China. As we knew, there was no complete genome sequence reproted for A. baumannii ST208, although several whole genome shotgun sequences had been reported. Here, we sequenced the 4087-kilobase (kb) chromosome and 112-kb plasmid of A. baumannii XH386 (ST208), which was isolated from a pediatric hospital in China. The genome of A. baumannii XH386 contained 3968 protein-coding genes and 94 RNA-only encoding genes. Genomic analysis and Minimum inhibitory concentration assay showed that A. baumannii XH386 was multi-drug resistant strain, which showed resistance to most of antibiotics, except for tigecycline. The data may be accessed via the GenBank accession number CP010779 and CP010780.

July 7, 2019  |  

MuffinEc: Error correction for de novo assembly via greedy partitioning and sequence alignment

Error correction is typically the first step of de novo genome assembly from NGS data. This step has an important impact on the quality and speed of the assembly process. However, the majority of available stand-alone error correction solutions can only detect and correct mismatches. Therefore, these solutions only support correcting reads generated by Illumina sequencers. Several solutions support insertions and deletions (indels) and are capable of working with multiple technologies. However, these solutions are limited by correction performance and resource consumption. In this paper, we introduce MuffinEc, an indel-aware multi-technology correction method for NGS data. This method uses a greedy approach to create groups of reads and subsequently corrects them using their consensus. MuffinEc surpasses existing solutions by offering better correction ratios for multiple technologies. This method also exploits parallel processing via OpenMP and uses less computational resources than similar programs, thereby being capable of handling large datasets. MuffinEc is open source and freely available at

July 7, 2019  |  

Long read and single molecule DNA sequencing simplifies genome assembly and TAL effector gene analysis of Xanthomonas translucens.

The species Xanthomonas translucens encompasses a complex of bacterial strains that cause diseases and yield loss on grass species including important cereal crops. Three pathovars, X. translucens pv. undulosa, X. translucens pv. translucens and X. translucens pv.cerealis, have been described as pathogens of wheat, barley, and oats. However, no complete genome sequence for a strain of this complex is currently available.A complete genome sequence of X. translucens pv. undulosa strain XT4699 was obtained by using PacBio long read, single molecule, real time (SMRT) DNA sequences and Illumina sequences. Draft genome sequences of nineteen additional X. translucens strains, which were collected from wheat or barley in different regions and at different times, were generated by Illumina sequencing. Phylogenetic relationships among different Xanthomonas strains indicates that X. translucens are members of a distinct clade from so-called group 2 xanthomonads and three pathovars of this species, undulosa, translucens and cerealis, represent distinct subclades in the group 1 clade. Knockout mutation of type III secretion system of XT4699 eliminated the ability to cause water-soaking symptoms on wheat and barley and resulted in a reduction in populations on wheat in comparison to the wild type strain. Sequence comparison of X. translucens strains revealed the genetic variation on type III effector repertories among different pathovars or within one pathovar. The full genome sequence of XT4699 reveals the presence of eight members of the Transcription-Activator Like (TAL) effector genes, which are phylogenetically distant from previous known TAL effector genes of group 2 xanthomonads. Microarray and qRT-PCR analyses revealed TAL effector-specific wheat gene expression modulation.PacBio long read sequencing facilitates the assembly of Xanthomonas genomes and the multiple TAL effector genes, which are difficult to assemble from short read platforms. The complete genome sequence of X. translucens pv. undulosa strain XT4699 and draft genome sequences of nineteen additional X. translucens strains provides a resource for further genetic analyses of pathogenic diversity and host range of the X. translucens species complex. TAL effectors of XT4699 strain play roles in modulating wheat host gene expressions.

July 7, 2019  |  

Whole genome sequence of the emerging oomycete pathogen Pythium insidiosum strain CDC-B5653 isolated from an infected human in the USA

Pythium insidiosum ATCC 200269 strain CDC-B5653, an isolate from necrotizing lesions on the mouth and eye of a 2-year-old boy in Memphis, Tennessee, USA, was sequenced using a combination of Illumina MiSeq (300 bp paired-end, 14 millions reads) and PacBio (10 Kb fragment library, 356,001 reads). The sequencing data were assembled using SPAdes version 3.1.0, yielding a total genome size of 45.6 Mb contained in 8992 contigs, N50 of 13 Kb, 57% G + C content, and 17,867 putative protein-coding genes. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JRHR00000000.

July 7, 2019  |  

Read-based phasing of related individuals.

Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually.We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2× for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15× coverage per individual.© The Author 2016. Published by Oxford University Press.

Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.