Menu
September 22, 2019

De novo genome assembly of Oryza granulata reveals rapid genome expansion and adaptive evolution

The wild relatives of rice have adapted to different ecological environments and constitute a useful reservoir of agronomic traits for genetic improvement. Here we present the ~777?Mb de novo assembled genome sequence of Oryza granulata. Recent bursts of long-terminal repeat retrotransposons, especially RIRE2, led to a rapid twofold increase in genome size after O. granulata speciation. Universal centromeric tandem repeats are absent within its centromeres, while gypsy-type LTRs constitute the main centromere-specific repetitive elements. A total of 40,116 protein-coding genes were predicted in O. granulata, which is close to that of Oryza sativa. Both the copy number and function of genes involved in photosynthesis and energy production have undergone positive selection during the evolution of O. granulata, which might have facilitated its adaptation to the low light habitats. Together, our findings reveal the rapid genome expansion, distinctive centromere organization, and adaptive evolution of O. granulata.


September 22, 2019

HapCHAT: adaptive haplotype assembly for efficiently leveraging high coverage in long reads.

Haplotype assembly is the process of assigning the different alleles of the variants covered by mapped sequencing reads to the two haplotypes of the genome of a human individual. Long reads, which are nowadays cheaper to produce and more widely available than ever before, have been used to reduce the fragmentation of the assembled haplotypes since their ability to span several variants along the genome. These long reads are also characterized by a high error rate, an issue which may be mitigated, however, with larger sets of reads, when this error rate is uniform across genome positions. Unfortunately, current state-of-the-art dynamic programming approaches designed for long reads deal only with limited coverages.Here, we propose a new method for assembling haplotypes which combines and extends the features of previous approaches to deal with long reads and higher coverages. In particular, our algorithm is able to dynamically adapt the estimated number of errors at each variant site, while minimizing the total number of error corrections necessary for finding a feasible solution. This allows our method to significantly reduce the required computational resources, allowing to consider datasets composed of higher coverages. The algorithm has been implemented in a freely available tool, HapCHAT: Haplotype Assembly Coverage Handling by Adapting Thresholds. An experimental analysis on sequencing reads with up to 60 × coverage reveals improvements in accuracy and recall achieved by considering a higher coverage with lower runtimes.Our method leverages the long-range information of sequencing reads that allows to obtain assembled haplotypes fragmented in a lower number of unphased haplotype blocks. At the same time, our method is also able to deal with higher coverages to better correct the errors in the original reads and to obtain more accurate haplotypes as a result.HapCHAT is available at http://hapchat.algolab.eu under the GNU Public License (GPL).


September 22, 2019

Raising the stakes: Loss of efflux pump regulation decreases meropenem susceptibility in Burkholderia pseudomallei

Burkholderia pseudomallei, the causative agent of the high-mortality disease melioidosis, is a gram-negative bacterium that is naturally resistant to many antibiotics. There is no vaccine for melioidosis, and effective eradication is reliant on biphasic and prolonged antibiotic administration. The carbapenem drug meropenem is the current gold standard option for treating severe melioidosis. Intrinsic B. pseudomallei resistance toward meropenem has not yet been documented; however, resistance could conceivably develop over the course of infection, leading to prolonged sepsis and treatment failure.We examined our 30-year clinical collection of melioidosis cases to identify B. pseudomallei isolates with reduced meropenem susceptibility. Isolates were subjected to minimum inhibitory concentration (MIC) testing toward meropenem. Paired isolates from patients who had evolved decreased susceptibility were subjected to whole-genome sequencing. Select agent-compliant genetic manipulation was carried out to confirm the molecular mechanisms conferring resistance.We identified 11 melioidosis cases where B. pseudomallei isolates developed decreased susceptibility toward meropenem during treatment, including 2 cases not treated with this antibiotic. Meropenem MICs increased from 0.5-0.75 µg/mL to 3-8 µg/mL. Comparative genomics identified multiple mutations affecting multidrug resistance-nodulation-division (RND) efflux pump regulators, with concomitant overexpression of their corresponding pumps. All cases were refractory to treatment despite aggressive, targeted therapy, and 2 were associated with a fatal outcome.This study confirms the role of RND efflux pumps in decreased meropenem susceptibility in B. pseudomallei. These findings have important ramifications for the diagnosis, treatment, and management of life-threatening melioidosis cases.


September 22, 2019

RAD sequencing and a hybrid Antarctic fur seal genome assembly reveal rapidly decaying linkage disequilibrium, global population structure and evidence for inbreeding.

Recent advances in high throughput sequencing have transformed the study of wild organisms by facilitating the generation of high quality genome assemblies and dense genetic marker datasets. These resources have the potential to significantly advance our understanding of diverse phenomena at the level of species, populations and individuals, ranging from patterns of synteny through rates of linkage disequilibrium (LD) decay and population structure to individual inbreeding. Consequently, we used PacBio sequencing to refine an existing Antarctic fur seal (Arctocephalus gazella) genome assembly and genotyped 83 individuals from six populations using restriction site associated DNA (RAD) sequencing. The resulting hybrid genome comprised 6,169 scaffolds with an N50 of 6.21 Mb and provided clear evidence for the conservation of large chromosomal segments between the fur seal and dog (Canis lupus familiaris). Focusing on the most extensively sampled population of South Georgia, we found that LD decayed rapidly, reaching the background level by around 400 kb, consistent with other vertebrates but at odds with the notion that fur seals experienced a strong historical bottleneck. We also found evidence for population structuring, with four main Antarctic island groups being resolved. Finally, appreciable variance in individual inbreeding could be detected, reflecting the strong polygyny and site fidelity of the species. Overall, our study contributes important resources for future genomic studies of fur seals and other pinnipeds while also providing a clear example of how high throughput sequencing can generate diverse biological insights at multiple levels of organization. Copyright © 2018 Humble et al.


September 22, 2019

Three substrains of the cyanobacterium Anabaena sp. PCC 7120 display divergence in genomic sequences and hetC function.

Anabaena sp. strain PCC 7120 is a model strain for molecular studies of cell differentiation and patterning in heterocyst-forming cyanobacteria. Subtle differences in heterocyst development have been noticed in different laboratories working on the same organism. In this study, 360 mutations, including single nucleotide polymorphisms (SNPs), small insertion/deletions (indels; 1 to 3 bp), fragment deletions, and transpositions, were identified in the genomes of three substrains. Heterogeneous/heterozygous bases were also identified due to the polyploidy nature of the genome and the multicellular morphology but could be completely segregated when plated after filament fragmentation by sonication. hetC is a gene upregulated in developing cells during heterocyst formation in Anabaena sp. strain PCC 7120 and found in approximately half of other heterocyst-forming cyanobacteria. Inactivation of hetC in 3 substrains of Anabaena sp. PCC 7120 led to different phenotypes: the formation of heterocysts, differentiating cells that keep dividing, or the presence of both heterocysts and dividing differentiating cells. The expression of P hetZ -gfp in these hetC mutants also showed different patterns of green fluorescent protein (GFP) fluorescence. Thus, the function of hetC is influenced by the genomic background and epistasis and constitutes an example of evolution under way.IMPORTANCE Our knowledge about the molecular genetics of heterocyst formation, an important cell differentiation process for global N2 fixation, is mostly based on studies with Anabaena sp. strain PCC 7120. Here, we show that rapid microevolution is under way in this strain, leading to phenotypic variations for certain genes related to heterocyst development, such as hetC This study provides an example for ongoing microevolution, marked by multiple heterogeneous/heterozygous single nucleotide polymorphisms (SNPs), in a multicellular multicopy-genome microorganism. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Clonal emergence of invasive multidrug-resistant Staphylococcus epidermidis deconvoluted via a combination of whole-genome sequencing and microbiome analyses.

Pathobionts, bacteria that are typically human commensals but can cause disease, contribute significantly to antimicrobial resistance. Staphylococcus epidermidis is a prototypical pathobiont as it is a ubiquitous human commensal but also a leading cause of healthcare-associated bacteremia. We sought to determine the etiology of a recent increase in invasive S. epidermidis isolates resistant to linezolid.Whole-genome sequencing (WGS) was performed on 176 S. epidermidis bloodstream isolates collected at the MD Anderson Cancer Center in Houston, Texas, between 2013 and 2016. Molecular relationships were assessed via complementary phylogenomic approaches. Abundance of the linezolid resistance determinant cfr was determined in stool samples via reverse-transcription quantitative polymerase chain reaction.Thirty-nine of the 176 strains were linezolid resistant (22%). Thirty-one of the 39 linezolid-resistant S. epidermidis infections were caused by a particular clone resistant to multiple antimicrobials that spread among leukemia patients and carried cfr on a 49-kb plasmid (herein called pMB151a). The 6 kb of pMB151a surrounding the cfr gene was nearly 100% identical to a cfr-containing plasmid isolated from livestock-associated staphylococci in China. Analysis of serial stool samples from leukemia patients revealed progressive staphylococcal domination of the intestinal microflora and an increase in cfr abundance following linezolid use.The combination of linezolid use plus transmission of a multidrug-resistant clone drove expansion of invasive, linezolid-resistant S. epidermidis. Our results lend support to the notion that a combination of antibiotic stewardship plus infection control measures may help to control the spread of a multidrug-resistant pathobiont.


September 22, 2019

Biosynthesis of abscisic acid in fungi: identification of a sesquiterpene cyclase as the key enzyme in Botrytis cinerea.

While abscisic acid (ABA) is known as a hormone produced by plants through the carotenoid pathway, a small number of phytopathogenic fungi are also able to produce this sesquiterpene but they use a distinct pathway that starts with the cyclization of farnesyl diphosphate (FPP) into 2Z,4E-a-ionylideneethane which is then subjected to several oxidation steps. To identify the sesquiterpene cyclase (STC) responsible for the biosynthesis of ABA in fungi, we conducted a genomic approach in Botrytis cinerea. The genome of the ABA-overproducing strain ATCC58025 was fully sequenced and five STC-coding genes were identified. Among them, Bcstc5 exhibits an expression profile concomitant with ABA production. Gene inactivation, complementation and chemical analysis demonstrated that BcStc5/BcAba5 is the key enzyme responsible for the key step of ABA biosynthesis in fungi. Unlike what is observed for most of the fungal secondary metabolism genes, the key enzyme-coding gene Bcstc5/Bcaba5 is not clustered with the other biosynthetic genes, i.e., Bcaba1 to Bcaba4 that are responsible for the oxidative transformation of 2Z,4E-a-ionylideneethane. Finally, our study revealed that the presence of the Bcaba genes among Botrytis species is rare and that the majority of them do not possess the ability to produce ABA.© 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.


September 22, 2019

Variation in human chromosome 21 ribosomal RNA genes characterized by TAR cloning and long-read sequencing.

Despite the key role of the human ribosome in protein biosynthesis, little is known about the extent of sequence variation in ribosomal DNA (rDNA) or its pre-rRNA and rRNA products. We recovered ribosomal DNA segments from a single human chromosome 21 using transformation-associated recombination (TAR) cloning in yeast. Accurate long-read sequencing of 13 isolates covering ~0.82 Mb of the chromosome 21 rDNA complement revealed substantial variation among tandem repeat rDNA copies, several palindromic structures and potential errors in the previous reference sequence. These clones revealed 101 variant positions in the 45S transcription unit and 235 in the intergenic spacer sequence. Approximately 60% of the 45S variants were confirmed in independent whole-genome or RNA-seq data, with 47 of these further observed in mature 18S/28S rRNA sequences. TAR cloning and long-read sequencing enabled the accurate reconstruction of multiple rDNA units and a new, high-quality 44 838 bp rDNA reference sequence, which we have annotated with variants detected from chromosome 21 of a single individual. The large number of variants observed reveal heterogeneity in human rDNA, opening up the possibility of corresponding variations in ribosome dynamics.


September 22, 2019

De novo assembly of a young Drosophila Y chromosome using single-molecule sequencing and chromatin conformation capture.

While short-read sequencing technology has resulted in a sharp increase in the number of species with genome assemblies, these assemblies are typically highly fragmented. Repeats pose the largest challenge for reference genome assembly, and pericentromeric regions and the repeat-rich Y chromosome are typically ignored from sequencing projects. Here, we assemble the genome of Drosophila miranda using long reads for contig formation, chromatin interaction maps for scaffolding and short reads, and optical mapping and bacterial artificial chromosome (BAC) clone sequencing for consensus validation. Our assembly recovers entire chromosomes and contains large fractions of repetitive DNA, including about 41.5 Mb of pericentromeric and telomeric regions, and >100 Mb of the recently formed highly repetitive neo-Y chromosome. While Y chromosome evolution is typically characterized by global sequence loss and shrinkage, the neo-Y increased in size by almost 3-fold because of the accumulation of repetitive sequences. Our high-quality assembly allows us to reconstruct the chromosomal events that have led to the unusual sex chromosome karyotype in D. miranda, including the independent de novo formation of a pair of sex chromosomes at two distinct time points, or the reversion of a former Y chromosome to an autosome.


September 22, 2019

A high-quality genome sequence of Rosa chinensis to elucidate ornamental traits.

Rose is the world’s most important ornamental plant, with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants. Roses are outbred and can have various ploidy levels. Our objectives were to develop a high-quality reference genome sequence for the genus Rosa by sequencing a doubled haploid, combining long and short reads, and anchoring to a high-density genetic map, and to study the genome structure and genetic basis of major ornamental traits. We produced a doubled haploid rose line (‘HapOB’) from Rosa chinensis ‘Old Blush’ and generated a rose genome assembly anchored to seven pseudo-chromosomes (512?Mb with N50 of 3.4?Mb and 564 contigs). The length of 512?Mb represents 90.1-96.1% of the estimated haploid genome size of rose. Of the assembly, 95% is contained in only 196 contigs. The anchoring was validated using high-density diploid and tetraploid genetic maps. We delineated hallmark chromosomal features, including the pericentromeric regions, through annotation of transposable element families and positioned centromeric repeats using fluorescent in situ hybridization. The rose genome displays extensive synteny with the Fragaria vesca genome, and we delineated only two major rearrangements. Genetic diversity was analysed using resequencing data of seven diploid and one tetraploid Rosa species selected from various sections of the genus. Combining genetic and genomic approaches, we identified potential genetic regulators of key ornamental traits, including prickle density and the number of flower petals. A rose APETALA2/TOE homologue is proposed to be the major regulator of petal number in rose. This reference sequence is an important resource for studying polyploidization, meiosis and developmental processes, as we demonstrated for flower and prickle development. It will also accelerate breeding through the development of molecular markers linked to traits, the identification of the genes underlying them and the exploitation of synteny across Rosaceae.


September 22, 2019

High-quality assembly of the reference genome for scarlet sage, Salvia splendens, an economically important ornamental plant.

Salvia splendens Ker-Gawler, scarlet or tropical sage, is a tender herbaceous perennial widely introduced and seen in public gardens all over the world. With few molecular resources, breeding is still restricted to traditional phenotypic selection, and the genetic mechanisms underlying phenotypic variation remain unknown. Hence, a high-quality reference genome will be very valuable for marker-assisted breeding, genome editing, and molecular genetics.We generated 66 Gb and 37 Gb of raw DNA sequences, respectively, from whole-genome sequencing of a largely homozygous scarlet sage inbred line using Pacific Biosciences (PacBio) single-molecule real-time and Illumina HiSeq sequencing platforms. The PacBio de novo assembly yielded a final genome with a scaffold N50 size of 3.12 Mb and a total length of 808 Mb. The repetitive sequences identified accounted for 57.52% of the genome sequence, and ?54,008 protein-coding genes were predicted collectively with ab initio and homology-based gene prediction from the masked genome. The divergence time between S. splendens and Salvia miltiorrhiza was estimated at 28.21 million years ago (Mya). Moreover, 3,797 species-specific genes and 1,187 expanded gene families were identified for the scarlet sage genome.We provide the first genome sequence and gene annotation for the scarlet sage. The availability of these resources will be of great importance for further breeding strategies, genome editing, and comparative genomics among related species.


September 22, 2019

A graph-based approach to diploid genome assembly.

Constructing high-quality haplotype-resolved de novo assemblies of diploid genomes is important for revealing the full extent of structural variation and its role in health and disease. Current assembly approaches often collapse the two sequences into one haploid consensus sequence and, therefore, fail to capture the diploid nature of the organism under study. Thus, building an assembler capable of producing accurate and complete diploid assemblies, while being resource-efficient with respect to sequencing costs, is a key challenge to be addressed by the bioinformatics community.We present a novel graph-based approach to diploid assembly, which combines accurate Illumina data and long-read Pacific Biosciences (PacBio) data. We demonstrate the effectiveness of our method on a pseudo-diploid yeast genome and show that we require as little as 50× coverage Illumina data and 10× PacBio data to generate accurate and complete assemblies. Additionally, we show that our approach has the ability to detect and phase structural variants.https://github.com/whatshap/whatshap.Supplementary data are available at Bioinformatics online.


September 22, 2019

N6-methyladenine DNA modification in the human genome.

DNA N6-methyladenine (6mA) modification is the most prevalent DNA modification in prokaryotes, but whether it exists in human cells and whether it plays a role in human diseases remain enigmatic. Here, we showed that 6mA is extensively present in the human genome, and we cataloged 881,240 6mA sites accounting for ~0.051% of the total adenines. [G/C]AGG[C/T] was the most significantly associated motif with 6mA modification. 6mA sites were enriched in the coding regions and mark actively transcribed genes in human cells. DNA 6mA and N6-demethyladenine modification in the human genome were mediated by methyltransferase N6AMT1 and demethylase ALKBH1, respectively. The abundance of 6mA was significantly lower in cancers, accompanied by decreased N6AMT1 and increased ALKBH1 levels, and downregulation of 6mA modification levels promoted tumorigenesis. Collectively, our results demonstrate that DNA 6mA modification is extensively present in human cells and the decrease of genomic DNA 6mA promotes human tumorigenesis. Copyright © 2018 Elsevier Inc. All rights reserved.


September 22, 2019

Whole genome and transcriptome maps of the entirely black native Korean chicken breed Yeonsan Ogye.

Yeonsan Ogye (YO), an indigenous Korean chicken breed (Gallus gallus domesticus), has entirely black external features and internal organs. In this study, the draft genome of YO was assembled using a hybrid de novo assembly method that takes advantage of high-depth Illumina short reads (376.6X) and low-depth Pacific Biosciences (PacBio) long reads (9.7X).The contig and scaffold NG50s of the hybrid de novo assembly were 362.3 Kbp and 16.8 Mbp, respectively. The completeness (97.6%) of the draft genome (Ogye_1.1) was evaluated with single-copy orthologous genes using Benchmarking Universal Single-Copy Orthologs and found to be comparable to the current chicken reference genome (galGal5; 97.4%; contigs were assembled with high-depth PacBio long reads (50X) and scaffolded with short reads) and superior to other avian genomes (92%-93%; assembled with short read-only or hybrid methods). Compared to galGal4 and galGal5, the draft genome included 551 structural variations including the fibromelanosis (FM) locus duplication, related to hyperpigmentation. To comprehensively reconstruct transcriptome maps, RNA sequencing and reduced representation bisulfite sequencing data were analyzed from 20 tissues, including 4 black tissues (skin, shank, comb, and fascia). The maps included 15,766 protein-coding and 6,900 long noncoding RNA genes, many of which were tissue-specifically expressed and displayed tissue-specific DNA methylation patterns in the promoter regions.We expect that the resulting genome sequence and transcriptome maps will be valuable resources for studying domestic chicken breeds, including black-skinned chickens, as well as for understanding genomic differences between breeds and the evolution of hyperpigmented chickens and functional elements related to hyperpigmentation.


September 22, 2019

Comparative analysis reveals unexpected genome features of newly isolated Thraustochytrids strains: on ecological function and PUFAs biosynthesis.

Thraustochytrids are unicellular fungal-like marine protists with ubiquitous existence in marine environments. They are well-known for their ability to produce high-valued omega-3 polyunsaturated fatty acids (?-3-PUFAs) (e.g., docosahexaenoic acid (DHA)) and hydrolytic enzymes. Thraustochytrid biomass has been estimated to surpass that of bacterioplankton in both coastal and oceanic waters indicating they have an important role in microbial food-web. Nevertheless, the molecular pathway and regulatory network for PUFAs production and the molecular mechanisms underlying ecological functions of thraustochytrids remain largely unknown.The genomes of two thraustochytrids strains (Mn4 and SW8) with ability to produce DHA were sequenced and assembled with a hybrid sequencing approach utilizing Illumina short paired-end reads and Pacific Biosciences long reads to generate a highly accurate genome assembly. Phylogenomic and comparative genomic analyses found that DHA-producing thraustochytrid strains were highly similar and possessed similar gene content. Analysis of the conventional fatty acid synthesis (FAS) and the polyketide synthase (PKS) systems for PUFAs production only detected incomplete and fragmentary pathways in the genome of these two strains. Surprisingly, secreted carbohydrate active enzymes (CAZymes) were found to be significantly depleted in the genomes of these 2 strains as compared to other sequenced relatives. Furthermore, these two strains possess an expanded gene repertoire for signal transduction and self-propelled movement, which could be important for their adaptations to dynamic marine environments.Our results demonstrate the possibility of a third PUFAs synthesis pathway besides previously described FAS and PKS pathways encoded in the genome of these two thraustochytrid strains. Moreover, lack of a complete set of hydrolytic enzymatic machinery for degrading plant-derived organic materials suggests that these two DHA-producing strains play an important role as a nutritional source rather than a nutrient-producer in marine microbial-food web. Results of this study suggest the existence of two types of saprobic thraustochytrids in the world’s ocean. The first group, which does not produce cellulosic enzymes and live as ‘left-over’ scavenger of bacterioplankton, serves as a dietary source for the plankton of higher trophic levels and the other possesses capacity to live on detrital organic matters in the marine ecosystems.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.