Menu
September 22, 2019

Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads

Due to the large number of repetitive sequences in complex eukaryotic genomes, fragmented and incompletely assembled genomes lose value as reference sequences, often due to short contigs that cannot be anchored or mispositioned onto chromosomes. Here we report a novel method Highly Efficient Repeat Assembly (HERA), which includes a new concept called a connection graph as well as algorithms for constructing the graph. HERA resolves repeats at high efficiency with single-molecule sequencing data, and enables the assembly of chromosome-scale contigs by further integrating genome maps and Hi-C data. We tested HERA with the genomes of rice R498, maize B73, human HX1 and Tartary buckwheat Pinku1. HERA can correctly assemble most of the tandemly repetitive sequences in rice using single-molecule sequencing data only. Using the same maize and human sequencing data published by Jiao et al. (2017) and Shi et al. (2016), respectively, we dramatically improved on the sequence contiguity compared with the published assemblies, increasing the contig N50 from 1.3 Mb to 61.2 Mb in maize B73 assembly and from 8.3 Mb to 54.4 Mb in human HX1 assembly with HERA. We provided a high-quality maize reference genome with 96.9% of the gaps filled (only 76 gaps left) and several incorrectly positioned sequences fixed compared with the B73 RefGen_v4 assembly. Comparisons between the HERA assembly of HX1 and the human GRCh38 reference genome showed that many gaps in GRCh38 could be filled, and that GRCh38 contained some potential errors that could be fixed. We assembled the Pinku1 genome into 12 scaffolds with a contig N50 size of 27.85 Mb. HERA serves as a new genome assembly/phasing method to generate high quality sequences for complex genomes and as a curation tool to improve the contiguity and completeness of existing reference genomes, including the correction of assembly errors in repetitive regions.


September 22, 2019

Homogenization of sub-genome secretome gene expression patterns in the allodiploid fungus Verticillium longisporum

Allopolyploidization, genome duplication through interspecific hybridization, is an important evolutionary mechanism that can enable organisms to adapt to environmental changes or stresses. The increased adaptive potential of allopolyploids can be particularly relevant for plant pathogens in their ongoing quest for host immune response evasion. To this end, plant pathogens secrete a plethora of molecules that enable host colonization. Allodiploidization has resulted in the new plant pathogen Verticillium longisporum that infects different hosts than haploid Verticillium species. To reveal the impact of allodiploidization on plant pathogen evolution, we studied the genome and transcriptome dynamics of V. longisporum using next-generation sequencing. V. longisporum genome evolution is characterized by extensive chromosomal rearrangements, between as well as within parental chromosome sets, leading to a mosaic genome structure. In comparison to haploid Verticillium species, V. longisporum genes display stronger signs of positive selection. The expression patterns of the two sub-genomes show remarkable resemblance, suggesting that the parental gene expression patterns homogenized upon hybridization. Moreover, whereas V. longisporum genes encoding secreted proteins frequently display differential expression between the parental sub-genomes in culture medium, expression patterns homogenize upon plant colonization. Collectively, our results illustrate of the adaptive potential of allodiploidy mediated by the interplay of two sub-genomes. Author summary Hybridization followed by whole-genome duplication, so-called allopolyploidization, provides genomic flexibility that is beneficial for survival under stressful conditions or invasiveness into new habitats. Allopolyploidization has mainly been studied in plants, but also occurs in other organisms, including fungi. Verticillium longisporum, an emerging fungal pathogen on brassicaceous plants, arose by allodiploidization between two Verticillium spp. We used comparative genomics to reveal the plastic nature of the V. longisporum genomes, showing that parental chromosome sets recombined extensively, resulting in a mosaic genome pattern. Furthermore, we show that non-synonymous substitutions frequently occurred in V. longisporum. Moreover, we reveal that expression patterns of genes encoding secreted proteins homogenized between the V. longisporum sub-genomes upon plant colonization. In conclusion, our results illustrate the large adaptive potential upon genome hybridization for fungi mediated by genomic plasticity and interaction between sub-genomes.


September 22, 2019

Co-culture of soil biofilm isolates enables the discovery of novel antibiotics

Bacterial natural products (NPs) are considered to be a promising source of drug discovery. However, the biosynthesis gene clusters (BGCs) of NP are not often expressed, making it difficult to identify them. Recently, the study of biofilm community showed bacteria may gain competitive advantages by the secretion of antibiotics, implying a possible way to screen antibiotic by evaluating the social behavior of bacteria. In this study, we have described an efficient workflow for novel antibiotic discovery by employing the bacterial social interaction strategy with biofilm cultivation, co-culture, transcriptomic and genomic methods. We showed that a biofilm dominant species, i.e. Pseudomonas sp. G7, which was isolated from cultivated soil biofilm community, was highly competitive in four-species biofilm communities, as the synergistic combinations preferred to exclude this strain while the antagonistic combinations did not. Through the analysis of transcriptomic changes in four-species co-culture and the complete genome of Pseudomonas sp. G7, we finally discovered two novel non-ribosomal polypeptide synthetic (NRPS) BGCs, whose products were predicted to have seven and six amino acid components, respectively. Furthermore, we provide evidence showing that only when Pseudomonas sp. G7 was co-cultivated with at least two or three other bacterial species can these BGC genes be induced, suggesting that the co-culture of the soil biofilm isolates is critical to the discovery of novel antibiotics. As a conclusion, we set a model of applying microbial interaction to the discovery of new antibiotics.


September 22, 2019

genomeview – an extensible python-based genomics visualization engine

Visual inspection and analysis are integral to quality control, hypothesis generation, methods development and validation of genomic data. The richness and complexity of genomic data necessitates customized visualizations highlighting specific features of interest while hiding the often vast tide of irrelevant attributes. However, the majority of genome-visualization occurs either in general-purpose tools such as IGV or the UCSC Genome Browser — which offer many options to adjust visualization parameters, but very little in the way of extensibility — or narrowly-focused tools aiming to solve a single visualization problem. Here, we present genomeview, a python-based visualization engine which is easy to extend and simple to integrate into existing analysis pipelines.


September 22, 2019

Comparative genomic analysis of Bacillus thuringiensis reveals molecular adaptation to copper tolerance

Bacillus thuringiensis is a type of Gram positive and rod shaped bacterium that is found in a wide range of habitats. Despite the intensive studies conducted on this bacterium, most of the information available are related to its pathogenic characteristics, with only a limited number of publications mentioning its ability to survive in extreme environments. Recently, a B. thuringiensis MCMY1 strain was successfully isolated from a copper contaminated site in Mamut Copper Mine, Sabah. This study aimed to conduct a comparative genomic analysis by using the genome sequence of MCMY1 strain published in GenBank (PRJNA374601) as a target genome for comparison with other available B. thuringiensis genomes at the GenBank. Whole genome alignment, Fragment all-against-all comparison analysis, phylogenetic reconstruction and specific copper genes comparison were applied to all forty-five B. thuringiensis genomes to reveal the molecular adaptation to copper tolerance. The comparative results indicated that B. thuringiensis MCMY1 strain is closely related to strain Bt407 and strain IS5056. This strain harbors almost all available copper genes annotated from the forty-five B. thuringiensis genomes, except for the gene for Magnesium and cobalt efflux protein (CorC) which plays an indirect role in reducing the oxidative stress that caused by copper and other metal ions. Furthermore, the findings also showed that the Copper resistance gene family, CopABCDZ and its repressor (CsoR) are conserved in almost all sequenced genomes but the presence of the genes for Cytoplasmic copper homeostasis protein (CutC) and CorC across the sample genomes are highly inconsonant. The variation of these genes across the B. thuringiensis genomes suggests that each strain may have adapted to their specific ecological niche. However, further investigations will be need to support this preliminary hypothesis.


September 22, 2019

PBHoover and CigarRoller: a method for confident haploid variant calling on Pacific Biosciences data and its application to heterogeneous population analysis

Motivation: Single Molecule Real-Time (SMRT) sequencing has important and underutilized advantages that amplification-based platforms lack. Lack of systematic error (e.g. GC-bias), complete de novo assembly (including large repetitive regions) without scaffolding, can be mentioned. SMRT sequencing, however suffers from high random error rate and low sequencing depth (older chemistries). Here, we introduce PBHoover, software that uses a heuristic calling algorithm in order to make base calls with high certainty in low coverage regions. This software is also capable of mixed population detection with high sensitivity. PBHoovertextquoterights CigarRoller attachment improves sequencing depth in low-coverage regions through CIGAR-string correction. Results: We tested both modules on 348 M.tuberculosis clinical isolates sequenced on C1 or C2 chemistries. On average, CigarRoller improved percentage of usable read count from 68.9% to 99.98% in C1 runs and from 50% to 99% in C2 runs. Using the greater depth provided by CigarRoller, PBHoover was able to make base and variant calls 99.95% concordant with Sanger calls (QV33). PBHoover also detected antibiotic-resistant subpopulations that went undetected by Sanger. Using C1 chemistry, subpopulations as small as 9% of the total colony can be detected by PBHoover. This provides the most sensitive amplification-free molecular method for heterogeneity analysis and is in line with phenotypic methodstextquoteright sensitivity. This sensitivity significantly improves with the greater depth and lower error rate of the newer chemistries. Availability and Implementation: Executables are freely available under GNU GPL v3+ at http://www.gitlab.com/LPCDRP/pbhoover and http://www.gitlab.com/LPCDRP/CigarRoller. PBHoover is also available on bioconda: https://anaconda.org/bioconda/pbhoover.


September 22, 2019

Sequencing of Panax notoginseng genome reveals genes involved in disease resistance and ginsenoside biosynthesis

Background: Panax notoginseng is a traditional Chinese herb with high medicinal and economic value. There has been considerable research on the pharmacological activities of ginsenosides contained in Panax spp.; however, very little is known about the ginsenoside biosynthetic pathway. Results: We reported the first de novo genome of 2.36 Gb of sequences from P. notoginseng with 35,451 protein-encoding genes. Compared to other plants, we found notable gene family contraction of disease-resistance genes in P. notoginseng, but notable expansion for several ATP-binding cassette (ABC) transporter subfamilies, such as the Gpdr subfamily, indicating that ABCs might be an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, we identified several key genes, their transcription factor binding sites and all their family members involved in the synthesis pathway of ginsenosides in P. notoginseng, including dammarenediol synthase, CYP716 and UGT71. Conclusions: The complete genome analysis of P. notoginseng, the first in genus Panax, will serve as an important reference sequence for improving breeding and cultivation of this important nutraceutical and medicinal but vulnerable plant species.


September 22, 2019

A chromosome scale assembly of the model desiccation tolerant grass Oropetium thomaeum

Oropetium thomaeum is an emerging model for desiccation tolerance and genome size evolution in grasses. A high-quality draft genome of Oropetium was recently sequenced, but the lack of a chromosome scale assembly has hindered comparative analyses and downstream functional genomics. Here, we reassembled Oropetium, and anchored the genome into ten chromosomes using Hi-C based chromatin interactions. A combination of high-resolution RNAseq data and homology-based gene prediction identified thousands of new, conserved gene models that were absent from the V1 assembly. This includes thousands of new genes with high expression across a desiccation timecourse. The sorghum and Oropetium genomes have a surprising degree of chromosome-level collinearity, and several chromosome pairs have near perfect synteny. Other chromosomes are collinear in the gene rich chromosome arms but have experienced pericentric translocations. Together, these resources will be useful for the grass comparative genomic community and further establish Oropetium as a model resurrection plant.


September 22, 2019

Large scale changes in host methylation patterns induced by IncA/C plasmid transformation in Vibrio cholerae

DNA methylation is a central epigenetic modification and has diverse biological functions in eukaryotic and prokaryotic organisms alike. The IncA/C plasmid genomes are approximately 150kb in length and harbour three methylase genes, two of which demonstrate cytosine specificity. Transformation of the Vibrio cholerae strain C6706 with the IncA/C plasmid pVC211 resulted in a significant relabelling of the methylation patterns on the host chromosomes. The new methylation patterns induced by transformation with IncA/C plasmid were accepted by the restriction enzymes of the hosttextquoterights restriction modification (RM) system. These data uncover a novel mechanism by which plasmids can be compatible with a hosttextquoterights RM system and suggest a possible reason that plasmids of the IncA/C family are broad-host-range.


September 22, 2019

Genomic analysis for heavy metal resistance in S. maltophilia

Stenotrophomonas maltophilia is highly resistant to heavy metals, but the genetic knowledge of metal resistance in S. maltophilia is poorly understood. In this study, the genome of S. maltophilia Pho isolated from the contaminated soil near a metalwork factory was sequenced using PacBio RS II. Its genome is composed of a single chromosome with a GC content of 66.4% and 4434 protein-encoding genes. Comparative analysis revealed high syntney between S. maltophilia Pho and the model strain, S. maltophilia K279a. Then, the type and number of mechanisms of heavy metal uptake were analyzed firstly. Results showed that 7 unspecific ion transporter genes and 13 specific ion transporter genes, most of which were involved in iron transport. But the sulfate permeases belonging to the family of SulT/CysP that can uptake chromate and the high affinity ZnuABC/SitABCD were absent. Secondly, the putative genes controlling metal efflux were analyzed. Results showed that this bacterium encoded 5 CDFs, 1 copper exporting ATPase and 4 RND systems, including 2 CzcABC efflux pumps. Moreover, the putative metal transformation genes including arsenate and mercury detoxification genes were also identified. This study may provide useful information on the metal resistance mechanisms of S. maltophilia.


September 22, 2019

Structural variants exhibit allelic heterogeneity and shape variation in complex traits

Despite extensive effort to reveal the genetic basis of complex phenotypic variation, studies typically explain only a fraction of trait heritability. It has been hypothesized that individually rare hidden structural variants (SVs) could account for a significant fraction of variation in complex traits. To investigate this hypothesis, we assembled 14 Drosophila melanogaster genomes and systematically identified more than 20,000 euchromatic SVs, of which ~40% are invisible to high specificity short read genotyping approaches. SVs are common in Drosophila genes, with almost one third of diploid individuals harboring an SV in genes larger than 5kb, and nearly a quarter harboring multiple SVs in genes larger than 10kb. We show that SV alleles are rarer than amino acid polymorphisms, implying that they are more strongly deleterious. A number of functionally important genes harbor previously hidden structural variants that likely affect complex phenotypes (e.g., Cyp6g1, Drsl5, Cyp28d1&2, InR, and Gss1&2). Furthermore, SVs are overrepresented in quantitative trait locus candidate genes from eight Drosophila Synthetic Population Resource (DSPR) mapping experiments. We conclude that SVs are pervasive in genomes, are frequently present as heterogeneous allelic series, and can act as rare alleles of large effect.


September 22, 2019

De novo assembly, delivery and expression of a 101 kb human gene in mouse cells

Design and large-scale synthesis of DNA has been applied to the functional study of viral and microbial genomes. New and expanded technology development is required to unlock the transformative potential of such bottom-up approaches to the study of larger, mammalian genomes. Two major challenges include assembling and delivering long DNA sequences. Here we describe a pipeline for de novo DNA assembly and delivery that enables functional evaluation of mammalian genes on the length scale of 100 kb. The DNA assembly step is supported by an integrated robotic workcell. We assemble the 101 kb human HPRT1 gene in yeast, deliver it to mouse cells, and show expression of the human protein from its full-length gene. This pipeline provides a framework for producing systematic, designer variants of any mammalian gene locus for functional evaluation in cells.


September 22, 2019

Parliament2: Fast structural variant calling using optimized combinations of callers

Here we present Parliament2: a structural variant caller which combines multiple best-in-class structural variant callers to create a highly accurate callset. This captures more events than the individual callers achieve independently. Parliament2 uses a call-overlap-genotype approach that is highly extensible to new methods and presents users the choice to run some or all of Breakdancer, Breakseq, CNVnator, Delly, Lumpy, and Manta to run. Parliament2 applies an additional parallelization framework to speed certain callers and executes these in parallel, taking advantage of the different resource requirements to complete structural variant calling much faster than running the programs individually. Parliament2 is available as a Docker container, which pre-installs all required dependencies. This allows users to run any caller with easy installation and execution. This Docker container can easily be deployed in cloud or local environments and is available as an app on DNAnexus.


September 22, 2019

A homeobox gene, BarH-1, underlies a female alternative life-history strategy

Colias butterflies (the “clouded sulphurs”) often occur in mixed populations where females exhibit two color morphs, yellow/orange or white. White females, known as the Alba morph, reallocate resources from the synthesis of costly colored pigments to reproductive and somatic development 1. Due to this tradeoff Alba females develop faster and have higher fecundity than orange females 2. However orange females, that have instead invested in pigments, are preferred by males who in turn provide a nutrient rich spermatophore during mating 2,3,4. Thus the wing color morphs represent alternative life history strategies (ALHS) that are female-limited, wherein tradeoffs, due to divergent resource investment, result in distinct phenotypes with associated fitness consequences. Here we map the genetic basis of Alba in Colias crocea to a transposable element insertion downstream of the Colias homolog of BarH-1. To investigate the phenotypic effects of this insertion we use CRISPR/Cas9 to validate BarH-1’s functional role in the wing color switch and antibody staining to confirm expression differences in the scale building cells of pupal wings. We then use scanning electron microscopy to determine that BarH-1 expression in the wings causes a reduction in pigment granules within wing scales, and thereby gives rise to the white color. Finally, lipid and transcriptome analyses reveal additional physiological differences that arise due to Alba, suggesting pleiotropic effects beyond wing color. Together these findings provide the first well documented mechanism for a female ALHS and support an alternative view of color polymorphism as indicative of pleiotropic effects with life history consequences.


September 22, 2019

Antiviral adaptive immunity and tolerance in the mosquito Aedes aegyti

Mosquitoes spread pathogenic arboviruses while themselves tolerate infection. We here characterize an immunity pathway providing long-term antiviral protection and define how this pathway discriminates between self and non-self. Mosquitoes use viral RNAs to create viral derived cDNAs (vDNAs) central to the antiviral response. vDNA molecules are acquired through a process of reverse-transcription and recombination directed by endogenous retrotransposons. These vDNAs are thought to integrate in the host genome as endogenous viral elements (EVEs). Sequencing of pre-integrated vDNA revealed that the acquisition process exquisitely distinguishes viral from host RNA, providing one layer of self-nonself discrimination. Importantly, we show EVE-derived piRNAs have antiviral activity and are loaded onto Piwi4 to inhibit virus replication. In a second layer of self-non-self discrimination, Piwi4 preferentially loads EVE-derived piRNAs, discriminating against transposon-targeting piRNAs. Our findings define a fundamental virus-specific immunity pathway in mosquitoes that uses EVEs as a potent and specific antiviral transgenerational mechanism.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.