Menu
July 7, 2019

Diversity oriented biosynthesis via accelerated evolution of modular gene clusters.

Erythromycin, avermectin and rapamycin are clinically useful polyketide natural products produced on modular polyketide synthase multienzymes by an assembly-line process in which each module of enzymes in turn specifies attachment of a particular chemical unit. Although polyketide synthase encoding genes have been successfully engineered to produce novel analogues, the process can be relatively slow, inefficient, and frequently low-yielding. We now describe a method for rapidly recombining polyketide synthase gene clusters to replace, add or remove modules that, with high frequency, generates diverse and highly productive assembly lines. The method is exemplified in the rapamycin biosynthetic gene cluster where, in a single experiment, multiple strains were isolated producing new members of a rapamycin-related family of polyketides. The process mimics, but significantly accelerates, a plausible mechanism of natural evolution for modular polyketide synthases. Detailed sequence analysis of the recombinant genes provides unique insight into the design principles for constructing useful synthetic assembly-line multienzymes.


July 7, 2019

Dense and accurate whole-chromosome haplotyping of individual genomes.

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.


July 7, 2019

Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing.

Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.


July 7, 2019

Convergence of plasmid architectures drives emergence of multi-drug resistance in a clonally diverse Escherichia coli population from a veterinary clinical care setting.

The purpose of this study was to determine the plasmid architecture and context of resistance genes in multi-drug resistant (MDR) Escherichia coli strains isolated from urinary tract infections in dogs. Illumina and single-molecule real-time (SMRT) sequencing were applied to assemble the complete genomes of E. coli strains associated with clinical urinary tract infections, which were either phenotypically MDR or drug susceptible. This revealed that multiple distinct families of plasmids were associated with building an MDR phenotype. Plasmid-mediated AmpC (CMY-2) beta-lactamase resistance was associated with a clonal group of IncI1 plasmids that has remained stable in isolates collected up to a decade apart. Other plasmids, in particular those with an IncF replicon type, contained other resistance gene markers, so that the emergence of these MDR strains was driven by the accumulation of multiple plasmids, up to 5 replicons in specific cases. This study indicates that vulnerable patients, often with complex clinical histories provide a setting leading to the emergence of MDR E. coli strains in clonally distinct commensal backgrounds. While it is known that horizontally-transferred resistance supplements uropathogenic strains of E. coli such as ST131, our study demonstrates that the selection of an MDR phenotype in commensal E. coli strains can result in opportunistic infections in vulnerable patient populations. These strains provide a reservoir for the onward transfer of resistance alleles into more typically pathogenic strains and provide opportunities for the coalition of resistance and virulence determinants on plasmids as evidenced by the IncF replicons characterised in this study. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.


July 7, 2019

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly.We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome.We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies.We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.


July 7, 2019

Mechanisms of surface antigenic variation in the human pathogenic fungus Pneumocystis jirovecii.

Microbial pathogens commonly escape the human immune system by varying surface proteins. We investigated the mechanisms used for that purpose by Pneumocystis jirovecii This uncultivable fungus is an obligate pulmonary pathogen that in immunocompromised individuals causes pneumonia, a major life-threatening infection. Long-read PacBio sequencing was used to assemble a core of subtelomeres of a single P. jirovecii strain from a bronchoalveolar lavage fluid specimen from a single patient. A total of 113 genes encoding surface proteins were identified, including 28 pseudogenes. These genes formed a subtelomeric gene superfamily, which included five families encoding adhesive glycosylphosphatidylinositol (GPI)-anchored glycoproteins and one family encoding excreted glycoproteins. Numerical analyses suggested that diversification of the glycoproteins relies on mosaic genes created by ectopic recombination and occurs only within each family. DNA motifs suggested that all genes are expressed independently, except those of the family encoding the most abundant surface glycoproteins, which are subject to mutually exclusive expression. PCR analyses showed that exchange of the expressed gene of the latter family occurs frequently, possibly favored by the location of the genes proximal to the telomere because this allows concomitant telomere exchange. Our observations suggest that (i) the P. jirovecii cell surface is made of a complex mixture of different surface proteins, with a majority of a single isoform of the most abundant glycoprotein, (ii) genetic mosaicism within each family ensures variation of the glycoproteins, and (iii) the strategy of the fungus consists of the continuous production of new subpopulations composed of cells that are antigenically different.IMPORTANCEPneumocystis jirovecii is a fungus causing severe pneumonia in immunocompromised individuals. It is the second most frequent life-threatening invasive fungal infection. We have studied the mechanisms of antigenic variation used by this pathogen to escape the human immune system, a strategy commonly used by pathogenic microorganisms. Using a new DNA sequencing technology generating long reads, we could characterize the highly repetitive gene families encoding the proteins that are present on the cellular surface of this pest. These gene families are localized in the regions close to the ends of all chromosomes, the subtelomeres. Such chromosomal localization was found to favor genetic recombinations between members of each gene family and to allow diversification of these proteins continuously over time. This pathogen seems to use a strategy of antigenic variation consisting of the continuous production of new subpopulations composed of cells that are antigenically different. Such a strategy is unique among human pathogens. Copyright © 2017 Schmid-Siegert et al.


July 7, 2019

Complete genomic sequences of two Salmonella enterica subsp. enterica serogroup C2 (O:6,8) strains from Central California.

Salmonella enterica subsp. enterica strains RM11060, serotype 6,8:d:-, and RM11065, serotype 6,8:-:e,n,z15, were isolated from environmental samples collected in central California in 2009. We report the complete genome sequences of these two strains. These genomic sequences are distinct and will provide additional data to our understanding of S. enterica genomics.


July 7, 2019

Comparative genome analysis of programmed DNA elimination in nematodes.

Programmed DNA elimination is a developmentally regulated process leading to the reproducible loss of specific genomic sequences. DNA elimination occurs in unicellular ciliates and a variety of metazoans, including invertebrates and vertebrates. In metazoa, DNA elimination typically occurs in somatic cells during early development, leaving the germline genome intact. Reference genomes for metazoa that undergo DNA elimination are not available. Here, we generated germline and somatic reference genome sequences of the DNA eliminating pig parasitic nematode Ascaris suum and the horse parasite Parascaris univalens. In addition, we carried out in-depth analyses of DNA elimination in the parasitic nematode of humans, Ascaris lumbricoides, and the parasitic nematode of dogs, Toxocara canis. Our analysis of nematode DNA elimination reveals that in all species, repetitive sequences (that differ among the genera) and germline-expressed genes (approximately 1000-2000 or 5%-10% of the genes) are eliminated. Thirty-five percent of these eliminated genes are conserved among these nematodes, defining a core set of eliminated genes that are preferentially expressed during spermatogenesis. Our analysis supports the view that DNA elimination in nematodes silences germline-expressed genes. Over half of the chromosome break sites are conserved between Ascaris and Parascaris, whereas only 10% are conserved in the more divergent T. canis. Analysis of the chromosomal breakage regions suggests a sequence-independent mechanism for DNA breakage followed by telomere healing, with the formation of more accessible chromatin in the break regions prior to DNA elimination. Our genome assemblies and annotations also provide comprehensive resources for analysis of DNA elimination, parasitology research, and comparative nematode genome and epigenome studies.© 2017 Wang et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Draft genomes of the fungal pathogen Phellinus noxius in Hong Kong

The fungal pathogen Phellinus noxius is the underlying cause of brown root rot, a disease with causing tree mortality globally, causing extensive damage in urban areas and crop plants. This disease currently has no cure, and despite the global epidemic, little is known about the pathogenesis and virulence of this pathogen. Using Ion Torrent PGM, Illumina MiSeq and PacBio RSII sequencing platforms with various genome assembly methods, we produced the draft genome sequences of four P. noxius strains isolated from infected trees in Hong Kong to further understand the pathogen and identify the mechanisms behind the aggressive nature and virulence of this fungus. The resulting genomes ranged from 30.8Mb to 31.8Mb in size, and of the four sequences, the YTM97 strain was chosen to produce a high-quality Hong Kong strain genome sequence, resulting in a 31Mb final assembly with 457 scaffolds, an N50 length of 275,889 bp and 96.2% genome completeness. RNA-seq of YTM97 using Illumina HiSeq400 was performed for improved gene prediction. AUGUSTUS and Genemark-ES prediction programs predicted 9,887 protein-coding genes which were annotated using GO and Pfam databases. The encoded carbohydrate active enzymes revealed large numbers of lignolytic enzymes present, comparable to those of other white-rot plant pathogens. In addition, P. noxius also possessed larger numbers of cellulose, xylan and hemicellulose degrading enzymes than other plant pathogens. Searches for virulence genes was also performed using PHI-Base and DFVF databases revealing a host of virulence-related genes and effectors. The combination of non-specific host range, unique carbohydrate active enzyme profile and large amount of putative virulence genes could explain the reasons behind the aggressive nature and increased virulence of this plant pathogen. The draft genome sequences presented here will provide references for strains found in Hong Kong. Together with emerging research, this information could be used for genetic diversity and epidemiology research on a global scale as well as expediting our efforts towards discovering the mechanisms of pathogenicity of this devastating pathogen.


July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.


July 7, 2019

Genomics of parallel adaptation at two timescales in Drosophila.

Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus.


July 7, 2019

Genetic maps and whole genome sequences of radish

Radish, Raphanus sativus L., is a member of Brassicaceae, to which Arabidopsis thaliana, a model plant in plant biology, belongs, as do other Brassica species including important crops. However, genetic and genomic studies of radish have been behind those of Arabidopsis and Brassica. In this decade, much effort has been made to develop genetic resources for radish, e.g., DNA markers, genetic maps, and whole genome sequences. Studies using the obtained information have revealed the genome structure of radish in terms of ancestral karyotype and have also prompted the identification of genes for agronomically important traits in radish through a map-based cloning strategy and quantitative trait locus analysis. In this chapter, we review the evolving development of radish genetic map in the past 15 years and the current status of genome sequencing of radish. We also introduce the latest strategy for the construction of a high-density genetic map using next-generation sequencing technology and propose a prospective direction of genetics and genomics research in radish which would be helpful for researchers and breeders in their efforts to promote radish breeding programs efficiently.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.