Menu
July 7, 2019

Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula.

Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner.Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly.Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.


July 7, 2019

Resolving multicopy duplications de novo using polyploid phasing

While the rise of single-molecule sequencing systems has enabled an unprecedented rise in the ability to assemble complex regions of the genome, long segmental duplications in the genome still remain a challenging frontier in assembly. Segmental duplications are at the same time both gene rich and prone to large structural rearrangements, making the resolution of their sequences important in medical and evolutionary studies. Duplicated sequences that are collapsed in mammalian de novo assemblies are rarely identical; after a sequence is duplicated, it begins to acquire paralog-specific variants. In this paper, we study the problem of resolving the variations in multicopy, long segmental duplications by developing and utilizing algorithms for polyploid phasing. We develop two algorithms: the first one is targeted at maximizing the likelihood of observing the reads given the underlying haplotypes using discrete matrix completion. The second algorithm is based on correlation clustering and exploits an assumption, which is often satisfied in these duplications, that each paralog has a sizable number of paralog-specific variants. We develop a detailed simulation methodology and demonstrate the superior performance of the proposed algorithms on an array of simulated datasets. We measure the likelihood score as well as reconstruction accuracy, i.e., what fraction of the reads are clustered correctly. In both the performance metrics, we find that our algorithms dominate existing algorithms on more than 93% of the datasets. While the discrete matrix completion performs better on likelihood score, the correlation-clustering algorithm performs better on reconstruction accuracy due to the stronger regularization inherent in the algorithm. We also show that our correlation-clustering algorithm can reconstruct on average 7.0 haplotypes in 10-copy duplication datasets whereas existing algorithms reconstruct less than one copy on average.


July 7, 2019

Extraction of high molecular weight DNA from fungal rust spores for long read sequencing.

Wheat rust fungi are complex organisms with a complete life cycle that involves two different host plants and five different spore types. During the asexual infection cycle on wheat, rusts produce massive amounts of dikaryotic urediniospores. These spores are dikaryotic (two nuclei) with each nucleus containing one haploid genome. This dikaryotic state is likely to contribute to their evolutionary success, making them some of the major wheat pathogens globally. Despite this, most published wheat rust genomes are highly fragmented and contain very little haplotype-specific sequence information. Current long-read sequencing technologies hold great promise to provide more contiguous and haplotype-phased genome assemblies. Long reads are able to span repetitive regions and phase structural differences between the haplomes. This increased genome resolution enables the identification of complex loci and the study of genome evolution beyond simple nucleotide polymorphisms. Long-read technologies require pure high molecular weight DNA as an input for sequencing. Here, we describe a DNA extraction protocol for rust spores that yields pure double-stranded DNA molecules with molecular weight of >50 kilo-base pairs (kbp). The isolated DNA is of sufficient purity for PacBio long-read sequencing, but may require additional purification for other sequencing technologies such as Nanopore and 10× Genomics.


July 7, 2019

Long-read sequencing offers path to more accurate drug metabolism profiles

In the complex drug discovery process, one of the looming questions for any new compound is how it will be metabolised in a human bodyWhi|e there are several methods for evaluating this, one of the most common involves CYP2D6,the enzyme encoded by the cytochrome P450—2D6 gene.This enzyme is involved in metabolising a quarter of all commonly used medications, making it an important target for ADME and pharmacogenomics studies. It is known to activate some drugs and to play a role in the deactivation or excretion of others.


July 7, 2019

The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation.

Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.


July 7, 2019

MECAT: fast mapping, error correction, and de novo assembly for single-molecule sequencing reads.

We present a tool that combines fast mapping, error correction, and de novo assembly (MECAT; accessible at https://github.com/xiaochuanle/MECAT) for processing single-molecule sequencing (SMS) reads. MECAT’s computing efficiency is superior to that of current tools, while the results MECAT produces are comparable or improved. MECAT enables reference mapping or de novo assembly of large genomes using SMS reads on a single computer.


July 7, 2019

Insights into the red algae and eukaryotic evolution from the genome of Porphyra umbilicalis (Bangiophyceae, Rhodophyta).

Porphyra umbilicalis (laver) belongs to an ancient group of red algae (Bangiophyceae), is harvested for human food, and thrives in the harsh conditions of the upper intertidal zone. Here we present the 87.7-Mbp haploid Porphyra genome (65.8% G + C content, 13,125 gene loci) and elucidate traits that inform our understanding of the biology of red algae as one of the few multicellular eukaryotic lineages. Novel features of the Porphyra genome shared by other red algae relate to the cytoskeleton, calcium signaling, the cell cycle, and stress-tolerance mechanisms including photoprotection. Cytoskeletal motor proteins in Porphyra are restricted to a small set of kinesins that appear to be the only universal cytoskeletal motors within the red algae. Dynein motors are absent, and most red algae, including Porphyra, lack myosin. This surprisingly minimal cytoskeleton offers a potential explanation for why red algal cells and multicellular structures are more limited in size than in most multicellular lineages. Additional discoveries further relating to the stress tolerance of bangiophytes include ancestral enzymes for sulfation of the hydrophilic galactan-rich cell wall, evidence for mannan synthesis that originated before the divergence of green and red algae, and a high capacity for nutrient uptake. Our analyses provide a comprehensive understanding of the red algae, which are both commercially important and have played a major role in the evolution of other algal groups through secondary endosymbioses.


July 7, 2019

Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community.

The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models.


July 7, 2019

Towards systems metabolic engineering in Pichia pastoris.

The methylotrophic yeast Pichia pastoris is firmly established as a host for the production of recombinant proteins, frequently outperforming other heterologous hosts. Already, a sizeable amount of systems biology knowledge has been acquired for this non-conventional yeast. By applying various omics-technologies, productivity features have been thoroughly analyzed and optimized via genetic engineering. However, challenging clonal variability, limited vector repertoire and insufficient genome annotation have hampered further developments. Yet, in the last few years a reinvigorated effort to establish P. pastoris as a host for both protein and metabolite production is visible. A variety of compounds from terpenoids to polyketides have been synthesized, often exceeding the productivity of other microbial systems. The clonal variability was systematically investigated and strategies formulated to circumvent untargeted events, thereby streamlining the screening procedure. Promoters with novel regulatory properties were discovered or engineered from existing ones. The genetic tractability was increased via the transfer of popular manipulation and assembly techniques, as well as the creation of new ones. A second generation of sequencing projects culminated in the creation of the second best functionally annotated yeast genome. In combination with landmark physiological insights and increased output of omics-data, a good basis for the creation of refined genome-scale metabolic models was created. The first application of model-based metabolic engineering in P. pastoris showcased the potential of this approach. Recent efforts to establish yeast peroxisomes for compartmentalized metabolite synthesis appear to fit ideally with the well-studied high capacity peroxisomal machinery of P. pastoris. Here, these recent developments are collected and reviewed with the aim of supporting the establishment of systems metabolic engineering in P. pastoris. Copyright © 2017. Published by Elsevier Inc.


July 7, 2019

A step-by-step guide to assemble a reptilian genome.

Multiple technologies and software are now available facilitating the de novo sequencing and assembly of any vertebrate genome. Yet the quality of most available sequenced genomes is substantially poorer than that of the golden standard in the field: the human genome. Here, we present a step-by-step protocol for the successful sequencing and assembly of a high-quality snake genome that can be applied to any other reptilian or avian species. We combine the great sequencing depth and accuracy of short reads with the use of different insert size libraries for extended scaffolding followed by optical mapping. We show that this procedure improved the corn snake scaffold N50 from 3.7 kbp to 1.4 Mbp, currently making it one of the snake genomes with the longest scaffolds. Short guidelines are also given on the extraction of long DNA molecules from reptilian blood and the necessary modifications in DNA extraction protocols. This chapter is accompanied by a website ( www.reptilomics.org/stepbystep.html ), where we provide links to the suggested software, examples of input and output files, and running parameters.


July 7, 2019

Draft genome sequences from a novel clade of Bacillus cereus sensu lato strains, isolated from the International Space Station.

The draft genome sequences of six Bacillus strains, isolated from the International Space Station and belonging to the Bacillus anthracis-B. cereus-B. thuringiensis group, are presented here. These strains were isolated from the Japanese Experiment Module (one strain), U.S. Harmony Node 2 (three strains), and Russian Segment Zvezda Module (two strains). Copyright © 2017 Venkateswaran et al.


July 7, 2019

The Apostasia genome and the evolution of orchids.

Constituting approximately 10% of flowering plant species, orchids (Orchidaceae) display unique flower morphologies, possess an extraordinary diversity in lifestyle, and have successfully colonized almost every habitat on Earth. Here we report the draft genome sequence of Apostasia shenzhenica, a representative of one of two genera that form a sister lineage to the rest of the Orchidaceae, providing a reference for inferring the genome content and structure of the most recent common ancestor of all extant orchids and improving our understanding of their origins and evolution. In addition, we present transcriptome data for representatives of Vanilloideae, Cypripedioideae and Orchidoideae, and novel third-generation genome data for two species of Epidendroideae, covering all five orchid subfamilies. A. shenzhenica shows clear evidence of a whole-genome duplication, which is shared by all orchids and occurred shortly before their divergence. Comparisons between A. shenzhenica and other orchids and angiosperms also permitted the reconstruction of an ancestral orchid gene toolkit. We identify new gene families, gene family expansions and contractions, and changes within MADS-box gene classes, which control a diverse suite of developmental processes, during orchid evolution. This study sheds new light on the genetic mechanisms underpinning key orchid innovations, including the development of the labellum and gynostemium, pollinia, and seeds without endosperm, as well as the evolution of epiphytism; reveals relationships between the Orchidaceae subfamilies; and helps clarify the evolutionary history of orchids within the angiosperms.


July 7, 2019

Population structure and local adaptation of MAC lung disease agent Mycobacterium avium subsp. hominissuis.

Mycobacterium avium subsp. hominissuis (MAH) is one of the most common nontuberculous mycobacterial species responsible for chronic lung disease in humans. Despite increasing worldwide incidence, little is known about the genetic mechanisms behind the population evolution of MAH. To elucidate the local adaptation mechanisms of MAH, we assessed genetic population structure, the mutual homologous recombination, and gene content for 36 global MAH isolates, including 12 Japanese isolates sequenced in the present study. We identified five major MAH lineages and found that extensive mutual homologous recombination occurs among them. Two lineages (MahEastAsia1 and MahEastAsia2) were predominant in the Japanese isolates. We identified alleles unique to these two East Asian lineages in the loci responsible for trehalose biosynthesis (treS and mak) and in one mammalian cell entry operon, which presumably originated from as yet undiscovered mycobacterial lineages. Several genes and alleles unique to East Asian strains were located in the fragments introduced via recombination between East Asian lineages, suggesting implication of recombination in local adaptation. These patterns of MAH genomes are consistent with the signature of distribution conjugative transfer, a mode of sexual reproduction reported for other mycobacterial species.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments.

We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments.By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools.All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/ .


July 7, 2019

SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads.

Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers.SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.