Menu
July 7, 2019

Single-molecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution.

Amaranth (Amaranthus hypochondriacus) was a food staple among the ancient civilizations of Central and South America that has recently received increased attention due to the high nutritional value of the seeds, with the potential to help alleviate malnutrition and food security concerns, particularly in arid and semiarid regions of the developing world. Here, we present a reference-quality assembly of the amaranth genome which will assist the agronomic development of the species.Utilizing single-molecule, real-time sequencing (Pacific Biosciences) and chromatin interaction mapping (Hi-C) to close assembly gaps and scaffold contigs, respectively, we improved our previously reported Illumina-based assembly to produce a chromosome-scale assembly with a scaffold N50 of 24.4 Mb. The 16 largest scaffolds contain 98% of the assembly and likely represent the haploid chromosomes (n?=?16). To demonstrate the accuracy and utility of this approach, we produced physical and genetic maps and identified candidate genes for the betalain pigmentation pathway. The chromosome-scale assembly facilitated a genome-wide syntenic comparison of amaranth with other Amaranthaceae species, revealing chromosome loss and fusion events in amaranth that explain the reduction from the ancestral haploid chromosome number (n?=?18) for a tetraploid member of the Amaranthaceae.The assembly method reported here minimizes cost by relying primarily on short-read technology and is one of the first reported uses of in vivo Hi-C for assembly of a plant genome. Our analyses implicate chromosome loss and fusion as major evolutionary events in the 2n?=?32 amaranths and clearly establish the homoeologous relationship among most of the subgenome chromosomes, which will facilitate future investigations of intragenomic changes that occurred post polyploidization.


July 7, 2019

Heterogeneity of the Epstein-Barr virus major internal repeat reveals evolutionary mechanisms of EBV and a functional defect in the prototype EBV strain B95-8.

Epstein-Barr virus (EBV) is a ubiquitous pathogen of humans that can cause several types of lymphoma and carcinoma. Like other herpesviruses, EBV has diversified both through co-evolution with its host, and genetic exchange between virus strains. Sequence analysis of the EBV genome is unusually challenging, because of the large number and length of repeat regions within the virus. Here we describe the sequence assembly and analysis of the large internal repeat of EBV (IR1 or BamW repeats) from over 70 strains.Diversity of the latency protein EBNA-LP resides predominantly within the exons downstream of IR1. The integrity of the putative BWRF1 ORF is retained in over 80% of strains, and deletions truncating IR1 always spare BWRF1. Conserved regions include the IR1 latency promoter (Wp), and one zone upstream of and two within BWRF1.IR1 is heterogeneous in 70% of strains, and this heterogeneity arises from sequence exchange between strains as well as spontaneous mutation, with inter-strain recombination more common in tumour-derived viruses. This genetic exchange often incorporates regions of <1kb, and allelic gene conversion changes the frequency of small regions within the repeat, but not close to the flanks. These observations suggest that IR1 - and by extension EBV - diversifies through both recombination and breakpoint repair, while concerted evolution of IR1 is driven by gene conversion of small regions. Finally, the prototype EBV strain B95-8 contains four non-consensus variants within a single IR1 repeat unit, including a STOP codon in EBNA-LP. Repairing IR1 improves EBNA-LP levels and the quality of transformation by the B95-8 BAC.IMPORTANCE Epstein-Barr virus (EBV) infects the majority of the world population, but only causes illness in a small minority. Nevertheless, over 1% of cancers worldwide are attributable to EBV. Recent sequencing projects investigating virus diversity, to see if different strains have different disease impacts, have excluded regions of repeating sequence, as they are more technically challenging. Here we analyse the sequence of the largest repeat in EBV (IR1). We first characterised the variations in protein sequences encoded across IR1. In studying variations within the repeat of each strain, we identified a mutation in the main laboratory strain of EBV that impairs virus function, and suggest that tumour-associated viruses may be more likely to contain DNA mixed from two strains. Patterns of this mixing suggest that sequences can spread between strains (and also within the repeat) by copying sequence from another strain (or repeat unit) to repair DNA damage. Copyright © 2017 Ba abdullah et al.


July 7, 2019

Bacteriophages are the major drivers of Shigella flexneri serotype 1c genome plasticity: a complete genome analysis.

Shigella flexneri is the primary cause of bacillary dysentery in the developing countries. S. flexneri serotype 1c is a novel serotype, which is found to be endemic in many developing countries, but little is known about its genomic architecture and virulence signatures. We have sequenced for the first time, the complete genome of S. flexneri serotype 1c strain Y394, to provide insights into its diversity and evolution.We generated a high-quality reference genome of S. flexneri serotype 1c using the hybrid methods of long-read single-molecule real-time (SMRT) sequencing technology and short-read MiSeq (Illumina) sequencing technology. The Y394 chromosome is 4.58 Mb in size and shares the basic genomic features with other S. flexneri complete genomes. However, it possesses unique and highly modified O-antigen structure comprising of three distinct O-antigen modifying gene clusters that potentially came from three different bacteriophages. It also possesses a large number of hypothetical unique genes compared to other S. flexneri genomes.Despite a high level of structural and functional similarities of Y394 genome with other S. flexneri genomes, there are marked differences in the pathogenic islands. The diversity in the pathogenic islands suggests that these bacterial pathogens are well adapted to respond to the selection pressures during their evolution, which might contribute to the differences in their virulence potential.


July 7, 2019

XCAVATOR: accurate detection and genotyping of copy number variants from second and third generation whole-genome sequencing experiments.

We developed a novel software package, XCAVATOR, for the identification of genomic regions involved in copy number variants/alterations (CNVs/CNAs) from short and long reads whole-genome sequencing experiments.By using simulated and real datasets we showed that our tool, based on read count approach, is capable to predict the boundaries and the absolute number of DNA copies CNVs/CNAs with high resolutions. To demonstrate the power of our software we applied it to the analysis Illumina and Pacific Bioscencies data and we compared its performance to other ten state of the art tools.All the analyses we performed demonstrate that XCAVATOR is capable to detect germline and somatic CNVs/CNAs outperforming all the other tools we compared. XCAVATOR is freely available at http://sourceforge.net/projects/xcavator/ .


July 7, 2019

Parallel evolution of two clades of a major Atlantic endemic Vibrio parahaemolyticus pathogen lineage by independent acquisition of related pathogenicity islands.

Shellfish-transmitted Vibrio parahaemolyticus infections have recently increased from locations with historically low disease incidence, such as the Northeast United States (US). This change coincided with a bacterial population shift towards human pathogenic variants occurring in part through the introduction of several Pacific native lineages (ST36, ST43 and ST636) to near-shore areas off the Atlantic coast of the Northeast US. Concomitantly, ST631 emerged as a major endemic pathogen. Phylogenetic trees of clinical and environmental isolates indicated that two clades diverged from a common ST631 ancestor, and in each of these clades, a human pathogenic variant evolved independently through acquisition of distinct Vibrio pathogenicity islands (VPaI). These VPaI differ from each other and bear little resemblance to hemolysin-containing VPaI from isolates of the pandemic clonal complex. Clade I ST631 isolates either harbored no hemolysins, or contained a chromosome I-inserted island we call VPaIß that encodes a type three secretion system (T3SS2ß) typical of Trh hemolysin-producers. The more clinically prevalent and clonal ST631 clade II had an island we call VPaI? that encodes both tdh and trh and that was inserted in chromosome II. VPaI? was derived from VPaIß but with some additional acquired elements in common with VPaI carried by pandemic isolates, exemplifying the mosaic nature of pathogenicity islands. Genomics comparisons and amplicon assays identified VPaI?-type islands containing tdh inserted adjacent to the ure cluster in the three introduced Pacific and most other emergent lineages. that collectively cause 67% of Northeast US infections as of 2016.IMPORTANCE The availability of three different hemolysin genotypes in the ST631 lineage provided a unique opportunity to employ genome comparisons to further our understanding of the processes underlying pathogen evolution. The fact that two different pathogenic clades arose in parallel from the same potentially benign lineage by independent VPaI acquisition is surprising considering the historically low prevalence of community members harboring VPaI in waters along the Northeast US coast that could serve as the source of this material. This illustrates a possible predisposition of some lineages to not only acquire foreign DNA but also to become human pathogens. Whereas the underlying cause for the expansion of V. parahaemolyticus lineages harboring VPaI? along the US Atlantic coast and spread of this element to multiple lineages that underlies disease emergence is not known, this work underscores the need to define the environment factors that favor bacteria harboring VPaI in locations of emergent disease. Copyright © 2017 American Society for Microbiology.


July 7, 2019

SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads.

Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers.SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.


July 7, 2019

The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology.

Mobile element insertions (MEIs) represent ~25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.© 2017 Gardner et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Multiple hybrid de novo genome assembly of finger millet, an orphan allotetraploid crop.

Finger millet (Eleusine coracana (L.) Gaertn) is an important crop for food security because of its tolerance to drought, which is expected to be exacerbated by global climate changes. Nevertheless, it is often classified as an orphan/underutilized crop because of the paucity of scientific attention. Among several small millets, finger millet is considered as an excellent source of essential nutrient elements, such as iron and zinc; hence, it has potential as an alternate coarse cereal. However, high-quality genome sequence data of finger millet are currently not available. One of the major problems encountered in the genome assembly of this species was its polyploidy, which hampers genome assembly compared with a diploid genome. To overcome this problem, we sequenced its genome using diverse technologies with sufficient coverage and assembled it via a novel multiple hybrid assembly workflow that combines next-generation with single-molecule sequencing, followed by whole-genome optical mapping using the Bionano Irys® system. The total number of scaffolds was 1,897 with an N50 length?>2.6?Mb and detection of 96% of the universal single-copy orthologs. The majority of the homeologs were assembled separately. This indicates that the proposed workflow is applicable to the assembly of other allotetraploid genomes.© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.


July 7, 2019

SureMap: Versatile, error tolerant, and high sensitive read mapper

SureMap is a versatile, error tolerant and high sensitive read mapper which is able to map “difficult” reads, those requiring many edit operations to be mapped to the reference genome, with acceptable time complexity. Mapping real datasets reveal that many variants unidentifiable by other mappers can be called using Suremap. Moreover, SureMap has a very good running time and accuracy in aligning very long and noisy reads like PacBio and Nanopore against a reference genome.


July 7, 2019

Structural variation offers new home for disease associations and gene discovery

Following completion of the Human Genome Project, most studies of human genetic variation have centered on single nucleotide polymorphisms (SNPs). SNPs are numerous in individual genomes and serve as useful genetic markers in association studies across a population. These markers have been leveraged to identify genetic loci for disease risk and draw associations with numerous traits of interest. Despite their usefulness, SNPs do not tell the whole story. For example, most SNPs are associated with only a small increased risk of disease, and they usually cannot identify on their own which genes are causal. This has resulted in what many researchers have referred to as missing or hidden heritability.


July 7, 2019

Lightning-fast genome variant detection with GROM.

Current human whole genome sequencing projects produce massive amounts of data, often creating significant computational challenges. Different approaches have been developed for each type of genome variant and method of its detection, necessitating users to run multiple algorithms to find variants.We present GROM (Genome Rearrangement OmniMapper), a novel comprehensive variant detection algorithm accepting aligned read files as input and finding SNVs, indels, structural variants (SVs), and copy number variants (CNVs). We show that GROM outperforms state-of-the-art methods on seven validated benchmarks using two whole genome sequencing (WGS) datasets. Additionally, GROM boasts lightning fast run times, analyzing a 50x WGS human dataset (NA12878) on commonly available computer hardware in 11 minutes, more than an order of magnitude (up to 72 times) faster than tools detecting a similar range of variants.Addressing the needs of big data analysis, GROM combines in one algorithm SNV, indel, SV, and CNV detection providing superior speed, sensitivity, and precision. GROM is also able to detect CNVs, SNVs and indels in non-paired read WGS libraries, as well as SNVs and indels in whole exome or RNA sequencing datasets.


July 7, 2019

Complete circular genome sequence and temperature independent adaptation to anaerobiosis of Listeria weihenstephanensis DSM 24698.

The aim of this study was to analyze the adaptation of the environmental Listeria weihenstephanensis DSM 24698 to anaerobiosis. The complete circular genome sequence of this species is reported and the adaptation of L. weihenstephanensis DSM 24698 to oxygen availability was investigated by global transcriptional analyses via RNAseq at 18 and 34°C. A list of operons was created based on the transcriptional data. Forty-two genes were upregulated anaerobically and 62 genes were downregulated anaerobically. The oxygen dependent gene expression of selected genes was further validated via qPCR. Many of the differentially regulated genes encode metabolic enzymes indicating broad metabolic adaptations with respect to oxygen availability. Genes showing the strongest oxygen-dependent adaption encoded nitrate (narGHJI) and nitrite (nirBD) reductases. Together with the observation that nitrate supported anaerobic growth, these data indicate that L. weihenstephanensis DSM 24698 performs anaerobic nitrate respiration. The wide overlap between the oxygen-dependent transcriptional regulation at 18 and 34°C suggest that temperature does not play a key role in the oxygen-dependent transcriptional regulation of L. weihenstephanensis DSM 24698.


July 7, 2019

The complete genome sequence of Streptomyces albolongus YIM 101047, the producer of novel bafilomycins and odoriferous sesquiterpenoids.

Streptomyces albolongus YIM 101047 produces novel bafilomycins and odoriferous sesquiterpenoids with cytotoxic and antimicrobial activities. Here, we report the complete genome sequence of S. albolongus YIM 101047, which consists of an 8,027,788bp linear chromosome. Forty-six putative biosynthetic gene clusters of secondary metabolites were found. The sesquiterpenoid gene cluster was on the left arm (0.09-0.10Mb), and the bafilomycin biosynthetic gene cluster was on the right arm (7.46-7.64Mb) of the chromosome. Twenty-two putative gene clusters with high or moderate similarity to important antibiotic biosynthetic gene clusters were found, including the antitumor agents bafilomycin, epothilone and hedamycin; the antibacterial/antifungal agents clavulanic acid, collismycin A, frontalamides, kanamycin, streptomycin and streptothricin; the protein phosphatase inhibitor RK-682; and the acute iron poisoning medication desferrioxamine B. The genome sequence reported here will enable us to study the biosynthetic mechanism of these important antibiotics and will facilitate the discovery of novel secondary metabolites with potential applications to human health. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

Identification of low allele frequency mosaic mutations in Alzheimer disease

Germline mutations ofAPP,PSEN1, andPSEN2 genes cause autosomal dominant Alzheimer disease (AD). Somatic variants of the same genes may underlie pathogenesis in sporadic AD, which is the most prevalent form of the disease. Importantly, such somatic variants may be present at very low allelic frequency, confined to the brain, and are thus very difficult or impossible to detect in blood-derived DNA. Ever-refined methodologies to identify mutations present in a fraction of the DNA of the original tissue are rapidly transforming our understanding of DNA mutation and their role in complex pathologies such as tumors. These methods stand poised to test to what extend somatic variants may play a role in AD and other neurodegenerative diseases.


July 7, 2019

Harnessing whole genome sequencing in medical mycology.

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens.Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host.Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.