April 21, 2020  |  

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes. © 2019 John Wiley & Sons Ltd/University College London.


April 21, 2020  |  

The Complete Genome of the Atypical Enteropathogenic Escherichia coli Archetype Isolate E110019 Highlights a Role for Plasmids in Dissemination of the Type III Secreted Effector EspT.

Enteropathogenic Escherichia coli (EPEC) is a leading cause of moderate to severe diarrhea among young children in developing countries, and EPEC isolates can be subdivided into two groups. Typical EPEC (tEPEC) bacteria are characterized by the presence of both the locus of enterocyte effacement (LEE) and the plasmid-encoded bundle-forming pilus (BFP), which are involved in adherence and translocation of type III effectors into the host cells. Atypical EPEC (aEPEC) bacteria also contain the LEE but lack the BFP. In the current report, we describe the complete genome of outbreak-associated aEPEC isolate E110019, which carries four plasmids. Comparative genomic analysis demonstrated that the type III secreted effector EspT gene, an autotransporter gene, a hemolysin gene, and putative fimbrial genes are all carried on plasmids. Further investigation of 65 espT-containing E. coli genomes demonstrated that different espT alleles are associated with multiple plasmids that differ in their overall gene content from the E110019 espT-containing plasmid. EspT has been previously described with respect to its role in the ability of E110019 to invade host cells. While other type III secreted effectors of E. coli have been identified on insertion elements and prophages of the chromosome, we demonstrated in the current study that the espT gene is located on multiple unique plasmids. These findings highlight a role of plasmids in dissemination of a unique E. coli type III secreted effector that is involved in host invasion and severe diarrheal illness.Copyright © 2019 American Society for Microbiology.


April 21, 2020  |  

LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.

Completing a genome is an important goal of genome assembly. However, many assemblies, including reference assemblies, are unfinished and have a number of gaps. Long reads obtained from third-generation sequencing (TGS) platforms can help close these gaps and improve assembly contiguity. However, current gap-closure approaches using long reads require extensive runtime and high memory usage. Thus, a fast and memory-efficient approach using long reads is needed to obtain complete genomes.We developed LR_Gapcloser to rapidly and efficiently close the gaps in genome assembly. This tool utilizes long reads generated from TGS sequencing platforms. Tested on de novo assembled gaps, repeat-derived gaps, and real gaps, LR_Gapcloser closed a higher number of gaps faster and with a lower error rate and a much lower memory usage than two existing, state-of-the art tools. This tool utilized raw reads to fill more gaps than when using error-corrected reads. It is applicable to gaps in the assemblies by different approaches and from large and complex genomes. After performing gap-closure using this tool, the contig N50 size of the human CHM1 genome was improved from 143 kb to 19 Mb, a 132-fold increase. We also closed the gaps in the Triticum urartu genome, a large genome rich in repeats; the contig N50 size was increased by 40%. Further, we evaluated the contiguity and correctness of six hybrid assembly strategies by combining the optimal TGS-based and next-generation sequencing-based assemblers with LR_Gapcloser. A proposed and optimal hybrid strategy generated a new human CHM1 genome assembly with marked contiguity. The contig N50 value was greater than 28 Mb, which is larger than previous non-reference assemblies of the diploid human genome.LR_Gapcloser is a fast and efficient tool that can be used to close gaps and improve the contiguity of genome assemblies. A proposed hybrid assembly including this tool promises reference-grade assemblies. The software is available at http://www.fishbrowser.org/software/LR_Gapcloser/.


April 21, 2020  |  

Effector gene reshuffling involves dispensable mini-chromosomes in the wheat blast fungus.

Newly emerged wheat blast disease is a serious threat to global wheat production. Wheat blast is caused by a distinct, exceptionally diverse lineage of the fungus causing rice blast disease. Through sequencing a recent field isolate, we report a reference genome that includes seven core chromosomes and mini-chromosome sequences that harbor effector genes normally found on ends of core chromosomes in other strains. No mini-chromosomes were observed in an early field strain, and at least two from another isolate each contain different effector genes and core chromosome end sequences. The mini-chromosome is enriched in transposons occurring most frequently at core chromosome ends. Additionally, transposons in mini-chromosomes lack the characteristic signature for inactivation by repeat-induced point (RIP) mutation genome defenses. Our results, collectively, indicate that dispensable mini-chromosomes and core chromosomes undergo divergent evolutionary trajectories, and mini-chromosomes and core chromosome ends are coupled as a mobile, fast-evolving effector compartment in the wheat pathogen genome.


April 21, 2020  |  

Complete Genome Assembly of Yersinia pseudotuberculosis IP2666pIB1.

Yersinia pseudotuberculosis, closely related to Yersinia pestis, is a human pathogen and model organism for studying bacterial pathogenesis. To aid in genomic analysis and understanding bacterial virulence, we sequenced and assembled the complete genome of the human pathogen Yersinia pseudotuberculosis IP2666pIB1.


April 21, 2020  |  

Genetic map-guided genome assembly reveals a virulence-governing minichromosome in the lentil anthracnose pathogen Colletotrichum lentis.

Colletotrichum lentis causes anthracnose, which is a serious disease on lentil and can account for up to 70% crop loss. Two pathogenic races, 0 and 1, have been described in the C. lentis population from lentil. To unravel the genetic control of virulence, an isolate of the virulent race 0 was sequenced at 1481-fold genomic coverage. The 56.10-Mb genome assembly consists of 50 scaffolds with N50 scaffold length of 4.89 Mb. A total of 11 436 protein-coding gene models was predicted in the genome with 237 coding candidate effectors, 43 secondary metabolite biosynthetic enzymes and 229 carbohydrate-active enzymes (CAZymes), suggesting a contraction of the virulence gene repertoire in C. lentis. Scaffolds were assigned to 10 core and two minichromosomes using a population (race 0 × race 1, n = 94 progeny isolates) sequencing-based, high-density (14 312 single nucleotide polymorphisms) genetic map. Composite interval mapping revealed a single quantitative trait locus (QTL), qClVIR-11, located on minichromosome 11, explaining 85% of the variability in virulence of the C. lentis population. The QTL covers a physical distance of 0.84 Mb with 98 genes, including seven candidate effector and two secondary metabolite genes. Taken together, the study provides genetic and physical evidence for the existence of a minichromosome controlling the C. lentis virulence on lentil. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.


April 21, 2020  |  

Complete assembly of the Leishmania donovani (HU3 strain) genome and transcriptome annotation.

Leishmania donovani is a unicellular parasite that causes visceral leishmaniasis, a fatal disease in humans. In this study, a complete assembly of the genome of L. donovani is provided. Apart from being the first published genome of this strain (HU3), this constitutes the best assembly for an L. donovani genome attained to date. The use of a combination of sequencing platforms enabled to assemble, without any sequence gap, the 36 chromosomes for this species. Additionally, based on this assembly and using RNA-seq reads derived from poly-A?+?RNA, the transcriptome for this species, not yet available, was delineated. Alternative SL addition sites and heterogeneity in the poly-A addition sites were commonly observed for most of the genes. After a complete annotation of the transcriptome, 2,410 novel transcripts were defined. Additionally, the relative expression for all transcripts present in the promastigote stage was determined. Events of cis-splicing have been documented to occur during the maturation of the transcripts derived from genes LDHU3_07.0430 and LDHU3_29.3990. The complete genome assembly and the availability of the gene models (including annotation of untranslated regions) are important pieces to understand how differential gene expression occurs in this pathogen, and to decipher phenotypic peculiarities like tissue tropism, clinical disease, and drug susceptibility.


April 21, 2020  |  

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.


April 21, 2020  |  

Chromosome conformation capture resolved near complete genome assembly of broomcorn millet.

Broomcorn millet (Panicum miliaceum L.) has strong tolerance to abiotic stresses, and is probably one of the oldest crops, with its earliest cultivation that dated back to ca. ~10,000 years. We report here its genome assembly through a combination of PacBio sequencing, BioNano, and Hi-C (in vivo) mapping. The 18 super scaffolds cover ~95.6% of the estimated genome (~887.8?Mb). There are 63,671 protein-coding genes annotated in this tetraploid genome. About ~86.2% of the syntenic genes in foxtail millet have two homologous copies in broomcorn millet, indicating rare gene loss after tetraploidization in broomcorn millet. Phylogenetic analysis reveals that broomcorn millet and foxtail millet diverged around ~13.1 Million years ago (Mya), while the lineage specific tetraploidization of broomcorn millet may be happened within ~5.91 million years. The genome is not only beneficial for the genome assisted breeding of broomcorn millet, but also an important resource for other Panicum species.


April 21, 2020  |  

Long-read based de novo assembly of low-complexity metagenome samples results in finished genomes and reveals insights into strain diversity and an active phage system.

Complete and contiguous genome assemblies greatly improve the quality of subsequent systems-wide functional profiling studies and the ability to gain novel biological insights. While a de novo genome assembly of an isolated bacterial strain is in most cases straightforward, more informative data about co-existing bacteria as well as synergistic and antagonistic effects can be obtained from a direct analysis of microbial communities. However, the complexity of metagenomic samples represents a major challenge. While third generation sequencing technologies have been suggested to enable finished metagenome-assembled genomes, to our knowledge, the complete genome assembly of all dominant strains in a microbiome sample has not been demonstrated. Natural whey starter cultures (NWCs) are used in cheese production and represent low-complexity microbiomes. Previous studies of Swiss Gruyère and selected Italian hard cheeses, mostly based on amplicon metagenomics, concurred that three species generally pre-dominate: Streptococcus thermophilus, Lactobacillus helveticus and Lactobacillus delbrueckii.Two NWCs from Swiss Gruyère producers were subjected to whole metagenome shotgun sequencing using the Pacific Biosciences Sequel and Illumina MiSeq platforms. In addition, longer Oxford Nanopore Technologies MinION reads had to be generated for one to resolve repeat regions. Thereby, we achieved the complete assembly of all dominant bacterial genomes from these low-complexity NWCs, which was corroborated by a 16S rRNA amplicon survey. Moreover, two distinct L. helveticus strains were successfully co-assembled from the same sample. Besides bacterial chromosomes, we could also assemble several bacterial plasmids and phages and a corresponding prophage. Biologically relevant insights were uncovered by linking the plasmids and phages to their respective host genomes using DNA methylation motifs on the plasmids and by matching prokaryotic CRISPR spacers with the corresponding protospacers on the phages. These results could only be achieved by employing long-read sequencing data able to span intragenomic as well as intergenomic repeats.Here, we demonstrate the feasibility of complete de novo genome assembly of all dominant strains from low-complexity NWCs based on whole metagenomics shotgun sequencing data. This allowed to gain novel biological insights and is a fundamental basis for subsequent systems-wide omics analyses, functional profiling and phenotype to genotype analysis of specific microbial communities.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.