Since the advent of Next-Generation Sequencing (NGS), the cost of de novo genome sequencing and assembly have dropped precipitately, which has spurred interest in genome sequencing overall. Unfortunately the contiguity of the NGS assembled sequences, as well as the accuracy of these assemblies have suffered. Additionally, most NGS de novo assemblies leave large portions of genomes unresolved, and repetitive regions are often collapsed. When compared to the reference quality genome sequences produced before the NGS era, the new sequences are highly fragmented and often prove to be difficult to properly annotate. In some cases the contiguous portions are smaller than the average gene size making the sequence not nearly as useful for biologists as the earlier reference quality genomes including of Human, Mouse, C. elegans, or Drosophila. Recently, new 3rd generation sequencing technologies, long-range molecular techniques, and new informatics tools have facilitated a return to high quality assembly. We will discuss the capabilities of the technologies and assess their impact on assembly projects across the tree of life from small microbial and fungal genomes through large plant and animal genomes. Beyond improvements to contiguity, we will focus on the additional biological insights that can be made with better assemblies, including more complete analysis genes in their flanking regulatory context, in-depth studies of transposable elements and other complex gene families, and long-range synteny analysis of entire chromosomes. We will also discuss the need for new algorithms for representing and analyzing collections of many complete genomes at once.
With the introduction of P6-C4 chemistry, PacBio has made significant strides with Single Molecule, Real-Time (SMRT) Sequencing . Read lengths averaging between 10 and 15 kb can be now be achieved with extreme reads in the distribution of > 60 kb. The chemistry attains a consensus accuracy of 99.999% (QV50) at 30x coverage which coupled with an increased throughput from the PacBio RS II platform (500 Mb – 1 Gb per SMRT Cell) makes larger genome projects more tractable. These combined advancements in technology deliver results that rival the quality of Sanger “clone-by-clone” sequencing efforts; resulting in closed microbial genomes and highly contiguous de novo assembly of complex eukaryotes on multi-Gbase scale using SMRT Sequencing as the standalone technology. We present here the guidelines and best practices to achieve optimal results when employing PacBio-only whole genome shotgun sequencing strategies. Specific sequencing examples for plant and animal genomes are discussed with SMRTbell library preparation and purification methods for obtaining long insert libraries to generate optimal sequencing results. The benefits of long reads are demonstrated by the highly contiguous assemblies yielding contig N50s of over 5 Mb compared to similar assemblies using next-generation short-read approaches. Finally, guidelines will be presented for planning out projects for the de novo assembly of large genomes.
PAG Conference: Reference-quality drosophila genome assemblies for evolutionary analysis of previously inaccessible genomic regions
In this presentation, Andrew Clark from Cornell University describes work from a collaboration with Manyuan Long of the University of Chicago and Rod Wing of the University of Arizona to…
Complete Genome Sequence of Leptospira kmetyi LS 001/16, Isolated from a Soil Sample Associated with a Leptospirosis Patient in Kelantan, Malaysia.
The Gram-negative pathogenic spirochetal bacteria Leptospira spp. cause leptospirosis in humans and livestock animals. Leptospira kmetyi strain LS 001/16 was isolated from a soil sample associated with a leptospirosis patient in Kelantan, which is among the states in Malaysia with a high reported number of disease cases. Here, we report the complete genome sequence of Leptospira kmetyi strain LS 001/16. Copyright © 2019 Yusof et al.
In lichen symbiosis, polyol transfer from green algae is important for acquiring the fungal carbon source. However, the existence of polyol transporter genes and their correlation with lichenization remain unclear. Here, we report candidate polyol transporter genes selected from the genome of the lichen-forming fungus (LFF) Ramalina conduplicans. A phylogenetic analysis using characterized polyol and monosaccharide transporter proteins and hypothetical polyol transporter proteins of R. conduplicans and various ascomycetous fungi suggested that the characterized yeast’ polyol transporters form multiple clades with the polyol transporter-like proteins selected from the diverse ascomycetous taxa. Thus, polyol transporter genes are widely conserved among Ascomycota, regardless of lichen-forming status. In addition, the phylogenetic clusters suggested that LFFs belonging to Lecanoromycetes have duplicated proteins in each cluster. Consequently, the number of sequences similar to characterized yeast’ polyol transporters were evaluated using the genomes of 472 species or strains of Ascomycota. Among these, LFFs belonging to Lecanoromycetes had greater numbers of deduced polyol transporter proteins. Thus, various polyol transporters are conserved in Ascomycota and polyol transporter genes appear to have expanded during the evolution of Lecanoromycetes. Copyright © 2019 British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Comprehensive characterization of T-DNA integration induced chromosomal rearrangement in a birch T-DNA mutant.
Integration of T-DNA into plant genomes via Agrobacterium may interrupt gene structure and generate numerous mutants. The T-DNA caused mutants are valuable materials for understanding T-DNA integration model in plant research. T-DNA integration in plants is complex and still largely unknown. In this work, we reported that multiple T-DNA fragments caused chromosomal translocation and deletion in a birch (Betula platyphylla × B. pendula) T-DNA mutant yl.We performed PacBio genome resequencing for yl and the result revealed that two ends of a T-DNA can be integrated into plant genome independently because the two ends can be linked to different chromosomes and cause chromosomal translocation. We also found that these T-DNA were connected into tandem fragment regardless of direction before integrating into plant genome. In addition, the integration of T-DNA in yl genome also caused several chromosomal fragments deletion. We then summarized three cases for T-DNA integration model in the yl genome. (1) A T-DNA fragment is linked to the two ends of a double-stranded break (DSB); (2) Only one end of a T-DNA fragment is linked to a DSB; (3) A T-DNA fragment is linked to the ends of different DSBs. All the observations in the yl genome supported the DSB repair model.In this study, we showed a comprehensive genome analysis of a T-DNA mutant and provide a new insight into T-DNA integration in plants. These findings would be helpful for the analysis of T-DNA mutants with special phenotypes.
The recent release of genomic sequences for 3000 rice varieties provides access to the genetic diversity at species level for this crop. We take advantage of this resource to unravel some features of the retrotranspositional landscape of rice. We develop software TRACKPOSON specifically for the detection of transposable elements insertion polymorphisms (TIPs) from large datasets. We apply this tool to 32 families of retrotransposons and identify more than 50,000 TIPs in the 3000 rice genomes. Most polymorphisms are found at very low frequency, suggesting that they may have occurred recently in agro. A genome-wide association study shows that these activations in rice may be triggered by external stimuli, rather than by the alteration of genetic factors involved in transposable element silencing pathways. Finally, the TIPs dataset is used to trace the origin of rice domestication. Our results suggest that rice originated from three distinct domestication events.