Phasing Archives - Page 28 of 30

July 7, 2019

Third-generation sequencing and the future of genomics

Third-generation long-range DNA sequencing and mapping technologies are creating a renaissance in high-quality genome sequencing. Unlike second-generation sequencing, which produces short reads a few hundred base-pairs long, third-generation single-molecule technologies generate over 10,000 bp reads or map over 100,000 bp molecules. We analyze how increased read lengths can be used to address long-standing problems in de novo genome assembly, structural variation analysis and haplotype phasing.

July 7, 2019

Complete genome sequence of Bradyrhizobium sp. strain CCGE-LA001, isolated from field nodules of the enigmatic wild bean Phaseolus microcarpus.

We present the complete genome sequence of Bradyrhizobium sp. strain CCGE-LA001, a nitrogen-fixing bacterium isolated from nodules of Phaseolus microcarpus. Strain CCGE-LA001 represents the first sequenced bradyrhizobial strain obtained from a wild Phaseolus sp. Its genome revealed a large and novel symbiotic island. Copyright © 2016 Servín-Garcidueñas et al.

July 7, 2019

Structure of Type IIL restriction-modification enzyme MmeI in complex with DNA has implications for engineering new specificities.

The creation of restriction enzymes with programmable DNA-binding and -cleavage specificities has long been a goal of modern biology. The recently discovered Type IIL MmeI family of restriction-and-modification (RM) enzymes that possess a shared target recognition domain provides a framework for engineering such new specificities. However, a lack of structural information on Type IIL enzymes has limited the repertoire that can be rationally engineered. We report here a crystal structure of MmeI in complex with its DNA substrate and an S-adenosylmethionine analog (Sinefungin). The structure uncovers for the first time the interactions that underlie MmeI-DNA recognition and methylation (5′-TCCRAC-3′; R = purine) and provides a molecular basis for changing specificity at four of the six base pairs of the recognition sequence (5′-TCCRAC-3′). Surprisingly, the enzyme is resilient to specificity changes at the first position of the recognition sequence (5′-TCCRAC-3′). Collectively, the structure provides a basis for engineering further derivatives of MmeI and delineates which base pairs of the recognition sequence are more amenable to alterations than others.

July 7, 2019

The challenges of implementing next generation sequencing across a large healthcare system, and the molecular epidemiology and antibiotic susceptibilities of carbapenemase-producing bacteria in the healthcare system of the U.S. Department of Defense.

We sought to: 1) provide an overview of the genomic epidemiology of an extensive collection of carbapenemase-producing bacteria (CPB) collected in the U.S. Department of Defense health system; 2) increase awareness of the public availability of the sequences, isolates, and customized antimicrobial resistance database of that system; and 3) illustrate challenges and offer mitigations for implementing next generation sequencing (NGS) across large health systems.Prospective surveillance and system-wide implementation of NGS.288-hospital healthcare network.All phenotypically carbapenem resistant bacteria underwent CarbaNP® testing and PCR, followed by NGS. Commercial (Newbler and Geneious), on-line (ResFinder), and open-source software (Btrim, FLASh, Bowtie2, an Samtools) were used for assembly, SNP detection and clustering. Laboratory capacity, throughput, and response time were assessed. From 2009 through 2015, 27,000 multidrug-resistant Gram-negative isolates were submitted. 225 contained carbapenemase-encoding genes (most commonly blaKPC, blaNDM, and blaOXA23). These were found in 15 species from 146 inpatients in 19 facilities. Genetically related CPB were found in more than one hospital. Other clusters or outbreaks were not clonal and involved genetically related plasmids, while some involved several unrelated plasmids. Relatedness depended on the clustering algorithm used. Transmission patterns of plasmids and other mobile genetic elements could not be determined without ultra-long read, single-molecule real-time sequencing. 80% of carbapenem-resistant phenotypes retained susceptibility to aminoglycosides, and 70% retained susceptibility to fluoroquinolones. However, among the CPB-confirmed genotypes, fewer than 25% retained susceptibility to aminoglycosides or fluoroquinolones.Although NGS is increasingly acclaimed to revolutionize clinical practice, resource-constrained environments, large or geographically dispersed healthcare networks, and military or government-funded public health laboratories are likely to encounter constraints and challenges as they implement NGS across their health systems. These include lack of standardized definitions and quality control metrics, limitations of short-read sequencing, insufficient bandwidth, and the current limited availability of very expensive and scarcely available sequencing platforms. Possible solutions and mitigations are also proposed.

July 7, 2019

Selecting reads for haplotype assembly

Haplotype assembly or read-based phasing is the problem of reconstructing both haplotypes of a diploid genome from next-generation sequencing data. This problem is formalized as the Minimum Error Correction (MEC) problem and can be solved using algorithms such as WhatsHap. The runtime of WhatsHap is exponential in the maximum coverage, which is hence controlled in a pre-processing step that selects reads to be used for phasing. Here, we report on a heuristic algorithm designed to choose beneficial reads for phasing, in particular to increase the connectivity of the phased blocks and the number of correctly phased variants compared to the random selection previously employed in by WhatsHap. The algorithm we describe has been integrated into the WhatsHap software, which is available under MIT licence from https://bitbucket.org/whatshap/whatshap.

July 7, 2019

ABO allele-level frequency estimation based on population-scale genotyping by next generation sequencing.

The characterization of the ABO blood group status is vital for blood transfusion and solid organ transplantation. Several methods for the molecular characterization of the ABO gene, which encodes the alleles that give rise to the different ABO blood groups, have been described. However, the application of those methods has so far been restricted to selected samples and not been applied to population-scale analysis.We describe a cost-effective method for high-throughput genotyping of the ABO system by next generation sequencing. Sample specific barcodes and sequencing adaptors are introduced during PCR, rendering the products suitable for direct sequencing on Illumina MiSeq or HiSeq instruments. Complete sequence coverage of exons 6 and 7 enables molecular discrimination of the ABO subgroups and many alleles. The workflow was applied to ABO genotype more than a million samples. We report the allele group frequencies calculated on a subset of more than 110,000 sampled individuals of German origin. Further we discuss the potential of the workflow for high resolution genotyping taking the observed allele group frequencies into account. Finally, sequence analysis revealed 287 distinct so far not described alleles of which the most abundant one was identified in 174 samples.The described workflow delivers high resolution ABO genotyping at low cost enabling population-scale molecular ABO characterization.

July 7, 2019

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly.

July 7, 2019

Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants

As a result of a high rate of mutations and recombination events, an RNA-virus exists as a heterogeneous “swarm” of mutant variants. The long read length offered by single-molecule sequencing technologies allows each mutant variant to be sequenced in a single pass. However, high error rate limits the ability to reconstruct heterogeneous viral population composed of rare, related mutant variants. In this paper, we present 2SNV, a method able to tolerate the high error-rate of the single-molecule protocol and reconstruct mutant variants. 2SNV uses linkage between single nucleotide variations to efficiently distinguish them from read errors. To benchmark the sensitivity of 2SNV, we performed a single-molecule sequencing experiment on a sample containing a titrated level of known viral mutant variants. Our method is able to accurately reconstruct clone with frequency of 0.2 % and distinguish clones that differed in only two nucleotides distantly located on the genome. 2SNV outperforms existing methods for full-length viral mutant reconstruction. The open source implementation of 2SNV is freely available for download at http://?alan.?cs.?gsu.?edu/?NGS/???q=?content/?2snv.

July 7, 2019

The draft genome of MD-2 pineapple using hybrid error correction of long reads.

The introduction of the elite pineapple variety, MD-2, has caused a significant market shift in the pineapple industry. Better productivity, overall increased in fruit quality and taste, resilience to chilled storage and resistance to internal browning are among the key advantages of the MD-2 as compared with its previous predecessor, the Smooth Cayenne. Here, we present the genome sequence of the MD-2 pineapple (Ananas comosus (L.) Merr.) by using the hybrid sequencing technology from two highly reputable platforms, i.e. the PacBio long sequencing reads and the accurate Illumina short reads. Our draft genome achieved 99.6% genome coverage with 27,017 predicted protein-coding genes while 45.21% of the genome was identified as repetitive elements. Furthermore, differential expression of ripening RNASeq library of pineapple fruits revealed ethylene-related transcripts, believed to be involved in regulating the process of non-climacteric pineapple fruit ripening. The MD-2 pineapple draft genome serves as an example of how a complex heterozygous genome is amenable to whole genome sequencing by using a hybrid technology that is both economical and accurate. The genome will make genomic applications more feasible as a medium to understand complex biological processes specific to pineapple. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

July 7, 2019

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.

July 7, 2019

Bacterial genetics: SMRT-seq reveals an epigenetic switch.

Streptococcus pneumoniae uses genetic diversification as a strategy to achieve phenotypic plasticity. For example, DNA inversion of the hsdS genes of type I restriction-modification (R-M) systems determines whether S. pneumoniae forms opaque or transparent colonies, which have different colonization and virulence characteristics. Zhang and colleagues now use single-molecule, real-time sequencing (SMRT-seq) to show the allelic variation of hsdS that results from site-specific recombination forms part of an epigenetic switch.

July 7, 2019

Identification of the novel B*27:144 allele in an Irish Individual.

July 7, 2019

Whole-genome sequencing recommendations

Recent technological developments have revolutionized the way we perform genetic analyses. In particular whole-genome sequencing provides access to the entire genetic makeup of an individual, and it is now an affordable approach for many research groups. As a consequence genome sequencing is pervading many fields of biological research. Sequencing technologies are evolving rapidly and so do their applications. Here we provide a first primer on whole-genome sequencing, focusing on two of the most popular applications: (1) de novo genome sequencing, in which the objective is obtaining a high-quality genome assembly that can serve as a reference for a species or variety, and (2) genome resequencing, when there is an available reference genome and the objective is to map sequence variation of an individual or a set of individuals. It is not our intention to provide a comprehensive overview of current methodologies that will likely soon become obsolete, but rather focus on general principles that will have a more general applicability.

July 7, 2019

Strategies for sequence assembly of plant genomes

The field of plant genome assembly has greatly benefited from the development and widespread adoption of next-generation DNA sequencing platforms. Very high sequencing throughputs and low costs per nucleotide have considerably reduced the technical and budgetary constraints associated with early assembly projects done primarily with a traditional Sanger-based approach. Those improvements led to a sharp increase in the number of plant genomes being sequenced, including large and complex genomes of economically important crops. Although next-generation DNA sequencing has considerably improved our understanding of the overall structure and dynamics of many plant genomes, severe limitations still remain because next-generation DNA sequencing reads typically are shorter than Sanger reads. In addition, the software tools used to de novo assemble sequences are not necessarily designed to optimize the use of short reads. These cause challenges, common to many plant species with large genome sizes, high repeat contents, polyploidy and genome-wide duplications. This chapter provides an overview of historical and current methods used to sequence and assemble plant genomes, along with new solutions offered by the emergence of technologies such as single molecule sequencing and optical mapping to address the limitations of current sequence assemblies.

July 7, 2019

Comparative methylome analysis of the occasional ruminant respiratory pathogen Bibersteinia trehalosi.

We examined and compared both the methylomes and the modification-related gene content of four sequenced strains of Bibersteinia trehalosi isolated from the nasopharyngeal tracts of Nebraska cattle with symptoms of bovine respiratory disease complex. The methylation patterns and the encoded DNA methyltransferase (MTase) gene sets were different between each strain, with the only common pattern being that of Dam (GATC). Among the observed patterns were three novel motifs attributable to Type I restriction-modification systems. In some cases the differences in methylation patterns corresponded to the gain or loss of MTase genes, or to recombination at target recognition domains that resulted in changes of enzyme specificity. However, in other cases the differences could be attributed to differential expression of the same MTase gene across strains. The most obvious regulatory mechanism responsible for these differences was slipped strand mispairing within short sequence repeat regions. The combined action of these evolutionary forces allows for alteration of different parts of the methylome at different time scales. We hypothesize that pleiotropic transcriptional modulation resulting from the observed methylomic changes may be involved with the switch between the commensal and pathogenic states of this common member of ruminant microflora.

Auto Tag: Phasing

Third-generation sequencing and the future of genomics

Complete genome sequence of Bradyrhizobium sp. strain CCGE-LA001, isolated from field nodules of the enigmatic wild bean Phaseolus microcarpus.

Structure of Type IIL restriction-modification enzyme MmeI in complex with DNA has implications for engineering new specificities.

The challenges of implementing next generation sequencing across a large healthcare system, and the molecular epidemiology and antibiotic susceptibilities of carbapenemase-producing bacteria in the healthcare system of the U.S. Department of Defense.

Selecting reads for haplotype assembly

ABO allele-level frequency estimation based on population-scale genotyping by next generation sequencing.

Extensive sequencing of seven human genomes to characterize benchmark reference materials.

Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants

The draft genome of MD-2 pineapple using hybrid error correction of long reads.

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Bacterial genetics: SMRT-seq reveals an epigenetic switch.

Identification of the novel B*27:144 allele in an Irish Individual.

Whole-genome sequencing recommendations

Strategies for sequence assembly of plant genomes

Comparative methylome analysis of the occasional ruminant respiratory pathogen Bibersteinia trehalosi.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert