Menu
July 7, 2019  |  

Microbial sequence typing in the genomic era.

Next-generation sequencing (NGS), also known as high-throughput sequencing, is changing the field of microbial genomics research. NGS allows for a more comprehensive analysis of the diversity, structure and composition of microbial genes and genomes compared to the traditional automated Sanger capillary sequencing at a lower cost. NGS strategies have expanded the versatility of standard and widely used typing approaches based on nucleotide variation in several hundred DNA sequences and a few gene fragments (MLST, MLVA, rMLST and cgMLST). NGS can now accommodate variation in thousands or millions of sequences from selected amplicons to full genomes (WGS, NGMLST and HiMLST). To extract signals from high-dimensional NGS data and make valid statistical inferences, novel analytic and statistical techniques are needed. In this review, we describe standard and new approaches for microbial sequence typing at gene and genome levels and guidelines for subsequent analysis, including methods and computational frameworks. We also present several applications of these approaches to some disciplines, namely genotyping, phylogenetics and molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019  |  

New high copy tandem repeat in the content of the chicken W chromosome.

The content of repetitive DNA in avian genomes is considerably less than in other investigated vertebrates. The first descriptions of tandem repeats were based on the results of routine biochemical and molecular biological experiments. Both satellite DNA and interspersed repetitive elements were annotated using library-based approach and de novo repeat identification in assembled genome. The development of deep-sequencing methods provides datasets of high quality without preassembly allowing one to annotate repetitive elements from unassembled part of genomes. In this work, we search the chicken assembly and annotate high copy number tandem repeats from unassembled short raw reads. Tandem repeat (GGAAA)n has been identified and found to be the second after telomeric repeat (TTAGGG)n most abundant in the chicken genome. Furthermore, (GGAAA)n repeat forms expanded arrays on the both arms of the chicken W chromosome. Our results highlight the complexity of repetitive sequences and update data about organization of sex W chromosome in chicken.


July 7, 2019  |  

Recent progress and prospects for advancing arachnid genomics

Arachnids exhibit tremendous species richness and adaptations of biomedical, industrial, and agricultural importance. Yet genomic resources for arachnids are limited, with the first few spider and scorpion genomes becoming accessible in the last four years. We review key insights from these genome projects, and recommend additional genomes for sequencing, emphasizing taxa of greatest value to the scientific community. We suggest greater sampling of spiders whose genomes are understudied but hold important protein recipes for silk and venom production. We further recommend arachnid genomes to address significant evolutionary topics, including the phenotypic impact of genome duplications. A barrier to high-quality arachnid genomes are assemblies based solely on short-read data, which may be overcome by long-range sequencing and other emerging methods.


July 7, 2019  |  

Current advances in genome sequencing of common wheat and its ancestral species

Common wheat is an important and widely cultivated food crop throughout the world. Much progress has been made in regard to wheat genome sequencing in the last decade. Starting from the sequencing of single chromosomes/chromosome arms whole genome sequences of common wheat and its diploid and tetraploid ancestors have been decoded along with the development of sequencing and assembling technologies. In this review, we give a brief summary on international progress in wheat genome sequencing, and mainly focus on reviewing the effort and contributions made by Chinese scientists.


July 7, 2019  |  

Development of molecular markers linked to powdery mildew resistance GenePm4bby combining SNP discovery from transcriptome sequencing data with bulked segregant analysis (BSR-Seq) in wheat.

Powdery mildew resistance genePm4b, originating fromTriticum persicum, is effective against the prevalentBlumeria graminisf. sp.tritici(Bgt) isolates from certain regions of wheat production in China. The lack of tightly linked molecular markers with the target gene prevents the precise identification ofPm4bduring the application of molecular marker-assisted selection (MAS). The strategy that combines the RNA-Seq technique and the bulked segregant analysis (BSR-Seq) was applied in an F2:3mapping population (237 families) derived from a pair of isogenic lines VPM1/7*Bainong 3217 F4(carryingPm4b) and Bainong 3217 to develop more closely linked molecular markers. RNA-Seq analysis of the two phenotypically contrasting RNA bulks prepared from the representative F2:3families generated 20,745,939 and 25,867,480 high-quality read pairs, and 82.8 and 80.2% of them were uniquely mapped to the wheat whole genome draft assembly for the resistant and susceptible RNA bulks, respectively. Variant calling identified 283,866 raw single nucleotide polymorphisms (SNPs) and InDels between the two bulks. The SNPs that were closely associated with the powdery mildew resistance were concentrated on chromosome 2AL. Among the 84 variants that were potentially associated with the disease resistance trait, 46 variants were enriched in an about 25 Mb region at the distal end of chromosome arm 2AL. FourPm4b-linked SNP markers were developed from these variants. Based on the sequences of Chinese Spring where these polymorphic SNPs were located, 98 SSR primer pairs were designed to develop distal markers flanking thePm4bgene. Three SSR markers,Xics13,Xics43, andXics76, were incorporated in the new genetic linkage map, which locatedPm4bin a 3.0 cM genetic interval spanning a 6.7 Mb physical genomic region. This region had a collinear relationship withBrachypodium distachyonchromosome 5, rice chromosome 4, and sorghum chromosome 6. Seven genes associated with disease resistance were predicted in this collinear genomic region, which included C2 domain protein, peroxidase activity protein, protein kinases of PKc_like super family, Mlo family protein, and catalytic domain of the serine/threonine kinases (STKc_IRAK like super family). The markers developed in the present study facilitate identification ofPm4bduring its MAS practice.


July 7, 2019  |  

Sustaining global agriculture through rapid detection and deployment of genetic resistance to deadly crop diseases.

Contents Summary 45 I. Introduction 45 II. Targeted chromosome-based cloning via long-range assembly (TACCA) 46 III. Resistance gene cloning through mutational mapping (MutMap) 47 IV. Cloning through mutant chromosome sequencing (MutChromSeq) 47 V. Rapid cloning through resistance gene enrichment and sequencing (RenSeq) 49 VI. Cloning resistance genes through transcriptome profiling (RNAseq) 49 VII. Resistance gene deployment strategies 49 VIII. Conclusions 50 Acknowledgements 50 References 50 SUMMARY: Genetically encoded resistance is a major component of crop disease management. Historically, gene loci conferring resistance to pathogens have been identified through classical genetic methods. In recent years, accelerated gene cloning strategies have become available through advances in sequencing, gene capture and strategies for reducing genome complexity. Here, I describe these approaches with key emphasis on the isolation of resistance genes to the cereal crop diseases that are an ongoing threat to global food security. Rapid gene isolation enables their efficient deployment through marker-assisted selection and transgenic technology. Together with innovations in genome editing and progress in pathogen virulence studies, this creates further opportunities to engineer long-lasting resistance. These approaches will speed progress towards a future of farming using fewer pesticides.© 2017 Commonwealth of Australia. New Phytologist © 2017 New Phytologist Trust.


July 7, 2019  |  

RIFRAF: a frame-resolving consensus algorithm.

Protein coding genes can be studied using long-read next generation sequencing. However, high rates of indel sequencing errors are problematic, corrupting the reading frame. Even the consensus of multiple independent sequence reads retains indel errors. To solve this problem, we introduce Reference-Informed Frame-Resolving multiple-Alignment Free template inference algorithm (RIFRAF), a sequence consensus algorithm that takes a set of error-prone reads and a reference sequence and infers an accurate in-frame consensus. RIFRAF uses a novel structure, analogous to a two-layer hidden Markov model: the consensus is optimized to maximize alignment scores with both the set of noisy reads and with a reference. The template-to-reads component of the model encodes the preponderance of indels, and is sensitive to the per-base quality scores, giving greater weight to more accurate bases. The reference-to-template component of the model penalizes frame-destroying indels. A local search algorithm proceeds in stages to find the best consensus sequence for both objectives.Using Pacific Biosciences SMRT sequences from an HIV-1 env clone, NL4-3, we compare our approach to other consensus and frame correction methods. RIFRAF consistently finds a consensus sequence that is more accurate and in-frame, especially with small numbers of reads. It was able to perfectly reconstruct over 80% of consensus sequences from as few as three reads, whereas the best alternative required twice as many. RIFRAF is able to achieve these results and keep the consensus in-frame even with a distantly related reference sequence. Moreover, unlike other frame correction methods, RIFRAF can detect and keep true indels while removing erroneous ones.RIFRAF is implemented in Julia, and source code is publicly available at https://github.com/MurrellGroup/Rifraf.jl.Supplementary data are available at Bioinformatics online.


July 7, 2019  |  

The challenge of analyzing the sugarcane genome.

Reference genome sequences have become key platforms for genetics and breeding of the major crop species. Sugarcane is probably the largest crop produced in the world (in weight of crop harvested) but lacks a reference genome sequence. Sugarcane has one of the most complex genomes in crop plants due to the extreme level of polyploidy. The genome of modern sugarcane hybrids includes sub-genomes from two progenitors Saccharum officinarum and S. spontaneum with some chromosomes resulting from recombination between these sub-genomes. Advancing DNA sequencing technologies and strategies for genome assembly are making the sugarcane genome more tractable. Advances in long read sequencing have allowed the generation of a more complete set of sugarcane gene transcripts. This is supporting transcript profiling in genetic research. The progenitor genomes are being sequenced. A monoploid coverage of the hybrid genome has been obtained by sequencing BAC clones that cover the gene space of the closely related sorghum genome. The complete polyploid genome is now being sequenced and assembled. The emerging genome will allow comparison of related genomes and increase understanding of the functioning of this polyploidy system. Sugarcane breeding for traditional sugar and new energy and biomaterial uses will be enhanced by the availability of these genomic resources.


July 7, 2019  |  

Clustering of circular consensus sequences: accurate error correction and assembly of single molecule real-time reads from multiplexed amplicon libraries.

Targeted resequencing with high-throughput sequencing (HTS) platforms can be used to efficiently interrogate the genomes of large numbers of individuals. A critical issue for research and applications using HTS data, especially from long-read platforms, is error in base calling arising from technological limits and bioinformatic algorithms. We found that the community standard long amplicon analysis (LAA) module from Pacific Biosciences is prone to substantial bioinformatic errors that raise concerns about findings based on this pipeline, prompting the need for a new method.A single molecule real-time (SMRT) sequencing-error correction and assembly pipeline, C3S-LAA, was developed for libraries of pooled amplicons. By uniquely leveraging the structure of SMRT sequence data (comprised of multiple low quality subreads from which higher quality circular consensus sequences are formed) to cluster raw reads, C3S-LAA produced accurate consensus sequences and assemblies of overlapping amplicons from single sample and multiplexed libraries. In contrast, despite read depths in excess of 100X per amplicon, the standard long amplicon analysis module from Pacific Biosciences generated unexpected numbers of amplicon sequences with substantial inaccuracies in the consensus sequences. A bootstrap analysis showed that the C3S-LAA pipeline per se was effective at removing bioinformatic sources of error, but in rare cases a read depth of nearly 400X was not sufficient to overcome minor but systematic errors inherent to amplification or sequencing.C3S-LAA uses a divide and conquer processing algorithm for SMRT amplicon-sequence data that generates accurate consensus sequences and local sequence assemblies. Solving the confounding bioinformatic source of error in LAA allowed for the identification of limited instances of errors due to DNA amplification or sequencing of homopolymeric nucleotide tracts. For research and development in genomics, C3S-LAA allows meaningful conclusions and biological inferences to be made from accurately polished sequence output.


July 7, 2019  |  

Implementation of pharmacogenomics in everyday clinical settings.

Currently, germline pharmacogenomics (PGx) is successfully implemented within certain specialties in clinical care. With the integration of PGx in pharmacotherapy multiple stakeholders are involved, which are identified in this chapter. Clinically relevant pharmacogenes with their related PGx test are discussed, along with diagnostic test criteria to guide clinicians and policy makers in PGx test selection. The chapter further reviews the similarities and the differences between the guidelines of the Dutch Pharmacogenetics Working Group and the Clinical Pharmacogenetics Implementation Consortium which both support healthcare professionals in understanding PGx test results and help guiding pharmacotherapy by providing evidence-based dosing recommendations. Finally, clinical studies which provide scientific evidence and information on cost-effectiveness supporting clinical implementation of PGx in clinical care are discussed along with the remaining barriers for adoption of PGx testing by healthcare professionals.© 2018 Elsevier Inc. All rights reserved.


July 7, 2019  |  

Recombination hotspots in an extended human pseudoautosomal domain predicted from double-strand break maps and characterized by sperm-based crossover analysis.

The human X and Y chromosomes are heteromorphic but share a region of homology at the tips of their short arms, pseudoautosomal region 1 (PAR1), that supports obligate crossover in male meiosis. Although the boundary between pseudoautosomal and sex-specific DNA has traditionally been regarded as conserved among primates, it was recently discovered that the boundary position varies among human males, due to a translocation of ~110 kb from the X to the Y chromosome that creates an extended PAR1 (ePAR). This event has occurred at least twice in human evolution. So far, only limited evidence has been presented to suggest this extension is recombinationally active. Here, we sought direct proof by examining thousands of gametes from each of two ePAR-carrying men, for two subregions chosen on the basis of previously published male X-chromosomal meiotic double-strand break (DSB) maps. Crossover activity comparable to that seen at autosomal hotspots was observed between the X and the ePAR borne on the Y chromosome both at a distal and a proximal site within the 110-kb extension. Other hallmarks of classic recombination hotspots included evidence of transmission distortion and GC-biased gene conversion. We observed good correspondence between the male DSB clusters and historical recombination activity of this region in the X chromosomes of females, as ascertained from linkage disequilibrium analysis; this suggests that this region is similarly primed for crossover in both male and female germlines, although sex-specific differences may also exist. Extensive resequencing and inference of ePAR haplotypes, placed in the framework of the Y phylogeny as ascertained by both Y microsatellites and single nucleotide polymorphisms, allowed us to estimate a minimum rate of crossover over the entire ePAR region of 6-fold greater than genome average, comparable with pedigree estimates of PAR1 activity generally. We conclude ePAR very likely contributes to the critical crossover function of PAR1.


July 7, 2019  |  

Allele-level KIR genotyping of more than a million samples: Workflow, algorithm, and observations.

The killer-cell immunoglobulin-like receptor (KIR) genes regulate natural killer cell activity, influencing predisposition to immune mediated disease, and affecting hematopoietic stem cell transplantation (HSCT) outcome. Owing to the complexity of the KIR locus, with extensive gene copy number variation (CNV) and allelic diversity, high-resolution characterization of KIR has so far been applied only to relatively small cohorts. Here, we present a comprehensive high-throughput KIR genotyping approach based on next generation sequencing. Through PCR amplification of specific exons, our approach delivers both copy numbers of the individual genes and allelic information for every KIR gene. Ten-fold replicate analysis of a set of 190 samples revealed a precision of 99.9%. Genotyping of an independent set of 360 samples resulted in an accuracy of more than 99% taking into account consistent copy number prediction. We applied the workflow to genotype 1.8 million stem cell donor registry samples. We report on the observed KIR allele diversity and relative abundance of alleles based on a subset of more than 300,000 samples. Furthermore, we identified more than 2,000 previously unreported KIR variants repeatedly in independent samples, underscoring the large diversity of the KIR region that awaits discovery. This cost-efficient high-resolution KIR genotyping approach is now applied to samples of volunteers registering as potential donors for HSCT. This will facilitate the utilization of KIR as additional selection criterion to improve unrelated donor stem cell transplantation outcome. In addition, the approach may serve studies requiring high-resolution KIR genotyping, like population genetics and disease association studies.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.