GATK Archives - Page 7 of 11

September 22, 2019 |

The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology.

We report a draft assembly of the genome of Hi5 cells from the lepidopteran insect pest,Trichoplusia ni, assigning 90.6% of bases to one of 28 chromosomes and predicting 14,037 protein-coding genes. Chemoreception and detoxification gene families revealT. ni-specific gene expansions that may explain its widespread distribution and rapid adaptation to insecticides. Transcriptome and small RNA data from thorax, ovary, testis, and the germline-derived Hi5 cell line show distinct expression profiles for 295 microRNA- and >393 piRNA-producing loci, as well as 39 genes encoding small RNA pathway proteins. Nearly all of the W chromosome is devoted to piRNA production, andT. nisiRNAs are not 2´-O-methylated. To enable use of Hi5 cells as a model system, we have established genome editing and single-cell cloning protocols. TheT. nigenome provides insights into pest control and allows Hi5 cells to become a new tool for studying small RNAs ex vivo.© 2018, Fu et al.

September 22, 2019 |

A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering.

Despite the need for inducible promoters in strain development efforts, the majority of engineering in Saccharomyces cerevisiae continues to rely on a few constitutively active or inducible promoters. Building on advances that use the modular nature of both transcription factors and promoter regions, we have built a library of hybrid promoters that are regulated by a synthetic transcription factor. The hybrid promoters consist of native S. cerevisiae promoters, in which the operator regions have been replaced with sequences that are recognized by the bacterial LexA DNA binding protein. Correspondingly, the synthetic transcription factor (TF) consists of the DNA binding domain of the LexA protein, fused with the human estrogen binding domain and the viral activator domain, VP16. The resulting system with a bacterial DNA binding domain avoids the transcription of native S. cerevisiae genes, and the hybrid promoters can be induced using estradiol, a compound with no detectable impact on S. cerevisiae physiology. Using combinations of one, two or three operator sequence repeats and a set of native S. cerevisiae promoters, we obtained a series of hybrid promoters that can be induced to different levels, using the same synthetic TF and a given estradiol. This set of promoters, in combination with our synthetic TF, has the potential to regulate numerous genes or pathways simultaneously, to multiple desired levels, in a single strain.© 2017 The Authors. Yeast published by John Wiley & Sons, Ltd.

September 22, 2019 |

Conventional and single-molecule targeted sequencing method for specific variant detection in IKBKG while bypassing the IKBKGP1 pseudogene.

In addition to Sanger sequencing, next-generation sequencing of gene panels and exomes has emerged as a standard diagnostic tool in many laboratories. However, these captures can miss regions, have poor efficiency, or capture pseudogenes, which hamper proper diagnoses. One such example is the primary immunodeficiency-associated gene IKBKG. Its pseudogene IKBKGP1 makes traditional capture methods aspecific. We therefore developed a long-range PCR method to efficiently target IKBKG, as well as two associated genes (IRAK4 and MYD88), while bypassing the IKBKGP1 pseudogene. Sequencing accuracy was evaluated using both conventional short-read technology and a newer long-read, single-molecule sequencer. Different mapping and variant calling options were evaluated in their capability to bypass the pseudogene using both sequencing platforms. Based on these evaluations, we determined a robust diagnostic application for unambiguous sequencing and variant calling in IKBKG, IRAK4, and MYD88. This method allows rapid identification of selected primary immunodeficiency diseases in patients suffering from life-threatening invasive pyogenic bacterial infections. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

September 22, 2019 |

Intraspecific comparative genomics of isolates of the Norway spruce pathogen (Heterobasidion parviporum) and identification of its potential virulence factors.

Heterobasidion parviporum is an economically most important fungal forest pathogen in northern Europe, causing root and butt rot disease of Norway spruce (Picea abies (L.) Karst.). The mechanisms underlying the pathogenesis and virulence of this species remain elusive. No reference genome to facilitate functional analysis is available for this species.To better understand the virulence factor at both phenotypic and genomic level, we characterized 15 H. parviporum isolates originating from different locations across Finland for virulence, vegetative growth, sporulation and saprotrophic wood decay. Wood decay capability and latitude of fungal origins exerted interactive effects on their virulence and appeared important for H. parviporum virulence. We sequenced the most virulent isolate, the first full genome sequences of H. parviporum as a reference genome, and re-sequenced the remaining 14 H. parviporum isolates. Genome-wide alignments and intrinsic polymorphism analysis showed that these isolates exhibited overall high genomic similarity with an average of at least 96% nucleotide identity when compared to the reference, yet had remarkable intra-specific level of polymorphism with a bias for CpG to TpG mutations. Reads mapping coverage analysis enabled the classification of all predicted genes into five groups and uncovered two genomic regions exclusively present in the reference with putative contribution to its higher virulence. Genes enriched for copy number variations (deletions and duplications) and nucleotide polymorphism were involved in oxidation-reduction processes and encoding domains relevant to transcription factors. Some secreted protein coding genes based on the genome-wide selection pressure, or the presence of variants were proposed as potential virulence candidates.Our study reported on the first reference genome sequence for this Norway spruce pathogen (H. parviporum). Comparative genomics analysis gave insight into the overall genomic variation among this fungal species and also facilitated the identification of several secreted protein coding genes as putative virulence factors for the further functional analysis. We also analyzed and identified phenotypic traits potentially linked to its virulence.

September 22, 2019 |

Synchronous termination of replication of the two chromosomes is an evolutionary selected feature in Vibrionaceae.

Vibrio cholerae, the causative agent of the cholera disease, is commonly used as a model organism for the study of bacteria with multipartite genomes. Its two chromosomes of different sizes initiate their DNA replication at distinct time points in the cell cycle and terminate in synchrony. In this study, the time-delayed start of Chr2 was verified in a synchronized cell population. This replication pattern suggests two possible regulation mechanisms for other Vibrio species with different sized secondary chromosomes: Either all Chr2 start DNA replication with a fixed delay after Chr1 initiation, or the timepoint at which Chr2 initiates varies such that termination of chromosomal replication occurs in synchrony. We investigated these two models and revealed that the two chromosomes of various Vibrionaceae species terminate in synchrony while Chr2-initiation timing relative to Chr1 is variable. Moreover, the sequence and function of the Chr2-triggering crtS site recently discovered in V. cholerae were found to be conserved, explaining the observed timing mechanism. Our results suggest that it is beneficial for bacterial cells with multiple chromosomes to synchronize their replication termination, potentially to optimize chromosome related processes as dimer resolution or segregation.

September 22, 2019 |

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Greater amberjack (Seriola dumerili) is distributed in tropical and temperate waters worldwide and is an important aquaculture fish. We carried out de novo sequencing of the greater amberjack genome to construct a reference genome sequence to identify single nucleotide polymorphisms (SNPs) for breeding amberjack by marker-assisted or gene-assisted selection as well as to identify functional genes for biological traits. We obtained 200 times coverage and constructed a high-quality genome assembly using next generation sequencing technology. The assembled sequences were aligned onto a yellowtail (Seriola quinqueradiata) radiation hybrid (RH) physical map by sequence homology. A total of 215 of the longest amberjack sequences, with a total length of 622.8?Mbp (92% of the total length of the genome scaffolds), were lined up on the yellowtail RH map. We resequenced the whole genomes of 20 greater amberjacks and mapped the resulting sequences onto the reference genome sequence. About 186,000 nonredundant SNPs were successfully ordered on the reference genome. Further, we found differences in the genome structural variations between two greater amberjack populations using BreakDancer. We also analyzed the greater amberjack transcriptome and mapped the annotated sequences onto the reference genome sequence.

September 22, 2019 |

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.

September 22, 2019 |

Ploidy variation in Kluyveromyces marxianus separates dairy and non-dairy isolates.

Kluyveromyces marxianus is traditionally associated with fermented dairy products, but can also be isolated from diverse non-dairy environments. Because of thermotolerance, rapid growth and other traits, many different strains are being developed for food and industrial applications but there is, as yet, little understanding of the genetic diversity or population genetics of this species. K. marxianus shows a high level of phenotypic variation but the only phenotype that has been clearly linked to a genetic polymorphism is lactose utilisation, which is controlled by variation in the LAC12 gene. The genomes of several strains have been sequenced in recent years and, in this study, we sequenced a further nine strains from different origins. Analysis of the Single Nucleotide Polymorphisms (SNPs) in 14 strains was carried out to examine genome structure and genetic diversity. SNP diversity in K. marxianus is relatively high, with up to 3% DNA sequence divergence between alleles. It was found that the isolates include haploid, diploid, and triploid strains, as shown by both SNP analysis and flow cytometry. Diploids and triploids contain long genomic tracts showing loss of heterozygosity (LOH). All six isolates from dairy environments were diploid or triploid, whereas 6 out 7 isolates from non-dairy environment were haploid. This also correlated with the presence of functional LAC12 alleles only in dairy haplotypes. The diploids were hybrids between a non-dairy and a dairy haplotype, whereas triploids included three copies of a dairy haplotype.

September 22, 2019 |

Repeated evolution of self-compatibility for reproductive assurance.

Sexual reproduction in eukaryotes requires the fusion of two compatible gametes of opposite sexes or mating types. To meet the challenge of finding a mating partner with compatible gametes, evolutionary mechanisms such as hermaphroditism and self-fertilization have repeatedly evolved. Here, by combining the insights from comparative genomics, computer simulations and experimental evolution in fission yeast, we shed light on the conditions promoting separate mating types or self-compatibility by mating-type switching. Analogous to multiple independent transitions between switchers and non-switchers in natural populations mediated by structural genomic changes, novel switching genotypes readily evolved under selection in the experimental populations. Detailed fitness measurements accompanied by computer simulations show the benefits and costs of switching during sexual and asexual reproduction, governing the occurrence of both strategies in nature. Our findings illuminate the trade-off between the benefits of reproductive assurance and its fitness costs under benign conditions facilitating the evolution of self-compatibility.

September 22, 2019 |

Comparison of phasing strategies for whole human genomes.

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.

September 22, 2019 |

Genome evolution across 1,011 Saccharomyces cerevisiae isolates.

Large-scale population genomic surveys are essential to explore the phenotypic diversity of natural populations. Here we report the whole-genome sequencing and phenotyping of 1,011 Saccharomyces cerevisiae isolates, which together provide an accurate evolutionary picture of the genomic variants that shape the species-wide phenotypic landscape of this yeast. Genomic analyses support a single ‘out-of-China’ origin for this species, followed by several independent domestication events. Although domesticated isolates exhibit high variation in ploidy, aneuploidy and genome content, genome evolution in wild isolates is mainly driven by the accumulation of single nucleotide polymorphisms. A common feature is the extensive loss of heterozygosity, which represents an essential source of inter-individual variation in this mainly asexual species. Most of the single nucleotide polymorphisms, including experimentally identified functional polymorphisms, are present at very low frequencies. The largest numbers of variants identified by genome-wide association are copy-number changes, which have a greater phenotypic effect than do single nucleotide polymorphisms. This resource will guide future population genomics and genotype-phenotype studies in this classic model system.

September 22, 2019 |

RTS,S/AS01 malaria vaccine mismatch observed among Plasmodium falciparum isolates from southern and central Africa and globally.

The RTS,S/AS01 malaria vaccine encompasses the central repeats and C-terminal of Plasmodium falciparum circumsporozoite protein (PfCSP). Although no Phase II clinical trial studies observed evidence of strain-specific immunity, recent studies show a decrease in vaccine efficacy against non-vaccine strain parasites. In light of goals to reduce malaria morbidity, anticipating the effectiveness of RTS,S/AS01 is critical to planning widespread vaccine introduction. We deep sequenced C-terminal Pfcsp from 77 individuals living along the international border in Luapula Province, Zambia and Haut-Katanga Province, the Democratic Republic of the Congo (DRC) and compared translated amino acid haplotypes to the 3D7 vaccine strain. Only 5.2% of the 193 PfCSP sequences from the Zambia-DRC border region matched 3D7 at all 84 amino acids. To further contextualize the genetic diversity sampled in this study with global PfCSP diversity, we analyzed an additional 3,809 Pfcsp sequences from the Pf3k database and constructed a haplotype network representing 15 countries from Africa and Asia. The diversity observed in our samples was similar to the diversity observed in the global haplotype network. These observations underscore the need for additional research assessing genetic diversity in P. falciparum and the impact of PfCSP diversity on RTS,S/AS01 efficacy.

September 22, 2019 |

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis.

Insertions and deletions (indels) have been implicated in dozens of human diseases through the radical alteration of gene function by short frameshift indels as well as long indels. However, the accurate detection of these indels from next-generation sequencing data is still challenging. This is particularly true for intermediate-size indels (=50?bp), due to the short DNA sequencing reads. Here, we developed a new method that predicts intermediate-size indels using BWA soft-clipped fragments (unmatched fragments in partially mapped reads) and unmapped reads. We report the performance comparison of our method, GATK, PINDEL and ScanIndel, using whole exome sequencing data from the same samples. False positive and false negative counts were determined through Sanger sequencing of all predicted indels across these four methods. The harmonic mean of the recall and precision, F-measure, was used to measure the performance of each method. Our method achieved the highest F-measure of 0.84 in one sample, compared to 0.56 for GATK, 0.52 for PINDEL and 0.46 for ScanIndel. Similar results were obtained in additional samples, demonstrating that our method was superior to the other methods for detecting intermediate-size indels. We believe that this methodology will contribute to the discovery of intermediate-size indels associated with human disease.

September 22, 2019 |

SvABA: genome-wide detection of structural variants and indels by local assembly.

Structural variants (SVs), including small insertion and deletion variants (indels), are challenging to detect through standard alignment-based variant calling methods. Sequence assembly offers a powerful approach to identifying SVs, but is difficult to apply at scale genome-wide for SV detection due to its computational complexity and the difficulty of extracting SVs from assembly contigs. We describe SvABA, an efficient and accurate method for detecting SVs from short-read sequencing data using genome-wide local assembly with low memory and computing requirements. We evaluated SvABA’s performance on the NA12878 human genome and in simulated and real cancer genomes. SvABA demonstrates superior sensitivity and specificity across a large spectrum of SVs and substantially improves detection performance for variants in the 20-300 bp range, compared with existing methods. SvABA also identifies complex somatic rearrangements with chains of short (<1000 bp) templated-sequence insertions copied from distant genomic regions. We applied SvABA to 344 cancer genomes from 11 cancer types and found that short templated-sequence insertions occur in ~4% of all somatic rearrangements. Finally, we demonstrate that SvABA can identify sites of viral integration and cancer driver alterations containing medium-sized (50-300 bp) SVs.© 2018 Wala et al.; Published by Cold Spring Harbor Laboratory Press.

September 22, 2019 |

Distinct evolutionary patterns of Neisseria meningitidis serogroup B disease outbreaks at two universities in the USA.

Neisseria meningitidis serogroup B (MnB) was responsible for two independent meningococcal disease outbreaks at universities in the USA during 2013. The first at University A in New Jersey included nine confirmed cases reported between March 2013 and March 2014. The second outbreak occurred at University B in California, with four confirmed cases during November 2013. The public health response to these outbreaks included the approval and deployment of a serogroup B meningococcal vaccine that was not yet licensed in the USA. This study investigated the use of whole-genome sequencing(WGS) to examine the genetic profile of the disease-causing outbreak isolates at each university. Comparative WGS revealed differences in evolutionary patterns between the two disease outbreaks. The University A outbreak isolates were very closely related, with differences primarily attributed to single nucleotide polymorphisms/insertion-deletion (SNP/indel) events. In contrast, the University B outbreak isolates segregated into two phylogenetic clades, differing in large part due to recombination events covering extensive regions (>30?kb) of the genome including virulence factors. This high-resolution comparison of two meningococcal disease outbreaks further demonstrates the genetic complexity of meningococcal bacteria as related to evolution and disease virulence.

Auto Tag: GATK

The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology.

A combinatorial approach to synthetic transcription factor-promoter combinations for yeast strain engineering.

Conventional and single-molecule targeted sequencing method for specific variant detection in IKBKG while bypassing the IKBKGP1 pseudogene.

Intraspecific comparative genomics of isolates of the Norway spruce pathogen (Heterobasidion parviporum) and identification of its potential virulence factors.

Synchronous termination of replication of the two chromosomes is an evolutionary selected feature in Vibrionaceae.

Whole genome sequencing of greater amberjack (Seriola dumerili) for SNP identification on aligned scaffolds and genome structural variation analysis using parallel resequencing

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Ploidy variation in Kluyveromyces marxianus separates dairy and non-dairy isolates.

Repeated evolution of self-compatibility for reproductive assurance.

Comparison of phasing strategies for whole human genomes.

Genome evolution across 1,011 Saccharomyces cerevisiae isolates.

RTS,S/AS01 malaria vaccine mismatch observed among Plasmodium falciparum isolates from southern and central Africa and globally.

IMSindel: An accurate intermediate-size indel detection tool incorporating de novo assembly and gapped global-local alignment with split read analysis.

SvABA: genome-wide detection of structural variants and indels by local assembly.

Distinct evolutionary patterns of Neisseria meningitidis serogroup B disease outbreaks at two universities in the USA.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert