reference genome Archives - Page 7 of 9

July 7, 2019 |

The characterization of goat genetic diversity: Towards a genomic approach

The investigation of genetic diversity at molecular level has been proposed as a valuable complement and sometimes proxy to phenotypic diversity of local breeds and is presently considered as one of the FAO priorities for breed characterization. By recommending a set of selected molecular markers for each of the main livestock species, FAO has promoted the meta-analysis of local datasets, to achieve a global view of molecular genetic diversity. Analysis within the EU Globaldiv project of two large goat microsatellite datasets produced by the Econogene Consortium and the IAEA CRP–Asia Consortium, respectively, has generated a picture of goat diversity across continents. This indicates a gradient of decreasing diversity from the domestication centre towards Europe and Asia, a clear phylogeographic structure at the continental and regional levels, and in Asia a limited genetic differentiation among local breeds. The development of SNP panels that assay thousands of markers and the whole genome sequencing of livestock permit an affordable use of genomic technologies in all livestock species, goats included. Preliminary data from the Italian Goat Consortium indicate that the SNP panel developed for this species is highly informative. The existing panel can be improved by integrating additional SNPs identified from the whole genome sequence alignment of goats adapted to extreme climates. Part of this effort is being achieved by international projects (e.g. EU FP7 NextGen and 3SR projects), but a fair representation of the global diversity in goats requires a large panel of samples (i.e. as in the recently launched 1000 cattle genomes initiative). Genomic technologies offer new strategies to investigate complex traits difficult to measure. For example, the comparison of patterns of diversity among the genomes in selected groups of animals (e.g. adapted to different environments) and the integration of genome-wide diversity with new GIScience-based methods are able to identify molecular markers associated with genomic regions of putative importance in adaptation and thus pave the way for the identification of causative genes. Goat breeds adapted to different production systems in extreme and harsh environments will play an important role in this process. The new sequencing technologies also permit the analysis of the entire mitochondrial genome at maximum resolution. The complete mtDNA sequence is now the common standard format for the investigation of human maternal lineages. A preliminary analysis of the complete goat mtDNA genome supports a single Neolithic origin of domestic goats rather than multiple domestication events in different geographic areas.

July 7, 2019 |

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

Whole-genome sequences are now available for many microbial species and clades, however, existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.

July 7, 2019 |

In transition: primate genomics at a time of rapid change.

The field of nonhuman primate genomics is undergoing rapid change and making impressive progress. Exploiting new technologies for DNA sequencing, researchers have generated new whole-genome sequence assemblies for multiple primate species over the past 6 years. In addition, investigations of within-species genetic variation, gene expression and RNA sequences, conservation of non-protein-coding regions of the genome, and other aspects of comparative genomics are moving at an accelerating speed. This progress is opening a wide array of new research opportunities in the analysis of comparative primate genome content and evolution. It also creates new possibilities for the use of nonhuman primates as model organisms in biomedical research. This transition, based on both new technology and the new information being generated in regard to human genetics, provides an important justification for reevaluating the research goals, strategies, and study designs used in primate genetics and genomics.

July 7, 2019 |

Combining de novo and reference-guided assembly with scaffold_builder.

Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at http://edwards.sdsu.edu/scaffold_builder. A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.

July 7, 2019 |

Complete genome sequence of Bacillus subtilis strain PY79.

Bacillus subtilis is a Gram-positive soil-dwelling and endospore-forming bacterium in the phylum Firmicutes. B. subtilis strain PY79 is a prototrophic laboratory strain that has been highly used for studying a wide variety of cellular pathways. Here, we announce the complete whole-genome sequence of B. subtilis PY79.

July 7, 2019 |

Identification of symmetrical RNA editing events in the mitochondria of Salvia miltiorrhiza by strand-specific RNA sequencing.

Salvia miltiorrhiza is one of the most widely-used medicinal plants. Here, we systematically analyzed the RNA editing events in its mitochondria. We developed a pipeline using REDItools to predict RNA editing events from stand-specific RNA-Seq data. The predictions were validated using reverse transcription, RT-PCR amplification and Sanger sequencing experiments. Putative sequences motifs were characterized. Comparative analyses were carried out between S. miltiorrhiza, Arabidopsis thaliana and Oryza sativa. We discovered 1123 editing sites, including 225 “C to U” sites in the protein-coding regions. Fourteen of sixteen (87.5%) sites were validated. Three putative DNA motifs were identified around the predicted sites. The nucleotides on both strands at 115 of the 225 sites had undergone RNA editing, which we called symmetrical RNA editing (SRE). Four of six these SRE sites (66.7%) were experimentally confirmed. Re-examination of strand-specific RNA-Seq data from A. thaliana and O. sativa identified 327 and 369 SRE sites respectively. 78, 20 and 13 SRE sites were found to be conserved among A. thaliana, O. sativa and S. miltiorrhiza respectively. This study provides a comprehensive picture of RNA editing events in the mitochondrial genome of S. miltiorrhiza. We identified SREs for the first time, which may represent a universal phenomenon.

July 7, 2019 |

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres was fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences. Published by Cold Spring Harbor Laboratory Press.

July 7, 2019 |

Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses.

Candida auris, a multidrug-resistant yeast that causes invasive infections, was first described in 2009 in Japan and has since been reported from several countries.To understand the global emergence and epidemiology of C. auris, we obtained isolates from 54 patients with C. auris infection from Pakistan, India, South Africa, and Venezuela during 2012-2015 and the type specimen from Japan. Patient information was available for 41 of the isolates. We conducted antifungal susceptibility testing and whole-genome sequencing (WGS).Available clinical information revealed that 41% of patients had diabetes mellitus, 51% had undergone recent surgery, 73% had a central venous catheter, and 41% were receiving systemic antifungal therapy when C. auris was isolated. The median time from admission to infection was 19 days (interquartile range, 9-36 days), 61% of patients had bloodstream infection, and 59% died. Using stringent break points, 93% of isolates were resistant to fluconazole, 35% to amphotericin B, and 7% to echinocandins; 41% were resistant to 2 antifungal classes and 4% were resistant to 3 classes. WGS demonstrated that isolates were grouped into unique clades by geographic region. Clades were separated by thousands of single-nucleotide polymorphisms, but within each clade isolates were clonal. Different mutations in ERG11 were associated with azole resistance in each geographic clade.C. auris is an emerging healthcare-associated pathogen associated with high mortality. Treatment options are limited, due to antifungal resistance. WGS analysis suggests nearly simultaneous, and recent, independent emergence of different clonal populations on 3 continents. Risk factors and transmission mechanisms need to be elucidated to guide control measures. Published by Oxford University Press for the Infectious Diseases Society of America 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.

July 7, 2019 |

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.

July 7, 2019 |

Whole genome and core genome multilocus sequence typing and single nucleotide polymorphism analyses of Listeria monocytogenes associated with an outbreak linked to cheese, United States, 2013.

Epidemiological findings of a listeriosis outbreak in 2013 implicated Hispanic-style cheese produced by Company A, and pulsed-field gel electrophoresis (PFGE) and whole genome sequencing (WGS) were performed on clinical isolates and representative isolates collected from Company A cheese and environmental samples during the investigation. The results strengthened the evidence for cheese as the vehicle. Surveillance sampling and WGS three months later revealed that the equipment purchased by Company B from Company A yielded an environmental isolate highly similar to all outbreak isolates. The whole genome and core genome multilocus sequence typing and single nucleotide polymorphism (SNP) analyses were compared to demonstrate the maximum discriminatory power obtained by using multiple analyses, which were needed to differentiate outbreak-associated isolates from a PFGE-indistinguishable isolate collected in a non-implicated food source in 2012. This unrelated isolate differed from the outbreak isolates by only 7 to 14 SNPs, and as a result, minimum spanning tree by the whole genome analyses and certain variant calling approach and phylogenetic algorithm for core genome-based analyses could not provide the differentiation between unrelated isolates. Our data also suggest that SNP/allele counts should always be combined with WGS clustering generated by phylogenetically meaningful algorithms on sufficient number of isolates, and SNP/allele threshold alone is not sufficient evidence to delineate an outbreak. The putative prophages were conserved across all the outbreak isolates. All outbreak isolates belonged to clonal complex 5 and serotype 1/2b, had an identical inlA sequence, which did not have premature stop codons.IMPORTANCE In this outbreak, multiple analytical approaches were used for maximum discriminatory power. A PFGE-matched, epidemiologically unrelated isolate had high genetic similarity to the outbreak-associated isolates, with as few as only 7 SNP differences. Therefore, the SNP/allele threshold should not be used as the only evidence to define the scope of an outbreak. It is critical that the SNP/allele counts be complemented by WGS clustering generated by phylogenetically meaningful algorithms to distinguish outbreak-associated isolates from epidemiologically unrelated isolates. Careful selection of a variant calling approach and phylogenetic algorithm is critical for core genome-based analyses. The whole genome-based analyses were able to construct the highly resolved phylogeny needed to support the findings of the outbreak investigation. Ultimately, epidemiologic evidence and multiple WGS analyses should be combined to increase the confidence in outbreak investigations. Copyright © 2017 Chen et al.

July 7, 2019 |

Genome sequencing: Illuminating the sunflower genome.

A high-quality sunflower genome provides insight into Asterid genome evolution. Moreover, integrative analyses based on quantitative genetics, expression and diversity data uncover the gene networks and candidate genes for oil metabolism and flowering time, two important agronomic traits for sunflowers.

July 7, 2019 |

Maize defective kernel mutant generated by insertion of a Ds element in a gene encoding a highly conserved TTI2 cochaperone.

We have used the newly engineered transposable element Dsg to tag a gene that gives rise to a defective kernel (dek) phenotype. Dsg requires the autonomous element Ac for transposition. Upon excision, it leaves a short DNA footprint that can create in-frame and frameshift insertions in coding sequences. Therefore, we could create alleles of the tagged gene that confirmed causation of the dek phenotype by the Dsg insertion. The mutation, designated dek38-Dsg, is embryonic lethal, has a defective basal endosperm transfer (BETL) layer, and results in a smaller seed with highly underdeveloped endosperm. The maize dek38 gene encodes a TTI2 (Tel2-interacting protein 2) molecular cochaperone. In yeast and mammals, TTI2 associates with two other cochaperones, TEL2 (Telomere maintenance 2) and TTI1 (Tel2-interacting protein 1), to form the triple T complex that regulates DNA damage response. Therefore, we cloned the maize Tel2 and Tti1 homologs and showed that TEL2 can interact with both TTI1 and TTI2 in yeast two-hybrid assays. The three proteins regulate the cellular levels of phosphatidylinositol 3-kinase-related kinases (PIKKs) and localize to the cytoplasm and the nucleus, consistent with known subcellular locations of PIKKs. dek38-Dsg displays reduced pollen transmission, indicating TTI2’s importance in male reproductive cell development.

July 7, 2019 |

Repeated divergent selection on pigmentation genes in a rapid finch radiation.

Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits.

July 7, 2019 |

ALUMINUM RESISTANCE TRANSCRIPTION FACTOR 1 (ART1) contributes to natural variation in aluminum resistance in diverse genetic backgrounds of rice (O. sativa)

Abstract Transcription factors (TFs) regulate the expression of other genes to indirectly mediate stress resistance mechanisms. Therefore, when studying TF-mediated stress resistance, it is important to understand how TFs interact with genes in the genetic background. Here, we fine-mapped the aluminum (Al) resistance QTL Alt12.1 to a 44-kb region containing six genes. Among them is ART1, which encodes a C2H2-type zinc finger TF required for Al resistance in rice. The mapping parents, Al-resistant cv Azucena (tropical japonica) and Al-sensitive cv IR64 (indica), have extensive sequence polymorphism within the ART1 coding region, but similar ART1 expression levels. Using reciprocal near-isogenic lines (NILs) we examined how allele-swapping the Alt12.1 locus would affect plant responses to Al. Analysis of global transcriptional responses to Al stress in roots of the NILs alongside their recurrent parents demonstrated that the presence of the Alt12.1 from Al-resistant Azucena led to greater changes in gene expression in response to Al when compared to the Alt12.1 from IR64 in both genetic backgrounds. The presence of the ART1 allele from the opposite parent affected the expression of several genes not previously implicated in rice Al tolerance. We highlight examples where putatively functional variation in cis-regulatory regions of ART1-regulated genes interacts with ART1 to determine gene expression in response to Al. This ART1–promoter interaction may be associated with transgressive variation for Al resistance in the Azucena × IR64 population. These results illustrate how ART1 interacts with the genetic background to contribute to quantitative phenotypic variation in rice Al resistance.

July 7, 2019 |

Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula.

Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner.Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly.Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.

Asset Tag: reference genome

The characterization of goat genetic diversity: Towards a genomic approach

The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

In transition: primate genomics at a time of rapid change.

Combining de novo and reference-guided assembly with scaffold_builder.

Complete genome sequence of Bacillus subtilis strain PY79.

Identification of symmetrical RNA editing events in the mitochondria of Salvia miltiorrhiza by strand-specific RNA sequencing.

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses.

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Whole genome and core genome multilocus sequence typing and single nucleotide polymorphism analyses of Listeria monocytogenes associated with an outbreak linked to cheese, United States, 2013.

Genome sequencing: Illuminating the sunflower genome.

Maize defective kernel mutant generated by insertion of a Ds element in a gene encoding a highly conserved TTI2 cochaperone.

Repeated divergent selection on pigmentation genes in a rapid finch radiation.

ALUMINUM RESISTANCE TRANSCRIPTION FACTOR 1 (ART1) contributes to natural variation in aluminum resistance in diverse genetic backgrounds of rice (O. sativa)

Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert