Menu
July 19, 2019

From short reads to chromosome-scale genome assemblies.

A high-quality, annotated genome assembly is the foundation for many downstream studies. However, obtaining such an assembly is a complex, reiterative process that requires the assimilation of high-quality data and combines different approaches and data types. While some software packages incorporating multiple steps of genome assembly are commercially available, they may not be flexible enough to be routinely applied to all organisms, particularly to nonmodel species such as pathogenic oomycetes and fungi. If researchers understand and apply the most appropriate, currently available tools for each step, it is possible to customize parameters and optimize results for their organism of study. Based on our experience of de novo assembly and annotation of several oomycete species, this chapter provides a modular workflow from processing of raw reads, to initial assembly generation, through optimization, chromosome-scale scaffolding and annotation, outlining input and output data as well as examples and alternative software used for each step. The accompanying Notes provide background information for each step as well as alternative options. The final result of this workflow could be an annotated, high-quality, validated, chromosome-scale assembly or a draft assembly of sufficient quality to meet specific needs of a project.


July 19, 2019

De novo assembly of haplotype-resolved genomes with trio binning.

Complex allelic variation hampers the assembly of haplotype-resolved sequences from diploid genomes. We developed trio binning, an approach that simplifies haplotype assembly by resolving allelic variation before assembly. In contrast with prior approaches, the effectiveness of our method improved with increasing heterozygosity. Trio binning uses short reads from two parental genomes to first partition long reads from an offspring into haplotype-specific sets. Each haplotype is then assembled independently, resulting in a complete diploid reconstruction. We used trio binning to recover both haplotypes of a diploid human genome and identified complex structural variants missed by alternative approaches. We sequenced an F1 cross between the cattle subspecies Bos taurus taurus and Bos taurus indicus and completely assembled both parental haplotypes with NG50 haplotig sizes of >20 Mb and 99.998% accuracy, surpassing the quality of current cattle reference genomes. We suggest that trio binning improves diploid genome assembly and will facilitate new studies of haplotype variation and inheritance.


July 19, 2019

Allele-defined genome of the autopolyploid sugarcane Saccharum spontaneum L.

Modern sugarcanes are polyploid interspecific hybrids, combining high sugar content from Saccharum officinarum with hardiness, disease resistance and ratooning of Saccharum spontaneum. Sequencing of a haploid S. spontaneum, AP85-441, facilitated the assembly of 32 pseudo-chromosomes comprising 8 homologous groups of 4 members each, bearing 35,525 genes with alleles defined. The reduction of basic chromosome number from 10 to 8 in S. spontaneum was caused by fissions of 2 ancestral chromosomes followed by translocations to 4 chromosomes. Surprisingly, 80% of nucleotide binding site-encoding genes associated with disease resistance are located in 4 rearranged chromosomes and 51% of those in rearranged regions. Resequencing of 64 S. spontaneum genomes identified balancing selection in rearranged regions, maintaining their diversity. Introgressed S. spontaneum chromosomes in modern sugarcanes are randomly distributed in AP85-441 genome, indicating random recombination among homologs in different S. spontaneum accessions. The allele-defined Saccharum genome offers new knowledge and resources to accelerate sugarcane improvement.


July 19, 2019

Prediction of smoking by multiplex bisulfite PCR with long amplicons considering allele-specific effects on DNA methylation.

Methylation of DNA is associated with a variety of biological processes. With whole-genome studies of DNA methylation, it became possible to determine a set of genomic sites where DNA methylation is associated with a specific phenotype. A method is needed that allows detailed follow-up studies of the sites, including taking into account genetic information. Bisulfite PCR is a natural choice for this kind of task, but multiplexing is one of the most important problems impeding its implementation. To address this task, we took advantage of a recently published method based on Pacbio sequencing of long bisulfite PCR products (single-molecule real-time bisulfite sequencing, SMRT-BS) and tested the validity of the improved methodology with a smoking phenotype.Herein, we describe the “panhandle” modification of the method, which permits a more robust PCR with multiple targets. We applied this technique to determine smoking by DNA methylation in 71 healthy people and 83 schizophrenia patients (n?=?50 smokers and n?=?104 non-smokers, Russians of the Moscow region). We used five targets known to be influenced by smoking (regions of genes AHRR, ALPPL2, IER3, GNG12, and GFI1). We discovered significant allele-specific methylation effects in the AHRR and IER3 regions and assessed how this information could be exploited to improve the prediction of smoking based on the collected DNA methylation data. We found no significant difference in the methylation profiles of selected targets in relation to schizophrenia suggesting that smoking affects methylation at the studied genomic sites in healthy people and schizophrenia patients in a similar way.We determined that SMRT-BS with “panhandle” modification performs well in the described setting. Additional information regarding methylation and allele-specific effects could improve the predictive accuracy of DNA methylation-based models, which could be valuable for both basic research and clinical applications.


July 19, 2019

Improved reference genome of Aedes aegypti informs arbovirus vector control.

Female Aedes aegypti mosquitoes infect more than 400 million people each year with dangerous viral pathogens including dengue, yellow fever, Zika and chikungunya. Progress in understanding the biology of mosquitoes and developing the tools to fight them has been slowed by the lack of a high-quality genome assembly. Here we combine diverse technologies to produce the markedly improved, fully re-annotated AaegL5 genome assembly, and demonstrate how it accelerates mosquito science. We anchored physical and cytogenetic maps, doubled the number of known chemosensory ionotropic receptors that guide mosquitoes to human hosts and egg-laying sites, provided further insight into the size and composition of the sex-determining M locus, and revealed copy-number variation among glutathione S-transferase genes that are important for insecticide resistance. Using high-resolution quantitative trait locus and population genomic analyses, we mapped new candidates for dengue vector competence and insecticide resistance. AaegL5 will catalyse new biological insights and intervention strategies to fight this deadly disease vector.


July 19, 2019

Mapping the landscape of tandem repeat variability by targeted long read single molecule sequencing in familial X-linked intellectual disability.

The etiology of more than half of all patients with X-linked intellectual disability remains elusive, despite array-based comparative genomic hybridization, whole exome or genome sequencing. Since short read massive parallel sequencing approaches do not allow the detection of larger tandem repeat expansions, we hypothesized that such expansions could be a hidden cause of X-linked intellectual disability.We selectively captured over 1800 tandem repeats on the X chromosome and characterized them by long read single molecule sequencing in 3 families with idiopathic X-linked intellectual disability. In male DNA samples, full tandem repeat length sequences were obtained for 88-93% of the targets and up to 99.6% of the repeats with a moderate guanine-cytosine content. Read length and analysis pipeline allow to detect cases of >?900?bp tandem repeat expansion. In one family, one repeat expansion co-occurs with down-regulation of the neighboring MIR222 gene. This gene has previously been implicated in intellectual disability and is apparently linked to FMR1 and NEFH overexpression associated with neurological disorders.This study demonstrates the power of single molecule sequencing to measure tandem repeat lengths and detect expansions, and suggests that tandem repeat mutations may be a hidden cause of X-linked intellectual disability.


July 19, 2019

The Dominant and Poorly Penetrant Phenotypes of Maize Unstable factor for orange1 Are Caused by DNA Methylation Changes at a Linked Transposon.

The maize (Zea mays) mutant Unstable factor for orange1 (Ufo1) has been implicated in the epigenetic modifications of pericarp color1 (p1), which regulates the production of the flavonoid pigments phlobaphenes. Here, we show that the ufo1 gene maps to a genetically recalcitrant region near the centromere of chromosome 10. Transcriptome analysis of Ufo1-1 mutant and wild-type plants identified a candidate gene in the mapping region using a comparative sequence-based approach. The candidate gene, GRMZM2G053177, is overexpressed by >45-fold in multiple tissues of Ufo1-1, explaining the dominance of Ufo1-1 and its phenotypes. In the mutant stock, GRMZM2G053177 has a unique transcript originating within a CACTA transposon inserted in its first intron, and it is missing the first four codons of the wild-type transcript. GRMZM2G053177 expression is regulated by the DNA methylation status of the CACTA transposon, explaining the incomplete penetrance and poor expressivity of Ufo1-1 Transgenic overexpression lines of GRMZM2G053177 (Ufo1-1) phenocopy the p1-induced pigmentation in coleoptiles, tassels, leaf sheaths, husks, pericarps, and cob glumes. Transcriptome analysis of Ufo1 versus wild-type tissues revealed changes in several pathways related to abiotic and biotic stress. Thus, this study addresses the enigma of Ufo1 identity in maize, which had gone unsolved for more than 50 years.© 2018 American Society of Plant Biologists. All rights reserved.


July 7, 2019

Complete annotated genome sequence of Mycobacterium tuberculosis (Zopf) Lehmann and Neumann (ATCC35812) (Kurono).

We report the completely annotated genome sequence of Mycobacterium tuberculosis (Zopf) Lehmann and Neumann (ATCC35812) (Kurono), which is a used for virulence and/or immunization studies. The complete genome sequence of M. tuberculosis Kurono was determined with a length of 4,415,078 bp and a G+C content of 65.60%. The chromosome was shown to contain a total of 4,340 protein-coding genes, 53 tRNA genes, one transfer messenger RNA for all amino acids, and 1 rrn operon. Lineage analysis based on large sequence polymorphisms indicated that M. tuberculosis Kurono belongs to the Euro-American lineage (lineage 4). Phylogenetic analysis using whole genome sequences of M. tuberculosis Kurono in addition to 22 M. tuberculosis complex strains indicated that H37Rv is the closest relative of Kurono based on the results of phylogenetic analysis. These findings provide a basis for research using M. tuberculosis Kurono, especially in animal models. Copyright © 2014 Elsevier Ltd. All rights reserved.


July 7, 2019

Genome sequence of Serratia nematodiphila DSM 21420T, a symbiotic bacterium from entomopathogenic nematode.

Serratia nematodiphila DSM 21420(T) (=CGMCC 1.6853(T), DZ0503SBS1(T)), isolated from the intestine of Heterorhabditidoides chongmingensis, has been known to have symbiotic-pathogenic life cycle, on the multilateral relationships with entomopathogenic nematode and insect pest. In order to better understanding of this rare feature in Serratia species, we present here the genome sequence of S. nematodiphila DSM 21420(T) with the significance of first genome sequence in this species. Copyright © 2014 Elsevier B.V. All rights reserved.


July 7, 2019

Construction of a reference genetic map of Raphanus sativus based on genotyping by whole-genome resequencing.

This manuscript provides a genetic map of Raphanus sativus that has been used as a reference genetic map for an ongoing genome sequencing project. The map was constructed based on genotyping by whole-genome resequencing of mapping parents and F 2 population. Raphanus sativus is an annual vegetable crop species of the Brassicaceae family and is one of the key plants in the seed industry, especially in East Asia. Assessment of the R. sativus genome provides fundamental resources for crop improvement as well as the study of crop genome structure and evolution. With the goal of anchoring genome sequence assemblies of R. sativus cv. WK10039 whose genome has been sequenced onto the chromosomes, we developed a reference genetic map based on genotyping of two parents (maternal WK10039 and paternal WK10024) and 93 individuals of the F2 mapping population by whole-genome resequencing. To develop high-confidence genetic markers, ~83 Gb of parental lines and ~591 Gb of mapping population data were generated as Illumina 100 bp paired-end reads. High stringent sequence analysis of the reads mapped to the 344 Mb of genome sequence scaffolds identified a total of 16,282 SNPs and 150 PCR-based markers. Using a subset of the markers, a high-density genetic map was constructed from the analysis of 2,637 markers spanning 1,538 cM with 1,000 unique framework loci. The genetic markers integrated 295 Mb of genome sequences to the cytogenetically defined chromosome arms. Comparative analysis of the chromosome-anchored sequences with Arabidopsis thaliana and Brassica rapa revealed that the R. sativus genome has evident triplicated sub-genome blocks and the structure of gene space is highly similar to that of B. rapa. The genetic map developed in this study will serve as fundamental genomic resources for the study of R. sativus.


July 7, 2019

Molecular characterization of plasmid pMoma1of Moraxella macacae, a newly described bacterial pathogen of macaques.

We report the complete nucleotide sequence and characterization of a small cryptic plasmid of Moraxella macacae 0408225, a newly described bacterial species within the family Moraxellaceae and a causative agent of epistaxis in macaques. The complete nucleotide sequence of the plasmid pMoma1 was determined and found to be 5,375 bp in size with a GC content of 37.4 %. Computer analysis of the sequence data revealed five open reading frames encoding putative proteins of 54.4 kDa (ORF1), 17.6 kDa (ORF2), 13.3 kDa (ORF3), 51.6 kDa (ORF4), and 25.0 kDa (ORF5). ORF1, ORF2, and ORF3 encode putative proteins with high identity (72, 42, and 55 %, respectively) to mobilization proteins of plasmids found in other Moraxella species. ORF3 encodes a putative protein with similarity (about 40 %) to several plasmid replicase (RepA) proteins. The fifth open reading frames (ORF) was most similar to hypothetical proteins with unknown functions, although domain analysis of this sequence suggests it belongs to the Abi-like protein family. Upstream of the repA gene, a 470-bp intergenic region, was identified that contained an AT-rich section and two sets of tandem direct and indirect repeats, consistent with a putative origin of replication site. In contrast to other plasmids of Moraxella, the occurrence of pMoma1 in M. macacae isolates appears to be common as PCR testing of 14 clinical isolates from two different research institutions all contained the plasmid.


July 7, 2019

Prevalence of subtilase cytotoxin-encoding subAB variants among Shiga toxin-producing Escherichia coli strains isolated from wild ruminants and sheep differs from that of cattle and pigs and is predominated by the new allelic variant subAB2-2.

Subtilase cytotoxin (SubAB) is an AB5 toxin produced by Shiga toxin (Stx)-producing Escherichia coli (STEC) strains usually lacking the eae gene product intimin. Three allelic variants of SubAB encoding genes have been described: subAB1, located on a plasmid, subAB2-1, located on the pathogenicity island SE-PAI and subAB2-2 located in an outer membrane efflux protein (OEP) region. SubAB is becoming increasingly recognized as a toxin potentially involved in human pathogenesis. Ruminants and cattle have been identified as reservoirs of subAB-positive STEC. The presence of the three subAB allelic variants was investigated by PCR for 152 STEC strains originating from chamois, ibex, red deer, roe deer, cattle, sheep and pigs. Overall, subAB genes were detected in 45.5% of the strains. Prevalence was highest for STEC originating from ibex (100%), chamois (92%) and sheep (65%). None of the STEC of bovine or of porcine origin tested positive for subAB. None of the strains tested positive for subAB1. The allelic variant subAB2-2 was detected the most commonly, with 51.4% possessing subAb2-1 together with subAB2-2. STEC of ovine origin, serotypes O91:H- and O128:H2, the saa gene, which encodes for the autoagglutinating adhesin and stx2b were significantly associated with subAB-positive STEC. Our results suggest that subAB2-1 and subAB2-2 is widespread among STEC from wild ruminants and sheep and may be important as virulence markers in STEC pathogenic to humans. Copyright © 2014 Elsevier GmbH. All rights reserved.


July 7, 2019

Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles.

Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity. © 2015 Nandi et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Late pleistocene Australian marsupial DNA clarifies the affinities of extinct megafaunal kangaroos and wallabies.

Understanding the evolution of Australia’s extinct marsupial megafauna has been hindered by a relatively incomplete fossil record and convergent or highly specialized morphology, which confound phylogenetic analyses. Further, the harsh Australian climate and early date of most megafaunal extinctions (39-52 ka) means that the vast majority of fossil remains are unsuitable for ancient DNA analyses. Here, we apply cross-species DNA capture to fossils from relatively high latitude, high altitude caves in Tasmania. Using low-stringency hybridization and high-throughput sequencing, we were able to retrieve mitochondrial sequences from two extinct megafaunal macropodid species. The two specimens, Simosthenurus occidentalis (giant short-faced kangaroo) and Protemnodon anak (giant wallaby), have been radiocarbon dated to 46-50 and 40-45 ka, respectively. This is significantly older than any Australian fossil that has previously yielded DNA sequence information. Processing the raw sequence data from these samples posed a bioinformatic challenge due to the poor preservation of DNA. We explored several approaches in order to maximize the signal-to-noise ratio in retained sequencing reads. Our findings demonstrate the critical importance of adopting stringent processing criteria when distant outgroups are used as references for mapping highly fragmented DNA. Based on the most stringent nucleotide data sets (879 bp for S. occidentalis and 2,383 bp for P. anak), total-evidence phylogenetic analyses confirm that macropodids consist of three primary lineages: Sthenurines such as Simosthenurus (extinct short-faced kangaroos), the macropodines (all other wallabies and kangaroos), and the enigmatic living banded hare-wallaby Lagostrophus fasciatus (Lagostrophinae). Protemnodon emerges as a close relative of Macropus (large living kangaroos), a position not supported by recent morphological phylogenetic analyses. © The Authors 2014. Published by Oxford University Press on behalf of Molecular Biology and Evolution. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.