Menu
July 7, 2019  |  

Best practices in insect genome sequencing: What works and what doesn’t.

The last decade of decreasing DNA sequencing costs and proliferating sequencing services in core labs and companies has brought the de-novo genome sequencing and assembly of insect species within reach for many entomologists. However, sequence production alone is not enough to generate a high quality reference genome, and in many cases, poor planning can lead to extremely fragmented genome assemblies preventing high quality gene annotation and other desired analyses. Insect genomes can be problematic to assemble, due to combinations of high polymorphism, inability to breed for genome homozygocity, and small physical sizes limiting the quantity of DNA able to be isolated from a single individual. Recent advances in sequencing technology and assembly strategies are enabling a revolution for insect genome reference sequencing and assembly. Here we review historical and new genome sequencing and assembly strategies, with a particular focus on their application to arthropod genomes. We highlight both the need to design sequencing strategies for the requirements of the assembly software, and new long-read technologies that are enabling a return to traditional assembly approaches. Finally, we compare and contrast very cost effective short read draft genome strategies with the long read approaches that although entailing additional cost, bring a higher likelihood of success and the possibility of archival assembly qualities approaching that of finished genomes.


July 7, 2019  |  

It’s more than stamp collecting: how genome sequencing can unify biological research.

The availability of reference genome sequences, especially the human reference, has revolutionized the study of biology. However, while the genomes of some species have been fully sequenced, a wide range of biological problems still cannot be effectively studied for lack of genome sequence information. Here, I identify neglected areas of biology and describe how both targeted species sequencing and more broad taxonomic surveys of the tree of life can address important biological questions. I enumerate the significant benefits that would accrue from sequencing a broader range of taxa, as well as discuss the technical advances in sequencing and assembly methods that would allow for wide-ranging application of whole-genome analysis. Finally, I suggest that in addition to ‘big science’ survey initiatives to sequence the tree of life, a modified infrastructure-funding paradigm would better support reference genome sequence generation for research communities most in need. Copyright © 2015 Elsevier Ltd. All rights reserved.


July 7, 2019  |  

Sequence type 1 group B Streptococcus, an emerging cause of invasive disease in adults, evolves by small genetic changes.

The molecular mechanisms underlying pathogen emergence in humans is a critical but poorly understood area of microbiologic investigation. Serotype V group B Streptococcus (GBS) was first isolated from humans in 1975, and rates of invasive serotype V GBS disease significantly increased starting in the early 1990s. We found that 210 of 229 serotype V GBS strains (92%) isolated from the bloodstream of nonpregnant adults in the United States and Canada between 1992 and 2013 were multilocus sequence type (ST) 1. Elucidation of the complete genome of a 1992 ST-1 strain revealed that this strain had the highest homology with a GBS strain causing cow mastitis and that the 1992 ST-1 strain differed from serotype V strains isolated in the late 1970s by acquisition of cell surface proteins and antimicrobial resistance determinants. Whole-genome comparison of 202 invasive ST-1 strains detected significant recombination in only eight strains. The remaining 194 strains differed by an average of 97 SNPs. Phylogenetic analysis revealed a temporally dependent mode of genetic diversification consistent with the emergence in the 1990s of ST-1 GBS as major agents of human disease. Thirty-one loci were identified as being under positive selective pressure, and mutations at loci encoding polysaccharide capsule production proteins, regulators of pilus expression, and two-component gene regulatory systems were shown to affect the bacterial phenotype. These data reveal that phenotypic diversity among ST-1 GBS is mainly driven by small genetic changes rather than extensive recombination, thereby extending knowledge into how pathogens adapt to humans.


July 7, 2019  |  

Twenty years of bacterial genome sequencing.

Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a ‘sequencing singularity’, where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.


July 7, 2019  |  

The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera.

Previous studies have reported that chromosome synteny in Lepidoptera has been well conserved, yet the number of haploid chromosomes varies widely from 5 to 223. Here we report the genome (393?Mb) of the Glanville fritillary butterfly (Melitaea cinxia; Nymphalidae), a widely recognized model species in metapopulation biology and eco-evolutionary research, which has the putative ancestral karyotype of n=31. Using a phylogenetic analyses of Nymphalidae and of other Lepidoptera, combined with orthologue-level comparisons of chromosomes, we conclude that the ancestral lepidopteran karyotype has been n=31 for at least 140?My. We show that fusion chromosomes have retained the ancestral chromosome segments and very few rearrangements have occurred across the fusion sites. The same, shortest ancestral chromosomes have independently participated in fusion events in species with smaller karyotypes. The short chromosomes have higher rearrangement rate than long ones. These characteristics highlight distinctive features of the evolutionary dynamics of butterflies and moths.


July 7, 2019  |  

Complete genome sequences of eight Helicobacter pylori strains with different virulence factor genotypes and methylation profiles, isolated from patients with diverse gastrointestinal diseases on Okinawa Island, Japan, determined using PacBio Single-Molecule Real-Time Technology.

We report the complete genome sequences of eight Helicobacter pylori strains isolated from patients with gastrointestinal diseases in Okinawa, Japan. Whole-genome sequencing and DNA methylation detection were performed using the PacBio platform. De novo assembly determined a single, complete contig for each strain. Furthermore, methylation analysis identified virulence factor genotype-dependent motifs.


July 7, 2019  |  

Replication of the Escherichia coli chromosome in RNase HI-deficient cells: multiple initiation regions and fork dynamics.

DNA replication in Escherichia coli is normally initiated at a single origin, oriC, dependent on initiation protein DnaA. However, replication can be initiated elsewhere on the chromosome at multiple ectopic oriK sites. Genetic evidence indicates that initiation from oriK depends on RNA-DNA hybrids (R-loops), which are normally removed by enzymes such as RNase HI to prevent oriK from misfiring during normal growth. Initiation from oriK sites occurs in RNase HI-deficient mutants, and possibly in wild-type cells under certain unusual conditions. Despite previous work, the locations of oriK and their impact on genome stability remain unclear. We combined 2D gel electrophoresis and whole genome approaches to map genome-wide oriK locations. The DNA copy number profiles of various RNase HI-deficient strains contained multiple peaks, often in consistent locations, identifying candidate oriK sites. Removal of RNase HI protein also leads to global alterations of replication fork migration patterns, often opposite to normal replication directions, and presumably eukaryote-like replication fork merging. Our results have implications for genome stability, offering a new understanding of how RNase HI deficiency results in R-loop-mediated transcription-replication conflict, as well as inappropriate replication stalling or blockage at Ter sites outside of the terminus trap region and at ribosomal operons. © 2013 John Wiley & Sons Ltd.


July 7, 2019  |  

In transition: primate genomics at a time of rapid change.

The field of nonhuman primate genomics is undergoing rapid change and making impressive progress. Exploiting new technologies for DNA sequencing, researchers have generated new whole-genome sequence assemblies for multiple primate species over the past 6 years. In addition, investigations of within-species genetic variation, gene expression and RNA sequences, conservation of non-protein-coding regions of the genome, and other aspects of comparative genomics are moving at an accelerating speed. This progress is opening a wide array of new research opportunities in the analysis of comparative primate genome content and evolution. It also creates new possibilities for the use of nonhuman primates as model organisms in biomedical research. This transition, based on both new technology and the new information being generated in regard to human genetics, provides an important justification for reevaluating the research goals, strategies, and study designs used in primate genetics and genomics.


July 7, 2019  |  

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly.In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies.Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.


July 7, 2019  |  

The value of new genome references.

Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.


July 7, 2019  |  

Efficient CNV breakpoint analysis reveals unexpected structural complexity and correlation of dosage-sensitive genes with clinical severity in genomic disorders.

Genomic disorders are the clinical conditions manifested by submicroscopic genomic rearrangements including copy number variants (CNVs). The CNVs can be identified by array-based comparative genomic hybridization (aCGH), the most commonly used technology for molecular diagnostics of genomic disorders. However, clinical aCGH only informs CNVs in the probe-interrogated regions. Neither orientational information nor the resulting genomic rearrangement structure is provided, which is a key to uncovering mutational and pathogenic mechanisms underlying genomic disorders. Long-range polymerase chain reaction (PCR) is a traditional approach to obtain CNV breakpoint junction, but this method is inefficient when challenged by structural complexity such as often found at the PLP1 locus in association with Pelizaeus-Merzbacher disease (PMD). Here we introduced ‘capture and single-molecule real-time sequencing’ (cap-SMRT-seq) and newly developed ‘asymmetry linker-mediated nested PCR walking’ (ALN-walking) for CNV breakpoint sequencing in 49 subjects with PMD-associated CNVs. Remarkably, 29 (94%) of the 31 CNV breakpoint junctions unobtainable by conventional long-range PCR were resolved by cap-SMRT-seq and ALN-walking. Notably, unexpected CNV complexities, including inter-chromosomal rearrangements that cannot be resolved by aCGH, were revealed by efficient breakpoint sequencing. These sequence-based structures of PMD-associated CNVs further support the role of DNA replicative mechanisms in CNV mutagenesis, and facilitate genotype-phenotype correlation studies. Intriguingly, the lengths of gained segments by CNVs are strongly correlated with clinical severity in PMD, potentially reflecting the functional contribution of other dosage-sensitive genes besides PLP1. Our study provides new efficient experimental approaches (especially ALN-walking) for CNV breakpoint sequencing and highlights their importance in uncovering CNV mutagenesis and pathogenesis in genomic disorders.© The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.


July 7, 2019  |  

IgA-coated E. coli enriched in Crohn’s disease spondyloarthritis promote TH17-dependent inflammation.

Peripheral spondyloarthritis (SpA) is a common extraintestinal manifestation in patients with active inflammatory bowel disease (IBD) characterized by inflammatory enthesitis, dactylitis, or synovitis of nonaxial joints. However, a mechanistic understanding of the link between intestinal inflammation and SpA has yet to emerge. We evaluated and functionally characterized the fecal microbiome of IBD patients with or without peripheral SpA. Coupling the sorting of immunoglobulin A (IgA)-coated microbiota with 16S ribosomal RNA-based analysis (IgA-seq) revealed a selective enrichment in IgA-coated Escherichia coli in patients with Crohn’s disease-associated SpA (CD-SpA) compared to CD alone. E. coli isolates from CD-SpA-derived IgA-coated bacteria were similar in genotype and phenotype to an adherent-invasive E. coli (AIEC) pathotype. In comparison to non-AIEC E. coli, colonization of germ-free mice with CD-SpA E. coli isolates induced T helper 17 cell (TH17) mucosal immunity, which required the virulence-associated metabolic enzyme propanediol dehydratase (pduC). Modeling the increase in mucosal and systemic TH17 immunity we observed in CD-SpA patients, colonization of interleukin-10-deficient or K/BxN mice with CD-SpA-derived E. coli lead to more severe colitis or inflammatory arthritis, respectively. Collectively, these data reveal the power of IgA-seq to identify immunoreactive resident pathosymbionts that link mucosal and systemic TH17-dependent inflammation and offer microbial and immunophenotype stratification of CD-SpA that may guide medical and biologic therapy. Copyright © 2017, American Association for the Advancement of Science.


July 7, 2019  |  

Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community.

The Asian citrus psyllid (Diaphorina citri Kuwayama) is the insect vector of the bacterium Candidatus Liberibacter asiaticus (CLas), the pathogen associated with citrus Huanglongbing (HLB, citrus greening). HLB threatens citrus production worldwide. Suppression or reduction of the insect vector using chemical insecticides has been the primary method to inhibit the spread of citrus greening disease. Accurate structural and functional annotation of the Asian citrus psyllid genome, as well as a clear understanding of the interactions between the insect and CLas, are required for development of new molecular-based HLB control methods. A draft assembly of the D. citri genome has been generated and annotated with automated pipelines. However, knowledge transfer from well-curated reference genomes such as that of Drosophila melanogaster to newly sequenced ones is challenging due to the complexity and diversity of insect genomes. To identify and improve gene models as potential targets for pest control, we manually curated several gene families with a focus on genes that have key functional roles in D. citri biology and CLas interactions. This community effort produced 530 manually curated gene models across developmental, physiological, RNAi regulatory and immunity-related pathways. As previously shown in the pea aphid, RNAi machinery genes putatively involved in the microRNA pathway have been specifically duplicated. A comprehensive transcriptome enabled us to identify a number of gene families that are either missing or misassembled in the draft genome. In order to develop biocuration as a training experience, we included undergraduate and graduate students from multiple institutions, as well as experienced annotators from the insect genomics research community. The resulting gene set (OGS v1.0) combines both automatically predicted and manually curated gene models.


July 7, 2019  |  

Complete genome sequence of Vibrio gazogenes ATCC 43942.

Vibrio gazogenes ATCC 43942 has the potential to synthesize a plethora of metabolites which are of clinical and agricultural significance in response to environmental triggers. The complete genomic sequence of Vibrio gazogenes ATCC 43942 is reported herein, contributing to the knowledge base of strains in the Vibrio genus. Copyright © 2017 Gummadidala et al.


July 7, 2019  |  

SVachra: a tool to identify genomic structural variation in mate pair sequencing data containing inward and outward facing reads.

Characterization of genomic structural variation (SV) is essential to expanding the research and clinical applications of genome sequencing. Reliance upon short DNA fragment paired end sequencing has yielded a wealth of single nucleotide variants and internal sequencing read insertions-deletions, at the cost of limited SV detection. Multi-kilobase DNA fragment mate pair sequencing has supplemented the void in SV detection, but introduced new analytic challenges requiring SV detection tools specifically designed for mate pair sequencing data. Here, we introduce SVachra – Structural Variation Assessment of CHRomosomal Aberrations, a breakpoint calling program that identifies large insertions-deletions, inversions, inter- and intra-chromosomal translocations utilizing both inward and outward facing read types generated by mate pair sequencing.We demonstrate SVachra’s utility by executing the program on large-insert (Illumina Nextera) mate pair sequencing data from the personal genome of a single subject (HS1011). An additional data set of long-read (Pacific BioSciences RSII) was also generated to validate SV calls from SVachra and other comparison SV calling programs. SVachra exhibited the highest validation rate and reported the widest distribution of SV types and size ranges when compared to other SV callers.SVachra is a highly specific breakpoint calling program that exhibits a more unbiased SV detection methodology than other callers.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.