Menu
July 7, 2019

An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies.By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual.The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.


July 7, 2019

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm.

Long sequencing reads generated by single-molecule sequencing technology offer the possibility of dramatically improving the contiguity of genome assemblies. The biggest challenge today is that long reads have relatively high error rates, currently around 15%. The high error rates make it difficult to use this data alone, particularly with highly repetitive plant genomes. Errors in the raw data can lead to insertion or deletion errors (indels) in the consensus genome sequence, which in turn create significant problems for downstream analysis; for example, a single indel may shift the reading frame and incorrectly truncate a protein sequence. Here, we describe an algorithm that solves the high error rate problem by combining long, high-error reads with shorter but much more accurate Illumina sequencing reads, whose error rates average <1%. Our hybrid assembly algorithm combines these two types of reads to construct mega-reads, which are both long and accurate, and then assembles the mega-reads using the CABOG assembler, which was designed for long reads. We apply this technique to a large data set of Illumina and PacBio sequences from the species Aegilops tauschii, a large and extremely repetitive plant genome that has resisted previous attempts at assembly. We show that the resulting assembled contigs are far larger than in any previous assembly, with an N50 contig size of 486,807 nucleotides. We compare the contigs to independently produced optical maps to evaluate their large-scale accuracy, and to a set of high-quality bacterial artificial chromosome (BAC)-based assemblies to evaluate base-level accuracy. © 2017 Zimin et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Genome sequence of Plasmopara viticola and insight into the pathogenic mechanism.

Plasmopara viticola causes downy mildew disease of grapevine which is one of the most devastating diseases of viticulture worldwide. Here we report a 101.3?Mb whole genome sequence of P. viticola isolate ‘JL-7-2’ obtained by a combination of Illumina and PacBio sequencing technologies. The P. viticola genome contains 17,014 putative protein-coding genes and has ~26% repetitive sequences. A total of 1,301 putative secreted proteins, including 100 putative RXLR effectors and 90 CRN effectors were identified in this genome. In the secretome, 261 potential pathogenicity genes and 95 carbohydrate-active enzymes were predicted. Transcriptional analysis revealed that most of the RXLR effectors, pathogenicity genes and carbohydrate-active enzymes were significantly up-regulated during infection. Comparative genomic analysis revealed that P. viticola evolved independently from the Arabidopsis downy mildew pathogen Hyaloperonospora arabidopsidis. The availability of the P. viticola genome provides a valuable resource not only for comparative genomic analysis and evolutionary studies among oomycetes, but also enhance our knowledge on the mechanism of interactions between this biotrophic pathogen and its host.


July 7, 2019

Extremely low genomic diversity of Rickettsia japonica distributed in Japan.

Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as “spotted fevers”. The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles.© The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

The Nephila clavipes genome highlights the diversity of spider silk genes and their complex expression.

Spider silks are the toughest known biological materials, yet are lightweight and virtually invisible to the human immune system, and they thus have revolutionary potential for medicine and industry. Spider silks are largely composed of spidroins, a unique family of structural proteins. To investigate spidroin genes systematically, we constructed the first genome of an orb-weaving spider: the golden orb-weaver (Nephila clavipes), which builds large webs using an extensive repertoire of silks with diverse physical properties. We cataloged 28 Nephila spidroins, representing all known orb-weaver spidroin types, and identified 394 repeated coding motif variants and higher-order repetitive cassette structures unique to specific spidroins. Characterization of spidroin expression in distinct silk gland types indicates that glands can express multiple spidroin types. We find evidence of an alternatively spliced spidroin, a spidroin expressed only in venom glands, evolutionary mechanisms for spidroin diversification, and non-spidroin genes with expression patterns that suggest roles in silk production.


July 7, 2019

Genome sequencing and population genomic analyses provide insights into the adaptive landscape of silver birch.

Silver birch (Betula pendula) is a pioneer boreal tree that can be induced to flower within 1 year. Its rapid life cycle, small (440-Mb) genome, and advanced germplasm resources make birch an attractive model for forest biotechnology. We assembled and chromosomally anchored the nuclear genome of an inbred B. pendula individual. Gene duplicates from the paleohexaploid event were enriched for transcriptional regulation, whereas tandem duplicates were overrepresented by environmental responses. Population resequencing of 80 individuals showed effective population size crashes at major points of climatic upheaval. Selective sweeps were enriched among polyploid duplicates encoding key developmental and physiological triggering functions, suggesting that local adaptation has tuned the timing of and cross-talk between fundamental plant processes. Variation around the tightly-linked light response genes PHYC and FRS10 correlated with latitude and longitude and temperature, and with precipitation for PHYC. Similar associations characterized the growth-promoting cytokinin response regulator ARR1, and the wood development genes KAK and MED5A.


July 7, 2019

High-quality draft genome sequences of four lignocellulose-degrading bacteria isolated from Puerto Rican forest soil: Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.

Here, we report the high-quality draft genome sequences of four phylogenetically diverse lignocellulose-degrading bacteria isolated from tropical soil (Gordonia sp., Paenibacillus sp., Variovorax sp., and Vogesella sp.) to elucidate the genetic basis of their ability to degrade lignocellulose. These isolates may provide novel enzymes for biofuel production. Copyright © 2017 Woo et al.


July 7, 2019

Metabolic modeling of energy balances in Mycoplasma hyopneumoniae shows that pyruvate addition increases growth rate.

Mycoplasma hyopneumoniae is cultured on large-scale to produce antigen for inactivated whole-cell vaccines against respiratory disease in pigs. However, the fastidious nutrient requirements of this minimal bacterium and the low growth rate make it challenging to reach sufficient biomass yield for antigen production. In this study, we sequenced the genome of M. hyopneumoniae strain 11 and constructed a high quality constraint-based genome-scale metabolic model of 284 chemical reactions and 298 metabolites. We validated the model with time-series data of duplicate fermentation cultures to aim for an integrated model describing the dynamic profiles measured in fermentations. The model predicted that 84% of cellular energy in a standard M. hyopneumoniae cultivation was used for non-growth associated maintenance and only 16% of cellular energy was used for growth and growth associated maintenance. Following a cycle of model-driven experimentation in dedicated fermentation experiments, we were able to increase the fraction of cellular energy used for growth through pyruvate addition to the medium. This increase in turn led to an increase in growth rate and a 2.3 times increase in the total biomass concentration reached after 3-4 days of fermentation, enhancing the productivity of the overall process. The model presented provides a solid basis to understand and further improve M. hyopneumoniae fermentation processes. Biotechnol. Bioeng. 2017;114: 2339-2347. © 2017 Wiley Periodicals, Inc.© 2017 Wiley Periodicals, Inc.


July 7, 2019

No evidence for maintenance of a sympatric Heliconius species barrier by chromosomal inversions.

Mechanisms that suppress recombination are known to help maintain species barriers by preventing the breakup of coadapted gene combinations. The sympatric butterfly species Heliconius melpomene and Heliconius cydno are separated by many strong barriers, but the species still hybridize infrequently in the wild, and around 40% of the genome is influenced by introgression. We tested the hypothesis that genetic barriers between the species are maintained by inversions or other mechanisms that reduce between-species recombination rate. We constructed fine-scale recombination maps for Panamanian populations of both species and their hybrids to directly measure recombination rate within and between species, and generated long sequence reads to detect inversions. We find no evidence for a systematic reduction in recombination rates in F1 hybrids, and also no evidence for inversions longer than 50 kb that might be involved in generating or maintaining species barriers. This suggests that mechanisms leading to global or local reduction in recombination do not play a significant role in the maintenance of species barriers between H. melpomene and H. cydno.


July 7, 2019

Genome-wide analysis of gene expression and protein secretion of Babesia canis during virulent infection identifies potential pathogenicity factors.

Infections of dogs with virulent strains of Babesia canis are characterized by rapid onset and high mortality, comparable to complicated human malaria. As in other apicomplexan parasites, most Babesia virulence factors responsible for survival and pathogenicity are secreted to the host cell surface and beyond where they remodel and biochemically modify the infected cell interacting with host proteins in a very specific manner. Here, we investigated factors secreted by B. canis during acute infections in dogs and report on in silico predictions and experimental analysis of the parasite’s exportome. As a backdrop, we generated a fully annotated B. canis genome sequence of a virulent Hungarian field isolate (strain BcH-CHIPZ) underpinned by extensive genome-wide RNA-seq analysis. We find evidence for conserved factors in apicomplexan hemoparasites involved in immune-evasion (e.g. VESA-protein family), proteins secreted across the iRBC membrane into the host bloodstream (e.g. SA- and Bc28 protein families), potential moonlighting proteins (e.g. profilin and histones), and uncharacterized antigens present during acute crisis in dogs. The combined data provides a first predicted and partially validated set of potential virulence factors exported during fatal infections, which can be exploited for urgently needed innovative intervention strategies aimed at facilitating diagnosis and management of canine babesiosis.


July 7, 2019

Characterization of four endophytic fungi as potential consolidated bioprocessing hosts for conversion of lignocellulose into advanced biofuels.

Recently, several endophytic fungi have been demonstrated to produce volatile organic compounds (VOCs) with properties similar to fossil fuels, called “mycodiesel,” while growing on lignocellulosic plant and agricultural residues. The fact that endophytes are plant symbionts suggests that some may be able to produce lignocellulolytic enzymes, making them capable of both deconstructing lignocellulose and converting it into mycodiesel, two properties that indicate that these strains may be useful consolidated bioprocessing (CBP) hosts for the biofuel production. In this study, four endophytes Hypoxylon sp. CI4A, Hypoxylon sp. EC38, Hypoxylon sp. CO27, and Daldinia eschscholzii EC12 were selected and evaluated for their CBP potential. Analysis of their genomes indicates that these endophytes have a rich reservoir of biomass-deconstructing carbohydrate-active enzymes (CAZys), which includes enzymes active on both polysaccharides and lignin, as well as terpene synthases (TPSs), enzymes that may produce fuel-like molecules, suggesting that they do indeed have CBP potential. GC-MS analyses of their VOCs when grown on four representative lignocellulosic feedstocks revealed that these endophytes produce a wide spectrum of hydrocarbons, the majority of which are monoterpenes and sesquiterpenes, including some known biofuel candidates. Analysis of their cellulase activity when grown under the same conditions revealed that these endophytes actively produce endoglucanases, exoglucanases, and ß-glucosidases. The richness of CAZymes as well as terpene synthases identified in these four endophytic fungi suggests that they are great candidates to pursue for development into platform CBP organisms.


July 7, 2019

Determination of nucleopolyhedrovirus’ taxonomic position

To date , over 78 genomes of nucleopolyhedroviruses (NPVs) have been sequenced and deposited in NCBI. How to define a new virus from the infected larvae in the field is usually the first question. Two NPV strains, which were isolated from casuarina moth (L. xylina) and golden birdwing larvae (Troides aeacus), respectively, displayed the same question. Due to the identity of polyhedrin (polh) sequences of these two isolates to that of Lymantria dispar MNPV and Bombyx mori NPV, they are named LdMNPV-like virus and TraeNPV, provisionally. To further clarify the relationships of LdMNPV-like virus and TraeNPV to closely related NPVs, Kimura 2-parameter (K-2-P) analysis was performed. Apparently, the results of K-2-P analysis that showed LdMNPV-like virus is an LdMNPV isolate, while TraeNPV had an ambiguous relationship to BmNPV. Otherwise, MaviNPV, which is a mini-AcMNPV, also exhibited a different story by K-2-P analysis. Since K-2-P analysis could not cover all species determination issues, therefore, TraeNPV needs to be sequenced for defining its taxonomic position. For this purpose, different genomic sequencing technologies and bioinformatic analysis approaches will be discussed. We anticipated that these applications will help to exam nucleotide information of unknown species and give an insight and facilitate to this issue.


July 7, 2019

Coping with living in the soil: the genome of the parthenogenetic springtail Folsomia candida.

Folsomia candida is a model in soil biology, belonging to the family of Isotomidae, subclass Collembola. It reproduces parthenogenetically in the presence of Wolbachia, and exhibits remarkable physiological adaptations to stress. To better understand these features and adaptations to life in the soil, we studied its genome in the context of its parthenogenetic lifestyle.We applied Pacific Bioscience sequencing and assembly to generate a reference genome for F. candida of 221.7 Mbp, comprising only 162 scaffolds. The complete genome of its endosymbiont Wolbachia, was also assembled and turned out to be the largest strain identified so far. Substantial gene family expansions and lineage-specific gene clusters were linked to stress response. A large number of genes (809) were acquired by horizontal gene transfer. A substantial fraction of these genes are involved in lignocellulose degradation. Also, the presence of genes involved in antibiotic biosynthesis was confirmed. Intra-genomic rearrangements of collinear gene clusters were observed, of which 11 were organized as palindromes. The Hox gene cluster of F. candida showed major rearrangements compared to arthropod consensus cluster, resulting in a disorganized cluster.The expansion of stress response gene families suggests that stress defense was important to facilitate colonization of soils. The large number of HGT genes related to lignocellulose degradation could be beneficial to unlock carbohydrate sources in soil, especially those contained in decaying plant and fungal organic matter. Intra- as well as inter-scaffold duplications of gene clusters may be a consequence of its parthenogenetic lifestyle. This high quality genome will be instrumental for evolutionary biologists investigating deep phylogenetic lineages among arthropods and will provide the basis for a more mechanistic understanding in soil ecology and ecotoxicology.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.