Menu
July 7, 2019

A recurrence-based approach for validating structural variation using long-read sequencing technology.

Although numerous algorithms have been developed to identify structural variations (SVs) in genomic sequences, there is a dearth of approaches that can be used to evaluate their results. This is significant as the accurate identification of structural variation is still an outstanding but important problem in genomics. The emergence of new sequencing technologies that generate longer sequence reads can, in theory, provide direct evidence for all types of SVs regardless of the length of the region through which it spans. However, current efforts to use these data in this manner require the use of large computational resources to assemble these sequences as well as visual inspection of each region. Here we present VaPoR, a highly efficient algorithm that autonomously validates large SV sets using long-read sequencing data. We assessed the performance of VaPoR on SVs in both simulated and real genomes and report a high-fidelity rate for overall accuracy across different levels of sequence depths. We show that VaPoR can interrogate a much larger range of SVs while still matching existing methods in terms of false positive validations and providing additional features considering breakpoint precision and predicted genotype. We further show that VaPoR can run quickly and efficiency without requiring a large processing or assembly pipeline. VaPoR provides a long read-based validation approach for genomic SVs that requires relatively low read depth and computing resources and thus will provide utility with targeted or low-pass sequencing coverage for accurate SV assessment. The VaPoR Software is available at: https://github.com/mills-lab/vapor.© The Authors 2017. Published by Oxford University Press.


July 7, 2019

The plastid genome in Cladophorales green algae is encoded by hairpin chromosomes.

Virtually all plastid (chloroplast) genomes are circular double-stranded DNA molecules, typically between 100 and 200 kb in size and encoding circa 80-250 genes. Exceptions to this universal plastid genome architecture are very few and include the dinoflagellates, where genes are located on DNA minicircles. Here we report on the highly deviant chloroplast genome of Cladophorales green algae, which is entirely fragmented into hairpin chromosomes. Short- and long-read high-throughput sequencing of DNA and RNA demonstrated that the chloroplast genes of Boodlea composita are encoded on 1- to 7-kb DNA contigs with an exceptionally high GC content, each containing a long inverted repeat with one or two protein-coding genes and conserved non-coding regions putatively involved in replication and/or expression. We propose that these contigs correspond to linear single-stranded DNA molecules that fold onto themselves to form hairpin chromosomes. The Boodlea chloroplast genes are highly divergent from their corresponding orthologs, and display an alternative genetic code. The origin of this highly deviant chloroplast genome most likely occurred before the emergence of the Cladophorales, and coincided with an elevated transfer of chloroplast genes to the nucleus. A chloroplast genome that is composed only of linear DNA molecules is unprecedented among eukaryotes, and highlights unexpected variation in plastid genome architecture. Copyright © 2017 Elsevier Ltd. All rights reserved.


July 7, 2019

The complete mitochondrial genome of Wonwhang (Pyrus pyrifolia)

This is a de novo assembly and annotation of a complete mitochondrial genome from Pyrus pyrifolia in the family Rosaceae. The complete mitochondrial genome of P. pyrifolia was assembled from PacBio RSII P6-C4 sequencing reads. The circular genome was 458,873?bp in length, containing 39 protein-coding genes, 23 tRNA genes and three rRNA genes. The nucleotide composition was A (27.5%), T (27.3%), G (22.6%) and C (22.6%) with GC content of 45.2%. Most of protein-coding genes use the canonical start codon ATG, whereas nad1, cox1, matR and rps4 use ACG, mttB uses ATT, rpl16 and rps19 uses GTG. The stop codon is also common in all mitochondrial genes. The phylogenetic analysis showed that P. pyrifolia was clustered with the Malus of Rosaceae family. Maximum-likelihood analysis suggests a clear relationship of Rosids and Asterids, which support the traditional classification.


July 7, 2019

Genomic clues to the parental origin of the wild flowering cherry Prunus yedoensis var. nudiflora (Rosaceae)

Prunus yedoensis Matsumura is one of the popular ornamental flowering cherry trees native to northeastern Asia, and its wild populations have only been found on Jeju Island, Korea. Previous studies suggested that wild P. yedoensis (P. yedoensis var. nudiflora) is a hybrid species; however, there is no solid evidence on its exact parental origin and genomic organization. In this study, we developed a total of 38 nuclear gene-based DNA markers that can be universally amplifiable in the Prunus species using 586 Prunus Conserved Orthologous Gene Set (Prunus COS). Using the Prunus COS markers, we investigated the genetic structure of wild P. yedoensis populations and evaluated the putative parental species of wild P. yedoensis. Population structure and phylogenetic analysis of 73 wild P. yedoensis accessions and 54 accessions of other Prunus species revealed that the wild P. yedoensis on Jeju Island is a natural homoploid hybrid. Sequence-level comparison of Prunus COS markers between species suggested that wild P. yedoensis might originate from a cross between maternal P. pendula f. ascendens and paternal P. jamasakura. Moreover, approximately 81% of the wild P. yedoensis accessions examined were likely F1 hybrids, whereas the remaining 19% were backcross hybrids resulting from additional asymmetric introgression of parental genotypes. These findings suggest that complex hybridization of the Prunus species on Jeju Island can produce a range of variable hybrid offspring. Overall, this study makes a significant contribution to address issues of the origin, nomenclature, and genetic relationship of ornamental P. yedoensis.


July 7, 2019

Mechanisms of adaptive divergence and speciation in Littorina saxatilis: Integrating knowledge from ecology and genetics with new data emerging from genomic studies

New opportunities to understand marine speciation and evolution of local adaptation come with genomic approaches and with the development of comprehensive model systems. The marine snail Littorina saxatilis is one example of a developing marine model for investigating genetic mechanisms of rapid divergence and evolution in natural systems. This species is strongly polymorphic and shows formation of local ecotypes throughout its distribution. Support is strong for primary (in situ) and parallel formation of reproductively semi-isolated ecotypes with contact zones between heterogeneous intertidal microhabitats. This makes this species an ideal organism for gaining new insights into the interplay of divergent selection, gene flow and genetic drift during local adaptation and speciation. A relatively well-resolved draft genome and a genetic map describing 17 linkage groups (“chromosomes”) are key tools for investigating the role of structural genomic variation, such as inversions, gene duplications and translocations. Whole genome re-sequencing of pools of individuals and the first comprehensive study of a contact zone contribute direct information on selection and barriers to gene flow present in specific regions of the genome. Linking selection at the phenotypic level to patterns obser ved in the genome is under way by quantitative trait loci mapping and annotation of candidate genes, while the role of single mutations on individual fitness will have to await development of gene manipulation tools. The features of the snail system facilitate the study of local adaptation and speciation and its genomic basis, but the underlying evolutionary processes are expected to be similar in other organisms, and hence this species is a useful model.


July 7, 2019

The state of whole-genome sequencing

Over the last decade, a technological paradigm shift has slashed the cost of DNA sequencing by over five orders of magnitude. Today, the cost of sequencing a human genome is a few thousand dollars, and it continues to fall. Here, we review the most cost-effective platforms for whole-genome sequencing (WGS) as well as emerging technologies that may displace or complement these. We also discuss the practical challenges of generating and analyzing WGS data, and how WGS has unlocked new strategies for discovering genes and variants underlying both rare and common human diseases.


July 7, 2019

Integrating transcriptomic and proteomic data for accurate assembly and annotation of genomes.

Complementing genome sequence with deep transcriptome and proteome data could enable more accurate assembly and annotation of newly sequenced genomes. Here, we provide a proof-of-concept of an integrated approach for analysis of the genome and proteome of Anopheles stephensi, which is one of the most important vectors of the malaria parasite. To achieve broad coverage of genes, we carried out transcriptome sequencing and deep proteome profiling of multiple anatomically distinct sites. Based on transcriptomic data alone, we identified and corrected 535 events of incomplete genome assembly involving 1196 scaffolds and 868 protein-coding gene models. This proteogenomic approach enabled us to add 365 genes that were missed during genome annotation and identify 917 gene correction events through discovery of 151 novel exons, 297 protein extensions, 231 exon extensions, 192 novel protein start sites, 19 novel translational frames, 28 events of joining of exons, and 76 events of joining of adjacent genes as a single gene. Incorporation of proteomic evidence allowed us to change the designation of more than 87 predicted “noncoding RNAs” to conventional mRNAs coded by protein-coding genes. Importantly, extension of the newly corrected genome assemblies and gene models to 15 other newly assembled Anopheline genomes led to the discovery of a large number of apparent discrepancies in assembly and annotation of these genomes. Our data provide a framework for how future genome sequencing efforts should incorporate transcriptomic and proteomic analysis in combination with simultaneous manual curation to achieve near complete assembly and accurate annotation of genomes.© 2017 Prasad et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Glaucophyta

The Glaucophyta is by far the least species-rich phylum of the Archaeplastida comprising only four described genera, Glaucocystis, Cyanophora, Gloeochaete, and Cyanoptyche, and 15 species. However, recent molecular and morphological analyses reveal that glaucophytes are not as species poor as hitherto assumed with many novel lineages existing in natural environments. Glaucophytes are freshwater phototrophs of moderate to low abundance and retain many ancestral plastid traits derived from the cyanobacterial donor of this organelle, including the remnant peptidoglycan wall in their envelope. These plastids were originally named “cyanelles,” which was later changed to “muroplasts” when their shared ancestry with other Archaeplastida was recognized. The model glaucophyte, Cyanophora paradoxa, is well studied with respect to biochemistry, proteomics, and the gene content of the nuclear and organelle genomes. Investigation of the biosynthesis of cytosolic starch led to a model for the transition from glycogen to starch storage during plastid endosymbiosis. The photosynthetic apparatus, including phycobilisome antennae, resembles that of cyanobacteria. However, the carbon-concentrating mechanism is algal in nature and based on pyrenoids. Studies on protein import into muroplasts revealed a primordial Toc/Tic translocon. The peptidoglycan wall was elucidated with respect to composition, biosynthesis, and involvement of nuclear genes. The muroplast genome is distinct, not due to the number of encoded genes but, rather, because of the presence of unique genes not present on other plastid genomes. The mosaic nature of the gene-rich (27,000) nuclear genome came as a surprise, considering the relatively small genomes of unicellular red algae.


July 7, 2019

Complete chloroplast genome sequence of Fritillaria unibracteata var. wabuensis based on SMRT Sequencing Technology.

Fritillaria unibracteata var. wabuensis is an important medicinal plant used for the treatment of cough symptoms related to the respiratory system. The chloroplast genome of F. unibracteata var. wabuensis (GenBank accession no. KF769142) was assembled using the PacBio RS platform (Pacific Biosciences, Beverly, MA) as a circle sequence with 151 009?bp. The assembled genome contains 133 genes, including 88 protein-coding, 37 tRNA, and eight rRNA genes. This genome sequence will provide important resource for further studies on the evolution of Fritillaria genus and molecular identification of Fritillaria herbs and their adulterants. This work suggests that PacBio RS is a powerful tool to sequence and assemble chloroplast genomes.


July 7, 2019

Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

Compared to diploid species, allopolyploid crop species possess more complex genomes, higher productivity, and greater adaptability to changing environments. Next generation sequencing techniques have produced high-density genetic maps, whole genome sequences, transcriptomes and epigenomes for important polyploid crops. However, several problems interfere with the full application of next generation sequencing techniques to these crops. Firstly, different types of genomic variation affect sequence assembly and QTL mapping. Secondly, duplicated or homoeologous genes can diverge in function and then lead to emergence of many minor QTL, which increases difficulties in fine mapping, cloning and marker assisted selection. Thirdly, repetitive DNA sequences arising in polyploid crop genomes also impact sequence assembly, and are increasingly being shown to produce small RNAs to regulate gene expression and hence phenotypic traits. We propose that these three key features should be considered together when analyzing polyploid crop genomes. It is apparent that dissection of genomic structural variation, elucidation of the function and mechanism of interaction of homoeologous genes, and investigation of the de novo roles of repeat sequences in agronomic traits are necessary for genomics-based crop breeding in polyploids. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.


July 7, 2019

HapCol: accurate and memory-efficient haplotype assembly from long reads.

Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of ‘future-generation’ sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions.By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HapCol, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HapCol with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HapCol is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HapCol requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HapCol can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption. Our source code is available under the terms of the GNU General Public License at http://hapcol.algolab.eu/.bonizzoni@disco.unimib.itSupplementary information: Supplementary data are available at Bioinformatics online.© The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.


July 7, 2019

Genomic resources and their influence on the detection of the signal of positive selection in genome scans.

Genome scans represent powerful approaches to investigate the action of natural selection on the genetic variation of natural populations and to better understand local adaptation. This is very useful, for example, in the field of conservation biology and evolutionary biology. Thanks to Next Generation Sequencing, genomic resources are growing exponentially, improving genome scan analyses in non-model species. Thousands of SNPs called using Reduced Representation Sequencing are increasingly used in genome scans. Besides, genome sequences are also becoming increasingly available, allowing better processing of short-read data, offering physical localization of variants, and improving haplotype reconstruction and data imputation. Ultimately, genome sequences are also becoming the raw material for selection inferences. Here, we discuss how the increasing availability of such genomic resources, notably genome sequences, influences the detection of signals of selection. Mainly, increasing data density and having the information of physical linkage data expand genome scans by (i) improving the overall quality of the data, (ii) helping the reconstruction of demographic history for the population studied to decrease false-positive rates and (iii) improving the statistical power of methods to detect the signal of selection. Of particular importance, the availability of a high-quality reference genome can improve the detection of the signal of selection by (i) allowing matching the potential candidate loci to linked coding regions under selection, (ii) rapidly moving the investigation to the gene and function and (iii) ensuring that the highly variable regions of the genomes that include functional genes are also investigated. For all those reasons, using reference genomes in genome scan analyses is highly recommended. © 2015 John Wiley & Sons Ltd.


July 7, 2019

Assembly and characterization of the MHC class I region of the Yangtze finless porpoise (Neophocaena asiaeorientalis asiaeorientalis).

The Yangtze finless porpoise (Neophocaena asiaeorientalis asiaeorientalis; YFP) is the sole freshwater subspecies of N. asiaeorientalis and is now critically endangered. Major histocompatibility complex (MHC) is a family of highly polymorphic genes that play an important immunological role in antigen presentation in the vertebrates. Currently, however, little is known about MHC region in the genome of the YFP, which hampers conservation genetics and evolutionary ecology study using MHC genes. In this work, a nucleotide sequence of 774,811 bp covering the YFP MHC class I region was obtained by screening a YFP bacterial artificial chromosome (BAC) library, followed by sequencing and assembly of positive BAC clones. A total of 45 genes were successfully annotated, of which four were MHC class I genes. There are high similarities among the four YFP MHC class I genes (>94 %). Divergence in the coding region of the four YFP MHC class I genes is mainly localized to exons 2 and 3, which encode the antigen-binding sites of MHC class I genes. Additionally, comparison of the MHC structure in YFP to those of cattle, sheep, and pig showed that MHC class I genes are located in genome regions with regard to the conserved genes, and the YFP contains the fewest MHC class I genes among these species. This is the first report characterizing a cetacean MHC class I region and describing its organization, which would be valuable for further investigation of adaptation in natural populations of the YFP and other cetaceans.


July 7, 2019

The Vigna Genome Server, ‘VigGS’: A genomic knowledge base of the genus Vigna based on high-quality, annotated genome sequence of the azuki bean, Vigna angularis (Willd.) Ohwi & Ohashi.

The genus Vigna includes legume crops such as cowpea, mungbean and azuki bean, as well as >100 wild species. A number of the wild species are highly tolerant to severe environmental conditions including high-salinity, acid or alkaline soil; drought; flooding; and pests and diseases. These features of the genus Vigna make it a good target for investigation of genetic diversity in adaptation to stressful environments; however, a lack of genomic information has hindered such research in this genus. Here, we present a genome database of the genus Vigna, Vigna Genome Server (‘VigGS’, http://viggs.dna.affrc.go.jp), based on the recently sequenced azuki bean genome, which incorporates annotated exon-intron structures, along with evidence for transcripts and proteins, visualized in GBrowse. VigGS also facilitates user construction of multiple alignments between azuki bean genes and those of six related dicot species. In addition, the database displays sequence polymorphisms between azuki bean and its wild relatives and enables users to design primer sequences targeting any variant site. VigGS offers a simple keyword search in addition to sequence similarity searches using BLAST and BLAT. To incorporate up to date genomic information, VigGS automatically receives newly deposited mRNA sequences of pre-set species from the public database once a week. Users can refer to not only gene structures mapped on the azuki bean genome on GBrowse but also relevant literature of the genes. VigGS will contribute to genomic research into plant biotic and abiotic stresses and to the future development of new stress-tolerant crops.© The Author 2015. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.


July 7, 2019

Refinement of the canine CD1 locus topology and investigation of antibody binding to recombinant canine CD1 isoforms.

CD1 molecules are antigen-presenting glycoproteins primarily found on dendritic cells (DCs) responsible for lipid antigen presentation to CD1-restricted T cells. Despite their pivotal role in immunity, little is known about CD1 protein expression in dogs, notably due to lack of isoform-specific antibodies. The canine (Canis familiaris) CD1 locus was previously found to contain three functional CD1A genes: canCD1A2, canCD1A6, and canCD1A8, where two variants of canCD1A8, canCD1A8.1 and canCD1A8.2, were assumed to be allelic variants. However, we hypothesized that these rather represented two separate genes. Sequencing of three overlapping bacterial artificial chromosomes (BACs) spanning the entire canine CD1 locus revealed canCD1A8.2 and canCD1A8.1 to be located in tandem between canCD1A7 and canCD1C, and canCD1A8.1 was consequently renamed canCD1A9. Green fluorescent protein (GFP)-fused canine CD1 transcripts were recombinantly expressed in 293T cells. All proteins showed a highly positive GFP expression except for canine CD1d and a splice variant of canine CD1a8 lacking exon 3. Probing with a panel of anti-CD1 monoclonal antibodies (mAbs) showed that Ca13.9H11 and Ca9.AG5 only recognized canine CD1a8 and CD1a9 isoforms, and Fe1.5F4 mAb solely recognized canine CD1a6. Anti-CD1b mAbs recognized the canine CD1b protein, but also bound CD1a2, CD1a8, and CD1a9. Interestingly, Ca9.AG5 showed allele specificity based on a single nucleotide polymorphism (SNP) located at position 321. Our findings have refined the structure of the canine CD1 locus and available antibody specificity against canine CD1 proteins. These are important fundamentals for future investigation of the role of canine CD1 in lipid immunity.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.