Menu
July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.


July 7, 2019

Complete genome sequence of Vibrio campbellii LMB 29 isolated from red drum with four native megaplasmids.

Vibrio spp. are the most common pathogens for animals reared in aquaculture. Vibrio campbellii, which is often involved in shrimp, fish and mollusks diseases, is widely distributed in the marine environment worldwide, but our knowledge about its pathogenesis and antimicrobial resistance is very limited. The existence of this knowledge gap is at least partially because that V. campbellii was originally classified as Vibrio harveyi, and the detailed information of its comparative genome analysis to other Vibrio spp. is currently lacking. In this study, the complete genome of a V. campbellii predominant strain, LMB29, was determined by MiSeq in conjunction with PacBio SMRT sequencing. This genome consists of two circular DNA chromosomes and four megaplasmids. Comparative genome analysis indicates that LMB29 shares a 96.66% similarity (average nucleotide identity) with the V. campbellii ATCC strain BAA-1116 based on a 75% AF (average fraction) calculations, and its functional profile is very similar to V. campbellii E1 and V. campbellii CAIM115. Both type III secretion system (T3SS) and type VI secretion system (T6SS), along with the tlh gene which encodes a thermolabile hemolysin, are present in LMB29 which may contribute to the bacterial pathogenesis. The virulence of this strain was experimental confirmed by performing a LDH assay on a fish cell infection model, and cell death was observed as early as within 3 h post infection. Thirty-seven antimicrobial resistance genes (>45% identity) were predicted in LMB29 which includes a novel rifampicin ADP ribosyltransferase, arr-9, in plasmid pLMB157. The gene arr-9 was predicted on a genomic island with horizontal transferable potentials which may facilitate the rifampicin resistance dissemination. Future researches are needed to explore the pathogenesis of V. campbellii LMB29, but the availability of this genome sequence will certainly aid as a basis for further analysis.


July 7, 2019

Comparative and population genomic landscape of Phellinus noxius: A hypervariable fungus causing root rot in trees.

The order Hymenochaetales of white rot fungi contain some of the most aggressive wood decayers causing tree deaths around the world. Despite their ecological importance and the impact of diseases they cause, little is known about the evolution and transmission patterns of these pathogens. Here, we sequenced and undertook comparative genomic analyses of Hymenochaetales genomes using brown root rot fungus Phellinus noxius, wood-decomposing fungus Phellinus lamaensis, laminated root rot fungus Phellinus sulphurascens and trunk pathogen Porodaedalea pini. Many gene families of lignin-degrading enzymes were identified from these fungi, reflecting their ability as white rot fungi. Comparing against distant fungi highlighted the expansion of 1,3-beta-glucan synthases in P. noxius, which may account for its fast-growing attribute. We identified 13 linkage groups conserved within Agaricomycetes, suggesting the evolution of stable karyotypes. We determined that P. noxius has a bipolar heterothallic mating system, with unusual highly expanded ~60 kb A locus as a result of accumulating gene transposition. We investigated the population genomics of 60 P. noxius isolates across multiple islands of the Asia Pacific region. Whole-genome sequencing showed this multinucleate species contains abundant poly-allelic single nucleotide polymorphisms with atypical allele frequencies. Different patterns of intra-isolate polymorphism reflect mono-/heterokaryotic states which are both prevalent in nature. We have shown two genetically separated lineages with one spanning across many islands despite the geographical barriers. Both populations possess extraordinary genetic diversity and show contrasting evolutionary scenarios. These results provide a framework to further investigate the genetic basis underlying the fitness and virulence of white rot fungi.© 2017 John Wiley & Sons Ltd.


July 7, 2019

Genomic patterns of de novo mutation in simplex autism.

To further our understanding of the genetic etiology of autism, we generated and analyzed genome sequence data from 516 idiopathic autism families (2,064 individuals). This resource includes >59 million single-nucleotide variants (SNVs) and 9,212 private copy number variants (CNVs), of which 133,992 and 88 are de novo mutations (DNMs), respectively. We estimate a mutation rate of ~1.5 × 10(-8) SNVs per site per generation with a significantly higher mutation rate in repetitive DNA. Comparing probands and unaffected siblings, we observe several DNM trends. Probands carry more gene-disruptive CNVs and SNVs, resulting in severe missense mutations and mapping to predicted fetal brain promoters and embryonic stem cell enhancers. These differences become more pronounced for autism genes (p = 1.8 × 10(-3), OR = 2.2). Patients are more likely to carry multiple coding and noncoding DNMs in different genes, which are enriched for expression in striatal neurons (p = 3 × 10(-3)), suggesting a path forward for genetically characterizing more complex cases of autism. Copyright © 2017 Elsevier Inc. All rights reserved.


July 7, 2019

Contributions of Zea mays subspecies mexicana haplotypes to modern maize.

Maize was domesticated from lowland teosinte (Zea mays ssp. parviglumis), but the contribution of highland teosinte (Zea mays ssp. mexicana, hereafter mexicana) to modern maize is not clear. Here, two genomes for Mo17 (a modern maize inbred) and mexicana are assembled using a meta-assembly strategy after sequencing of 10 lines derived from a maize-teosinte cross. Comparative analyses reveal a high level of diversity between Mo17, B73, and mexicana, including three Mb-size structural rearrangements. The maize spontaneous mutation rate is estimated to be 2.17?×?10-8 ~3.87?×?10-8 per site per generation with a nonrandom distribution across the genome. A higher deleterious mutation rate is observed in the pericentromeric regions, and might be caused by differences in recombination frequency. Over 10% of the maize genome shows evidence of introgression from the mexicana genome, suggesting that mexicana contributed to maize adaptation and improvement. Our data offer a rich resource for constructing the pan-genome of Zea mays and genetic improvement of modern maize varieties.


July 7, 2019

Hidden genetic variation shapes the structure of functional elements in Drosophila.

Mutations that add, subtract, rearrange, or otherwise refashion genome structure often affect phenotypes, although the fragmented nature of most contemporary assemblies obscures them. To discover such mutations, we assembled the first new reference-quality genome of Drosophila melanogaster since its initial sequencing. By comparing this new genome to the existing D. melanogaster assembly, we created a structural variant map of unprecedented resolution and identified extensive genetic variation that has remained hidden until now. Many of these variants constitute candidates underlying phenotypic variation, including tandem duplications and a transposable element insertion that amplifies the expression of detoxification-related genes associated with nicotine resistance. The abundance of important genetic variation that still evades discovery highlights how crucial high-quality reference genomes are to deciphering phenotypes.


July 7, 2019

Assembly of an early-matured japonica (Geng) rice genome, Suijing18, based on PacBio and Illumina sequencing.

The early-matured japonica (Geng) rice variety, Suijing18 (SJ18), carries multiple elite traits including durable blast resistance, good grain quality, and high yield. Using PacBio SMRT technology, we produced over 25?Gb of long-read sequencing raw data from SJ18 with a coverage of 62×. Using Illumina paired-end whole-genome shotgun sequencing technology, we generated 59?Gb of short-read sequencing data from SJ18 (23.6?Gb from a 200?bp library with a coverage of 59× and 35.4?Gb from an 800?bp library with a coverage of 88×). With these data, we assembled a single SJ18 genome and then generated a set of annotation data. These data sets can be used to test new programs for variation deep mining, and will provide new insights into the genome structure, function, and evolution of SJ18, and will provide essential support for biological research in general.


July 7, 2019

Efficient transgenesis and annotated genome sequence of the regenerative flatworm model Macrostomum lignano.

Regeneration-capable flatworms are informative research models to study the mechanisms of stem cell regulation, regeneration, and tissue patterning. However, the lack of transgenesis methods considerably hampers their wider use. Here we report development of a transgenesis method for Macrostomum lignano, a basal flatworm with excellent regeneration capacity. We demonstrate that microinjection of DNA constructs into fertilized one-cell stage eggs, followed by a low dose of irradiation, frequently results in random integration of the transgene in the genome and its stable transmission through the germline. To facilitate selection of promoter regions for transgenic reporters, we assembled and annotated the M. lignano genome, including genome-wide mapping of transcription start regions, and show its utility by generating multiple stable transgenic lines expressing fluorescent proteins under several tissue-specific promoters. The reported transgenesis method and annotated genome sequence will permit sophisticated genetic studies on stem cells and regeneration using M. lignano as a model organism.


July 7, 2019

Complete genome sequences of two plant-associated Pseudomonas putida isolates with increased heavy-metal tolerance.

We report here the complete genome sequences of two Pseudomonas putida isolates recovered from surfac e-sterilized roots of Sida hermaphrodita The two isolates were characterized by an increased tolerance to zinc, cadmium, and lead. Furthermore, the strains showed typical plant growth-promoting properties, such as the production of indole acetic acid, cellulolytic enzymes, and siderophores. Copyright © 2017 Nesme et al.


July 7, 2019

Closed genome sequence of Chryseobacterium piperi strain CTMT/ATCC BAA-1782, a Gram-negative bacterium with clostridial neurotoxin-like coding sequences.

Clostridial neurotoxins, including botulinum and tetanus neurotoxins, are among the deadliest known bacterial toxins. Until recently, the horizontal mobility of this toxin gene family appeared to be limited to the genusClostridiumWe report here the closed genome sequence ofChryseobacterium piperi, a Gram-negative bacterium containing coding sequences with homology to clostridial neurotoxin family proteins. Copyright © 2017 Wentz et al.


July 7, 2019

The genome of an intranuclear parasite, Paramicrosporidium saccamoebae, reveals alternative adaptations to obligate intracellular parasitism.

Intracellular parasitism often results in gene loss, genome reduction, and dependence upon the host for cellular functioning. Rozellomycota is a clade comprising many such parasites and is related to the diverse, highly reduced, animal parasites, Microsporidia. We sequenced the nuclear and mitochondrial genomes ofParamicrosporidium saccamoebae[Rozellomycota], an intranuclear parasite of amoebae. A canonical fungal mitochondrial genome was recovered fromP. saccamoebaethat encodes genes necessary for the complete oxidative phosphorylation pathway including Complex I, differentiating it from most endoparasites including its sequenced relatives in Rozellomycota and Microsporidia. Comparative analysis revealed thatP. saccamoebaeshares more gene content with distantly related Fungi than with its closest relatives, suggesting that genome evolution in Rozellomycota and Microsporidia has been affected by repeated and independent gene losses, possibly as a result of variation in parasitic strategies (e.g. host and subcellular localization) or due to multiple transitions to parasitism.


July 7, 2019

SV2: Accurate structural variation genotyping and de novo mutation detection from whole genomes.

Structural Variation (SV) detection from short-read whole genome sequencing is error prone, presenting significant challenges for population or family-based studies of disease.Here we describe SV2, a machine-learning algorithm for genotyping deletions and duplications from paired-end sequencing data. SV2 can rapidly integrate variant calls from multiple structural variant discovery algorithms into a unified call set with high genotyping accuracy and capability to detect de novo mutations. SV2 is freely available on GitHub (https://github.com/dantaki/SV2).Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com


July 7, 2019

Copy number variation and expression analysis reveals a nonorthologous pinta gene family member involved in butterfly vision.

Vertebrate (cellular retinaldehyde-binding protein) and Drosophila (prolonged depolarization afterpotential is not apparent [PINTA]) proteins with a CRAL-TRIO domain transport retinal-based chromophores that bind to opsin proteins and are necessary for phototransduction. The CRAL-TRIO domain gene family is composed of genes that encode proteins with a common N-terminal structural domain. Although there is an expansion of this gene family in Lepidoptera, there is no lepidopteran ortholog of pinta. Further, the function of these genes in lepidopterans has not yet been established. Here, we explored the molecular evolution and expression of CRAL-TRIO domain genes in the butterfly Heliconius melpomene in order to identify a member of this gene family as a candidate chromophore transporter. We generated and searched a four tissue transcriptome and searched a reference genome for CRAL-TRIO domain genes. We expanded an insect CRAL-TRIO domain gene phylogeny to include H. melpomene and used 18 genomes from 4 subspecies to assess copy number variation. A transcriptome-wide differential expression analysis comparing four tissue types identified a CRAL-TRIO domain gene, Hme CTD31, upregulated in heads suggesting a potential role in vision for this CRAL-TRIO domain gene. RT-PCR and immunohistochemistry confirmed that Hme CTD31 and its protein product are expressed in the retina, specifically in primary and secondary pigment cells and in tracheal cells. Sequencing of eye protein extracts that fluoresce in the ultraviolet identified Hme CTD31 as a possible chromophore binding protein. Although we found several recent duplications and numerous copy number variants in CRAL-TRIO domain genes, we identified a single copy pinta paralog that likely binds the chromophore in butterflies.© The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 7, 2019

An update on bioinformatics resources for plant genomics research

Next-generation sequencing and traditional Sanger sequencing methods are of great significance in unraveling the complexity of plant genomes. These are constantly generating heaps of sequence data to be analyzed, annotated and stored. This has created a revolutionary demand for bioinformatics tools and software that can perform these functions. A large number of potentially useful bioinformatics tools and plant genome databases are created that have greatly simplified the analysis and storage of vast amounts of sequence data. The information garnered using the available bioinformatics methods have greatly helped in understanding the plant genome structure. Despite the availability of a good number of such tools, the information pouring from single gene-sequencing, and various whole-genome sequencing projects is overwhelming; thus, further innovations and improved methods are needed to sift through this sequence data, and assemble genomes. The current review focuses on diverse bioinformatics approaches and methods developed to systematically analyze and store plant sequence data. Finally, it outlines the bottlenecks in plant genome analysis, and some possible solutions that could be utilized to overcome the problems associated with plant genome analysis.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.