Menu
July 7, 2019

Measuring the mappability spectrum of reference genome assemblies

The ability to infer actionable information from genomic variation data in a resequencing experiment relies on accurately aligning the sequences to a reference genome. However, this accuracy is inherently limited by the quality of the reference assembly and the repetitive content of the subject’s genome. As long read sequencing technologies become more widespread, it is crucial to investigate the expected improvements in alignment accuracy and variant analysis over existing short read methods. The ability to quantify the read length and error rate necessary to uniquely map regions of interest in a sequence allows users to make informed decisions regarding experiment design and provides useful metrics for comparing the magnitude of repetition across different reference assemblies. To this end we have developed NEAT-Repeat, a toolkit for exhaustively identifying the minimum read length required to uniquely map each position of a reference sequence given a specified error rate. Using these tools we computed the -mappability spectrum” for ten reference sequences, including human and a range of plants and animals, quantifying the theoretical improvements in alignment accuracy that would result from sequencing with longer reads or reads with less base-calling errors. Our inclusion of read length and error rate builds upon existing methods for mappability tracks based on uniqueness or aligner-specific mapping scores, and thus enables more comprehensive analysis. We apply our mappability results to whole-genome variant call data, and demonstrate that variants called with low mapping and genotype quality scores are disproportionately found in reference regions that require long reads to be uniquely covered. We propose that our mappability metrics provide a valuable supplement to established variant filtering and annotation pipelines by supplying users with an additional metric related to read mapping quality. NEAT-Repeat can process large and repetitive genomes, such as those of corn and soybean, in a tractable amount of time by leveraging efficient methods for edit distance computation as well as running multiple jobs in parallel. NEAT-Repeat is written in Python 2.7 and C++, and is available at https://github.com/zstephens/neat-repeat.


July 7, 2019

Recombination hotspots in an extended human pseudoautosomal domain predicted from double-strand break maps and characterized by sperm-based crossover analysis.

The human X and Y chromosomes are heteromorphic but share a region of homology at the tips of their short arms, pseudoautosomal region 1 (PAR1), that supports obligate crossover in male meiosis. Although the boundary between pseudoautosomal and sex-specific DNA has traditionally been regarded as conserved among primates, it was recently discovered that the boundary position varies among human males, due to a translocation of ~110 kb from the X to the Y chromosome that creates an extended PAR1 (ePAR). This event has occurred at least twice in human evolution. So far, only limited evidence has been presented to suggest this extension is recombinationally active. Here, we sought direct proof by examining thousands of gametes from each of two ePAR-carrying men, for two subregions chosen on the basis of previously published male X-chromosomal meiotic double-strand break (DSB) maps. Crossover activity comparable to that seen at autosomal hotspots was observed between the X and the ePAR borne on the Y chromosome both at a distal and a proximal site within the 110-kb extension. Other hallmarks of classic recombination hotspots included evidence of transmission distortion and GC-biased gene conversion. We observed good correspondence between the male DSB clusters and historical recombination activity of this region in the X chromosomes of females, as ascertained from linkage disequilibrium analysis; this suggests that this region is similarly primed for crossover in both male and female germlines, although sex-specific differences may also exist. Extensive resequencing and inference of ePAR haplotypes, placed in the framework of the Y phylogeny as ascertained by both Y microsatellites and single nucleotide polymorphisms, allowed us to estimate a minimum rate of crossover over the entire ePAR region of 6-fold greater than genome average, comparable with pedigree estimates of PAR1 activity generally. We conclude ePAR very likely contributes to the critical crossover function of PAR1.


July 7, 2019

Genomic insights into date palm origins.

With the development of next-generation sequencing technology, the amount of date palm (Phoenix dactylifera L.) genomic data has grown rapidly and yielded new insights into this species and its origins. Here, we review advances in understanding of the evolutionary history of the date palm, with a particular emphasis on what has been learned from the analysis of genomic data. We first record current genomic resources available for date palm including genome assemblies and resequencing data. We discuss new insights into its domestication and diversification history based on these improved genomic resources. We further report recent discoveries such as the existence of wild ancestral populations in remote locations of Oman and high differentiation between African and Middle Eastern populations. While genomic data are consistent with the view that domestication took place in the Gulf region, they suggest that the process was more complex involving multiple gene pools and possibly a secondary domestication. Many questions remain unanswered, especially regarding the genetic architecture of domestication and diversification. We provide a road map to future studies that will further clarify the domestication history of this iconic crop.


July 7, 2019

Complete genome sequence of Bordetella sp. HZ20 sheds light on the ecological role of bacterium without algal-polysaccharides degrading abilities in the brown seaweed-abundant environment

Bordetella sp. HZ20 was isolated from the surface of brown seaweed (Laminaria japonica) and absence of the abilities to decompose the brown seaweed. The genome of Bordetella sp. HZ20 was sequenced and comprised of one circular chromosome with the size of 4,227,194?bp and DNA G?+?C content of 55.5%. Genomic annotation showed that, Bordetella sp. HZ20 may have chitin degradation related enzymes, heparin-sulfate lyase-like protein and enzymes related to the synthase and utilization of polyhydroxyalkanoate for carbon utilization, nitrate and nitrite reductase, glutamate dehydrogenase, glutamate synthase and glutamine synthetase for nitrogen cycle, polyphosphate kinases (pkk1 and pkk2), the high-affinity phosphate-specific transport (Pst) system and the low-affinity inorganic phosphate transporter (pitA) for phosphorus cycle, cysteine synthase and type III acyl coenzyme A transferase (dddD) for sulfur cycle. These features indicated the metabolic patterns of Bordetella sp. HZ20 in C, N, P and S cycles. In addition, the predicted Pst system and cysteine synthase were also related to biofilm formation which showed the potential pathogenicity of Bordetella sp. HZ20 to the cells of animals or plants. This study provides evidences about the metabolic patterns of Bordetella sp. HZ20 and broadens our understandings about ecological roles of bacterium without algal-polysaccharides degrading abilities in the brown seaweed-abundant environment.


July 7, 2019

Lifestyle of Lactobacillus hordei isolated from water kefir based on genomic, proteomic and physiological characterization.

Water kefir is a traditional fermented beverage made from sucrose, water, kefir granules, dried or fresh fruits. In our water kefir granules, Lactobacillus (L.) hordei is one of the predominant lactic acid bacteria (LAB) species of this presumed symbiotic consortium. It faces abundant sucrose versus limitation of amino- and fatty acids in an acidic environment. Sequencing of the genome of L. hordei TMW 1.1822 revealed one chromosome plus three plasmids. The size of the chromosome was 2.42?Mbp with a GC content of 35% GC and 2461 predicted coding sequences. Furthermore, we identified 1474 proteins upon growth on water kefir medium. Metabolic prediction revealed all enzymes required for the glycolytic Embden-Meyerhof (EMP) and phosphoketolase (PKP) pathways. Genes encoding all enzymes involved in citrate, pyruvate and mannitol metabolism are present. Moreover, it was confirmed that L. hordei is prototrophic for 11 amino acids and auxotrophic for 6 amino acids when combining putative biosynthesis pathways for amino acids with physiological characterization. Still, for glycine, serine and methionine no sure auxotype could be determined. The OppABCDF peptide transport system is complete, and 13 genes encoding peptidases are present. The arginine deiminase system, was predicted to be complete except for carbamate kinase, thus enabling neutralization reactions via ammonium formation but no additional energy generation. Taken together our findings enable prediction of the L. hordei lifestyle in water kefir: Abundant sucrose is consumed directly via parallel EMP and PK pathways and is also extracellularly converted to dextran and fructose by a glucansucrase, leaving fructose as additional carbon source. Essential amino acids (in the form of peptides) and citrate are acquired from fruits. In the lack of FabB unsaturated fatty acids are synthesized by predicted alternative enzymes. Formation of acetoin and diacetyl as well as arginine conversion reactions enable acidification limitation. Other members of the water kefir consortium (yeasts, acetic acid bacteria) likely facilitate or support growth of L. hordei by delivering gluconate, mannitol, amino- and fatty acids and vitamins. Copyright © 2018 Elsevier B.V. All rights reserved.


July 7, 2019

Genome analysis of Rhodococcus Sp. DSSKP-R-001: A highly effective ß-estradiol-degrading bacterium.

We screened bacteria that use E2 as its sole source of carbon and energy for growth and identified them as Rhodococcus, and we named them DSSKP-R-001. For a better understanding of the metabolic potential of the strain, whole genome sequencing of Rhodococcus DSSKP-R-001 and annotation of the functional genes were performed. The genomic sketches included a predicted protein-coding gene of approximately 5.4?Mbp with G?+?C content of 68.72% and 5180. The genome of Rhodococcus strain DSSKP-R-001 consists of three replicons: one chromosome and two plasmids of 5.2, 0.09, and 0.09, respectively. The results showed that there were ten steroid-degrading enzymes distributed in the whole genome of the strain. The existence and expression of estradiol-degrading enzymes were verified by PCR and RTPCR. Finally, comparative genomics was used to compare multiple strains of Rhodococcus. It was found that Rhodococcus DSSKP-R-001 had the highest similarity to Rhodococcus sp. P14 and there were 2070 core genes shared with Rhodococcus sp. P14, Rhodococcus jostii RHA1, Rhodococcus opacus B4, and Rhodococcus equi 103S, showing evolutionary homology. In summary, this study provides a comprehensive understanding of the role of Rhodococcus DSSKP-R-001 in estradiol-efficient degradation of these assays for Rhodococcus. DSSKP-R-001 in bioremediation and evolution within Rhodococcus has important meaning.


July 7, 2019

Complete genome sequences of three Leptospira mayottensis strains from tenrecs that are endemic in the Malagasy region

Leptospirosis is a zoonosis caused by Leptospira, a diversified genus containing more than 10 pathogenic species. Tenrecs are small terrestrial mammals endemic in the Malagasy region and are known to be reservoirs of the recently de- scribed species Leptospira mayottensis. We report the complete genome sequences of three L. mayottensis strains isolated from two tenrec species.


July 7, 2019

Genome analysis of Vallitalea guaymasensis strain L81 isolated from a deep-sea hydrothermal vent system.

Abyssivirga alkaniphila strain L81T, recently isolated from a black smoker biofilm at the Loki’s Castle hydrothermal vent field, was previously described as a mesophilic, obligately anaerobic heterotroph able to ferment carbohydrates, peptides, and aliphatic hydrocarbons. The strain was classified as a new genus within the family Lachnospiraceae. Herein, its genome is analyzed and A. alkaniphila is reassigned to the genus Vallitalea as a new strain of V. guaymasensis, designated V. guaymasensis strain L81. The 6.4 Mbp genome contained 5651 protein encoding genes, whereof 4043 were given a functional prediction. Pathways for fermentation of mono-saccharides, di-saccharides, peptides, and amino acids were identified whereas a complete pathway for the fermentation of n-alkanes was not found. Growth on carbohydrates and proteinous compounds supported methane production in co-cultures with Methanoplanus limicola. Multiple confurcating hydrogen-producing hydrogenases, a putative bifurcating electron-transferring flavoprotein—butyryl-CoA dehydrogenase complex, and a Rnf-complex form a basis for the observed hydrogen-production and a putative reverse electron-transport in V. guaymasensis strain L81. Combined with the observation that n-alkanes did not support growth in co-cultures with M. limicola, it seemed more plausible that the previously observed degradation patterns of crude-oil in strain L81 are explained by unspecific activation and may represent a detoxification mechanism, representing an interesting ecological function. Genes encoding a capacity for polyketide synthesis, prophages, and resistance to antibiotics shows interactions with the co-occurring microorganisms. This study enlightens the function of the fermentative microorganisms from hydrothermal vents systems and adds valuable information on the bioprospecting potential emerging in deep-sea hydrothermal systems.


July 7, 2019

Genetic structure of four plasmids found in Acinetobacter baumannii isolate D36 belonging to lineage 2 of global clone 1.

Four plasmids ranging in size from 4.7 to 44.7 kb found in the extensively antibiotic resistant Acinetobacter baumannii isolate D36 that belongs to lineage 2 of global clone 1 were examined. D36 includes two cryptic plasmids and two carrying antibiotic resistance genes. The smallest plasmid pD36-1 (4.7 kb) carries no resistance genes but includes mobA and mobC mobilisation genes related to those found in pRAY* (pD36-2, 6,078 bp) that also carries the aadB gentamicin, kanamycin and tobramycin resistance gene cassette. These two plasmids do not encode a Rep protein. Plasmid pRAY* was found to be mobilised at high frequency by the large conjugative plasmid pA297-3 but a pRAY* derivative lacking the mobA and mobC genes was not. The two larger plasmids, pD36-3 and pD36-4, encode Rep_3 family proteins (Pfam1051). The cryptic plasmid pD36-3 (6.2 kb) has RepAci1 and pD36-4 (44.7 kb) encodes two novel Rep_3 family proteins suggesting a co-integrate. Plasmid pD36-4 includes the sul2 sulfonamide resistance gene, the aphA1a kanamycin/neomycin resistance gene in Tn4352::ISAba1 and a mer module in a hybrid Tn501/Tn1696 transposon conferring resistance to mercuric ions. New examples of dif modules flanked by pdif sites (XerC-XerD binding sites) that are part of many A. baumannii plasmids were also identified in pD36-3 and pD36-4 which carry three and two dif modules, respectively. Homologs of three dif modules, the sup sulphate permease module in pD36-3, and of the abkAB toxin-antitoxin module and the orf module in pD36-4, were found in different contexts in diverse Acinetobacter plasmids, consistent with module mobility. A novel insertion sequence named ISAba32 found next to the pdif site in the abkAB dif module is related to members of the ISAjo2 group which also are associated with the pdif sites of dif modules. Plasmids found in D36 were also found in some other members of GC1 lineage 2.


July 7, 2019

An investigation of Y chromosome incorporations in 400 species of Drosophila and related genera.

Y chromosomes are widely believed to evolve from a normal autosome through a process of massive gene loss (with preservation of some male genes), shaped by sex-antagonistic selection and complemented by occasional gains of male-related genes. The net result of these processes is a male-specialized chromosome. This might be expected to be an irreversible process, but it was found in 2005 that the Drosophila pseudoobscura Y chromosome was incorporated into an autosome. Y chromosome incorporations have important consequences: a formerly male-restricted chromosome reverts to autosomal inheritance, and the species may shift from an XY/XX to X0/XX sex-chromosome system. In order to assess the frequency and causes of this phenomenon we searched for Y chromosome incorporations in 400 species from Drosophila and related genera. We found one additional large scale event of Y chromosome incorporation, affecting the whole montium subgroup (40 species in our sample); overall 13% of the sampled species (52/400) have Y incorporations. While previous data indicated that after the Y incorporation the ancestral Y disappeared as a free chromosome, the much larger data set analyzed here indicates that a copy of the Y survived as a free chromosome both in montium and pseudoobscura species, and that the current Y of the pseudoobscura lineage results from a fusion between this free Y and the neoY. The 400 species sample also showed that the previously suggested causal connection between X-autosome fusions and Y incorporations is, at best, weak: the new case of Y incorporation (montium) does not have X-autosome fusion, whereas nine independent cases of X-autosome fusions were not followed by Y incorporations. Y incorporation is an underappreciated mechanism affecting Y chromosome evolution; our results show that at least in Drosophila it plays a relevant role and highlight the need of similar studies in other groups.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.