Menu
April 21, 2020

Haplotype-aware diplotyping from noisy long reads.

Current genotyping approaches for single-nucleotide variations rely on short, accurate reads from second-generation sequencing devices. Presently, third-generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking. Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set of the Genome-in-a-Bottle effort.


April 21, 2020

Comparative Phylogenomics, a Stepping Stone for Bird Biodiversity Studies

Birds are a group with immense availability of genomic resources, and hundreds of forthcoming genomes at the doorstep. We review recent developments in whole genome sequencing, phylogenomics, and comparative genomics of birds. Short read based genome assemblies are common, largely due to efforts of the Bird 10K genome project (B10K). Chromosome-level assemblies are expected to increase due to improved long-read sequencing. The available genomic data has enabled the reconstruction of the bird tree of life with increasing confidence and resolution, but challenges remain in the early splits of Neoaves due to their explosive diversification after the Cretaceous-Paleogene (K-Pg) event. Continued genomic sampling of the bird tree of life will not just better reflect their evolutionary history but also shine new light onto the organization of phylogenetic signal and conflict across the genome. The comparatively simple architecture of avian genomes makes them a powerful system to study the molecular foundation of bird specific traits. Birds are on the verge of becoming an extremely resourceful system to study biodiversity from the nucleotide up.


April 21, 2020

A draft genome for Spatholobus suberectus.

Spatholobus suberectus Dunn (S. suberectus), which belongs to the Leguminosae, is an important medicinal plant in China. Owing to its long growth cycle and increased use in human medicine, wild resources of S. suberectus have decreased rapidly and may be on the verge of extinction. De novo assembly of the whole S. suberectus genome provides us a critical potential resource towards biosynthesis of the main bioactive components and seed development regulation mechanism of this plant. Utilizing several sequencing technologies such as Illumina HiSeq X Ten, single-molecule real-time sequencing, 10x Genomics, as well as new assembly techniques such as FALCON and chromatin interaction mapping (Hi-C), we assembled a chromosome-scale genome about 798?Mb in size. In total, 748?Mb (93.73%) of the contig sequences were anchored onto nine chromosomes with the longest scaffold being 103.57?Mb. Further annotation analyses predicted 31,634 protein-coding genes, of which 93.9% have been functionally annotated. All data generated in this study is available in public databases.


April 21, 2020

Development of a metabolic pathway transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii.

Clostridium spp. can synthesize valuable chemicals and fuels by utilizing diverse waste-stream substrates, including starchy biomass, lignocellulose, and industrial waste gases. However, metabolic engineering in Clostridium spp. is challenging due to the low efficiency of gene transfer and genomic integration of entire biosynthetic pathways.We have developed a reliable gene transfer and genomic integration system for the syngas-fermenting bacterium Clostridium ljungdahlii based on the conjugal transfer of donor plasmids containing large transgene cassettes (>?5 kb) followed by the inducible activation of Himar1 transposase to promote integration. We established a conjugation protocol for the efficient generation of transconjugants using the Gram-positive origins of replication repL and repH. We also investigated the impact of DNA methylation on conjugation efficiency by testing donor constructs with all possible combinations of Dam and Dcm methylation patterns, and used bisulfite conversion and PacBio sequencing to determine the DNA methylation profile of the C. ljungdahlii genome, resulting in the detection of four sequence motifs with N6-methyladenosine. As proof of concept, we demonstrated the transfer and genomic integration of a heterologous acetone biosynthesis pathway using a Himar1 transposase system regulated by a xylose-inducible promoter. The functionality of the integrated pathway was confirmed by detecting enzyme proteotypic peptides and the formation of acetone and isopropanol by C. ljungdahlii cultures utilizing syngas as a carbon and energy source.The developed multi-gene delivery system offers a versatile tool to integrate and stably express large biosynthetic pathways in the industrial promising syngas-fermenting microorganism C. ljungdahlii. The simple transfer and stable integration of large gene clusters (like entire biosynthetic pathways) is expanding the range of possible fermentation products of heterologously expressing recombinant strains. We also believe that the developed gene delivery system can be adapted to other clostridial strains as well.


April 21, 2020

The bile salt glycocholate induces global changes in gene and protein expression and activates virulence in enterotoxigenic Escherichia coli.

Pathogenic bacteria use specific host factors to modulate virulence and stress responses during infection. We found previously that the host factor bile and the bile component glyco-conjugated cholate (NaGCH, sodium glycocholate) upregulate the colonization factor CS5 in enterotoxigenic Escherichia coli (ETEC). To further understand the global regulatory effects of bile and NaGCH, we performed Illumina RNA-Seq and found that crude bile and NaGCH altered the expression of 61 genes in CS5?+?CS6 ETEC isolates. The most striking finding was high induction of the CS5 operon (csfA-F), its putative transcription factor csvR, and the putative ETEC virulence factor cexE. iTRAQ-coupled LC-MS/MS proteomic analyses verified induction of the plasmid-borne virulence proteins CS5 and CexE and also showed that NaGCH affected the expression of bacterial membrane proteins. Furthermore, NaGCH induced bacteria to aggregate, increased their adherence to epithelial cells, and reduced their motility. Our results indicate that CS5?+?CS6 ETEC use NaGCH present in the small intestine as a signal to initiate colonization of the epithelium.


April 21, 2020

Survival Mechanisms of Campylobacter hepaticus Identified by Genomic Analysis and Comparative Transcriptomic Analysis of in vivo and in vitro Derived Bacteria.

Chickens infected with Campylobacter jejuni or Campylobacter coli are largely asymptomatic, however, infection with the closely related species, Campylobacter hepaticus, can result in Spotty Liver Disease (SLD). C. hepaticus has been detected in the liver, bile, small intestine and caecum of SLD affected chickens. The survival and colonization mechanisms that C. hepaticus uses to colonize chickens remain unknown. In this study, we compared the genome sequences of 14 newly sequenced Australian isolates of C. hepaticus, isolates from outbreaks in the United Kingdom, and reference strains of C. jejuni and C. coli, with the aim of identifying virulence genes associated with SLD. We also carried out global comparative transcriptomic analysis between C. hepaticus recovered from the bile of SLD infected chickens and C. hepaticus grown in vitro. This revealed how the bacteria adapt to proliferate in the challenging host environment in which they are found. Additionally, biochemical experiments confirmed some in silico metabolic predictions. We found that, unlike other Campylobacter sp., C. hepaticus encodes glucose and polyhydroxybutyrate metabolism pathways. This study demonstrated the metabolic plasticity of C. hepaticus, which may contribute to survival in the competitive, nutrient and energy-limited environment of the chicken. Transcriptomic analysis indicated that gene clusters associated with glucose utilization, stress response, hydrogen metabolism, and sialic acid modification may play an important role in the pathogenicity of C. hepaticus. An understanding of the survival and virulence mechanisms that C. hepaticus uses will help to direct the development of effective intervention methods to protect birds from the debilitating effects of SLD.


April 21, 2020

Genome expansion of an obligate parthenogenesis-associated Wolbachia poses an exception to the symbiont reduction model.

Theory predicts that dependency within host-endosymbiont interactions results in endosymbiont genome size reduction. Unexpectedly, the largest Wolbachia genome was found in the obligate, parthenogenesis-associated wFol. In this study, we investigate possible processes underlying this genome expansion by comparing a re-annotated wFol genome to other Wolbachia genomes. In addition, we also search for candidate genes related to parthenogenesis induction (PI).Within wFol, we found five phage WO regions representing 25.4% of the complete genome, few pseudogenized genes, and an expansion of DNA-repair genes in comparison to other Wolbachia. These signs of genome conservation were mirrored in the wFol host, the springtail F. candida, which also had an expanded DNA-repair gene family and many horizontally transferred genes. Across all Wolbachia genomes, there was a strong correlation between gene numbers of Wolbachia strains and their hosts. In order to identify genes with a potential link to PI, we assembled the genome of an additional PI strain, wLcla. Comparisons between four PI Wolbachia, including wFol and wLcla, and fourteen non-PI Wolbachia yielded a small set of potential candidate genes for further investigation.The strong similarities in genome content of wFol and its host, as well as the correlation between host and Wolbachia gene numbers suggest that there may be some form of convergent evolution between endosymbiont and host genomes. If such convergent evolution would be strong enough to overcome the evolutionary forces causing genome reduction, it would enable expanded genomes within long-term obligate endosymbionts.


April 21, 2020

Using Pan RNA-Seq Analysis to Reveal the Ubiquitous Existence of 5′ and 3′ End Small RNAs.

In this study, we used pan RNA-seq analysis to reveal the ubiquitous existence of both 5′ and 3′ end small RNAs (5′ and 3′ sRNAs). 5′ and 3′ sRNAs alone can be used to annotate nuclear non-coding and mitochondrial genes at 1-bp resolution and identify new steady RNAs, which are usually transcribed from functional genes. Then, we provided a simple and cost effective way for the annotation of nuclear non-coding and mitochondrial genes and the identification of new steady RNAs, particularly long non-coding RNAs (lncRNAs). Using 5′ and 3′ sRNAs, the annotation of human mitochondrial was corrected and a novel ncRNA named non-coding mitochondrial RNA 1 (ncMT1) was reported for the first time in this study. We also found that most of human tRNA genes have downstream lncRNA genes as lncTRS-TGA1-1 and corrected the misunderstanding of them in previous studies. Using 5′, 3′, and intronic sRNAs, we reported for the first time that enzymatic double-stranded RNA (dsRNA) cleavage and RNA interference (RNAi) might be involved in the RNA degradation and gene expression regulation of U1 snRNA in human. We provided a different perspective on the regulation of gene expression in U1 snRNA. We also provided a novel view on cancer and virus-induced diseases, leading to find diagnostics or therapy targets from the ribonuclease III (RNase III) family and its related pathways. Our findings pave the way toward a rediscovery of dsRNA cleavage and RNAi, challenging classical theories.


April 21, 2020

One reference genome is not enough

A recent study on human structural variation indicates insufficiencies and errors in the human reference genome, GRCh38, and argues for the construction of a human pan-genome.


April 21, 2020

Adaptive Strategies in a Poly-Extreme Environment: Differentiation of Vegetative Cells in Serratia ureilytica and Resistance to Extreme Conditions.

Poly-extreme terrestrial habitats are often used as analogs to extra-terrestrial environments. Understanding the adaptive strategies allowing bacteria to thrive and survive under these conditions could help in our quest for extra-terrestrial planets suitable for life and understanding how life evolved in the harsh early earth conditions. A prime example of such a survival strategy is the modification of vegetative cells into resistant resting structures. These differentiated cells are often observed in response to harsh environmental conditions. The environmental strain (strain Lr5/4) belonging to Serratia ureilytica was isolated from a geothermal spring in Lirima, Atacama Desert, Chile. The Atacama Desert is the driest habitat on Earth and furthermore, due to its high altitude, it is exposed to an increased amount of UV radiation. The geothermal spring from which the strain was isolated is oligotrophic and the temperature of 54°C exceeds mesophilic conditions (15 to 45°C). Although the vegetative cells were tolerant to various environmental insults (desiccation, extreme pH, glycerol), a modified cell type was formed in response to nutrient deprivation, UV radiation and thermal shock. Scanning (SEM) and Transmission Electron Microscopy (TEM) analyses of vegetative cells and the modified cell structures were performed. In SEM, a change toward a circular shape with reduced size was observed. These circular cells possessed what appears as extra coating layers under TEM. The resistance of the modified cells was also investigated, they were resistant to wet heat, UV radiation and desiccation, while vegetative cells did not withstand any of those conditions. A phylogenomic analysis was undertaken to investigate the presence of known genes involved in dormancy in other bacterial clades. Genes related to spore-formation in Myxococcus and Firmicutes were found in S. ureilytica Lr5/4 genome; however, these genes were not enough for a full sporulation pathway that resembles either group. Although, the molecular pathway of cell differentiation in S. ureilytica Lr5/4 is not fully defined, the identified genes may contribute to the modified phenotype in the Serratia genus. Here, we show that a modified cell structure can occur as a response to extremity in a species that was previously not known to deploy this strategy. This strategy may be widely spread in bacteria, but only expressed under poly-extreme environmental conditions.


April 21, 2020

Systematic analysis of dark and camouflaged genes reveals disease-relevant genes hiding in plain sight.

The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions.Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are =?5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls.While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies.


April 21, 2020

High-coverage, long-read sequencing of Han Chinese trio reference samples.

Single-molecule long-read sequencing datasets were generated for a son-father-mother trio of Han Chinese descent that is part of the Genome in a Bottle (GIAB) consortium portfolio. The dataset was generated using the Pacific Biosciences Sequel System. The son and each parent were sequenced to an average coverage of 60 and 30, respectively, with N50 subread lengths between 16 and 18?kb. Raw reads and reads aligned to both the GRCh37 and GRCh38 are available at the NCBI GIAB ftp site (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/). The GRCh38 aligned read data are archived in NCBI SRA (SRX4739017, SRX4739121, and SRX4739122). This dataset is available for anyone to develop and evaluate long-read bioinformatics methods.


April 21, 2020

The tech for the next decade: promises and challenges in genome biology.

The 19th Annual Advances in Genome Biology and Technology (AGBT) meeting came back to Marco Island, Florida, and was held in the renovated venue from 27 February to 2 March 2019. The meeting showed a variety of new technology, both in wet lab and in bioinformatics. This year’s themes included single-cell technology and applications, spatially resolved gene expression measurements, new sequencing platforms, genome assembly and variation, and long and linked reads.


April 21, 2020

Plasmids of Shigella flexneri serotype 1c strain Y394 provide advantages to bacteria in the host.

Shigella flexneri has an extremely complex genome with a significant number of virulence traits acquired by mobile genetic elements including bacteriophages and plasmids. S. flexneri serotype 1c is an emerging etiological agent of bacillary dysentery in developing countries. In this study, the complete nucleotide sequence of two plasmids of S. flexneri serotype 1c strain Y394 was determined and analysed.The plasmid pINV-Y394 is an invasive or virulence plasmid of size 221,293?bp composed of a large number of insertion sequences (IS), virulence genes, regulatory and maintenance genes. Three hundred and twenty-eight open reading frames (ORFs) were identified in pINV-Y394, of which about a half (159 ORFs) were identified as IS elements. Ninety-seven ORFs were related to characterized genes (majority of which are associated with virulence and their regulons), and 72 ORFs were uncharacterized or hypothetical genes. The second plasmid pNV-Y394 is of size 10,866?bp and encodes genes conferring resistance against multiple antibiotics of clinical importance. The multidrug resistance gene cassette consists of tetracycline resistance gene tetA, streptomycin resistance gene strA-strB and sulfonamide-resistant dihydropteroate synthase gene sul2.These two plasmids together play a key role in the fitness of Y394 in the host environment. The findings from this study indicate that the pathogenic S. flexneri is a highly niche adaptive pathogen which is able to co-evolve with its host and respond to the selection pressure in its environment.


April 21, 2020

Complete genome sequence of acetate-producing Klebsiella pneumoniae L5-2 isolated from infant feces.

Acetate is an important metabolite in infants as it can affect metabolism as well as immune and inflammatory responses. However, there have been no studies on acetate production by Klebsiella pneumoniae isolated from infant feces. In this study, we isolated a K. pneumoniae strain, L5-2, from infant feces, and we found it produces acetate. The genome of L5-2 consisted of a 5,237,123-bp single chromosome and a 139,211-bp single plasmid. The G?+?C content was 57.27%. By whole-genome analysis of K. pneumoniae L5-2, we identified seven genes related to acetate production (poxA, pta, eutD, ackA, eutP, eutQ, and adhE). We confirmed acetate production by K. pneumoniae L5-2 by ion chromatography. The aldehyde/alcohol dehydrogenase (adhE) activity of K. pneumoniae L5-2 was significantly higher than that of the K. pneumoniae subsp. ozaenae ATCC 11296. Thus, the acetate-producing ability of K. pneumoniae L5-2 was influenced by the adhE gene. In addition, K. pneumoniae L5-2 had significantly less virulence factor-encoding genes than other K. pneumoniae strains isolated from humans. In conclusion, K. pneumoniae L5-2 isolated from infant feces has less virulence factors and higher adhE activity than other K. pneumoniae strains.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.