Menu
April 21, 2020  |  

Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing

In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100?bps to ~10,000?bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.


April 21, 2020  |  

Retrotranspositional landscape of Asian rice revealed by 3000 genomes.

The recent release of genomic sequences for 3000 rice varieties provides access to the genetic diversity at species level for this crop. We take advantage of this resource to unravel some features of the retrotranspositional landscape of rice. We develop software TRACKPOSON specifically for the detection of transposable elements insertion polymorphisms (TIPs) from large datasets. We apply this tool to 32 families of retrotransposons and identify more than 50,000 TIPs in the 3000 rice genomes. Most polymorphisms are found at very low frequency, suggesting that they may have occurred recently in agro. A genome-wide association study shows that these activations in rice may be triggered by external stimuli, rather than by the alteration of genetic factors involved in transposable element silencing pathways. Finally, the TIPs dataset is used to trace the origin of rice domestication. Our results suggest that rice originated from three distinct domestication events.


September 22, 2019  |  

A chromosome conformation capture ordered sequence of the barley genome.

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.


September 22, 2019  |  

The third revolution in sequencing technology.

Forty years ago the advent of Sanger sequencing was revolutionary as it allowed complete genome sequences to be deciphered for the first time. A second revolution came when next-generation sequencing (NGS) technologies appeared, which made genome sequencing much cheaper and faster. However, NGS methods have several drawbacks and pitfalls, most notably their short reads. Recently, third-generation/long-read methods appeared, which can produce genome assemblies of unprecedented quality. Moreover, these technologies can directly detect epigenetic modifications on native DNA and allow whole-transcript sequencing without the need for assembly. This marks the third revolution in sequencing technology. Here we review and compare the various long-read methods. We discuss their applications and their respective strengths and weaknesses and provide future perspectives. Copyright © 2018 Elsevier Ltd. All rights reserved.


September 22, 2019  |  

wtf genes are prolific dual poison-antidote meiotic drivers.

Meiotic drivers are selfish genes that bias their transmission into gametes, defying Mendelian inheritance. Despite the significant impact of these genomic parasites on evolution and infertility, few meiotic drive loci have been identified or mechanistically characterized. Here, we demonstrate a complex landscape of meiotic drive genes on chromosome 3 of the fission yeasts Schizosaccharomyces kambucha and S. pombe. We identify S. kambucha wtf4 as one of these genes that acts to kill gametes (known as spores in yeast) that do not inherit the gene from heterozygotes. wtf4 utilizes dual, overlapping transcripts to encode both a gamete-killing poison and an antidote to the poison. To enact drive, all gametes are poisoned, whereas only those that inherit wtf4 are rescued by the antidote. Our work suggests that the wtf multigene family proliferated due to meiotic drive and highlights the power of selfish genes to shape genomes, even while imposing tremendous costs to fertility.


September 22, 2019  |  

Cataloguing over-expressed genes in Epstein Barr Virus immortalized lymphoblastoid cell lines through consensus analysis of PacBio transcriptomes corroborates hypomethylation of chromosome 1

The ability of Epstein Barr Virus (EBV) to transform resting cell B-cells into immortalized lymphoblastoid cell lines (LCL) provides a continuous source of peripheral blood lymphocytes that are used to model conditions in which these lymphocytes play a key role. Here, the PacBio generated transcriptome of three LCLs from a parent-daughter trio (SRAid:SRP036136) provided by a previous study [1] were analyzed using a kmer-based version of YeATS (KEATS). The set of over-expressed genes in these cell lines were determined based on a comparison with the PacBio transcriptome of twenty tissues pro- vided by another study (hOPTRS) [2]. MIR155 long non-coding RNA (MIR155HG), Fc fragment of IgE receptor II (FCER2), T-cell leukemia/lymphoma 1A (TCL1A), and germinal center associated signaling and motility (GCSAM) were genes having the highest expression counts in the three LCLs with no expression in hOPTRS. Other over-expressed genes, having low expression in hOPTRS, were membrane spanning 4-domains A1 (MS4A1) and ribosomal protein S2 pseudogene 55 (RPS2P55). While some of these genes are known to be over-expressed in LCLs, this study provides a comprehensive cataloguing of such genes. A recent work involving a patient with EBV-positive large B-cell lymphoma was “unusually lacking various B-cell markers”, but over-expressing CD30 [3] – a gene ranked 79 among uniquely expressed genes here. Hypomethylation of chromosome 1 observed in EBV immortalized LCLs [4, 5] is also corroborated here by mapping the genes to chromosomes. Extending previous work identifying un-annotated genes [6], 80 genes were identified which are expressed in the three LCLs, not in hOPTRS, and missing in the GENCODE, RefSeq and RefSeqGene databases. KEATS introduces a method of determining expression counts based on a partitioning of the known annotated genes, has runtimes of a few hours on a personal workstation and provides detailed reports enabling proper debugging.


September 22, 2019  |  

Direct chromosome-length haplotyping by single-cell sequencing.

Haplotypes are fundamental to fully characterize the diploid genome of an individual, yet methods to directly chart the unique genetic makeup of each parental chromosome are lacking. Here we introduce single-cell DNA template strand sequencing (Strand-seq) as a novel approach to phasing diploid genomes along the entire length of all chromosomes. We demonstrate this by building a complete haplotype for a HapMap individual (NA12878) at high accuracy (concordance 99.3%), without using generational information or statistical inference. By use of this approach, we mapped all meiotic recombination events in a family trio with high resolution (median range ~14 kb) and phased larger structural variants like deletions, indels, and balanced rearrangements like inversions. Lastly, the single-cell resolution of Strand-seq allowed us to observe loss of heterozygosity regions in a small number of cells, a significant advantage for studies of heterogeneous cell populations, such as cancer cells. We conclude that Strand-seq is a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells.© 2016 Porubský et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019  |  

Genome and secretome analysis of Pochonia chlamydosporia provide new insight into egg-parasitic mechanisms.

Pochonia chlamydosporia infects eggs and females of economically important plant-parasitic nematodes. The fungal isolates parasitizing different nematodes are genetically distinct. To understand their intraspecific genetic differentiation, parasitic mechanisms, and adaptive evolution, we assembled seven putative chromosomes of P. chlamydosporia strain 170 isolated from root-knot nematode eggs (~44?Mb, including 7.19% of transposable elements) and compared them with the genome of the strain 123 (~41?Mb) isolated from cereal cyst nematode. We focus on secretomes of the fungus, which play important roles in pathogenicity and fungus-host/environment interactions, and identified 1,750 secreted proteins, with a high proportion of carboxypeptidases, subtilisins, and chitinases. We analyzed the phylogenies of these genes and predicted new pathogenic molecules. By comparative transcriptome analysis, we found that secreted proteins involved in responses to nutrient stress are mainly comprised of proteases and glycoside hydrolases. Moreover, 32 secreted proteins undergoing positive selection and 71 duplicated gene pairs encoding secreted proteins are identified. Two duplicated pairs encoding secreted glycosyl hydrolases (GH30), which may be related to fungal endophytic process and lost in many insect-pathogenic fungi but exist in nematophagous fungi, are putatively acquired from bacteria by horizontal gene transfer. The results help understanding genetic origins and evolution of parasitism-related genes.


September 22, 2019  |  

The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology.

We report a draft assembly of the genome of Hi5 cells from the lepidopteran insect pest,Trichoplusia ni, assigning 90.6% of bases to one of 28 chromosomes and predicting 14,037 protein-coding genes. Chemoreception and detoxification gene families revealT. ni-specific gene expansions that may explain its widespread distribution and rapid adaptation to insecticides. Transcriptome and small RNA data from thorax, ovary, testis, and the germline-derived Hi5 cell line show distinct expression profiles for 295 microRNA- and >393 piRNA-producing loci, as well as 39 genes encoding small RNA pathway proteins. Nearly all of the W chromosome is devoted to piRNA production, andT. nisiRNAs are not 2´-O-methylated. To enable use of Hi5 cells as a model system, we have established genome editing and single-cell cloning protocols. TheT. nigenome provides insights into pest control and allows Hi5 cells to become a new tool for studying small RNAs ex vivo.© 2018, Fu et al.


September 22, 2019  |  

Cytogenomic analysis of several repetitive DNA elements in turbot (Scophthalmus maximus).

Repetitive DNA plays a fundamental role in the organization, size and evolution of eukaryotic genomes. The sequencing of the turbot revealed a small and compact genome, as in all flatfish studied to date. The assembly of repetitive regions is still incomplete because it is difficult to correctly identify their position, number and array. The combination of classical cytogenetic techniques along with high quality sequencing is essential to increase the knowledge of the structure and composition of these sequences and, thus, of the structure and function of the whole genome. In this work, the in silico analysis of H1 histone, 5S rDNA, telomeric and Rex repetitive sequences, was compared to their chromosomal mapping by fluorescent in situ hybridization (FISH), providing a more comprehensive picture of these elements in the turbot genome. FISH assays confirmed the location of H1 in LG8; 5S rDNA in LG4 and LG6; telomeric sequences at the end of all chromosomes whereas Rex elements were dispersed along most chromosomes. The discrepancies found between both approaches could be related to the sequencing methodology applied in this species and also to the resolution limitations of the FISH technique. Turbot cytogenomic analyses have proven to add new chromosomal landmarks in the karyotype of this species, representing a powerful tool to investigate targeted genomic sequences or regions in the genetic and physical maps of this species. Copyright © 2017 Elsevier B.V. All rights reserved.


September 22, 2019  |  

LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons.

Long terminal repeat retrotransposons (LTR-RTs) are prevalent in plant genomes. The identification of LTR-RTs is critical for achieving high-quality gene annotation. Based on the well-conserved structure, multiple programs were developed for the de novo identification of LTR-RTs; however, these programs are associated with low specificity and high false discovery rates. Here, we report LTR_retriever, a multithreading-empowered Perl program that identifies LTR-RTs and generates high-quality LTR libraries from genomic sequences. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity (91%), specificity (97%), accuracy (96%), and precision (90%) in rice (Oryza sativa). LTR_retriever is also compatible with long sequencing reads. With 40k self-corrected PacBio reads equivalent to 4.5× genome coverage in Arabidopsis (Arabidopsis thaliana), the constructed LTR library showed excellent sensitivity and specificity. In addition to canonical LTR-RTs with 5′-TG…CA-3′ termini, LTR_retriever also identifies noncanonical LTR-RTs (non-TGCA), which have been largely ignored in genome-wide studies. We identified seven types of noncanonical LTRs from 42 out of 50 plant genomes. The majority of noncanonical LTRs areCopiaelements, with which the LTR is four times shorter than that of otherCopiaelements, which may be a result of their target specificity. Strikingly, non-TGCACopiaelements are often located in genic regions and preferentially insert nearby or within genes, indicating their impact on the evolution of genes and their potential as mutagenesis tools.© 2018 American Society of Plant Biologists. All Rights Reserved.


September 22, 2019  |  

Genome sequences of Chlorella sorokiniana UTEX 1602 and Micractinium conductrix SAG 241.80: implications to maltose excretion by a green alga.

Green algae represent a key segment of the global species capable of photoautotrophic-driven biological carbon fixation. Algae partition fixed-carbon into chemical compounds required for biomass, while diverting excess carbon into internal storage compounds such as starch and lipids or, in certain cases, into targeted extracellular compounds. Two green algae were selected to probe for critical components associated with sugar production and release in a model alga. Chlorella sorokiniana UTEX 1602 – which does not release significant quantities of sugars to the extracellular space – was selected as a control to compare with the maltose-releasing Micractinium conductrix SAG 241.80 – which was originally isolated from an endosymbiotic association with the ciliate Paramecium bursaria. Both strains were subjected to three sequencing approaches to assemble their genomes and annotate their genes. This analysis was further complemented with transcriptional studies during maltose release by M. conductrix SAG 241.80 versus conditions where sugar release is minimal. The annotation revealed that both strains contain homologs for the key components of a putative pathway leading to cytosolic maltose accumulation, while transcriptional studies found few changes in mRNA levels for the genes associated with these established intracellular sugar pathways. A further analysis of potential sugar transporters found multiple homologs for SWEETs and tonoplast sugar transporters. The analysis of transcriptional differences revealed a lesser and more measured global response for M. conductrix SAG 241.80 versus C. sorokiniana UTEX 1602 during conditions resulting in sugar release, providing a catalog of genes that might play a role in extracellular sugar transport.© 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.


September 22, 2019  |  

Dissemination of KPC-2-encoding IncX6 plasmids among multiple Enterobacteriaceae species in a single Chinese hospital.

Forty-five KPC-producing Enterobacteriaceae strains were isolated from multiple departments in a Chinese public hospital from 2014 to 2015. Genome sequencing of four representative strains, namely Proteus mirabilis GN2, Serratia marcescens GN26, Morganella morganii GN28, and Klebsiella aerogenes E20, indicated the presence of blaKPC-2-carrying IncX6 plasmids pGN2-KPC, pGN26-KPC, pGN28-KPC, and pE20-KPC in the four strains, respectively. These plasmids were genetically closely related to one another and to the only previously sequenced IncX6 plasmid, pKPC3_SZ. Each of the plasmids carried a single accessory module containing the blaKPC-2/3-carrying ?Tn6296 derivatives. The ?Tn6292 element from pGN26-KPC also contained qnrS, which was absent from all other plasmids. Overall, pKPC3_SZ-like blaKPC-carrying IncX6 plasmids were detected by PCR in 44.4% of the KPC-producing isolates, which included K. aerogenes, P. mirabilis, S. marcescens, M. morganii, Escherichia coli, and Klebsiella pneumoniae, and were obtained from six different departments of the hospital. Data presented herein provided insights into the genomic diversity and evolution of IncX6 plasmids, as well as the dissemination and epidemiology of blaKPC-carrying IncX6 plasmids among Enterobacteriaceae in a hospital setting.


September 22, 2019  |  

Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials

Benchmark small variant calls from the Genome in a Bottle Consortium (GIAB) for the CEPH/HapMap genome NA12878 (HG001) have been used extensively for developing, optimizing, and demonstrating performance of sequencing and bioinformatics methods. Here, we develop a reproducible, cloud-based pipeline to integrate multiple sequencing datasets and form benchmark calls, enabling application to arbitrary human genomes. We use these reproducible methods to form high-confidence calls with respect to GRCh37 and GRCh38 for HG001 and 4 additional broadly-consented genomes from the Personal Genome Project that are available as NIST Reference Materials. These new genomes’ broad, open consent with few restrictions on availability of samples and data is enabling a uniquely diverse array of applications. Our new methods produce 17% more high-confidence SNPs, 176% more indels, and 12% larger regions than our previously published calls. To demonstrate that these calls can be used for accurate benchmarking, we compare other high-quality callsets to ours (e.g., Illumina Platinum Genomes), and we demonstrate that the majority of discordant calls are errors in the other callsets, We also highlight challenges in interpreting performance metrics when benchmarking against imperfect high-confidence calls. We show that benchmarking tools from the Global Alliance for Genomics and Health can be used with our calls to stratify performance metrics by variant type and genome context and elucidate strengths and weaknesses of a method.


September 22, 2019  |  

Ploidy variation in Kluyveromyces marxianus separates dairy and non-dairy isolates.

Kluyveromyces marxianus is traditionally associated with fermented dairy products, but can also be isolated from diverse non-dairy environments. Because of thermotolerance, rapid growth and other traits, many different strains are being developed for food and industrial applications but there is, as yet, little understanding of the genetic diversity or population genetics of this species. K. marxianus shows a high level of phenotypic variation but the only phenotype that has been clearly linked to a genetic polymorphism is lactose utilisation, which is controlled by variation in the LAC12 gene. The genomes of several strains have been sequenced in recent years and, in this study, we sequenced a further nine strains from different origins. Analysis of the Single Nucleotide Polymorphisms (SNPs) in 14 strains was carried out to examine genome structure and genetic diversity. SNP diversity in K. marxianus is relatively high, with up to 3% DNA sequence divergence between alleles. It was found that the isolates include haploid, diploid, and triploid strains, as shown by both SNP analysis and flow cytometry. Diploids and triploids contain long genomic tracts showing loss of heterozygosity (LOH). All six isolates from dairy environments were diploid or triploid, whereas 6 out 7 isolates from non-dairy environment were haploid. This also correlated with the presence of functional LAC12 alleles only in dairy haplotypes. The diploids were hybrids between a non-dairy and a dairy haplotype, whereas triploids included three copies of a dairy haplotype.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.