Menu
April 21, 2020

Characterization of LINE-1 transposons in a human genome at allelic resolution

The activity of the retrotransposon LINE-1 has created a substantial portion of the human genome. Most of this sequence comprises fractured and debilitated LINE-1s. An accurate approximation of the number, location, and sequence of the LINE-1 elements present in any single genome has proven elusive due to the difficulty of assembling and phasing the repetitive and polymorphic regions of the human genome. Through an in-depth analysis of publicly-available, deep, long-read assemblies of nearly homozygous human genomes, we defined the location and sequence of all intact LINE-1s in these assemblies. We found 148 and 142 intact LINE-1s in two nearly homozygous assemblies. A combination of these assemblies suggests a diploid human genome contains at least 50% more intact LINE-1s than previous estimates textendash in this case, 290 intact LINE-1s at 194 loci. We think this is the best approximation, to date, of the number of intact LINE-1s in a single diploid human genome. In addition to counting intact LINE-1 elements, we resolved the sequence of each element, including some LINE-1 elements in unassembled, presumably centromeric regions of the genome. A comparison of the intact LINE-1s in each assembly shows the specific pattern of variation between these genomes, including LINE-1s that remain intact in only one genome, allelic variation in shared intact LINE-1s, and LINE-1s that are unique (presumably young) insertions in only one genome. We found that many old elements (> 6 million years old) remain intact, and comparison of the young and intact LINE-1s across assemblies reinforces the notion that only a small portion of all LINE-1 sequences that may be intact in the genomes of the human population has been uncovered. This dataset provides the first nearly comprehensive estimate of LINE-1 diversity within an individual, an important dataset in the quest to understand the functional consequences of sequence variation in LINE-1 and the complete set of LINE-1s in the human population.


April 21, 2020

Loss-of-function tolerance of enhancers in the human genome

Previous studies have surveyed the potential impact of loss-of-function (LoF) variants and identified LoF-tolerant protein-coding genes. However, the tolerance of human genomes to losing enhancers has not yet been evaluated. Here we present the catalog of LoF-tolerant enhancers using structural variants from whole-genome sequences. Using a conservative approach, we estimate that each individual human genome possesses at least 28 LoF-tolerant enhancers on average. We assessed the properties of LoF-tolerant enhancers in a unified regulatory network constructed by integrating tissue-specific enhancers and gene-gene interactions. We find that LoF-tolerant enhancers are more tissue-specific and regulate fewer and more dispensable genes. They are enriched in immune-related cells while LoF-intolerant enhancers are enriched in kidney and brain/neuronal stem cells. We developed a supervised learning approach to predict the LoF- tolerance of enhancers, which achieved an AUROC of 96%. We predict 5,677 more enhancers would be likely tolerant to LoF and 75 enhancers that would be highly LoF-intolerant. Our predictions are supported by known set of disease enhancers and novel deletions from PacBio sequencing. The LoF-tolerance scores provided here will serve as an important reference for disease studies.


April 21, 2020

CENP-C stabilizes the conformation of CENP-A nucleosomes within the inner kinetochore at human centromere

The centromere is a vital locus on each chromosome which seeds the kinetochore, allowing for a physical connection between the chromosome and the mitotic spindle. At the heart of the centromere is the centromere-specific histone H3 variant CENP-A/CENH3. Throughout the cell cycle the constitutive centromere associated network is bound to CENP-A chromatin, but how this protein network modifies CENP-A nucleosome dynamics in vivo is unknown. Here, using a combination of biophysical and biochemical analyses we provide evidence for the existence of two populations of structurally distinct CENP-A nucleosomes that co-exist at human centromeres. These two populations display unique sedimentation patterns, which permits purification of inner kinetochore bound CENP-A chromatin away from bulk CENP-A nucleosomes. The bulk population of CENP-A nucleosomes have diminished heights and weakened DNA interactions, whereas CENP-A nucleosomes robustly associated with the inner kinetochore are stabilized in an octameric conformation, with restricted access to nucleosomal DNA. Immuno-labeling coupled to atomic force microscopy of these complexes confirms their identity at the nanoscale resolution. These data provide a systematic and detailed description of inner-kinetochore bound CENP-A chromatin from human centromeres, with implications for the state of CENP-A chromatin that is actively engaged during mitosis.


April 21, 2020

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal RepeatLINELong Interspersed Nuclear ElementSINEShort Interspersed Nuclear ElementMITEMiniature Inverted Transposable ElementTIRTerminal Inverted RepeatTSDTarget Site DuplicationTPTrue PositivesFPFalse PositivesTNTrue NegativeFNFalse NegativesGRFGeneric Repeat FinderEDTAExtensive de-novo TE Annotator


April 21, 2020

Insights into the bacterial species and communities of a full-scale anaerobic/anoxic/oxic wastewater treatment plant by using third-generation sequencing.

For the first time, full-length 16S rRNA sequencing method was applied to disclose the bacterial species and communities of a full-scale wastewater treatment plant using an anaerobic/anoxic/oxic (A/A/O) process in Wuhan, China. The compositions of the bacteria at phylum and class levels in the activated sludge were similar to which revealed by Illumina Miseq sequencing. At genus and species levels, third-generation sequencing showed great merits and accuracy. Typical functional taxa classified to ammonia-oxidizing bacteria (AOB), nitrite-oxidizing bacteria (NOB), denitrifying bacteria (DB), anaerobic ammonium oxidation bacteria (ANAMMOXB) and polyphosphate-accumulating organisms (PAOs) were presented, which were Nitrosomonas (1.11%), Nitrospira (3.56%), Pseudomonas (3.88%), Planctomycetes (13.80%), Comamonadaceae (1.83%), respectively. Pseudomonas (3.88%) and Nitrospira (3.56%) were the most predominating two genera, mainly containing Pseudomonas extremaustralis (1.69%), Nitrospira defluvii (3.13%), respectively. Bacteria regarding to nitrogen and phosphorus removal at species level were put forward. The predicted functions proved that the A/A/O process was efficient regarding nitrogen and organics removal. Copyright © 2019 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.


April 21, 2020

ORF Capture-Seq: a versatile method for targeted identification of full-length isoforms

Most human protein-coding genes are expressed as multiple isoforms. This in turn greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every gene, the majority of alternative isoforms remains uncharacterized experimentally. This is primarily due to: i) vast differences of overall levels between different isoforms expressed from common genes, and ii) the difficulty of obtaining contiguous full-length ORF sequences. Here, we present ORF Capture-Seq (OCS), a flexible and cost-effective method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude, compared to unenriched sample. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will allow mapping of the full set of human isoforms at reasonable cost.


April 21, 2020

Comparative Genomic Analysis of a Multidrug-Resistant Listeria monocytogenes ST477 Isolate.

Listeria monocytogenes is an opportunistic human foodborne pathogen that causes severe infections with high hospitalization and fatality rates. Clonal complex 9 (CC9) contains a large number of sequence types (STs) and is one of the predominant clones distributed worldwide. However, genetic characteristics of ST477 isolates, which also belong to CC9, have never been examined, and little is known about the detail genomic traits of this food-associated clone. In this study, we sequenced and constructed the whole-genome sequence of an ST477 isolate from a frozen food sample in China and compared it with 58 previously sequenced genomes of 25 human-associated, 5 animal, and 27 food isolates consisting of 6 CC9 and 52 other clones. Phylogenetic analysis revealed that the ST477 clustered with three Canadian ST9 isolates. All phylogeny revealed that CC9 isolates involved in this study consistently possessed the invasion-related gene vip. Mobile genetic elements (MGEs), resistance genes, and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas system were elucidated among CC9 isolates. Our ST477 isolate contained a Tn554-like transposon, carrying five arsenical-resistance genes (arsA-arsD, arsR), which was exclusively identified in the CC9 background. Compared with the ST477 genome, three Canadian ST9 isolates shared nonsynonymous nucleotide substitutions in the condensin complex gene smc and cell surface protein genes ftsA and essC. Our findings preliminarily indicate that the extraordinary success of CC9 clone in colonization of different geographical regions is likely due to conserved features harboring MGEs, functional virulence and resistance genes. ST477 and three ST9 genomes are closely related and the distinct differences between them consist primarily of changes in genes involved in multiplication and invasion, which may contribute to the prevalence of ST9 isolates in food and food processing environment.


April 21, 2020

Integrating multiple genomic technologies to investigate an outbreak of carbapenemase-producing Enterobacter hormaechei

Carbapenem-resistant Enterobacteriaceae (CRE) represent one of the most urgent threats to human health posed by antibiotic resistant bacteria. Enterobacter hormaechei and other members of the Enterobacter cloacae complex are the most commonly encountered Enterobacter spp. within clinical settings, responsible for numerous outbreaks and ultimately poorer patient outcomes. Here we applied three complementary whole genome sequencing (WGS) technologies to characterise a hospital cluster of blaIMP-4 carbapenemase-producing E. hormaechei.In response to a suspected CRE outbreak in 2015 within an Intensive Care Unit (ICU)/Burns Unit in a Brisbane tertiary referral hospital we used Illumina sequencing to determine that all outbreak isolates were sequence type (ST)90 and near-identical at the core genome level. Comparison to publicly available data unequivocally linked all 10 isolates to a 2013 isolate from the same ward, confirming the hospital environment as the most likely original source of infection in the 2015 cases. No clonal relationship was found to IMP-4-producing isolates identified from other local hospitals. However, using Pacific Biosciences long-read sequencing we were able to resolve the complete context of the blaIMP-4 gene, which was found to be on a large IncHI2 plasmid carried by all IMP-4-producing isolates. Continued surveillance of the hospital environment was carried out using Oxford Nanopore long-read sequencing, which was able to rapidly resolve the true relationship of subsequent isolates to the initial outbreak. Shotgun metagenomic sequencing of environmental samples also found evidence of ST90 E. hormaechei and the IncHI2 plasmid within the hospital plumbing.Overall, our strategic application of three WGS technologies provided an in-depth analysis of the outbreak, including the transmission dynamics of a carbapenemase-producing E. hormaechei cluster, identification of possible hospital reservoirs and the full context of blaIMP-4 on a multidrug resistant IncHI2 plasmid that appears to be widely distributed in Australia.


April 21, 2020

Complete genome sequence and characterization of virulence genes in Lancefield group C Streptococcus dysgalactiae isolated from farmed amberjack (Seriola dumerili).

Lancefield group C Streptococcus dysgalactiae causes infections in farmed fish. Here, the genome of S. dysgalactiae strain kdys0611, isolated from farmed amberjack (Seriola dumerili) was sequenced. The complete genome sequence of kdys0611 consists of a single chromosome and five plasmids. The chromosome is 2,142,780?bp long and has a GC content of 40%. It possesses 2061 coding sequences and 67 tRNA and 6 rRNA operons. One clustered regularly interspaced short palindromic repeat, 125 insertion sequences, and four predicted prophage elements were identified. Phylogenetic analysis based on 126 core genes suggested that the kdys0611 strain is more closely related to S. dysgalactiae subsp. dysgalactiae than to S. dysgalactiae subsp. equisimilis. The genome of kdys0611 harbors 87 genes with sequence similarity to putative virulence-associated genes identified in other bacteria, of which 57 exhibit amino acid identity (>52%) to genes of the S. dysgalactiae subsp. equisimilis GGS124 human clinical isolate. Four putative virulence genes, emm5 (FGCSD_0256), spg_2 (FGCSD_1961), skc (FGCSD_1012), and cna (FGCSD_0159), in kdys0611 did not show significant homology with any deposited S. dysgalactiae genes. The chromosomal sequence of kdys0611 has been deposited in GenBank under Accession No. AP018726. This is the first report of the complete genome sequence of S. dysgalactiae isolated from fish. © 2019 The Societies and John Wiley & Sons Australia, Ltd.


April 21, 2020

Detection of transferable oxazolidinone resistance determinants in Enterococcus faecalis and Enterococcus faecium of swine origin in Sichuan Province, China.

The aim of this study was to detect the transferable oxazolidinone resistance determinants (cfr, optrA and poxtA) in E. faecalis and E. faecium of swine origin in Sichuan Province, China.A total of 158 enterococci strains (93 E. faecalis and 65 E. faecium) isolated from 25 large-scale swine farms were screened for the presence of cfr, optrA and poxtA by PCR. The genetic environments of cfr, optrA and poxtA were characterized by whole genome sequencing. Transfer of oxazolidinone resistance determinants was determined by conjugation or electrotransformation experiments.The transferable oxazolidinone resistance determinants, cfr, optrA and poxtA, were detected in zero, six, and one enterococci strains, respectively. The poxtA in one E. faecalis strain was located on a 37,990 bp plasmid, which co-harbored fexB, cat, tet(L) and tet(M), and could be conjugated to E. faecalis JH2-2. One E. faecalis strain harbored two different OptrA variants, including one variant with a single substitution, Q219H, which has not been reported previously. Two optrA-carrying plasmids, pC25-1, with a size of 45,581 bp, and pC54, with a size of 64,500 bp, shared a 40,494 bp identical region that contained genetic context IS1216E-fexA-optrA-erm(A)-IS1216E, which could be electrotransformed into Staphylococcus aureus. Four different chromosomal optrA gene clusters were found in five strains, in which optrA was associated with Tn554 or Tn558 that were inserted into the radC gene.Our study highlights the fact that mobile genetic elements, such as plasmids, IS1216E, Tn554 and Tn558, may facilitate the horizontal transmission of optrA or poxtA.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020

Insect genomes: progress and challenges.

In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.


April 21, 2020

A proposed core genome scheme for analyses of the Salmonella genus.

The salmonellae are found in a wide range of animal hosts and many food products for human consumption. Most cases of human disease are caused by S. enterica subspecies I; however as opportunistic pathogens the other subspecies (II-VI) and S. bongori are capable of causing disease. Loci that were not consistently present in all of the species and subspecies were removed from a previously proposed core genome scheme (EBcgMLSTv2.0), the removal of these 252 loci resulted in a core genus scheme (SalmcgMLSTv1.0). SalmcgMLSTv1.0 clustered isolates from the same subspecies more rapidly and more accurately grouped isolates from different subspecies when compared with EBcgMLSTv2.0. All loci within the EBcgMLSTv2.0 scheme were present in over 98% of S. enterica subspecies I isolates and should, therefore, continue to be used for subspecies I analyses, while the SalmcgMLSTv1.0 scheme is more appropriate for cross genus investigations. Copyright © 2019. Published by Elsevier Inc.


April 21, 2020

The use of Online Tools for Antimicrobial Resistance Prediction by Whole Genome Sequencing in MRSA and VRE.

The antimicrobial resistance (AMR) crisis represents a serious threat to public health and has resulted in concentrated efforts to accelerate development of rapid molecular diagnostics for AMR. In combination with publicly-available web-based AMR databases, whole genome sequencing (WGS) offers the capacity for rapid detection of antibiotic resistance genes. Here we studied the concordance between WGS-based resistance prediction and phenotypic susceptibility testing results for methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin resistant Enterococcus (VRE) clinical isolates using publicly-available tools and databases.Clinical isolates prospectively collected at the University of Pittsburgh Medical Center between December 2016 and December 2017 underwent WGS. Antibiotic resistance gene content was assessed from assembled genomes by BLASTn search of online databases. Concordance between WGS-predicted resistance profile and phenotypic susceptibility as well as sensitivity, specificity, positive and negative predictive values (NPV, PPV) were calculated for each antibiotic/organism combination, using the phenotypic results as the gold standard.Phenotypic susceptibility testing and WGS results were available for 1242 isolate/antibiotic combinations. Overall concordance was 99.3% with a sensitivity, specificity, PPV, NPV of 98.7% (95% CI, 97.2-99.5%), 99.6% (95 % CI, 98.8-99.9%), 99.3% (95% CI, 98.0-99.8%), 99.2% (95% CI, 98.3-99.7%), respectively. Additional identification of point mutations in housekeeping genes increased the concordance to 99.4% and the sensitivity to 99.3% (95% CI, 98.2-99.8%) and NPV to 99.4% (95% CI, 98.4-99.8%).WGS can be used as a reliable predicator of phenotypic resistance for both MRSA and VRE using readily-available online tools.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020

deSALT: fast and accurate long transcriptomic read alignment with de Bruijn graph-based index

Long-read RNA sequencing (RNA-seq) is promising to transcriptomics studies, however, the alignment of the reads is still a fundamental but non-trivial task due to the sequencing errors and complicated gene structures. We propose deSALT, a tailored two-pass long RNA-seq read alignment approach, which constructs graph-based alignment skeletons to sensitively infer exons, and use them to generate spliced reference sequence to produce refined alignments. deSALT addresses several difficult issues, such as small exons, serious sequencing errors and consensus spliced alignment. Benchmarks demonstrate that this approach has a better ability to produce high-quality full-length alignments, which has enormous potentials to transcriptomics studies.


April 21, 2020

Extended haplotype phasing of de novo genome assemblies with FALCON-Phase

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. These assemblies can be created in various ways, such as use of tissues that contain single-haplotype (haploid) genomes, or by co-sequencing of parental genomes, but these approaches can be impractical in many situations. We present FALCON-Phase, which integrates long-read sequencing data and ultra-long-range Hi-C chromatin interaction data of a diploid individual to create high-quality, phased diploid genome assemblies. The method was evaluated by application to three datasets, including human, cattle, and zebra finch, for which high-quality, fully haplotype resolved assemblies were available for benchmarking. Phasing algorithm accuracy was affected by heterozygosity of the individual sequenced, with higher accuracy for cattle and zebra finch (>97%) compared to human (82%). In addition, scaffolding with the same Hi-C chromatin contact data resulted in phased chromosome-scale scaffolds.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.