Menu
April 21, 2020  |  

Long-read sequencing for rare human genetic diseases.

During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.


April 21, 2020  |  

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others. © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.


April 21, 2020  |  

Streptococcus oralis subsp. dentisani Produces Monolateral Serine-Rich Repeat Protein Fibrils, One of Which Contributes to Saliva Binding via Sialic Acid.

Our studies reveal that the oral colonizer and cause of infective endocarditis Streptococcus oralis subsp. dentisani displays a striking monolateral distribution of surface fibrils. Furthermore, our data suggest that these fibrils impact the structure of adherent bacterial chains. Mutagenesis studies indicate that these fibrils are dependent on three serine-rich repeat proteins (SRRPs), here named fibril-associated protein A (FapA), FapB, and FapC, and that each SRRP forms a different fibril with a distinct distribution. SRRPs are a family of bacterial adhesins that have diverse roles in adhesion and that can bind to different receptors through modular nonrepeat region domains. Amino acid sequence and predicted structural similarity searches using the nonrepeat regions suggested that FapA may contribute to interspecies interactions, that FapA and FapB may contribute to intraspecies interactions, and that FapC may contribute to sialic acid binding. We demonstrate that a fapC mutant was significantly reduced in binding to saliva. We confirmed a role for FapC in sialic acid binding by demonstrating that the parental strain was significantly reduced in adhesion upon addition of a recombinantly expressed, sialic acid-specific, carbohydrate binding module, while the fapC mutant was not reduced. However, mutation of a residue previously shown to be essential for sialic acid binding did not decrease bacterial adhesion, leaving the precise mechanism of FapC-mediated adhesion to sialic acid to be defined. We also demonstrate that the presence of any one of the SRRPs is sufficient for efficient biofilm formation. Similar structures were observed on all infective endocarditis isolates examined, suggesting that this distribution is a conserved feature of this S. oralis subspecies.Copyright © 2019 American Society for Microbiology.


April 21, 2020  |  

Comprehensive evaluation of non-hybrid genome assembly tools for third-generation PacBio long-read sequence data.

Long reads obtained from third-generation sequencing platforms can help overcome the long-standing challenge of the de novo assembly of sequences for the genomic analysis of non-model eukaryotic organisms. Numerous long-read-aided de novo assemblies have been published recently, which exhibited superior quality of the assembled genomes in comparison with those achieved using earlier second-generation sequencing technologies. Evaluating assemblies is important in guiding the appropriate choice for specific research needs. In this study, we evaluated 10 long-read assemblers using a variety of metrics on Pacific Biosciences (PacBio) data sets from different taxonomic categories with considerable differences in genome size. The results allowed us to narrow down the list to a few assemblers that can be effectively applied to eukaryotic assembly projects. Moreover, we highlight how best to use limited genomic resources for effectively evaluating the genome assemblies of non-model organisms. © The Author 2017. Published by Oxford University Press.


April 21, 2020  |  

a-Difluoromethylornithine reduces gastric carcinogenesis by causing mutations in Helicobacter pylori cagY.

Infection by Helicobacter pylori is the primary cause of gastric adenocarcinoma. The most potent H. pylori virulence factor is cytotoxin-associated gene A (CagA), which is translocated by a type 4 secretion system (T4SS) into gastric epithelial cells and activates oncogenic signaling pathways. The gene cagY encodes for a key component of the T4SS and can undergo gene rearrangements. We have shown that the cancer chemopreventive agent a-difluoromethylornithine (DFMO), known to inhibit the enzyme ornithine decarboxylase, reduces H. pylori-mediated gastric cancer incidence in Mongolian gerbils. In the present study, we questioned whether DFMO might directly affect H. pylori pathogenicity. We show that H. pylori output strains isolated from gerbils treated with DFMO exhibit reduced ability to translocate CagA in gastric epithelial cells. Further, we frequently detected genomic modifications in the middle repeat region of the cagY gene of output strains from DFMO-treated animals, which were associated with alterations in the CagY protein. Gerbils did not develop carcinoma when infected with a DFMO output strain containing rearranged cagY or the parental strain in which the wild-type cagY was replaced by cagY with DFMO-induced rearrangements. Lastly, we demonstrate that in vitro treatment of H. pylori by DFMO induces oxidative DNA damage, expression of the DNA repair enzyme MutS2, and mutations in cagY, demonstrating that DFMO directly affects genomic stability. Deletion of mutS2 abrogated the ability of DFMO to induce cagY rearrangements directly. In conclusion, DFMO-induced oxidative stress in H. pylori leads to genomic alterations and attenuates virulence.


April 21, 2020  |  

Analyses of four new Caulobacter Phicbkviruses indicate independent lineages.

Bacteriophages with genomes larger than 200 kbp are considered giant phages, and the giant Phicbkviruses are the most frequently isolated Caulobacter crescentus phages. In this study, we compare six bacteriophage genomes that differ from the genomes of the majority of Phicbkviruses. Four of these genomes are much larger than those of the rest of the Phicbkviruses, with genome sizes that are more than 250 kbp. A comparison of 16 Phicbkvirus genomes identified a ‘core genome’ of 69 genes that is present in all of these Phicbkvirus genomes, as well as shared accessory genes and genes that are unique for each phage. Most of the core genes are clustered into the regions coding for structural proteins or those involved in DNA replication. A phylogenetic analysis indicated that these 16 CaulobacterPhicbkvirus genomes are related, but they represent four distinct branches of the Phicbkvirus genomic tree with distantly related branches sharing little nucleotide homology. In contrast, pairwise comparisons within each branch of the phylogenetic tree showed that more than 80?% of the entire genome is shared among phages within a group. This conservation of the genomes within each branch indicates that horizontal gene transfer events between the groups are rare. Therefore, the Phicbkvirus genus consists of at least four different phylogenetic branches that are evolving independently from one another. One of these branches contains a 27-gene inversion relative to the other three branches. Also, an analysis of the tRNA genes showed that they are relatively mobile within the Phicbkvirus genus.


April 21, 2020  |  

Long-read sequencing identified intronic repeat expansions in SAMD12 from Chinese pedigrees affected with familial cortical myoclonic tremor with epilepsy.

The locus for familial cortical myoclonic tremor with epilepsy (FCMTE) has long been mapped to 8q24 in linkage studies, but the causative mutations remain unclear. Recently, expansions of intronic TTTCA and TTTTA repeat motifs within SAMD12 were found to be involved in the pathogenesis of FCMTE in Japanese pedigrees. We aim to identify the causative mutations of FCMTE in Chinese pedigrees.We performed genetic linkage analysis by microsatellite markers in a five-generation Chinese pedigree with 55 members. We also used array-comparative genomic hybridisation (CGH) and next-generation sequencing (NGS) technologies (whole-exome sequencing, capture region deep sequencing and whole-genome sequencing) to identify the causative mutations in the disease locus. Recently, we used low-coverage (~10×) long-read genome sequencing (LRS) on the PacBio Sequel and Oxford Nanopore platforms to identify the causative mutations, and used repeat-primed PCR for validation of the repeat expansions.Linkage analysis mapped the disease locus to 8q23.3-24.23. Array-CGH and NGS failed to identify causative mutations in this locus. LRS identified the intronic TTTCA and TTTTA repeat expansions in SAMD12 as the causative mutations, thus corroborating the recently published results in Japanese pedigrees.We identified the pentanucleotide repeat expansion in SAMD12 as the causative mutation in Chinese FCMTE pedigrees. Our study also suggested that LRS is an effective tool for molecular diagnosis of genetic disorders, especially for neurological diseases that cannot be positively diagnosed by conventional clinical microarray and NGS technologies. © Author(s) (or their employer(s)) 2019. No commercial re-use. See rights and permissions. Published by BMJ.


April 21, 2020  |  

Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases.

Long-read sequencing technology is now capable of reading single-molecule DNA with an average read length of more than 10?kb, fully enabling the coverage of large structural variations (SVs). This advantage may pave the way for the detection of unprecedented SVs as well as repeat expansions. Pathogenic SVs of only known genes used to be selectively analyzed based on prior knowledge of target DNA sequence. The unbiased application of long-read whole-genome sequencing (WGS) for the detection of pathogenic SVs has just begun. Here, we apply PacBio SMRT sequencing in a Japanese family with benign adult familial myoclonus epilepsy (BAFME). Our SV selection of low-coverage WGS data (7×) narrowed down the candidates to only six SVs in a 7.16-Mb region of the BAFME1 locus and correctly determined an approximately 4.6-kb SAMD12 intronic repeat insertion, which is causal of BAFME1. These results indicate that long-read WGS is potentially useful for evaluating all of the known SVs in a genome and identifying new disease-causing SVs in combination with other genetic methods to resolve the genetic causes of currently unexplained diseases.


April 21, 2020  |  

Sequential evolution of virulence and resistance during clonal spread of community-acquired methicillin-resistant Staphylococcus aureus.

The past two decades have witnessed an alarming expansion of staphylococcal disease caused by community-acquired methicillin-resistant Staphylococcus aureus (CA-MRSA). The factors underlying the epidemic expansion of CA-MRSA lineages such as USA300, the predominant CA-MRSA clone in the United States, are largely unknown. Previously described virulence and antimicrobial resistance genes that promote the dissemination of CA-MRSA are carried by mobile genetic elements, including phages and plasmids. Here, we used high-resolution genomics and experimental infections to characterize the evolution of a USA300 variant plaguing a patient population at increased risk of infection to understand the mechanisms underlying the emergence of genetic elements that facilitate clonal spread of the pathogen. Genetic analyses provided conclusive evidence that fitness (manifest as emergence of a dominant clone) changed coincidently with the stepwise emergence of (i) a unique prophage and mutation of the regulator of the pyrimidine nucleotide biosynthetic operon that promoted abscess formation and colonization, respectively, thereby priming the clone for success; and (ii) a unique plasmid that conferred resistance to two topical microbiocides, mupirocin and chlorhexidine, frequently used for decolonization and infection prevention. The resistance plasmid evolved through successive incorporation of DNA elements from non-S. aureus spp. into an indigenous cryptic plasmid, suggesting a mechanism for interspecies genetic exchange that promotes antimicrobial resistance. Collectively, the data suggest that clonal spread in a vulnerable population resulted from extensive clinical intervention and intense selection pressure toward a pathogen lifestyle that involved the evolution of consequential mutations and mobile genetic elements.


April 21, 2020  |  

Long-read sequencing identifies GGC repeat expansions in NOTCH2NLC associated with neuronal intranuclear inclusion disease.

Neuronal intranuclear inclusion disease (NIID) is a progressive neurodegenerative disease that is characterized by eosinophilic hyaline intranuclear inclusions in neuronal and somatic cells. The wide range of clinical manifestations in NIID makes ante-mortem diagnosis difficult1-8, but skin biopsy enables its ante-mortem diagnosis9-12. The average onset age is 59.7 years among approximately 140 NIID cases consisting of mostly sporadic and several familial cases. By linkage mapping of a large NIID family with several affected members (Family 1), we identified a 58.1 Mb linked region at 1p22.1-q21.3 with a maximum logarithm of the odds score of 4.21. By long-read sequencing, we identified a GGC repeat expansion in the 5′ region of NOTCH2NLC (Notch 2 N-terminal like C) in all affected family members. Furthermore, we found similar expansions in 8 unrelated families with NIID and 40 sporadic NIID cases. We observed abnormal anti-sense transcripts in fibroblasts specifically from patients but not unaffected individuals. This work shows that repeat expansion in human-specific NOTCH2NLC, a gene that evolved by segmental duplication, causes a human disease.


April 21, 2020  |  

Blast Fungal Genomes Show Frequent Chromosomal Changes, Gene Gains and Losses, and Effector Gene Turnover.

Pyricularia is a fungal genus comprising several pathogenic species causing the blast disease in monocots. Pyricularia oryzae, the best-known species, infects rice, wheat, finger millet, and other crops. As past comparative and population genomics studies mainly focused on isolates of P. oryzae, the genomes of the other Pyricularia species have not been well explored. In this study, we obtained a chromosomal-level genome assembly of the finger millet isolate P. oryzae MZ5-1-6 and also highly contiguous assemblies of Pyricularia sp. LS, P. grisea, and P. pennisetigena. The differences in the genomic content of repetitive DNA sequences could largely explain the variation in genome size among these new genomes. Moreover, we found extensive gene gains and losses and structural changes among Pyricularia genomes, including a large interchromosomal translocation. We searched for homologs of known blast effectors across fungal taxa and found that most avirulence effectors are specific to Pyricularia, whereas many other effectors share homologs with distant fungal taxa. In particular, we discovered a novel effector family with metalloprotease activity, distinct from the well-known AVR-Pita family. We predicted 751 gene families containing putative effectors in 7 Pyricularia genomes and found that 60 of them showed differential expression in the P. oryzae MZ5-1-6 transcriptomes obtained under experimental conditions mimicking the pathogen infection process. In summary, this study increased our understanding of the structural, functional, and evolutionary genomics of the blast pathogen and identified new potential effector genes, providing useful data for developing crops with durable resistance. © The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


April 21, 2020  |  

Recompleting the Caenorhabditis elegans genome.

Caenorhabditis elegans was the first multicellular eukaryotic genome sequenced to apparent completion. Although this assembly employed a standard C. elegans strain (N2), it used sequence data from several laboratories, with DNA propagated in bacteria and yeast. Thus, the N2 assembly has many differences from any C. elegans available today. To provide a more accurate C. elegans genome, we performed long-read assembly of VC2010, a modern strain derived from N2. Our VC2010 assembly has 99.98% identity to N2 but with an additional 1.8 Mb including tandem repeat expansions and genome duplications. For 116 structural discrepancies between N2 and VC2010, 97 structures matching VC2010 (84%) were also found in two outgroup strains, implying deficiencies in N2. Over 98% of N2 genes encoded unchanged products in VC2010; moreover, we predicted =53 new genes in VC2010. The recompleted genome of C. elegans should be a valuable resource for genetics, genomics, and systems biology. © 2019 Yoshimura et al.; Published by Cold Spring Harbor Laboratory Press.


April 21, 2020  |  

5’UTR-mediated regulation of Ataxin-1 expression.

Expression of mutant Ataxin-1 with an abnormally expanded polyglutamine domain is necessary for the onset and progression of spinocerebellar ataxia type 1 (SCA1). Understanding how Ataxin-1 expression is regulated in the human brain could inspire novel molecular therapies for this fatal, dominantly inherited neurodegenerative disease. Previous studies have shown that the ATXN1 3’UTR plays a key role in regulating the Ataxin-1 cellular pool via diverse post-transcriptional mechanisms. Here we show that elements within the ATXN1 5’UTR also participate in the regulation of Ataxin-1 expression. PCR and PacBio sequencing analysis of cDNA obtained from control and SCA1 human brain samples revealed the presence of three major, alternatively spliced ATXN1 5’UTR variants. In cell-based assays, fusion of these variants upstream of an EGFP reporter construct revealed significant and differential impacts on total EGFP protein output, uncovering a type of genetic rheostat-like function of the ATXN1 5’UTR. We identified ribosomal scanning of upstream AUG codons and increased transcript instability as potential mechanisms of regulation. Importantly, transcript-based analyses revealed significant differences in the expression pattern of ATXN1 5’UTR variants between control and SCA1 cerebellum. Together, the data presented here shed light into a previously unknown role for the ATXN1 5’UTR in the regulation of Ataxin-1 and provide new opportunities for the development of SCA1 therapeutics. Copyright © 2019. Published by Elsevier Inc.


April 21, 2020  |  

Long-read sequencing unveils IGH-DUX4 translocation into the silenced IGH allele in B-cell acute lymphoblastic leukemia.

IGH@ proto-oncogene translocation is a common oncogenic event in lymphoid lineage cancers such as B-ALL, lymphoma and multiple myeloma. Here, to investigate the interplay between IGH@ proto-oncogene translocation and IGH allelic exclusion, we perform long-read whole-genome and transcriptome sequencing along with epigenetic and 3D genome profiling of Nalm6, an IGH-DUX4 positive B-ALL cell line. We detect significant allelic imbalance on the wild-type over the IGH-DUX4 haplotype in expression and epigenetic data, showing IGH-DUX4 translocation occurs on the silenced IGH allele. In vitro, this reduces the oncogenic stress of DUX4 high-level expression. Moreover, patient samples of IGH-DUX4 B-ALL have similar expression profile and IGH breakpoints as Nalm6, suggesting a common mechanism to allow optimal dosage of non-toxic DUX4 expression.


April 21, 2020  |  

Methicillin-Resistant Staphylococcus aureus Blood Isolates Harboring a Novel Pseudo-staphylococcal Cassette Chromosome mec Element.

The aim of this work was to assess a novel pseudo-staphylococcal cassette chromosome mec (?SCCmec) element in methicillin-resistant Staphylococcus aureus (MRSA) blood isolates. Community-associated MRSA E16SA093 and healthcare-associated MRSA F17SA003 isolates were recovered from the blood specimens of patients with S. aureus bacteremia in 2016 and in 2017, respectively. Antimicrobial susceptibility was determined via the disk diffusion method, and SCCmec typing was conducted by multiplex polymerase chain reaction. Whole genome sequencing was carried out by single molecule real-time long-read sequencing. Both isolates belonged to sequence type 72 and agr-type I, and they were negative for Panton-Valentine leukocidin and toxic shock syndrome toxin. The spa-types of E16SA093 and F17SA003 were t324 and t2460, respectively. They had a SCCmec IV-like element devoid of the cassette chromosome recombinase (ccr) gene complex, designated as ?SCCmecE16SA093. The element was manufactured from SCCmec type IV and the deletion of the ccr gene complex and a 7.0- and 31.9-kb portion of each chromosome. The deficiency of the ccr gene complex in the SCCmec unit is likely resulting in mobility loss, which would be an adaptive evolutionary mechanism. The dissemination of this clone should be monitored closely.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.