Menu
July 7, 2019

Challenges, solutions, and quality metrics of personal genome assembly in advancing precision medicine.

Even though each of us shares more than 99% of the DNA sequences in our genome, there are millions of sequence codes or structure in small regions that differ between individuals, giving us different characteristics of appearance or responsiveness to medical treatments. Currently, genetic variants in diseased tissues, such as tumors, are uncovered by exploring the differences between the reference genome and the sequences detected in the diseased tissue. However, the public reference genome was derived with the DNA from multiple individuals. As a result of this, the reference genome is incomplete and may misrepresent the sequence variants of the general population. The more reliable solution is to compare sequences of diseased tissue with its own genome sequence derived from tissue in a normal state. As the price to sequence the human genome has dropped dramatically to around $1000, it shows a promising future of documenting the personal genome for every individual. However, de novo assembly of individual genomes at an affordable cost is still challenging. Thus, till now, only a few human genomes have been fully assembled. In this review, we introduce the history of human genome sequencing and the evolution of sequencing platforms, from Sanger sequencing to emerging “third generation sequencing” technologies. We present the currently available de novo assembly and post-assembly software packages for human genome assembly and their requirements for computational infrastructures. We recommend that a combined hybrid assembly with long and short reads would be a promising way to generate good quality human genome assemblies and specify parameters for the quality assessment of assembly outcomes. We provide a perspective view of the benefit of using personal genomes as references and suggestions for obtaining a quality personal genome. Finally, we discuss the usage of the personal genome in aiding vaccine design and development, monitoring host immune-response, tailoring drug therapy and detecting tumors. We believe the precision medicine would largely benefit from bioinformatics solutions, particularly for personal genome assembly.


July 7, 2019

SAR11 bacteria linked to ocean anoxia and nitrogen loss.

Bacteria of the SAR11 clade constitute up to one half of all microbial cells in the oxygen-rich surface ocean. SAR11 bacteria are also abundant in oxygen minimum zones (OMZs), where oxygen falls below detection and anaerobic microbes have vital roles in converting bioavailable nitrogen to N2 gas. Anaerobic metabolism has not yet been observed in SAR11, and it remains unknown how these bacteria contribute to OMZ biogeochemical cycling. Here, genomic analysis of single cells from the world’s largest OMZ revealed previously uncharacterized SAR11 lineages with adaptations for life without oxygen, including genes for respiratory nitrate reductases (Nar). SAR11 nar genes were experimentally verified to encode proteins catalysing the nitrite-producing first step of denitrification and constituted ~40% of OMZ nar transcripts, with transcription peaking in the anoxic zone of maximum nitrate reduction activity. These results link SAR11 to pathways of ocean nitrogen loss, redefining the ecological niche of Earth’s most abundant organismal group.


July 7, 2019

Representing genetic variation with synthetic DNA standards.

The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed ‘sequins’, that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.


July 7, 2019

New Delhi metallo-ß-lactamase-1-producing Klebsiella pneumoniae, Florida, USA(1).

New Delhi metallo-ß-lactamase (NDM)–producing Enterobacteriaceae have swiftly spread worldwide since an initial report in 2008 from a patient who had been transferred from India back home to Sweden (1). Epidemiologically, the global diffusion of NDM-1 producers has been associated with the Indian subcontinent and the Balkan region, which are considered the primary and secondary reservoirs of these pathogens, respectively (1). However, recent reports suggest that countries in the Middle East may constitute another potential reservoir for NDM-1 producers (1). More than 100 NDM-producing isolates have been reported in the United States, most of which were associated with recent travel from the Indian subcontinent (2,3). We report an NDM-1–producing Klebsiella pneumoniae strain that was recovered from a patient who had been transferred from Iran to a hospital in Florida, United States.


July 7, 2019

The report of my death was an exaggeration: A review for researchers using microsatellites in the 21st century.

Microsatellites, or simple sequence repeats (SSRs), have long played a major role in genetic studies due to their typically high polymorphism. They have diverse applications, including genome mapping, forensics, ascertaining parentage, population and conservation genetics, identification of the parentage of polyploids, and phylogeography. We compare SSRs and newer methods, such as genotyping by sequencing (GBS) and restriction site associated DNA sequencing (RAD-Seq), and offer recommendations for researchers considering which genetic markers to use. We also review the variety of techniques currently used for identifying microsatellite loci and developing primers, with a particular focus on those that make use of next-generation sequencing (NGS). Additionally, we review software for microsatellite development and report on an experiment to assess the utility of currently available software for SSR development. Finally, we discuss the future of microsatellites and make recommendations for researchers preparing to use microsatellites. We argue that microsatellites still have an important place in the genomic age as they remain effective and cost-efficient markers.


July 7, 2019

Whole genomic sequence analysis of Bacillus infantis: defining the genetic blueprint of strain NRRL B-14911, an emerging cardiopathogenic microbe.

We recently reported the identification of Bacillus sp. NRRL B-14911 that induces heart autoimmunity by generating cardiac-reactive T cells through molecular mimicry. This marine bacterium was originally isolated from the Gulf of Mexico, but no associations with human diseases were reported. Therefore, to characterize its biological and medical significance, we sought to determine and analyze the complete genome sequence of Bacillus sp. NRRL B-14911.Based on the phylogenetic analysis of 16S ribosomal RNA (rRNA) genes, sequence analysis of the 16S-23S rDNA intergenic transcribed spacers, phenotypic microarray, and matrix-assisted laser desorption ionization time-of-flight mass spectrometry, we propose that this organism belongs to the species Bacillus infantis, previously shown to be associated with sepsis in a newborn child. Analysis of the complete genome of Bacillus sp. NRRL B-14911 revealed several virulence factors including adhesins, invasins, colonization factors, siderophores and transporters. Likewise, the bacterial genome encodes a wide range of methyl transferases, transporters, enzymatic and biochemical pathways, and insertion sequence elements that are distinct from other closely related bacilli.The complete genome sequence of Bacillus sp. NRRL B-14911 provided in this study may facilitate genetic manipulations to assess gene functions associated with bacterial survival and virulence. Additionally, this bacterium may serve as a useful tool to establish a disease model that permits systematic analysis of autoimmune events in various susceptible rodent strains.


July 7, 2019

Association between progranulin and Gaucher disease.

Gaucher disease (GD) is a genetic disease caused by mutations in the GBA1 gene which result in reduced enzymatic activity of ß-glucocerebrosidase (GCase). This study identified the progranulin (PGRN) gene (GRN) as another gene associated with GD.Serum levels of PGRN were measured from 115 GD patients and 99 healthy controls, whole GRN gene from 40 GD patients was sequenced, and the genotyping of 4 SNPs identified in GD patients was performed in 161 GD and 142 healthy control samples. Development of GD in PGRN-deficient mice was characterized, and the therapeutic effect of rPGRN on GD analyzed.Serum PGRN levels were significantly lower in GD patients (96.65±53.45ng/ml) than those in healthy controls of the general population (164.99±43.16ng/ml, p<0.0001) and of Ashkenazi Jews (150.64±33.99ng/ml, p<0.0001). Four GRN gene SNPs, including rs4792937, rs78403836, rs850713, and rs5848, and three point mutations, were identified in a full-length GRN gene sequencing in 40 GD patients. Large scale SNP genotyping in 161 GD and 142 healthy controls was conducted and the four SNP sites have significantly higher frequency in GD patients. In addition, "aged" and challenged adult PGRN null mice develop GD-like phenotypes, including typical Gaucher-like cells in lung, spleen, and bone marrow. Moreover, lysosomes in PGRN KO mice exhibit a tubular-like appearance. PGRN is required for the lysosomal appearance of GCase and its deficiency leads to GCase accumulation in the cytoplasm. More importantly, recombinant PGRN is therapeutic in various animal models of GD and human fibroblasts from GD patients.Our data demonstrates an unknown association between PGRN and GD and identifies PGRN as an essential factor for GCase's lysosomal localization. These findings not only provide new insight into the pathogenesis of GD, but may also have implications for diagnosis and alternative targeted therapies for GD. Copyright © 2016 Forschungsgesellschaft für Arbeitsphysiologie und Arbeitschutz e.V. Published by Elsevier B.V. All rights reserved.


July 7, 2019

Next-generation sequencing: a diagnostic one-stop shop for Hepatitis C?

Before starting chronic hepatitis C treatment, the viral genotype/subtype has to be accurately determined and potentially coupled with drug resistance testing. Due to the high genetic variability of the hepatitis C virus, this can be a demanding task that can potentially be streamlined by viral whole-genome sequencing using next-generation sequencing as demonstrated by an article in this issue of the Journal of Clinical Microbiology by E. Thomson, C. L. C. Ip, A. Badhan, M. T. Christiansen, W. Adamson, et al. (J Clin Microbiol. 54:2455-2469, 2016, http://dx.doi.org/10.1128/JCM.00330-16). Copyright © 2016, American Society for Microbiology. All Rights Reserved.


July 7, 2019

Complete genome sequence of Bordetella pertussis strain VA-190 isolated from a vaccinated 10-year-old patient with whooping cough.

The number of cases of pertussis has increased in the United States despite vaccination. We present the genome of an isolate of Bordetella pertussis from a vaccinated patient from Virginia. The genome was sequenced by long-read methodology and compared to that of a clinical isolate used for laboratory studies, D420. Copyright © 2016 Eby et al.


July 7, 2019

The effects of signal erosion and core genome reduction on the identification of diagnostic markers.

Whole-genome sequence (WGS) data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes are analyzed, the core genome size is reduced and there is erosion of the target-specific regions due to commonality with related species, potentially resulting in the identification of false positives and/or false negatives.A comparative analysis of 1,130 Burkholderia genomes identified unique markers for many named species, including the human pathogens B. pseudomallei and B. mallei Due to core genome reduction and signature erosion, only 38 targets specific to B. pseudomallei/mallei were identified. By using only public genomes, a larger number of markers were identified, due to undersampling, and this larger number represents the potential for false positives. This analysis has implications for the design of diagnostics for other species where the genomic space of the target and/or closely related species is not well defined. Copyright © 2016 Sahl et al.


July 7, 2019

Comparative genomics analysis of Streptococcus tigurinus strains identifies genetic elements specifically and uniquely present in highly virulent strains.

Streptococcus tigurinus is responsible for severe invasive infections such as infective endocarditis, spondylodiscitis and meningitis. As described, S. tigurinus isolates AZ_3aT and AZ_14 were highly virulent (HV phenotype) in an experimental model of infective endocarditis and showed enhanced adherence and invasion of human endothelial cells when compared to low virulent S. tigurinus isolate AZ_8 (LV phenotype). Here, we sought whether genetic determinants could explain the higher virulence of AZ_3aT and AZ_14 isolates. Several genetic determinants specific to the HV strains were identified through extensive comparative genomics amongst which some were thought to be highly relevant for the observed HV phenotype. These included i) an iron uptake and metabolism operon, ii) an ascorbate assimilation operon, iii) a newly acquired PI-2-like pilus islets described for the first time in S. tigurinus, iv) a hyaluronate metabolism operon, v) an Entner-Doudoroff pathway of carbohydrates metabolism, and vi) an alternate pathways for indole biosynthesis. We believe that the identified genomic features could largely explain the phenotype of high infectivity of the two HV S. tigurinus strains. Indeed, these features include determinants that could be involved at different stages of the disease such as survival of S. tigurinus in blood (iron uptake and ascorbate metabolism operons), initial attachment of bacterial pathogen to the damaged cardiac tissue and/or vegetation that formed on site (PI-2-like pilus islets), tissue invasion (hyaluronate operon and Entner-Doudoroff pathway) and regulation of pathogenicity (indole biosynthesis pathway).


July 7, 2019

Third-generation sequencing and analysis of four complete pig liver esterase gene sequences in clones identified by screening BAC library.

Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing.After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis.Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression.This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for investigation of the genetic structure, function, and regulatory mechanisms of the PLE gene family.


July 7, 2019

Isolation and plasmid characterization of carbapenemase (IMP-4) producing Salmonella enterica Typhimurium from cats.

Carbapenem-resistant Enterobacteriaceae (CRE) are a pressing public health issue due to limited therapeutic options to treat such infections. CREs have been predominantly isolated from humans and environmental samples and they are rarely reported among companion animals. In this study we report on the isolation and plasmid characterization of carbapenemase (IMP-4) producing Salmonella enterica Typhimurium from a companion animal. Carbapenemase-producing S. enterica Typhimurium carrying blaIMP-4 was identified from a systemically unwell (index) cat and three additional cats at an animal shelter. All isolates were identical and belonged to ST19. Genome sequencing revealed the acquisition of a multidrug-resistant IncHI2 plasmid (pIMP4-SEM1) that encoded resistance to nine antimicrobial classes including carbapenems and carried the blaIMP-4-qacG-aacA4-catB3 cassette array. The plasmid also encoded resistance to arsenic (MIC-150?mM). Comparative analysis revealed that the plasmid pIMP4-SEM1 showed greatest similarity to two blaIMP-8 carrying IncHI2 plasmids from Enterobacter spp. isolated from humans in China. This is the first report of CRE carrying a blaIMP-4 gene causing a clinical infection in a companion animal, with presumed nosocomial spread. This study illustrates the broader community risk entailed in escalating CRE transmission within a zoonotic species such as Salmonella, and in a cycle that encompasses humans, animals and the environment.


July 7, 2019

Sequence assembly of Yarrowia lipolytica strain W29/CLIB89 shows transposable element diversity.

Yarrowia lipolytica, an oleaginous yeast, is capable of accumulating significant cellular mass in lipid making it an important source of biosustainable hydrocarbon-based chemicals. In spite of a similar number of protein-coding genes to that in other Hemiascomycetes, the Y. lipolytica genome is almost double that of model yeasts. Despite its economic importance and several distinct strains in common use, an independent genome assembly exists for only one strain. We report here a de novo annotated assembly of the chromosomal genome of an industrially-relevant strain, W29/CLIB89, determined by hybrid next-generation sequencing. For the first time, each Y. lipolytica chromosome is represented by a single contig. The telomeric rDNA repeats were localized by Irys long-range genome mapping and one complete copy of the rDNA sequence is reported. Two large structural variants and retroelement differences with reference strain CLIB122 including a full-length, novel Ty3/Gypsy long terminal repeat (LTR) retrotransposon and multiple LTR-like sequences are described. Strikingly, several of these are adjacent to RNA polymerase III-transcribed genes, which are almost double in number in Y. lipolytica compared to other Hemiascomycetes. In addition to previously-reported dimeric RNA polymerase III-transcribed genes, tRNA pseudogenes were identified. Multiple full-length and truncated LINE elements are also present. Therefore, although identified transposons do not constitute a significant fraction of the Y. lipolytica genome, they could have played an active role in its evolution. Differences between the sequence of this strain and of the existing reference strain underscore the utility of an additional independent genome assembly for this economically important organism.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.