Menu
September 22, 2019

npInv: accurate detection and genotyping of inversions using long read sub-alignment.

Detection of genomic inversions remains challenging. Many existing methods primarily target inzversions with a non repetitive breakpoint, leaving inverted repeat (IR) mediated non-allelic homologous recombination (NAHR) inversions largely unexplored.We present npInv, a novel tool specifically for detecting and genotyping NAHR inversion using long read sub-alignment of long read sequencing data. We benchmark npInv with other tools in both simulation and real data. We use npInv to generate a whole-genome inversion map for NA12878 consisting of 30 NAHR inversions (of which 15 are novel), including all previously known NAHR mediated inversions in NA12878 with flanking IR less than 7kb. Our genotyping accuracy on this dataset was 94%. We used PCR to confirm the presence of two of these novel inversions. We show that there is a near linear relationship between the length of flanking IR and the minimum inversion size, without inverted repeats.The application of npInv shows high accuracy in both simulation and real data. The results give deeper insight into understanding inversion.


September 22, 2019

Hepacivirus A infection in horses defines distinct envelope hypervariable regions and elucidates potential roles of viral strain and adaptive immune status in determining envelope diversity and infection outcome.

Hepacivirus A (also known as nonprimate hepacivirus and equine hepacivirus) is a hepatotropic virus that can cause both transient and persistent infections in horses. The evolution of intrahost viral populations (quasispecies) has not been studied in detail for hepacivirus A, and its roles in immune evasion and persistence are unknown. To address these knowledge gaps, we first evaluated the envelope gene (E1 and E2) diversity of two different hepacivirus A strains (WSU and CU) in longitudinal blood samples from experimentally infected adult horses, juvenile horses (foals), and foals with severe combined immunodeficiency (SCID). Persistent infection with the WSU strain was associated with significantly greater quasispecies diversity than that observed in horses who spontaneously cleared infection (P = 0.0002) or in SCID foals (P < 0.0001). In contrast, the CU strain was able to persist despite significantly lower (P < 0.0001) and relatively static envelope diversity. These findings indicate that envelope diversity is a poor predictor of hepacivirus A infection outcomes and could be dependent on strain-specific factors. Next, entropy analysis was performed on all E1/E2 genes entered into GenBank. This analysis defined three novel hypervariable regions (HVRs) in E2, at residues 391 to 402 (HVR1), 450 to 461 (HVR2), and 550 to 562 (HVR3). For the experimentally infected horses, entropy analysis focusing on the HVRs demonstrated that these regions were under increased selective pressure during persistent infection. Increased diversity in the HVRs was also temporally associated with seroconversion in some horses, suggesting that these regions may be targets of neutralizing antibody and may play a role in immune evasion.IMPORTANCE Hepacivirus C (hepatitis C virus) is estimated to infect 150 million people worldwide and is a leading cause of cirrhosis and hepatocellular carcinoma. In contrast, its closest relative, hepacivirus A, causes relatively mild disease in horses and is frequently cleared. The relationship between quasispecies evolution and infection outcome has not been explored for hepacivirus A. To address this knowledge gap, we examined envelope gene diversity in horses with resolving and persistent infections. Interestingly, two strain-specific patterns of quasispecies diversity emerged. Persistence of the WSU strain was associated with increased quasispecies diversity and the accumulation of amino acid changes within three novel hypervariable regions following seroconversion. These findings provided evidence that envelope gene mutation is influenced by adaptive immune pressure and may contribute to hepacivirus persistence. However, the CU strain persisted despite relative evolutionary stasis, suggesting that some hepacivirus strains may use alternative mechanisms to persist in the host. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Long-term colonization dynamics of Enterococcus faecalis in implanted devices in research macaques.

Enterococcus faecalis is a common opportunistic pathogen that colonizes cephalic recording chambers (CRCs) of macaques used in cognitive neuroscience research. We previously characterized 15 E. faecalis strains isolated from macaques at the Massachusetts Institute of Technology (MIT) in 2011. The goal of this study was to examine how a 2014 protocol change prohibiting the use of antimicrobials within CRCs affected colonizing E. faecalis strains. We collected 20 E. faecalis isolates from 10 macaques between 2013 and 2017 for comparison to 4 isolates previously characterized in 2011 with respect to the sequence type (ST) distribution, antimicrobial resistance, biofilm formation, and changes in genes that might confer a survival advantage. ST4 and ST55 were predominant among the isolates characterized in 2011, whereas the less antimicrobial-resistant lineage ST48 emerged to dominance after 2013. Two macaques remained colonized by ST4 and ST55 strains for 5 and 4 years, respectively. While the antimicrobial resistance and virulence factors identified in these ST4 and ST55 strains remained relatively stable, we detected an increase in biofilm formation ability over time in both isolates. We also found that ST48 strains were typically robust biofilm formers, which could explain why this ST increased in prevalence. Finally, we identified mutations in the DNA mismatch repair genes mutS and mutL in separate ST55 and ST4 strains and confirmed that strains bearing these mutations displayed a hypermutator phenotype. The presence of a hypermutator phenotype may complicate future antimicrobial treatment for clinically relevant E. faecalis infections in macaques.IMPORTANCEEnterococcus faecalis is a common cause of health care-associated infections in humans, largely due to its ability to persist in the hospital environment, colonize patients, acquire antimicrobial resistance, and form biofilms. Understanding how enterococci evolve in health care settings provides insight into factors affecting enterococcal survival and persistence. Macaques used in neuroscience research have long-term cranial implants that, despite best practices, often become colonized by E. faecalis This provides a unique opportunity to noninvasively examine the evolution of enterococci on a long-term indwelling device. We collected E. faecalis strains from cephalic implants over a 7-year period and characterized the sequence type, antimicrobial resistance, virulence factors, biofilm production, and hypermutator phenotypes. Improved antimicrobial stewardship allowed a less-antimicrobial-resistant E. faecalis strain to predominate at the implant interface, potentially improving antimicrobial treatment outcomes if future clinical infections occur. Biofilm formation appears to play an important role in the persistence of the E. faecalis strains associated with these implants. Copyright © 2018 American Society for Microbiology.


September 22, 2019

Evolutionary history of human Plasmodium vivax revealed by genome-wide analyses of related ape parasites.

Wild-living African apes are endemically infected with parasites that are closely related to human Plasmodium vivax, a leading cause of malaria outside Africa. This finding suggests that the origin of P. vivax was in Africa, even though the parasite is now rare in humans there. To elucidate the emergence of human P. vivax and its relationship to the ape parasites, we analyzed genome sequence data of P. vivax strains infecting six chimpanzees and one gorilla from Cameroon, Gabon, and Côte d’Ivoire. We found that ape and human parasites share nearly identical core genomes, differing by only 2% of coding sequences. However, compared with the ape parasites, human strains of P. vivax exhibit about 10-fold less diversity and have a relative excess of nonsynonymous nucleotide polymorphisms, with site-frequency spectra suggesting they are subject to greatly relaxed purifying selection. These data suggest that human P. vivax has undergone an extreme bottleneck, followed by rapid population expansion. Investigating potential host-specificity determinants, we found that ape P. vivax parasites encode intact orthologs of three reticulocyte-binding protein genes (rbp2d, rbp2e, and rbp3), which are pseudogenes in all human P. vivax strains. However, binding studies of recombinant RBP2e and RBP3 proteins to human, chimpanzee, and gorilla erythrocytes revealed no evidence of host-specific barriers to red blood cell invasion. These data suggest that, from an ancient stock of P. vivax parasites capable of infecting both humans and apes, a severely bottlenecked lineage emerged out of Africa and underwent rapid population growth as it spread globally. Copyright © 2018 the Author(s). Published by PNAS.


September 22, 2019

Recurrent loss of HMGCS2 shows that ketogenesis is not essential for the evolution of large mammalian brains.

Apart from glucose, fatty acid-derived ketone bodies provide metabolic energy for the brain during fasting and neonatal development. We investigated the evolution of HMGCS2, the key enzyme required for ketone body biosynthesis (ketogenesis). Unexpectedly, we found that three mammalian lineages, comprising cetaceans (dolphins and whales), elephants and mastodons, and Old World fruit bats have lost this gene. Remarkably, many of these species have exceptionally large brains and signs of intelligent behavior. While fruit bats are sensitive to starvation, cetaceans and elephants can still withstand periods of fasting. This suggests that alternative strategies to fuel large brains during fasting evolved repeatedly and reveals flexibility in mammalian energy metabolism. Furthermore, we show that HMGCS2 loss preceded brain size expansion in toothed whales and elephants. Thus, while ketogenesis was likely important for brain size expansion in modern humans, ketogenesis is not a universal precondition for the evolution of large mammalian brains.© 2018, Jebb et al.


September 22, 2019

The unique evolution of the pig LRC, a single KIR but expansion of LILR and a novel Ig receptor family.

The leukocyte receptor complex (LRC) encodes numerous immunoglobulin (Ig)-like receptors involved in innate immunity. These include the killer-cell Ig-like receptors (KIR) and the leukocyte Ig-like receptors (LILR) which can be polymorphic and vary greatly in number between species. Using the recent long-read genome assembly, Sscrofa11.1, we have characterized the porcine LRC on chromosome 6. We identified a ~?197-kb region containing numerous LILR genes that were missing in previous assemblies. Out of 17 such LILR genes and fragments, six encode functional proteins, of which three are inhibitory and three are activating, while the majority of pseudogenes had the potential to encode activating receptors. Elsewhere in the LRC, between FCAR and GP6, we identified a novel gene that encodes two Ig-like domains and a long inhibitory intracellular tail. Comparison with two other porcine assemblies revealed a second, nearly identical, non-functional gene encoding a short intracellular tail with ambiguous function. These novel genes were found in a diverse range of mammalian species, including a pseudogene in humans, and typically consist of a single long-tailed receptor and a variable number of short-tailed receptors. Using porcine transcriptome data, both the novel inhibitory gene and the LILR were highly expressed in peripheral blood, while the single KIR gene, KIR2DL1, was either very poorly expressed or not at all. These observations are a prerequisite for improved understanding of immune cell functions in the pig and other species.


September 22, 2019

Computational tools to unmask transposable elements.

A substantial proportion of the genome of many species is derived from transposable elements (TEs). Moreover, through various self-copying mechanisms, TEs continue to proliferate in the genomes of most species. TEs have contributed numerous regulatory, transcript and protein innovations and have also been linked to disease. However, notwithstanding their demonstrated impact, many genomic studies still exclude them because their repetitive nature results in various analytical complexities. Fortunately, a growing array of methods and software tools are being developed to cater for them. This Review presents a summary of computational resources for TEs and highlights some of the challenges and remaining gaps to perform comprehensive genomic analyses that do not simply ‘mask’ repeats.


September 22, 2019

Correcting palindromes in long reads after whole-genome amplification.

Next-generation sequencing requires sufficient DNA to be available. If limited, whole-genome amplification is applied to generate additional amounts of DNA. Such amplification often results in many chimeric DNA fragments, in particular artificial palindromic sequences, which limit the usefulness of long sequencing reads.Here, we present Pacasus, a tool for correcting such errors. Two datasets show that it markedly improves read mapping and de novo assembly, yielding results similar to these that would be obtained with non-amplified DNA.With Pacasus long-read technologies become available for sequencing targets with very small amounts of DNA, such as single cells or even single chromosomes.


September 22, 2019

Analysis of structural variants in four African cichlids highlights an association with developmental and immune related genes

African Lakes Cichlids are one of the most impressive example of adaptive radiation. Independently in Lake Victoria, Tanganyika, and Malawi, several hundreds of species arose within the last 10 million to 100,000 years. Whereas most analyses in cichlids focused on nucleotide substitutions across species to investigate the genetic bases of this explosive radiation, to date, no study has investigated the contribution of structural variants (SVs) to speciation events (through a reduction of gene flow) and adaptation to different ecological niches. Here, we annotate and characterize the repertoires and evolutionary potential of different SV classes (deletion, duplication, inversion, insertions and translocations) in five cichlid species (Astatotilapia burtoni, Metriaclima zebra, Neolamprologus brichardi, Pundamilia nyererei and Oreochromis niloticus). We investigate the patterns of gain/loss evolution across the phylogeny for each SV type enabling the identification of both lineage specific events and a set of conserved SVs, common to all four species in the radiation. Both deletion and inversion events show a significant overlap with SINE elements, while inversions additionally show a limited, but significant association with DNA transposons. Genes lying inside inverted regions are enriched for genes regulating behaviour, or involved in skeletal and visual system development. Moreover, we find that duplicated genes show enrichment for textquoterightantigen processing and presentationtextquoteright (GO:0019882) and other immune related categories. Altogether, we provide the first, comprehensive overview of rearrangement evolution in East African Cichlids, and some initial insights into their possible contribution to adaptation.


September 22, 2019

Mosaicism diminishes the value of pre-implantation embryo biopsies for detecting CRISPR/Cas9 induced mutations in sheep.

The production of knock-out (KO) livestock models is both expensive and time consuming due to their long gestational interval and low number of offspring. One alternative to increase efficiency is performing a genetic screening to select pre-implantation embryos that have incorporated the desired mutation. Here we report the use of sheep embryo biopsies for detecting CRISPR/Cas9-induced mutations targeting the gene PDX1 prior to embryo transfer. PDX1 is a critical gene for pancreas development and the target gene required for the creation of pancreatogenesis-disabled sheep. We evaluated the viability of biopsied embryos in vitro and in vivo, and we determined the mutation efficiency using PCR combined with gel electrophoresis and digital droplet PCR (ddPCR). Next, we determined the presence of mosaicism in?~?50% of the recovered fetuses employing a clonal sequencing methodology. While the use of biopsies did not compromise embryo viability, the presence of mosaicism diminished the diagnostic value of the technique. If mosaicism could be overcome, pre-implantation embryo biopsies for mutation screening represents a powerful approach that will streamline the creation of KO animals.


September 22, 2019

Long-read sequencing technology indicates genome-wide effects of non-B DNA on polymerization speed and error rate.

DNA conformation may deviate from the classical B-form in ~13% of the human genome. Non-B DNA regulates many cellular processes; however, its effects on DNA polymerization speed and accuracy have not been investigated genome-wide. Such an inquiry is critical for understanding neurological diseases and cancer genome instability. Here, we present the first simultaneous examination of DNA polymerization kinetics and errors in the human genome sequenced with Single-Molecule Real-Time (SMRT) technology. We show that polymerization speed differs between non-B and B-DNA: It decelerates at G-quadruplexes and fluctuates periodically at disease-causing tandem repeats. Analyzing polymerization kinetics profiles, we predict and validate experimentally non-B DNA formation for a novel motif. We demonstrate that several non-B motifs affect sequencing errors (e.g., G-quadruplexes increase error rates), and that sequencing errors are positively associated with polymerase slowdown. Finally, we show that highly divergent G4 motifs have pronounced polymerization slowdown and high sequencing error rates, suggesting similar mechanisms for sequencing errors and germline mutations.© 2018 Guiblet et al.; Published by Cold Spring Harbor Laboratory Press.


September 22, 2019

Evolutionary conservation of Y Chromosome ampliconic gene families despite extensive structural variation.

Despite claims that the mammalian Y Chromosome is on a path to extinction, comparative sequence analysis of primate Y Chromosomes has shown the decay of the ancestral single-copy genes has all but ceased in this eutherian lineage. The suite of single-copy Y-linked genes is highly conserved among the majority of eutherian Y Chromosomes due to strong purifying selection to retain dosage-sensitive genes. In contrast, the ampliconic regions of the Y Chromosome, which contain testis-specific genes that encode the majority of the transcripts on eutherian Y Chromosomes, are rapidly evolving and are thought to undergo species-specific turnover. However, ampliconic genes are known from only a handful of species, limiting insights into their long-term evolutionary dynamics. We used a clone-based sequencing approach employing both long- and short-read sequencing technologies to assemble ~2.4 Mb of representative ampliconic sequence dispersed across the domestic cat Y Chromosome, and identified the major ampliconic gene families and repeat units. We analyzed fluorescence in situ hybridization, qPCR, and whole-genome sequence data from 20 cat species and revealed that ampliconic gene families are conserved across the cat family Felidae but show high transcript diversity, copy number variation, and structural rearrangement. Our analysis of ampliconic gene evolution unveils a complex pattern of long-term gene content stability despite extensive structural variation on a nonrecombining background.© 2018 Brashear et al.; Published by Cold Spring Harbor Laboratory Press.


September 21, 2019

Characterization of multi-drug resistant Enterococcus faecalis isolated from cephalic recording chambers in research macaques (Macaca spp.).

Nonhuman primates are commonly used for cognitive neuroscience research and often surgically implanted with cephalic recording chambers for electrophysiological recording. Aerobic bacterial cultures from 25 macaques identified 72 bacterial isolates, including 15 Enterococcus faecalis isolates. The E. faecalis isolates displayed multi-drug resistant phenotypes, with resistance to ciprofloxacin, enrofloxacin, trimethoprim-sulfamethoxazole, tetracycline, chloramphenicol, bacitracin, and erythromycin, as well as high-level aminoglycoside resistance. Multi-locus sequence typing showed that most belonged to two E. faecalis sequence types (ST): ST 4 and ST 55. The genomes of three representative isolates were sequenced to identify genes encoding antimicrobial resistances and other traits. Antimicrobial resistance genes identified included aac(6′)-aph(2″), aph(3′)-III, str, ant(6)-Ia, tetM, tetS, tetL, ermB, bcrABR, cat, and dfrG, and polymorphisms in parC (S80I) and gyrA (S83I) were observed. These isolates also harbored virulence factors including the cytolysin toxin genes in ST 4 isolates, as well as multiple biofilm-associated genes (esp, agg, ace, SrtA, gelE, ebpABC), hyaluronidases (hylA, hylB), and other survival genes (ElrA, tpx). Crystal violet biofilm assays confirmed that ST 4 isolates produced more biofilm than ST 55 isolates. The abundance of antimicrobial resistance and virulence factor genes in the ST 4 isolates likely relates to the loss of CRISPR-cas. This macaque colony represents a unique model for studying E. faecalis infection associated with indwelling devices, and provides an opportunity to understand the basis of persistence of this pathogen in a healthcare setting.


September 21, 2019

PacBio assembly of a Plasmodium knowlesi genome sequence with Hi-C correction and manual annotation of the SICAvar gene family.

Plasmodium knowlesi has risen in importance as a zoonotic parasite that has been causing regular episodes of malaria throughout South East Asia. The P. knowlesi genome sequence generated in 2008 highlighted and confirmed many similarities and differences in Plasmodium species, including a global view of several multigene families, such as the large SICAvar multigene family encoding the variant antigens known as the schizont-infected cell agglutination proteins. However, repetitive DNA sequences are the bane of any genome project, and this and other Plasmodium genome projects have not been immune to the gaps, rearrangements and other pitfalls created by these genomic features. Today, long-read PacBio and chromatin conformation technologies are overcoming such obstacles. Here, based on the use of these technologies, we present a highly refined de novo P. knowlesi genome sequence of the Pk1(A+) clone. This sequence and annotation, referred to as the ‘MaHPIC Pk genome sequence’, includes manual annotation of the SICAvar gene family with 136 full-length members categorized as type I or II. This sequence provides a framework that will permit a better understanding of the SICAvar repertoire, selective pressures acting on this gene family and mechanisms of antigenic variation in this species and other pathogens.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.