Menu
July 7, 2019

The evolution and population diversity of human-specific segmental duplications

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (N?=?80 genes from 33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed ‘core duplicons’ and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (such as TCAF1/TCAF2), we highlight ten gene families (for example, ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.


July 7, 2019

Complete genome sequences of three Cupriavidus strains isolated from various Malaysian environments.

Cupriavidus sp. USMAA1020, USMAA2-4, and USMAHM13 are capable of producing polyhydroxyalkanoate (PHA). This biopolymer is an alternative solution to synthetic plastics, whereby polyhydroxyalkanoate synthase is the key enzyme involved in PHA biosynthesis. Here, we report the complete genomes of three Cupriavidus sp. strains: USMAA1020, USMAA2-4, and USMAHM13. Copyright © 2017 Shafie et al.


July 7, 2019

Complete genome sequence of Mycoplasma pneumoniae type 2 reference strain FH using single-molecule real-time sequencing technology.

Mycoplasma pneumoniae type 2 strain FH was previously sequenced with Illumina (FH-Illumina) and 454 (FH-454) technologies according to Xiao et al. (2015) and Krishnakumar et al. (2010). Comparative analyses revealed differences in genomic content between these sequences, including a 6-kb region absent from the FH-454 submission. Here, we present a complete genome sequence of FH sequenced with the Pacific Biosciences RSII platform. Copyright © 2017 Desai et al.


July 7, 2019

Genomic sequencing of a strain of Acinetobacter baumannii and potential mechanisms to antibiotics resistance.

Acinetobacter baumannii has been becoming a great challenge to clinicians due to their resistance to almost all available antibiotics. In this study, we sequenced the genome from a multiple antibiotics resistant Acinetobacter baumannii stain which was named A. baumannii-1isolated from China by SMRT sequencing technology to explore its potential mechanisms to antibiotic resistance. We found that several mechanisms might contribute to the antibiotic resistance of Acinetobacter baumannii. Specifically, we found that SNP in genes associated with nucleotide excision repair and ABC transporter might contribute to its resistance to multiple antibiotics; we also found that specific genes associated with bacterial DNA integration and recombination, DNA-mediated transposition and response to antibiotics might contribute to its resistance to multiple antibiotics; Furthermore, specific genes associated with penicillin and cephalosporin biosynthetic pathway and specific genes associated with CHDL and MBL ß-lactamase genes might contribute to its resistance to multiple antibiotics. Thus, the detailed mechanisms by which Acinetobacter baumannii show extensive resistance to multiple antibiotics are very complicated. Such a study might be helpful to develop new strategies to control Acinetobacter baumannii infection. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

A Clostridioides difficile bacteriophage genome encodes functional binary toxin-associated genes.

Pathogenic clostridia typically produce toxins as virulence factors which cause severe diseases in both humans and animals. Whereas many clostridia like e.g., Clostridium perfringens, Clostridium botulinum or Clostridium tetani were shown to contain toxin-encoding plasmids, only toxin genes located on the chromosome were detected in Clostridioides difficile so far. In this study, we determined, annotated, and analyzed the complete genome of the bacteriophage phiSemix9P1 using single-molecule real-time sequencing technology (SMRT). To our knowledge, this represents the first C. difficile-associated bacteriophage genome that carries a complete functional binary toxin locus in its genome. Copyright © 2017 Elsevier B.V. All rights reserved.


July 7, 2019

Identification of a Pseudomonas aeruginosa PAO1 DNA methyltransferase, its targets, and physiological roles.

DNA methylation is widespread among prokaryotes, and most DNA methylation reactions are catalyzed by adenine DNA methyltransferases, which are part of restriction-modification (R-M) systems. R-M systems are known for their role in the defense against foreign DNA; however, DNA methyltransferases also play functional roles in gene regulation. In this study, we used single-molecule real-time (SMRT) sequencing to uncover the genome-wide DNA methylation pattern in the opportunistic pathogen Pseudomonas aeruginosa PAO1. We identified a conserved sequence motif targeted by an adenine methyltransferase of a type I R-M system and quantified the presence of N(6)-methyladenine using liquid chromatography-tandem mass spectrometry (LC-MS/MS). Changes in the PAO1 methylation status were dependent on growth conditions and affected P. aeruginosa pathogenicity in a Galleria mellonella infection model. Furthermore, we found that methylated motifs in promoter regions led to shifts in sense and antisense gene expression, emphasizing the role of enzymatic DNA methylation as an epigenetic control of phenotypic traits in P. aeruginosa Since the DNA methylation enzymes are not encoded in the core genome, our findings illustrate how the acquisition of accessory genes can shape the global P. aeruginosa transcriptome and thus may facilitate adaptation to new and challenging habitats.IMPORTANCE With the introduction of advanced technologies, epigenetic regulation by DNA methyltransferases in bacteria has become a subject of intense studies. Here we identified an adenosine DNA methyltransferase in the opportunistic pathogen Pseudomonas aeruginosa PAO1, which is responsible for DNA methylation of a conserved sequence motif. The methylation level of all target sequences throughout the PAO1 genome was approximated to be in the range of 65 to 85% and was dependent on growth conditions. Inactivation of the methyltransferase revealed an attenuated-virulence phenotype in the Galleria mellonella infection model. Furthermore, differential expression of more than 90 genes was detected, including the small regulatory RNA prrF1, which contributes to a global iron-sparing response via the repression of a set of gene targets. Our finding of a methylation-dependent repression of the antisense transcript of the prrF1 small regulatory RNA significantly expands our understanding of the regulatory mechanisms underlying active DNA methylation in bacteria. Copyright © 2017 Doberenz et al.


July 7, 2019

Analysis of the complete genome sequence of Nocardia seriolae UTF1, the causative agent of fish nocardiosis: The first reference genome sequence of the fish pathogenic Nocardia species.

Nocardiosis caused by Nocardia seriolae is one of the major threats in the aquaculture of Seriola species (yellowtail; S. quinqueradiata, amberjack; S. dumerili and kingfish; S. lalandi) in Japan. Here, we report the complete nucleotide genome sequence of N. seriolae UTF1, isolated from a cultured yellowtail. The genome is a circular chromosome of 8,121,733 bp with a G+C content of 68.1% that encodes 7,697 predicted proteins. In the N. seriolae UTF1 predicted genes, we found orthologs of virulence factors of pathogenic mycobacteria and human clinical Nocardia isolates involved in host cell invasion, modulation of phagocyte function and survival inside the macrophages. The virulence factor candidates provide an essential basis for understanding their pathogenic mechanisms at the molecular level by the fish nocardiosis research community in future studies. We also found many potential antibiotic resistance genes on the N. seriolae UTF1 chromosome. Comparative analysis with the four existing complete genomes, N. farcinica IFM 10152, N. brasiliensis HUJEG-1 and N. cyriacigeorgica GUH-2 and N. nova SH22a, revealed that 2,745 orthologous genes were present in all five Nocardia genomes (core genes) and 1,982 genes were unique to N. seriolae UTF1. In particular, the N. seriolae UTF1 genome contains a greater number of mobile elements and genes of unknown function that comprise the differences in structure and gene content from the other Nocardia genomes. In addition, a lot of the N. seriolae UTF1-specific genes were assigned to the ABC transport system. Because of limited resources in ocean environments, these N. seriolae UTF1 specific ABC transporters might facilitate adaptation strategies essential for marine environment survival. Thus, the availability of the complete N. seriolae UTF1 genome sequence will provide a valuable resource for comparative genomic studies of N. seriolae isolates, as well as provide new insights into the ecological and functional diversity of the genus Nocardia.


July 7, 2019

Combination of short-read, long-read and optical mapping assemblies reveals large-scale tandem repeat arrays with population genetic implications.

Accurate and contiguous genome assembly is key to a comprehensive understanding of the processes shaping genomic diversity and evolution. Yet, it is frequently constrained by constitutive heterochromatin, usually characterized by highly repetitive DNA. As a key feature of genome architecture associated with centromeric and telomeric regions it influences meiotic recombination. In this study, we assess the impact of large tandem repeat arrays on the recombination rate landscape in an avian speciation model, the Eurasian crow. We assembled two high-quality genome references using single-molecule real-time sequencing (long-read assembly, LR) and single-molecule restriction maps (optical map assembly, OM). A three-way comparison including the published short-read assembly (SR) constructed for the same individual allowed assessing assembly properties and pinpointing mis-assemblies. Combining information from all three assemblies, we characterized 36 previously unidentified large repetitive regions in the proximity of sequence assembly breakpoints, the majority of which contained complex arrays of a 14-kb satellite repeat or its 1.2-kb subunit. Using genome-wide population re-sequencing data, we estimated the population-scaled recombination rate (?) and found it to be significantly reduced in these regions. These findings are consistent with an effect of low recombination in regions adjacent to centromeric or subtelomeric heterochromatin, and add to our understanding of the processes generating widespread heterogeneity in genetic diversity and differentiation along the genome. By combining three independent technologies, our results highlight the importance of adding a layer of information on genome structure inaccessible to each approach independently. Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

The human reference genome assembly plays a central role in nearly all aspects of today’s basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health. © 2017 Schneider et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies.

Achieving complete, accurate, and cost-effective assembly of human genomes is of great importance for realizing the promise of precision medicine. The abundance of repeats and genetic variations in human genomes and the limitations of existing sequencing technologies call for the development of novel assembly methods that can leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next-generation sequencing and single-molecule sequencing technologies to accurately assemble and detect structural variants (SVs) in human genomes. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance the assembly of structurally altered regions in human genomes. We used data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878) to test our approach. The result showed that, compared with existing methods, our approach had a low false discovery rate and substantially improved the detection of many types of SVs, particularly novel large insertions, small indels (10-50 bp), and short tandem repeat expansions and contractions. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.© 2017 Fan et al.; Published by Cold Spring Harbor Laboratory Press.


July 7, 2019

Elucidation of quantitative structural diversity of remarkable rearrangement regions, shufflons, in IncI2 plasmids.

A multiple DNA inversion system, the shufflon, exists in incompatibility (Inc) I1 and I2 plasmids. The shufflon generates variants of the PilV protein, a minor component of the thin pilus. The shufflon is one of the most difficult regions for de novo genome assembly because of its structural diversity even in an isolated bacterial clone. We determined complete genome sequences, including those of IncI2 plasmids carrying mcr-1, of three Escherichia coli strains using single-molecule, real-time (SMRT) sequencing and Illumina sequencing. The sequences assembled using only SMRT sequencing contained misassembled regions in the shufflon. A hybrid analysis using SMRT and Illumina sequencing resolved the misassembled region and revealed that the three IncI2 plasmids, excluding the shufflon region, were highly conserved. Moreover, the abundance ratio of whole-shufflon structures could be determined by quantitative structural variation analysis of the SMRT data, suggesting that a remarkable heterogeneity of whole-shufflon structural variations exists in IncI2 plasmids. These findings indicate that remarkable rearrangement regions should be validated using both long-read and short-read sequencing data and that the structural variation of PilV in the shufflon might be closely related to phenotypic heterogeneity of plasmid-mediated transconjugation involved in horizontal gene transfer even in bacterial clonal populations.


July 7, 2019

Sequencing and de novo assembly of a near complete indica rice genome.

A high-quality reference genome is critical for understanding genome structure, genetic variation and evolution of an organism. Here we report the de novo assembly of an indica rice genome Shuhui498 (R498) through the integration of single-molecule sequencing and mapping data, genetic map and fosmid sequence tags. The 390.3?Mb assembly is estimated to cover more than 99% of the R498 genome and is more continuous than the current reference genomes of japonica rice Nipponbare (MSU7) and Arabidopsis thaliana (TAIR10). We annotate high-quality protein-coding genes in R498 and identify genetic variations between R498 and Nipponbare and presence/absence variations by comparing them to 17 draft genomes in cultivated rice and its closest wild relatives. Our results demonstrate how to de novo assemble a highly contiguous and near-complete plant genome through an integrative strategy. The R498 genome will serve as a reference for the discovery of genes and structural variations in rice.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.