Menu
July 19, 2019

Genomic changes following the reversal of a Y chromosome to an autosome in Drosophila pseudoobscura

Robertsonian translocations resulting in fusions between sex chromosomes and autosomes shape karyotype evolution by creating new sex chromosomes from autosomes. These translocations can also reverse sex chromosomes back into autosomes, which is especially intriguing given the dramatic differences between autosomes and sex chromosomes. To study the genomic events following a Y chromosome reversal, we investigated an autosome-Y translocation in Drosophila pseudoobscura. The ancestral Y chromosome fused to a small autosome (the dot chromosome) approximately 10–15 Mya. We used single molecule real-time sequencing reads to assemble the D. pseudoobscura dot chromosome, including this Y-to-dot translocation. We find that the intervening sequence between the ancestral Y and the rest of the dot chromosome is only ~78 Kb and is not repeat-dense, suggesting that the centromere now falls outside, rather than between, the fused chromosomes. The Y-to-dot region is 100 times smaller than the D. melanogaster Y chromosome, owing to changes in repeat landscape. However, we do not find a consistent reduction in intron sizes across the Y-to-dot region. Instead, deletions in intergenic regions and possibly a small ancestral Y chromosome size may explain the compact size of the Y-to-dot translocation.


July 19, 2019

AgIn: Measuring the landscape of CpG methylation of individual repetitive elements.

Determining the methylation state of regions with high copy numbers is challenging for second-generation sequencing, because the read length is insufficient to map reads uniquely, especially when repetitive regions are long and nearly identical to each other. Single-molecule real-time (SMRT) sequencing is a promising method for observing such regions, because it is not vulnerable to GC bias, it produces long read lengths, and its kinetic information is sensitive to DNA modifications.We propose a novel linear-time algorithm that combines the kinetic information for neighboring CpG sites and increases the confidence in identifying the methylation states of those sites. Using a practical read coverage of ~30-fold from an inbred strain medaka (Oryzias latipes), we observed that both the sensitivity and precision of our method on individual CpG sites were ~93.7%. We also observed a high correlation coefficient (R?=?0.884) between our method and bisulfite sequencing, and for 92.0% of CpG sites, methylation levels ranging over [0, 1] were in concordance within an acceptable difference 0.25. Using this method, we characterized the landscape of the methylation status of repetitive elements, such as LINEs, in the human genome, thereby revealing the strong correlation between CpG density and hypomethylation and detecting hypomethylation hot spots of LTRs and LINEs. We uncovered the methylation states for nearly identical active transposons, two novel LINE insertions of identity ~99% and length 6050 base pairs (bp) in the human genome, and 16 Tol2 elements of identity >99.8% and length 4682?bp in the medaka genome.AgIn (Aggregate on Intervals) is available at: https://github.com/hacone/AgIn CONTACT: ysuzuki@cb.k.u-tokyo.ac.jp, moris@cb.k.u-tokyo.ac.jp SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2016. Published by Oxford University Press.


July 19, 2019

High-quality assembly of an individual of Yoruban descent

De novo assembly of human genomes is now a tractable effort due in part to advances in sequencing and mapping technologies. We use PacBio single-molecule, real-time (SMRT) sequencing and BioNano genomic maps to construct the first de novo assembly of NA19240, a Yoruban individual from Africa. This chromosome-scaffolded assembly of 3.08 Gb with a contig N50 of 7.25 Mb and a scaffold N50 of 78.6 Mb represents one of the most contiguous high-quality human genomes. We utilize a BAC library derived from NA19240 DNA and novel haplotype-resolving sequencing technologies and algorithms to characterize regions of complex genomic architecture that are normally lost due to compression to a linear haploid assembly. Our results demonstrate that multiple technologies are still necessary for complete genomic representation, particularly in regions of highly identical segmental duplications. Additionally, we show that diploid assembly has utility in improving the quality of de novo human genome assemblies.


July 19, 2019

Towards precision medicine.

There is great potential for genome sequencing to enhance patient care through improved diagnostic sensitivity and more precise therapeutic targeting. To maximize this potential, genomics strategies that have been developed for genetic discovery – including DNA-sequencing technologies and analysis algorithms – need to be adapted to fit clinical needs. This will require the optimization of alignment algorithms, attention to quality-coverage metrics, tailored solutions for paralogous or low-complexity areas of the genome, and the adoption of consensus standards for variant calling and interpretation. Global sharing of this more accurate genotypic and phenotypic data will accelerate the determination of causality for novel genes or variants. Thus, a deeper understanding of disease will be realized that will allow its targeting with much greater therapeutic precision.


July 19, 2019

De novo assembly and phasing of a Korean human genome.

Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9?Mb and a scaffold N50 size of 44.8?Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03?Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6?Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.


July 19, 2019

Rapid functional and sequence differentiation of a tandemly repeated species-specific multigene family in Drosophila.

Gene clusters of recently duplicated genes are hotbeds for evolutionary change. However, our understanding of how mutational mechanisms and evolutionary forces shape the structural and functional evolution of these clusters is hindered by the high sequence identity among the copies, which typically results in their inaccurate representation in genome assemblies. The presumed testis-specific, chimeric gene Sdic originated, and tandemly expanded in Drosophila melanogaster, contributing to increased male-male competition. Using various types of massively parallel sequencing data, we studied the organization, sequence evolution, and functional attributes of the different Sdic copies. By leveraging long-read sequencing data, we uncovered both copy number and order differences from the currently accepted annotation for the Sdic region. Despite evidence for pervasive gene conversion affecting the Sdic copies, we also detected signatures of two episodes of diversifying selection, which have contributed to the evolution of a variety of C-termini and miRNA binding site compositions. Expression analyses involving RNA-seq datasets from 59 different biological conditions revealed distinctive expression breadths among the copies, with three copies being transcribed in females, opening the possibility to a sexually antagonistic effect. Phenotypic assays using Sdic knock-out strains indicated that should this antagonistic effect exist, it does not compromise female fertility. Our results strongly suggest that the genome consolidation of the Sdic gene cluster is more the result of a quick exploration of different paths of molecular tinkering by different copies than a mere dosage increase, which could be a recurrent evolutionary outcome in the presence of persistent sexual selection. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.


July 19, 2019

Comprehensive genome analysis of carbapenemase-producing Enterobacter spp.: new insights into phylogeny, population structure and resistance mechanisms.

Knowledge regarding the genomic structure of Enterobacter spp., the second most prevalent carbapenemase-producing Enterobacteriaceae, remains limited. Here we sequenced 97 clinical Enterobacter species isolates that were both carbapenem susceptible and resistant from various geographic regions to decipher the molecular origins of carbapenem resistance and to understand the changing phylogeny of these emerging and drug-resistant pathogens. Of the carbapenem-resistant isolates, 30 possessed blaKPC-2, 40 had blaKPC-3, 2 had blaKPC-4, and 2 had blaNDM-1 Twenty-three isolates were carbapenem susceptible. Six genomes were sequenced to completion, and their sizes ranged from 4.6 to 5.1 Mbp. Phylogenomic analysis placed 96 of these genomes, 351 additional Enterobacter genomes downloaded from NCBI GenBank, and six newly sequenced type strains into 19 phylogenomic groups-18 groups (A to R) in the Enterobacter cloacae complex and Enterobacter aerogenes Diverse mechanisms underlying the molecular evolutionary trajectory of these drug-resistant Enterobacter spp. were revealed, including the acquisition of an antibiotic resistance plasmid, followed by clonal spread, horizontal transfer of blaKPC-harboring plasmids between different phylogenomic groups, and repeated transposition of the blaKPC gene among different plasmid backbones. Group A, which comprises multilocus sequence type 171 (ST171), was the most commonly identified (23% of isolates). Genomic analysis showed that ST171 isolates evolved from a common ancestor and formed two different major clusters; each acquiring unique blaKPC-harboring plasmids, followed by clonal expansion. The data presented here represent the first comprehensive study of phylogenomic interrogation and the relationship between antibiotic resistance and plasmid discrimination among carbapenem-resistant Enterobacter spp., demonstrating the genetic diversity and complexity of the molecular mechanisms driving antibiotic resistance in this genus.Enterobacter spp., especially carbapenemase-producing Enterobacter spp., have emerged as a clinically significant cause of nosocomial infections. However, only limited information is available on the distribution of carbapenem resistance across this genus. Augmenting this problem is an erroneous identification of Enterobacter strains because of ambiguous typing methods and imprecise taxonomy. In this study, we used a whole-genome-based comparative phylogenetic approach to (i) revisit and redefine the genus Enterobacter and (ii) unravel the emergence and evolution of the Klebsiella pneumoniae carbapenemase-harboring Enterobacter spp. Using genomic analysis of 447 sequenced strains, we developed an improved understanding of the species designations within this complex genus and identified the diverse mechanisms driving the molecular evolution of carbapenem resistance. The findings in this study provide a solid genomic framework that will serve as an important resource in the future development of molecular diagnostics and in supporting drug discovery programs. Copyright © 2016 Chavda et al.


July 19, 2019

Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either PacBio or Oxford Nanopore technologies, and achieves a contig NG50 of greater than 21 Mbp on both human and Drosophila melanogaster PacBio datasets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes. Published by Cold Spring Harbor Laboratory Press.


July 19, 2019

Genomic structure of the horse major histocompatibility complex class II region resolved using PacBio long-read sequencing technology.

The mammalian Major Histocompatibility Complex (MHC) region contains several gene families characterized by highly polymorphic loci with extensive nucleotide diversity, copy number variation of paralogous genes, and long repetitive sequences. This structural complexity has made it difficult to construct a reliable reference sequence of the horse MHC region. In this study, we used long-read single molecule, real-time (SMRT) sequencing technology from Pacific Biosciences (PacBio) to sequence eight Bacterial Artificial Chromosome (BAC) clones spanning the horse MHC class II region. The final assembly resulted in a 1,165,328?bp continuous gap free sequence with 35 manually curated genomic loci of which 23 were considered to be functional and 12 to be pseudogenes. In comparison to the MHC class II region in other mammals, the corresponding region in horse shows extraordinary copy number variation and different relative location and directionality of the Eqca-DRB, -DQA, -DQB and -DOB loci. This is the first long-read sequence assembly of the horse MHC class II region with rigorous manual gene annotation, and it will serve as an important resource for association studies of immune-mediated equine diseases and for evolutionary analysis of genetic diversity in this region.


July 19, 2019

Comprehensive bioinformatics analysis of Mycoplasma pneumoniae genomes to investigate underlying population structure and type-specific determinants.

Mycoplasma pneumoniae is a significant cause of respiratory illness worldwide. Despite a minimal and highly conserved genome, genetic diversity within the species may impact disease. We performed whole genome sequencing (WGS) analysis of 107 M. pneumoniae isolates, including 67 newly sequenced using the Pacific BioSciences RS II and/or Illumina MiSeq sequencing platforms. Comparative genomic analysis of 107 genomes revealed >3,000 single nucleotide polymorphisms (SNPs) in total, including 520 type-specific SNPs. Population structure analysis supported the existence of six distinct subgroups, three within each type. We developed a predictive model to classify an isolate based on whole genome SNPs called against the reference genome into the identified subtypes, obviating the need for genome assembly. This study is the most comprehensive WGS analysis for M. pneumoniae to date, underscoring the power of combining complementary sequencing technologies to overcome difficult-to-sequence regions and highlighting potential differential genomic signatures in M. pneumoniae.


July 19, 2019

Revealing complete complex KIR haplotypes phased by long-read sequencing technology

The killer cell immunoglobulin-like receptor (KIR) region of human chromosome 19 contains up to 16 genes for natural killer (NK) cell receptors that recognize human leukocyte antigen (HLA)/peptide complexes and other ligands. The KIR proteins fulfill functional roles in infections, pregnancy, autoimmune diseases and transplantation. However, their characterization remains a constant challenge. Not only are the genes highly homologous due to their recent evolution by tandem duplications, but the region is structurally dynamic due to frequent transposon-mediated recombination. A sequencing approach that precisely captures the complexity of KIR haplotypes for functional annotation is desirable. We present a unique approach to haplotype the KIR loci using single-molecule, real-time (SMRT) sequencing. Using this method, we have—for the first time—comprehensively sequenced and phased sixteen KIR haplotypes from eight individuals without imputation. The information revealed four novel haplotype structures, a novel gene-fusion allele, novel and confirmed insertion/deletion events, a homozygous individual, and overall diversity for the structural haplotypes and their alleles. These KIR haplotypes augment our existing knowledge by providing high-quality references, evolutionary informers, and source material for imputation. The haplotype sequences and gene annotations provide alternative loci for the KIR region in the human genome reference GrCh38.p8.


July 19, 2019

The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution.

The domesticated sunflower, Helianthus annuus L., is a global oil crop that has promise for climate change adaptation, because it can maintain stable yields across a wide variety of environmental conditions, including drought. Even greater resilience is achievable through the mining of resistance alleles from compatible wild sunflower relatives, including numerous extremophile species. Here we report a high-quality reference for the sunflower genome (3.6 gigabases), together with extensive transcriptomic data from vegetative and floral organs. The genome mostly consists of highly similar, related sequences and required single-molecule real-time sequencing technologies for successful assembly. Genome analyses enabled the reconstruction of the evolutionary history of the Asterids, further establishing the existence of a whole-genome triplication at the base of the Asterids II clade and a sunflower-specific whole-genome duplication around 29 million years ago. An integrative approach combining quantitative genetics, expression and diversity data permitted development of comprehensive gene networks for two major breeding traits, flowering time and oil metabolism, and revealed new candidate genes in these networks. We found that the genomic architecture of flowering time has been shaped by the most recent whole-genome duplication, which suggests that ancient paralogues can remain in the same regulatory networks for dozens of millions of years. This genome represents a cornerstone for future research programs aiming to exploit genetic diversity to improve biotic and abiotic stress resistance and oil production, while also considering agricultural constraints and human nutritional needs.


July 19, 2019

Contrasting evolutionary genome dynamics between domesticated and wild yeasts.

Structural rearrangements have long been recognized as an important source of genetic variation, with implications in phenotypic diversity and disease, yet their detailed evolutionary dynamics remain elusive. Here we use long-read sequencing to generate end-to-end genome assemblies for 12 strains representing major subpopulations of the partially domesticated yeast Saccharomyces cerevisiae and its wild relative Saccharomyces paradoxus. These population-level high-quality genomes with comprehensive annotation enable precise definition of chromosomal boundaries between cores and subtelomeres and a high-resolution view of evolutionary genome dynamics. In chromosomal cores, S. paradoxus shows faster accumulation of balanced rearrangements (inversions, reciprocal translocations and transpositions), whereas S. cerevisiae accumulates unbalanced rearrangements (novel insertions, deletions and duplications) more rapidly. In subtelomeres, both species show extensive interchromosomal reshuffling, with a higher tempo in S. cerevisiae. Such striking contrasts between wild and domesticated yeasts are likely to reflect the influence of human activities on structural genome evolution.


July 19, 2019

Increased risk of low birth weight in women with placental malaria associated with P. falciparum VAR2CSA clade.

Pregnancy associated malaria (PAM) causes adverse pregnancy and birth outcomes owing to Plasmodium falciparum accumulation in the placenta. Placental accumulation is mediated by P. falciparum protein VAR2CSA, a leading PAM-specific vaccine target. The extent of its antigen diversity and impact on clinical outcomes remain poorly understood. Through amplicon deep-sequencing placental malaria samples from women in Malawi and Benin, we assessed sequence diversity of VAR2CSA’s ID1-DBL2x region, containing putative vaccine targets and estimated associations of specific clades with adverse birth outcomes. Overall, var2csa diversity was high and haplotypes subdivided into five clades, the largest two defined by homology to parasites strains, 3D7 or FCR3. Across both cohorts, compared to women infected with only FCR3-like variants, women infected with only 3D7-like variants delivered infants with lower birthweight (difference: -267.99?g; 95% Confidence Interval [CI]: -466.43?g,-69.55?g) and higher odds of low birthweight (<2500?g) (Odds Ratio [OR] 5.41; 95% CI:0.99,29.52) and small-for-gestational-age (OR: 3.65; 95% CI: 1.01,13.38). In two distinct malaria-endemic African settings, parasites harboring 3D7-like variants of VAR2CSA were associated with worse birth outcomes, supporting differential effects of infection with specific parasite strains. The immense diversity coupled with differential clinical effects of this diversity suggest that an effective VAR2CSA-based vaccine may require multivalent activity.


July 19, 2019

Pacific Biosciences sequencing and IMGT/HighV-QUEST analysis of full-length single chain fragment variable from an in vivo selected phage-display combinatorial Library.

Phage-display selection of immunoglobulin (IG) or antibody single chain Fragment variable (scFv) from combinatorial libraries is widely used for identifying new antibodies for novel targets. Next-generation sequencing (NGS) has recently emerged as a new method for the high throughput characterization of IG and T cell receptor (TR) immune repertoires bothin vivoandin vitro. However, challenges remain for the NGS sequencing of scFv from combinatorial libraries owing to the scFv length (>800?bp) and the presence of two variable domains [variable heavy (VH) and variable light (VL) for IG] associated by a peptide linker in a single chain. Here, we show that single-molecule real-time (SMRT) sequencing with the Pacific Biosciences RS II platform allows for the generation of full-length scFv reads obtained from anin vivoselection of scFv-phages in an animal model of atherosclerosis. We first amplified the DNA of the phagemid inserts from scFv-phages eluted from an aortic section at the third round of thein vivoselection. From this amplified DNA, 450,558 reads were obtained from 15 SMRT cells. Highly accurate circular consensus sequences from these reads were generated, filtered by quality and then analyzed by IMGT/HighV-QUEST with the functionality for scFv. Full-length scFv were identified and characterized in 348,659 reads. Full-length scFv sequencing is an absolute requirement for analyzing the associated VH and VL domains enriched during thein vivopanning rounds. In order to further validate the ability of SMRT sequencing to provide high quality, full-length scFv sequences, we tracked the reads of an scFv-phage clone P3 previously identified by biological assays and Sanger sequencing. Sixty P3 reads showed 100% identity with the full-length scFv of 767?bp, 53 of them covering the whole insert of 977?bp, which encompassed the primer sequences. The remaining seven reads were identical over a shortened length of 939?bp that excludes the vicinity of primers at both ends. Interestingly these reads were obtained from each of the 15 SMRT cells. Thus, the SMRT sequencing method and the IMGT/HighV-QUEST functionality for scFv provides a straightforward protocol for characterization of full-length scFv from combinatorial phage libraries.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.