Structural variation Archives - Page 29 of 31

July 7, 2019

Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region.

Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10(-5)). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.© 2016 Mohajeri et al.; Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Towards integration of population and comparative genomics in forest trees.

The past decade saw the initiation of an ongoing revolution in sequencing technologies that is transforming all fields of biology. This has been driven by the advent and widespread availability of high-throughput, massively parallel short-read sequencing (MPS) platforms. These technologies have enabled previously unimaginable studies, including draft assemblies of the massive genomes of coniferous species and population-scale resequencing. Transcriptomics studies have likewise been transformed, with RNA-sequencing enabling studies in nonmodel organisms, the discovery of previously unannotated genes (novel transcripts), entirely new classes of RNAs and previously unknown regulatory mechanisms. Here we touch upon current developments in the areas of genome assembly, comparative regulomics and population genetics as they relate to studies of forest tree species.© 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

July 7, 2019

The evolution of orphan regions in genomes of a fungal pathogen of wheat.

Fungal plant pathogens rapidly evolve virulence on resistant hosts through mutations in genes encoding proteins that modulate the host immune responses. The mutational spectrum likely includes chromosomal rearrangements responsible for gains or losses of entire genes. However, the mechanisms creating adaptive structural variation in fungal pathogen populations are poorly understood. We used complete genome assemblies to quantify structural variants segregating in the highly polymorphic fungal wheat pathogen Zymoseptoria tritici The genetic basis of virulence in Z. tritici is complex, and populations harbor significant genetic variation for virulence; hence, we aimed to identify whether structural variation led to functional differences. We combined single-molecule real-time sequencing, genetic maps, and transcriptomics data to generate a fully assembled and annotated genome of the highly virulent field isolate 3D7. Comparative genomics analyses against the complete reference genome IPO323 identified large chromosomal inversions and the complete gain or loss of transposable-element clusters, explaining the extensive chromosomal-length polymorphisms found in this species. Both the 3D7 and IPO323 genomes harbored long tracts of sequences exclusive to one of the two genomes. These orphan regions contained 296 genes unique to the 3D7 genome and not previously known for this species. These orphan genes tended to be organized in clusters and showed evidence of mutational decay. Moreover, the orphan genes were enriched in genes encoding putative effectors and included a gene that is one of the most upregulated putative effector genes during wheat infection. Our study showed that this pathogen species harbored extensive chromosomal structure polymorphism that may drive the evolution of virulence.Pathogen outbreak populations often harbor previously unknown genes conferring virulence. Hence, a key puzzle of rapid pathogen evolution is the origin of such evolutionary novelty in genomes. Chromosomal rearrangements and structural variation in pathogen populations likely play a key role. However, identifying such polymorphism is challenging, as most genome-sequencing approaches only yield information about point mutations. We combined long-read technology and genetic maps to assemble the complete genome of a strain of a highly polymorphic fungal pathogen of wheat. Comparisons against the reference genome of the species showed substantial variation in the chromosome structure and revealed large regions unique to each assembled genome. These regions were enriched in genes encoding likely effector proteins, which are important components of pathogenicity. Our study showed that pathogen populations harbor extensive polymorphism at the chromosome level and that this polymorphism can be a source of adaptive genetic variation in pathogen evolution. Copyright © 2016 Plissonneau et al.

July 7, 2019

Deep sequencing of 10,000 human genomes.

We report on the sequencing of 10,545 human genomes at 30×-40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

July 7, 2019

Variant exported blood-stage proteins encoded by Plasmodium multigene families are expressed in liver stages where they are exported into the parasitophorous vacuole.

Many variant proteins encoded by Plasmodium-specific multigene families are exported into red blood cells (RBC). P. falciparum-specific variant proteins encoded by the var, stevor and rifin multigene families are exported onto the surface of infected red blood cells (iRBC) and mediate interactions between iRBC and host cells resulting in tissue sequestration and rosetting. However, the precise function of most other Plasmodium multigene families encoding exported proteins is unknown. To understand the role of RBC-exported proteins of rodent malaria parasites (RMP) we analysed the expression and cellular location by fluorescent-tagging of members of the pir, fam-a and fam-b multigene families. Furthermore, we performed phylogenetic analyses of the fam-a and fam-b multigene families, which indicate that both families have a history of functional differentiation unique to RMP. We demonstrate for all three families that expression of family members in iRBC is not mutually exclusive. Most tagged proteins were transported into the iRBC cytoplasm but not onto the iRBC plasma membrane, indicating that they are unlikely to play a direct role in iRBC-host cell interactions. Unexpectedly, most family members are also expressed during the liver stage, where they are transported into the parasitophorous vacuole. This suggests that these protein families promote parasite development in both the liver and blood, either by supporting parasite development within hepatocytes and erythrocytes and/or by manipulating the host immune response. Indeed, in the case of Fam-A, which have a steroidogenic acute regulatory-related lipid transfer (START) domain, we found that several family members can transfer phosphatidylcholine in vitro. These observations indicate that these proteins may transport (host) phosphatidylcholine for membrane synthesis. This is the first demonstration of a biological function of any exported variant protein family of rodent malaria parasites.

July 7, 2019

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Human genomes are routinely compared against a universal reference. However, this strategy could miss population-specific and personal genomic variations, which may be detected more efficiently using an ethnically relevant or personal reference. Here we report a hybrid assembly of a Korean reference genome (KOREF) for constructing personal and ethnic references by combining sequencing and mapping methods. We also build its consensus variome reference, providing information on millions of variants from 40 additional ethnically homogeneous genomes from the Korean Personal Genome Project. We find that the ethnically relevant consensus reference can be beneficial for efficient variant detection. Systematic comparison of human assemblies shows the importance of assembly quality, suggesting the necessity of new technologies to comprehensively map ethnic and personal genomic structure variations. In the era of large-scale population genome projects, the leveraging of ethnicity-specific genome assemblies as well as the human reference genome will accelerate mapping all human genome diversity.

July 7, 2019

Genomic insights into Campylobacter jejuni virulence and population genetics

Campylobacter jejuni has long been recognized as a main food-borne pathogen in many parts of the world. Natural reservoirs include a wide variety of domestic and wild birds and mammals, whose intestines offer a suitable biological niche for the survival and dissemination of the organism. Understanding the genetic basis of the biology and pathogenicity of C. jejuni is vital to prevent and control Campylobacter-associated infections. The recent progress in sequencing techniques has allowed for a rapid increase in our knowledge of the molecular biology and the genetic structures of Campylobacter. Single-molecule realtime (SMRT) sequencing, which goes beyond four-base sequencing, revealed the role of DNA methylation in modulating the biology and virulence of C. jejuni at the level of epigenetics. In this review, we will provide an up-to-date review on recent advances in understanding C. jejuni genomics, including structural features of genomes, genetic traits of virulence, population genetics, and epigenetics.

July 7, 2019

Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid.

Genomic studies on fungal species with hydrolytic activity have gained increased attention due to their great biotechnological potential for biomass-based biofuel production. The amylolytic yeast Saccharomycopsis fibuligera has served as a good source of enzymes and genes involved in saccharification. Despite its long history of use in food fermentation and bioethanol production, very little is known about the basic physiology and genomic features of S. fibuligera.We performed whole-genome (WG) de novo sequencing and complete assembly of S. fibuligera KJJ81 and KPH12, two isolates from wheat-based Nuruk in Korea. Intriguingly, the KJJ81 genome (~38 Mb) was revealed as a hybrid between the KPH12 genome (~18 Mb) and another unidentified genome sharing 88.1% nucleotide identity with the KPH12 genome. The seven chromosome pairs of KJJ81 subgenomes exhibit highly conserved synteny, indicating a very recent hybridization event. The phylogeny inferred from WG comparisons showed an early divergence of S. fibuligera before the separation of the CTG and Saccharomycetaceae clades in the subphylum Saccharomycotina. Reconstructed carbon and sulfur metabolic pathways, coupled with RNA-Seq analysis, suggested a marginal Crabtree effect under high glucose and activation of sulfur metabolism toward methionine biosynthesis under sulfur limitation in this yeast. Notably, the lack of sulfate assimilation genes in the S. fibuligera genome reflects a unique phenotype for Saccharomycopsis clades as natural sulfur auxotrophs. Extended gene families, including novel genes involved in saccharification and proteolysis, were identified. Moreover, comparative genome analysis of S. fibuligera ATCC 36309, an isolate from chalky rye bread in Germany, revealed that an interchromosomal translocation occurred in the KPH12 genome before the generation of the KJJ81 hybrid genome.The completely sequenced S. fibuligera genome with high-quality annotation and RNA-Seq analysis establishes an important foundation for functional inference of S. fibuligera in the degradation of fermentation mash. The gene inventory facilitates the discovery of new genes applicable to the production of novel valuable enzymes and chemicals. Moreover, as the first gapless genome assembly in the genus Saccharomycopsis including members with desirable traits for bioconversion, the unique genomic features of S. fibuligera and its hybrid will provide in-depth insights into fungal genome dynamics as evolutionary adaptation.

July 7, 2019

Chromosome assembly of large and complex genomes using multiple references

Despite the rapid development of sequencing technologies, assembly of mammalian-scale genomes into complete chromosomes remains one of the most challenging problems in bioinformatics. To help address this difficulty, we developed Ragout, a reference-assisted assembly tool that now works for large and complex genomes. Taking one or more target assemblies (generated from an NGS assembler) and one or multiple related reference genomes, Ragout infers the evolutionary relationships between the genomes and builds the final assemblies using a genome rearrangement approach. Using Ragout, we transformed NGS assemblies of 15 different Mus musculus and one Mus spretus genomes into sets of complete chromosomes, leaving less than 5% of sequence unlocalized per set. Various benchmarks, including PCR testing and realigning of long PacBio reads, suggest only a small number of structural errors in the final assemblies, comparable with direct assembly approaches. Additionally, we applied Ragout to Mus caroli and Mus pahari genomes, which exhibit karyotype-scale variations compared to other genomes from the Muridae family. Chromosome color maps confirmed most large-scale rearrangements that Ragout detected.

July 7, 2019

STR-realigner: a realignment method for short tandem repeat regions.

In the estimation of repeat numbers in a short tandem repeat (STR) region from high-throughput sequencing data, two types of strategies are mainly taken: a strategy based on counting repeat patterns included in sequence reads spanning the region and a strategy based on estimating the difference between the actual insert size and the insert size inferred from paired-end reads. The quality of sequence alignment is crucial, especially in the former approaches although usual alignment methods have difficulty in STR regions due to insertions and deletions caused by the variations of repeat numbers.We proposed a new dynamic programming based realignment method named STR-realigner that considers repeat patterns in STR regions as prior knowledge. By allowing the size change of repeat patterns with low penalty in STR regions, accurate realignment is expected. For the performance evaluation, publicly available STR variant calling tools were applied to three types of aligned reads: synthetically generated sequencing reads aligned with BWA-MEM, those realigned with STR-realigner, those realigned with ReviSTER, and those realigned with GATK IndelRealigner. From the comparison of root mean squared errors between estimated and true STR region size, the results for the dataset realigned with STR-realigner are better than those for other cases. For real data analysis, we used a real sequencing dataset from Illumina HiSeq 2000 for a parent-offspring trio. RepeatSeq and lobSTR were applied to the sequence reads for these individuals aligned with BWA-MEM, those realigned with STR-realigner, ReviSTER, and GATK IndelRealigner. STR-realigner shows the best performance in terms of consistency of the size of estimated STR regions in Mendelian inheritance. Root mean squared error values were also calculated from the comparison of these estimated results with STR region sizes obtained from high coverage PacBio sequencing data, and the results from the realigned sequencing data with STR-realigner showed the least (the best) root mean squared error value.The effectiveness of the proposed realignment method for STR regions was verified from the comparison with an existing method on both simulation datasets and real whole genome sequencing dataset.

July 7, 2019

SRinversion: a tool for detecting short inversions by splitting and re-aligning poorly mapped and unmapped sequencing reads.

Rapid development in sequencing technologies has dramatically improved our ability to detect genetic variants in human genome. However, current methods have variable sensitivities in detecting different types of genetic variants. One type of such genetic variants that is especially hard to detect is inversions. Analysis of public databases showed that few short inversions have been reported so far. Unlike reads that contain small insertions or deletions, which will be considered through gap alignment, reads carrying short inversions often have poor mapping quality or are unmapped, thus are often not further considered. As a result, the majority of short inversions might have been overlooked and require special algorithms for their detection.Here, we introduce SRinversion, a framework to analyze poorly mapped or unmapped reads by splitting and re-aligning them for the purpose of inversion detection. SRinversion is very sensitive to small inversions and can detect those less than 10?bp in size. We applied SRinversion to both simulated data and high-coverage sequencing data from the 1000 Genomes Project and compared the results with those from Pindel, BreakDancer, DELLY, Gustaf and MID. A better performance of SRinversion was achieved for both datasets for the detection of small inversions.SRinversion is implemented in Perl and is publicly available at http://paed.hku.hk/genome/software/SRinversion/index.html CONTACT: yangwl@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

CoLoRMap: Correcting Long Reads by Mapping short reads.

Second generation sequencing technologies paved the way to an exceptional increase in the number of sequenced genomes, both prokaryotic and eukaryotic. However, short reads are difficult to assemble and often lead to highly fragmented assemblies. The recent developments in long reads sequencing methods offer a promising way to address this issue. However, so far long reads are characterized by a high error rate, and assembling from long reads require a high depth of coverage. This motivates the development of hybrid approaches that leverage the high quality of short reads to correct errors in long reads.We introduce CoLoRMap, a hybrid method for correcting noisy long reads, such as the ones produced by PacBio sequencing technology, using high-quality Illumina paired-end reads mapped onto the long reads. Our algorithm is based on two novel ideas: using a classical shortest path algorithm to find a sequence of overlapping short reads that minimizes the edit score to a long read and extending corrected regions by local assembly of unmapped mates of mapped short reads. Our results on bacterial, fungal and insect data sets show that CoLoRMap compares well with existing hybrid correction methods.The source code of CoLoRMap is freely available for non-commercial use at https://github.com/sfu-compbio/colormapehaghshe@sfu.ca or cedric.chauve@sfu.caSupplementary data are available at Bioinformatics online.© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

July 7, 2019

Epigenetic mechanisms in microbial members of the human microbiota: current knowledge and perspectives.

The human microbiota and epigenetic processes have both been shown to play a crucial role in health and disease. However, there is extremely scarce information on epigenetic modulation of microbiota members except for a few pathogens. Mainly DNA adenine methylation has been described extensively in modulating the virulence of pathogenic bacteria in particular. It would thus appear likely that such mechanisms are widespread for most bacterial members of the microbiota. This review will present briefly the current knowledge on epigenetic processes in bacteria, give examples of known methylation processes in microbial members of the human microbiota and summarize the knowledge on regulation of host epigenetic processes by the human microbiota.

July 7, 2019

svclassify: a method to establish benchmark structural variant calls.

The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.

July 7, 2019

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Contigs assembled from the second generation sequencing short reads may contain misassemblies, and thus complicate downstream analysis or even lead to incorrect analysis results. Fortunately, with more and more sequenced species available, it becomes possible to use the reference genome of a closely related species to detect misassemblies. In addition, long reads of the third generation sequencing technology have been more and more widely used, and can also help detect misassemblies.Here, we introduce ReMILO, a reference assisted misassembly detection algorithm that uses both short reads and PacBio SMRT long reads. ReMILO aligns the initial short reads to both the contigs and reference genome, and then constructs a novel data structure called red-black multipositional de Bruijn graph to detect misassemblies. In addition, ReMILO also aligns the contigs to long reads and find their differences from the long reads to detect more misassemblies. In our performance test on short read assemblies of human chromosome 14 data, ReMILO can detect 41.8-77.9% extensive misassemblies and 33.6-54.5% local misassemblies. On hybrid short and long read assemblies of S.pastorianus data, ReMILO can also detect 60.6-70.9% extensive misassemblies and 28.6-54.0% local misassemblies.The ReMILO software can be downloaded for free under Artistic License 2.0 from this site: https://github.com/songc001/remilo.baoe@bjtu.edu.cn.Supplementary data are available at Bioinformatics online.© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Auto Tag: Structural variation

Interchromosomal core duplicons drive both evolutionary instability and disease susceptibility of the Chromosome 8p23.1 region.

Towards integration of population and comparative genomics in forest trees.

The evolution of orphan regions in genomes of a fungal pathogen of wheat.

Deep sequencing of 10,000 human genomes.

Variant exported blood-stage proteins encoded by Plasmodium multigene families are expressed in liver stages where they are exported into the parasitophorous vacuole.

An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Genomic insights into Campylobacter jejuni virulence and population genetics

Whole-genome de novo sequencing, combined with RNA-Seq analysis, reveals unique genome and physiological features of the amylolytic yeast Saccharomycopsis fibuligera and its interspecies hybrid.

Chromosome assembly of large and complex genomes using multiple references

STR-realigner: a realignment method for short tandem repeat regions.

SRinversion: a tool for detecting short inversions by splitting and re-aligning poorly mapped and unmapped sequencing reads.

CoLoRMap: Correcting Long Reads by Mapping short reads.

Epigenetic mechanisms in microbial members of the human microbiota: current knowledge and perspectives.

svclassify: a method to establish benchmark structural variant calls.

ReMILO: reference assisted misassembly detection algorithm using short and long reads.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert