Human genome Archives - Page 47 of 59

July 7, 2019

Trypanosoma cruzi specific mRNA amplification by in vitro transcription improves parasite transcriptomics in host-parasite RNA mixtures.

Trypanosomatids are a group of protozoan parasites that includes the etiologic agents of important human illnesses as Chagas disease, sleeping sickness and leishmaniasis. These parasites have a significant distinction from other eukaryotes concerning mRNA structure, since all mature mRNAs have an identical species-specific sequence of 39 nucleotides at the 5′ extremity, named spliced leader (SL). Considering this peculiar aspect of trypanosomatid mRNA, the aim of the present work was to develop a Trypanosoma cruzi specific in vitro transcription (IVT) linear mRNA amplification method in order to improve parasite transcriptomics analyses.We designed an oligonucleotide complementary to the last 21 bases of T. cruzi SL sequence, bearing an upstream T7 promoter (T7SL primer), which was used to direct the synthesis of second-strand cDNA. Original mRNA was then amplified by IVT using T7 RNA polymerase. T7SL-amplified RNA from two distinct T. cruzi stages (epimastigotes and trypomastigotes) were deep sequenced in SOLiD platform. Usual poly(A) + RNA and and T7-oligo(dT) amplified RNA (Eberwine method) were also sequenced. RNA-Seq reads were aligned to our new and improved T. cruzi Dm28c genome assembly (PacBio technology) and resulting transcriptome pattern from these three RNA preparation methods were compared, mainly concerning the conservation of mRNA transcritional levels and DEGs detection between epimastigotes and trypomastigotes.T7SL IVT method detected more potential differentially expressed genes in comparison to either poly(A) + RNA or T7dT IVT, and was also able to produce reliable quantifications of the parasite transcriptome down to 3 ng of total RNA. Furthermore, amplification of parasite mRNA in HeLa/epimastigote RNA mixtures showed that T7SL IVT generates transcriptome quantification with similar detection of differentially expressed genes when parasite RNA mass was only 0.1% of the total mixture (R = 0.78 when compared to poly(A) + RNA).The T7SL IVT amplification method presented here allows the detection of more potential parasite differentially expressed genes (in comparison to poly(A) + RNA) in host-parasite mixtures or samples with low amount of RNA. This method is especially useful for trypanosomatid transcriptomics because it produces less bias than PCR-based mRNA amplification. Additionally, by simply changing the complementary region of the T7SL primer, the present method can be applied to any trypanosomatid species.

July 7, 2019

GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly.

The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis.© 2017 Cameron et al.; Published by Cold Spring Harbor Laboratory Press.

July 7, 2019

Dense and accurate whole-chromosome haplotyping of individual genomes.

The diploid nature of the human genome is neglected in many analyses done today, where a genome is perceived as a set of unphased variants with respect to a reference genome. This lack of haplotype-level analyses can be explained by a lack of methods that can produce dense and accurate chromosome-length haplotypes at reasonable costs. Here we introduce an integrative phasing strategy that combines global, but sparse haplotypes obtained from strand-specific single-cell sequencing (Strand-seq) with dense, yet local, haplotype information available through long-read or linked-read sequencing. We provide comprehensive guidance on the required sequencing depths and reliably assign more than 95% of alleles (NA12878) to their parental haplotypes using as few as 10 Strand-seq libraries in combination with 10-fold coverage PacBio data or, alternatively, 10X Genomics linked-read sequencing data. We conclude that the combination of Strand-seq with different technologies represents an attractive solution to chart the genetic variation of diploid genomes.

July 7, 2019

Variant review with the Integrative Genomics Viewer.

Manual review of aligned reads for confirmation and interpretation of variant calls is an important step in many variant calling pipelines for next-generation sequencing (NGS) data. Visual inspection can greatly increase the confidence in calls, reduce the risk of false positives, and help characterize complex events. The Integrative Genomics Viewer (IGV) was one of the first tools to provide NGS data visualization, and it currently provides a rich set of tools for inspection, validation, and interpretation of NGS datasets, as well as other types of genomic data. Here, we present a short overview of IGV’s variant review features for both single-nucleotide variants and structural variants, with examples from both cancer and germline datasets. IGV is freely available at https://www.igv.org Cancer Res; 77(21); e31-34. ©2017 AACR.©2017 American Association for Cancer Research.

July 7, 2019

Tools for annotation and comparison of structural variation.

The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.

July 7, 2019

Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing.

Microsatellite expansion, such as trinucleotide repeat expansion (TRE), is known to cause a number of genetic diseases. Sanger sequencing and next-generation short-read sequencing are unable to interrogate TRE reliably. We developed a novel algorithm called RepeatHMM to estimate repeat counts from long-read sequencing data. Evaluation on simulation data, real amplicon sequencing data on two repeat expansion disorders, and whole-genome sequencing data generated by PacBio and Oxford Nanopore technologies showed superior performance over competing approaches. We concluded that long-read sequencing coupled with RepeatHMM can estimate repeat counts on microsatellites and can interrogate the “unsequenceable” genomic trinucleotide repeat disorders.

July 7, 2019

Hunting structural variants: Population by population

Until recently, most population-scale genome sequencing studies have focused on identifying single nucleotide variants (SNVs) to explore genetic differences between individuals. Like so many SNV-based genome-wide association studies, however, these efforts have had difficulty identifying causative genetic mechanisms underlying most complex functions. More and more, the genomics community has realised that structural variation is likely responsible for many of the traits and phenotypes that scientists have not been able to attribute to SNVs. This class of variants, defined as genetic differences of 50 bp or larger, accounts for most of the DNA sequence differences between any two people. Structural variants (SVs) are also already known to cause many common and rare diseases including ALS, schizophrenia, leukemia, Carney complex, and Huntington’s disease. Despite the importance of SVs, these larger variants have been understudied and underreported compared to their single-nucleotide counterparts. One reason is that they remain difficult to detect. Their length often means they cannot be fully spanned using short sequencing reads. They also often occur in highly repetitive or GC-rich regions of the genome, making them challenging targets. As such, this class of human genetic variation has remained vastly under-explored in global populations and is now ripe for discovery.

July 7, 2019

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

The de novo assembly of repeat-rich mammalian genomes using only high-throughput short read sequencing data typically results in highly fragmented genome assemblies that limit downstream applications. Here, we present an iterative approach to hybrid de novo genome assembly that incorporates datasets stemming from multiple genomic technologies and methods. We used this approach to improve the gray mouse lemur (Microcebus murinus) genome from early draft status to a near chromosome-scale assembly.We used a combination of advanced genomic technologies to iteratively resolve conflicts and super-scaffold the M. murinus genome.We improved the M. murinus genome assembly to a scaffold N50 of 93.32 Mb. Whole genome alignments between our primary super-scaffolds and 23 human chromosomes revealed patterns that are congruent with historical comparative cytogenetic data, thus demonstrating the accuracy of our de novo scaffolding approach and allowing assignment of scaffolds to M. murinus chromosomes. Moreover, we utilized our independent datasets to discover and characterize sequences associated with centromeres across the mouse lemur genome. Quality assessment of the final assembly found 96% of mouse lemur canonical transcripts nearly complete, comparable to other published high-quality reference genome assemblies.We describe a new assembly of the gray mouse lemur (Microcebus murinus) genome with chromosome-scale scaffolds produced using a hybrid bioinformatic and sequencing approach. The approach is cost effective and produces superior results based on metrics of contiguity and completeness. Our results show that emerging genomic technologies can be used in combination to characterize centromeres of non-model species and to produce accurate de novo chromosome-scale genome assemblies of complex mammalian genomes.

July 7, 2019

Detection of complex structural variation from paired-end sequencing data

Detecting structural variants (SVs) from sequencing data is a key problem in genome analysis, but the full diversity of SVs is not captured by most methods. We introduce the Automated Reconstruction of Complex Structural Variants (ARC-SV) method, which detects a broad class of structural variants from paired-end whole genome sequencing (WGS) data. Analysis of samples from NA12878 and HuRef suggests that complex SVs are often misclassified by traditional methods. We validated our results both experimentally and by comparison to whole genome assembly and PacBio data; ARC-SV compares favorably to existing algorithms in general and gives state-of-the-art results on complex SV detection. By expanding the range of detectable SVs compared to commonly-used algorithms, ARC-SV allows additional information to be extracted from existing WGS data.

July 7, 2019

Copy number variation probes inform diverse applications

A major contributor to inter-individual genomic variability is copy number variation (CNV). CNVs change the diploid status of the DNA, involve one or multiple genes, and may disrupt coding regions, affect regulatory elements, or change gene dosage. While some of these changes may have no phenotypic consequences, others underlie disease, explain evolutionary processes, or impact the response to medication.

July 7, 2019

Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

Eliminating errors in next-generation DNA sequencing has proved challenging. Here we present error-correction code (ECC) sequencing, a method to greatly improve sequencing accuracy by combining fluorogenic sequencing-by-synthesis (SBS) with an information theory-based error-correction algorithm. ECC embeds redundancy in sequencing reads by creating three orthogonal degenerate sequences, generated by alternate dual-base reactions. This is similar to encoding and decoding strategies that have proved effective in detecting and correcting errors in information communication and storage. We show that, when combined with a fluorogenic SBS chemistry with raw accuracy of 98.1%, ECC sequencing provides single-end, error-free sequences up to 200 bp. ECC approaches should enable accurate identification of extremely rare genomic variations in various applications in biology and medicine.

July 7, 2019

Comparative and population genomic landscape of Phellinus noxius: A hypervariable fungus causing root rot in trees.

The order Hymenochaetales of white rot fungi contain some of the most aggressive wood decayers causing tree deaths around the world. Despite their ecological importance and the impact of diseases they cause, little is known about the evolution and transmission patterns of these pathogens. Here, we sequenced and undertook comparative genomic analyses of Hymenochaetales genomes using brown root rot fungus Phellinus noxius, wood-decomposing fungus Phellinus lamaensis, laminated root rot fungus Phellinus sulphurascens and trunk pathogen Porodaedalea pini. Many gene families of lignin-degrading enzymes were identified from these fungi, reflecting their ability as white rot fungi. Comparing against distant fungi highlighted the expansion of 1,3-beta-glucan synthases in P. noxius, which may account for its fast-growing attribute. We identified 13 linkage groups conserved within Agaricomycetes, suggesting the evolution of stable karyotypes. We determined that P. noxius has a bipolar heterothallic mating system, with unusual highly expanded ~60 kb A locus as a result of accumulating gene transposition. We investigated the population genomics of 60 P. noxius isolates across multiple islands of the Asia Pacific region. Whole-genome sequencing showed this multinucleate species contains abundant poly-allelic single nucleotide polymorphisms with atypical allele frequencies. Different patterns of intra-isolate polymorphism reflect mono-/heterokaryotic states which are both prevalent in nature. We have shown two genetically separated lineages with one spanning across many islands despite the geographical barriers. Both populations possess extraordinary genetic diversity and show contrasting evolutionary scenarios. These results provide a framework to further investigate the genetic basis underlying the fitness and virulence of white rot fungi.© 2017 John Wiley & Sons Ltd.

July 7, 2019

The novel HLA-B*08:183 allele identified by sequence-based typing in a Caucasian leukemia patient.

July 7, 2019

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes.© The Authors 2017. Published by Oxford University Press.

July 7, 2019

Genomics of parallel adaptation at two timescales in Drosophila.

Two interesting unanswered questions are the extent to which both the broad patterns and genetic details of adaptive divergence are repeatable across species, and the timescales over which parallel adaptation may be observed. Drosophila melanogaster is a key model system for population and evolutionary genomics. Findings from genetics and genomics suggest that recent adaptation to latitudinal environmental variation (on the timescale of hundreds or thousands of years) associated with Out-of-Africa colonization plays an important role in maintaining biological variation in the species. Additionally, studies of interspecific differences between D. melanogaster and its sister species D. simulans have revealed that a substantial proportion of proteins and amino acid residues exhibit adaptive divergence on a roughly few million years long timescale. Here we use population genomic approaches to attack the problem of parallelism between D. melanogaster and a highly diverged conger, D. hydei, on two timescales. D. hydei, a member of the repleta group of Drosophila, is similar to D. melanogaster, in that it too appears to be a recently cosmopolitan species and recent colonizer of high latitude environments. We observed parallelism both for genes exhibiting latitudinal allele frequency differentiation within species and for genes exhibiting recurrent adaptive protein divergence between species. Greater parallelism was observed for long-term adaptive protein evolution and this parallelism includes not only the specific genes/proteins that exhibit adaptive evolution, but extends even to the magnitudes of the selective effects on interspecific protein differences. Thus, despite the roughly 50 million years of time separating D. melanogaster and D. hydei, and despite their considerably divergent biology, they exhibit substantial parallelism, suggesting the existence of a fundamental predictability of adaptive evolution in the genus.

Auto Tag: Human genome

Trypanosoma cruzi specific mRNA amplification by in vitro transcription improves parasite transcriptomics in host-parasite RNA mixtures.

GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly.

Dense and accurate whole-chromosome haplotyping of individual genomes.

Variant review with the Integrative Genomics Viewer.

Tools for annotation and comparison of structural variation.

Interrogating the “unsequenceable” genomic trinucleotide repeat disorders by long-read sequencing.

Hunting structural variants: Population by population

Hybrid de novo genome assembly and centromere characterization of the gray mouse lemur (Microcebus murinus).

Detection of complex structural variation from paired-end sequencing data

Copy number variation probes inform diverse applications

Highly accurate fluorogenic DNA sequencing with information theory-based error correction.

Comparative and population genomic landscape of Phellinus noxius: A hypervariable fungus causing root rot in trees.

The novel HLA-B*08:183 allele identified by sequence-based typing in a Caucasian leukemia patient.

Filling reference gaps via assembling DNA barcodes using high-throughput sequencing-moving toward barcoding the world.

Genomics of parallel adaptation at two timescales in Drosophila.

Subscribe for blog updates:

Filter by topic

Talk with an expert

Antimicrobial resistance research

Subscribe for blog updates:

Filter by topic

Talk with an expert