April 21, 2020  |  

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads.

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes. © 2019 John Wiley & Sons Ltd/University College London.


April 21, 2020  |  

Characterization of Reference Materials for Genetic Testing of CYP2D6 Alleles: A GeT-RM Collaborative Project.

Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination Program studies and 42 additional samples that had not been characterized previously. DNA samples were distributed to volunteer testing laboratories for genotyping using a variety of commercially available and laboratory-developed tests. These publicly available samples will support the quality-assurance and quality-control programs of clinical laboratories performing CYP2D6 testing.Published by Elsevier Inc.


April 21, 2020  |  

Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid

Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time


April 21, 2020  |  

A robust benchmark for germline structural variant detection

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls =50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.


April 21, 2020  |  

Rapid evolution of a-gliadin gene family revealed by analyzing Gli-2 locus regions of wild emmer wheat.

a-Gliadins are a major group of gluten proteins in wheat flour that contribute to the end-use properties for food processing and contain major immunogenic epitopes that can cause serious health-related issues including celiac disease (CD). a-Gliadins are also the youngest group of gluten proteins and are encoded by a large gene family. The majority of the gene family members evolved independently in the A, B, and D genomes of different wheat species after their separation from a common ancestral species. To gain insights into the origin and evolution of these complex genes, the genomic regions of the Gli-2 loci encoding a-gliadins were characterized from the tetraploid wild emmer, a progenitor of hexaploid bread wheat that contributed the AABB genomes. Genomic sequences of Gli-2 locus regions for the wild emmer A and B genomes were first reconstructed using the genome sequence scaffolds along with optical genome maps. A total of 24 and 16 a-gliadin genes were identified for the A and B genome regions, respectively. a-Gliadin pseudogene frequencies of 86% for the A genome and 69% for the B genome were primarily caused by C to T substitutions in the highly abundant glutamine codons, resulting in the generation of premature stop codons. Comparison with the homologous regions from the hexaploid wheat cv. Chinese Spring indicated considerable sequence divergence of the two A genomes at the genomic level. In comparison, conserved regions between the two B genomes were identified that included a-gliadin pseudogenes containing shared nested TE insertions. Analyses of the genomic organization and phylogenetic tree reconstruction indicate that although orthologous gene pairs derived from speciation were present, large portions of a-gliadin genes were likely derived from differential gene duplications or deletions after the separation of the homologous wheat genomes ~?0.5 MYA. The higher number of full-length intact a-gliadin genes in hexaploid wheat than that in wild emmer suggests that human selection through domestication might have an impact on a-gliadin evolution. Our study provides insights into the rapid and dynamic evolution of genomic regions harboring the a-gliadin genes in wheat.


April 21, 2020  |  

Updated assembly resource of Phytophthora ramorum Pr102 isolate incorporating long reads from PacBio sequencing.

The NA1 clonal lineage of Phytophthora ramorum is responsible for Sudden Oak Death, an epidemic that has devastated California’s coastal forest ecosystems. An NA1 isolate Pr102 derived from coast live oak in California was previously sequenced and reported with 65 Mb assembly containing 12 Mb gaps in 2576 scaffolds. Here we report an improved 70 Mb genome in 1512 scaffolds with 6752 bp gaps after incorporating PacBio P5-C3 longreads. This assembly contains 19494 gene models (average gene length 2515 bp) compared to 16134 genes (average gene length of 1673 bp) in the previous version. We predicted 29 new RXLRs and 76 new paralogs of a total 392 RXLRs from this assembly. We predicted 35 CRNs compared to 19 in earlier version with six paralogs. Our lncRNAs prediction identified 255 candidates. This new resource will be invaluable for future evolution studies on the invasive plant pathogen.


April 21, 2020  |  

The Chinese chestnut genome: a reference for species restoration

Forest tree species are increasingly subject to severe mortalities from exotic pests, diseases, and invasive organisms, accelerated by climate change. Forest health issues are threatening multiple species and ecosystem sustainability globally. While sources of resistance may be available in related species, or among surviving trees, introgression of resistance genes into threatened tree species in reasonable time frames requires genome-wide breeding tools. Asian species of chestnut (Castanea spp.) are being employed as donors of disease resistance genes to restore native chestnut species in North America and Europe. To aid in the restoration of threatened chestnut species, we present the assembly of a reference genome with chromosome-scale sequences for Chinese chestnut (C. mollissima), the disease-resistance donor for American chestnut restoration. We also demonstrate the value of the genome as a platform for research and species restoration, including new insights into the evolution of blight resistance in Asian chestnut species, the locations in the genome of ecologically important signatures of selection differentiating American chestnut from Chinese chestnut, the identification of candidate genes for disease resistance, and preliminary comparisons of genome organization with related species.


April 21, 2020  |  

Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline

Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and allow for annotation of TEs. There are numerous methods for each class of elements with unknown relative performance metrics. We benchmarked existing programs based on a curated library of rice TEs. Using the most robust programs, we created a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a condensed TE library for annotations of structurally intact and fragmented elements. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.List of abbreviationsTETransposable ElementsLTRLong Terminal RepeatLINELong Interspersed Nuclear ElementSINEShort Interspersed Nuclear ElementMITEMiniature Inverted Transposable ElementTIRTerminal Inverted RepeatTSDTarget Site DuplicationTPTrue PositivesFPFalse PositivesTNTrue NegativeFNFalse NegativesGRFGeneric Repeat FinderEDTAExtensive de-novo TE Annotator


April 21, 2020  |  

Variant Phasing and Haplotypic Expression from Single-molecule Long-read Sequencing in Maize

Haplotype phasing of genetic variants is important for interpretation of the maize genome, population genetic analysis, and functional genomic analysis of allelic activity. Accordingly, accurate methods for phasing full-length isoforms are essential for functional genomics study. In this study, we performed an isoform-level phasing study in maize, using two inbred lines and their reciprocal crosses, based on single-molecule full-length cDNA sequencing. To phase and analyze full-length transcripts between hybrids and parents, we developed a tool called IsoPhase. Using this tool, we validated the majority of SNPs called against matching short read data and identified cases of allele-specific, gene-level, and isoform-level expression. Our results revealed that maize parental and hybrid lines exhibit different splicing activities. After phasing 6,847 genes in two reciprocal hybrids using embryo, endosperm and root tissues, we annotated the SNPs and identified large-effect genes. In addition, based on single-molecule sequencing, we identified parent-of-origin isoforms in maize hybrids, different novel isoforms between maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase power and accuracy in studies of allelic expression.


April 21, 2020  |  

The use of Online Tools for Antimicrobial Resistance Prediction by Whole Genome Sequencing in MRSA and VRE.

The antimicrobial resistance (AMR) crisis represents a serious threat to public health and has resulted in concentrated efforts to accelerate development of rapid molecular diagnostics for AMR. In combination with publicly-available web-based AMR databases, whole genome sequencing (WGS) offers the capacity for rapid detection of antibiotic resistance genes. Here we studied the concordance between WGS-based resistance prediction and phenotypic susceptibility testing results for methicillin-resistant Staphylococcus aureus (MRSA) and vancomycin resistant Enterococcus (VRE) clinical isolates using publicly-available tools and databases.Clinical isolates prospectively collected at the University of Pittsburgh Medical Center between December 2016 and December 2017 underwent WGS. Antibiotic resistance gene content was assessed from assembled genomes by BLASTn search of online databases. Concordance between WGS-predicted resistance profile and phenotypic susceptibility as well as sensitivity, specificity, positive and negative predictive values (NPV, PPV) were calculated for each antibiotic/organism combination, using the phenotypic results as the gold standard.Phenotypic susceptibility testing and WGS results were available for 1242 isolate/antibiotic combinations. Overall concordance was 99.3% with a sensitivity, specificity, PPV, NPV of 98.7% (95% CI, 97.2-99.5%), 99.6% (95 % CI, 98.8-99.9%), 99.3% (95% CI, 98.0-99.8%), 99.2% (95% CI, 98.3-99.7%), respectively. Additional identification of point mutations in housekeeping genes increased the concordance to 99.4% and the sensitivity to 99.3% (95% CI, 98.2-99.8%) and NPV to 99.4% (95% CI, 98.4-99.8%).WGS can be used as a reliable predicator of phenotypic resistance for both MRSA and VRE using readily-available online tools.Copyright © 2019. Published by Elsevier Ltd.


April 21, 2020  |  

Microbial diversity in the tick Argas japonicus (Acari: Argasidae) with a focus on Rickettsia pathogens.

The soft tick Argas japonicus mainly infests birds and can cause human dermatitis; however, no pathogen has been identified from this tick species in China. In the present study, the microbiota in A. japonicus collected from an epidemic community was explored, and some putative Rickettsia pathogens were further characterized. The results obtained indicated that bacteria in A. japonicus were mainly ascribed to the phyla Proteobacteria, Firmicutes and Actinobacteria. At the genus level, the male A. japonicus harboured more diverse bacteria than the females and nymphs. The bacteria Alcaligenes, Pseudomonas, Rickettsia and Staphylococcus were common in nymphs and adults. The abundance of bacteria belonging to the Rickettsia genus in females and males was 7.27% and 10.42%, respectively. Furthermore, the 16S rRNA gene of Rickettsia was amplified and sequenced, and phylogenetic analysis revealed that 13 sequences were clustered with the spotted fever group rickettsiae (Rickettsia heilongjiangensis and Rickettsia japonica) and three were clustered with Rickettsia limoniae, which suggested that the characterized Rickettsia in A. japonicus were novel putative pathogens and also that the residents were at considerable risk for infection by tick-borne pathogens. © 2019 The Royal Entomological Society.


April 21, 2020  |  

Defining transgene insertion sites and off-target effects of homology-based gene silencing informs the use of functional genomics tools in Phytophthora infestans.

DNA transformation and homology-based transcriptional silencing are frequently used to assess gene function in Phytophthora. Since unplanned side-effects of these tools are not well-characterized, we used P. infestans to study plasmid integration sites and whether knockdowns caused by homology-dependent silencing spreads to other genes. Insertions occurred both in gene-dense and gene-sparse regions but disproportionately near the 5′ ends of genes, which disrupted native coding sequences. Microhomology at the recombination site between plasmid and chromosome was common. Studies of transformants silenced for twelve different gene targets indicated that neighbors within 500-nt were often co-silenced, regardless of whether hairpin or sense constructs were employed and the direction of transcription of the target. However, cis-spreading of silencing did not occur in all transformants obtained with the same plasmid. Genome-wide studies indicated that unlinked genes with partial complementarity with the silencing-inducing transgene were not usually down-regulated. We learned that hairpin or sense transgenes were not co-silenced with the target in all transformants, which informs how screens for silencing should be performed. We conclude that transformation and gene silencing can be reliable tools for functional genomics in Phytophthora but must be used carefully, especially by testing for the spread of silencing to genes flanking the target.


April 21, 2020  |  

Strengths and potential pitfalls of hay-transfer for ecological restoration revealed by RAD-seq analysis in floodplain Arabis species

Achieving high intraspecific genetic diversity is a critical goal in ecological restoration as it increases the adaptive potential and long-term resilience of populations. Thus, we investigated genetic diversity within and between pristine sites in a fossil floodplain and compared it to sites restored by hay-transfer between 1997 and 2014. RAD-seq genotyping revealed that the stenoecious flood-plain species Arabis nemorensis is co-occurring with individuals that, based on ploidy, ITS-sequencing and morphology, probably belong to the close relative Arabis sagittata, which has a documented preference for dry calcareous grasslands but has not been reported in floodplain meadows. We show that hay-transfer maintains genetic diversity for both species. Additionally, in A. sagittata, transfer from multiple genetically isolated pristine sites resulted in restored sites with increased diversity and admixed local genotypes. In A. nemorensis, transfer did not create novel admixture dynamics because genetic diversity between pristine sites was less differentiated. Thus, the effects of hay-transfer on genetic diversity also depend on the genetic makeup of the donor communities of each species, especially when local material is mixed. Our results demonstrate the efficiency of hay-transfer for habitat restoration and emphasize the importance of pre-restoration characterization of micro-geographic patterns of intraspecific diversity of the community to guarantee that restoration practices reach their goal, i.e. maximize the adaptive potential of the entire restored plant community. Overlooking these patterns may alter the balance between species in the community. Additionally, our comparison of summary statistics obtained from de novo and reference-based RAD-seq pipelines shows that the genomic impact of restoration can be reliably monitored in species lacking prior genomic knowledge.


April 21, 2020  |  

Large-scale ruminant genome sequencing provides insights into their evolution and distinct traits.

The ruminants are one of the most successful mammalian lineages, exhibiting morphological and habitat diversity and containing several key livestock species. To better understand their evolution, we generated and analyzed de novo assembled genomes of 44 ruminant species, representing all six Ruminantia families. We used these genomes to create a time-calibrated phylogeny to resolve topological controversies, overcoming the challenges of incomplete lineage sorting. Population dynamic analyses show that population declines commenced between 100,000 and 50,000 years ago, which is concomitant with expansion in human populations. We also reveal genes and regulatory elements that possibly contribute to the evolution of the digestive system, cranial appendages, immune system, metabolism, body size, cursorial locomotion, and dentition of the ruminants. Copyright © 2019 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.