In this webinar, Lori Aro and Cheryl Heiner of PacBio describe how high-throughput amplicon sequencing using Single Molecule, Real-Time (SMRT) Sequencing and the Sequel System allows for the easy and…
In this webinar, Jenny Ekholm and Paul Kotturi provide an overview of the PacBio No-Amp targeted sequencing application and its uses for targeting hard-to-amplify genes. This approach couples CRISPR-Cas9 with…
Jeremy Schmutz discusses the increased throughput and reduced project costs using HiFi reads from the PacBio Sequel II System in his work sequencing, assembling, and analyzing a variety of genomes…
The release of the PacBio Sequel II System in 2019 brought dramatic throughput improvements and protocols for producing a new data type, highly accurate long reads or HiFi reads. PacBio…
Video Poster: A new approach to Thalassemia and Ataxia carrier screening panels using CRISPR-Cas9 enrichment and long-read sequencing
Although PCR is a cost-effective way to enrich for genomic regions of interest for DNA sequencing, amplifying regions with extreme GC-content and long stretches of short tandem repeat (STR) sequences…
Characterization of Reference Materials for Genetic Testing of CYP2D6 Alleles: A GeT-RM Collaborative Project.
Pharmacogenetic testing increasingly is available from clinical and research laboratories. However, only a limited number of quality control and other reference materials currently are available for the complex rearrangements and rare variants that occur in the CYP2D6 gene. To address this need, the Division of Laboratory Systems, CDC-based Genetic Testing Reference Material Coordination Program, in collaboration with members of the pharmacogenetic testing and research communities and the Coriell Cell Repositories (Camden, NJ), has characterized 179 DNA samples derived from Coriell cell lines. Testing included the recharacterization of 137 genomic DNAs that were genotyped in previous Genetic Testing Reference Material Coordination Program studies and 42 additional samples that had not been characterized previously. DNA samples were distributed to volunteer testing laboratories for genotyping using a variety of commercially available and laboratory-developed tests. These publicly available samples will support the quality-assurance and quality-control programs of clinical laboratories performing CYP2D6 testing.Published by Elsevier Inc.
Existing long-read assemblers require tens of thousands of CPU hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a novel long-read assembler wtdbg2 that, for human data, is tens of times faster than published tools while achieving comparable contiguity and accuracy. It represents a significant algorithmic advance and paves the way for population-scale long-read assembly in future.
RADAR-seq: A RAre DAmage and Repair sequencing method for detecting DNA damage on a genome-wide scale.
RAre DAmage and Repair sequencing (RADAR-seq) is a highly adaptable sequencing method that enables the identification and detection of rare DNA damage events for a wide variety of DNA lesions at single-molecule resolution on a genome-wide scale. In RADAR-seq, DNA lesions are replaced with a patch of modified bases that can be directly detected by Pacific Biosciences Single Molecule Real-Time (SMRT) sequencing. RADAR-seq enables dynamic detection over a wide range of DNA damage frequencies, including low physiological levels. Furthermore, without the need for DNA amplification and enrichment steps, RADAR-seq provides sequencing coverage of damaged and undamaged DNA across an entire genome. Here, we use RADAR-seq to measure the frequency and map the location of ribonucleotides in wild-type and RNaseH2-deficient E. coli and Thermococcus kodakarensis strains. Additionally, by tracking ribonucleotides incorporated during in vivo lagging strand DNA synthesis, we determined the replication initiation point in E. coli, and its relation to the origin of replication (oriC). RADAR-seq was also used to map cyclobutane pyrimidine dimers (CPDs) in Escherichia coli (E. coli) genomic DNA exposed to UV-radiation. On a broader scale, RADAR-seq can be applied to understand formation and repair of DNA damage, the correlation between DNA damage and disease initiation and progression, and complex biological pathways, including DNA replication.Copyright © 2019 The Authors. Published by Elsevier B.V. All rights reserved.
Combining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes. © 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Genome Comparisons of Wild Isolates of Caulobacter crescentus Reveal Rates of Inversion and Horizontal Gene Transfer.
Since previous interspecies comparisons of Caulobacter genomes have revealed extensive genome rearrangements, we decided to compare the nucleotide sequences of four C. crescentus genomes, NA1000, CB1, CB2, and CB13. To accomplish this goal, we used PacBio sequencing technology to determine the nucleotide sequence of the CB1, CB2, and CB13 genomes, and obtained each genome sequence as a single contig. To correct for possible sequencing errors, each genome was sequenced twice. The only differences we observed between the two sets of independently determined sequences were random omissions of a single base in a small percentage of the homopolymer regions where a single base is repeated multiple times. Comparisons of these four genomes indicated that horizontal gene transfer events that included small numbers of genes occurred at frequencies in the range of 10-3 to 10-4 insertions per generation. Large insertions were about 100 times less frequent. Also, in contrast to previous interspecies comparisons, we found no genome rearrangements when the closely related NA1000, CB1, and CB2 genomes were compared, and only eight inversions and one translocation when the more distantly related CB13 genome was compared to the other genomes. Thus, we estimate that inversions occur at a rate of one per 10 to 12 million generations in Caulobacter genomes. The inversions seem to be complex events that include the simultaneous creation of indels.
Long-read sequencing, CENP-A ChIP, and chromatin fiber imaging reveal the composition and organization of Drosophila melanogaster centromeres, which have long remained elusive despite the high quality of this species’ genome. assembly.
Crop domestication changed the course of human evolution, and domestication of maize (Zea mays L. subspecies mays), today the world’s most important crop, enabled civilizations to flourish and has played a major role in shaping the world we know today. Archaeological and ethnobotanical research help us understand the development of the cultures and the movements of the peoples who carried maize to new areas where it continued to adapt. Ancient remains of maize cobs and kernels have been found in the place of domestication, the Balsas River Valley (~9,000 years before present era), and the cultivation center, the Tehuacan Valley (~5,000 years before present era), and have been used to study the process of domestication. Paleogenomic data showed that some of the genes controlling the stem and inflorescence architecture were comparable to modern maize, while other genes controlling ear shattering and starch biosynthesis retain high levels of variability, similar to those found in the wild relative teosinte. These results indicate that the domestication process was both gradual and complex, where different genetic loci were selected at different points in time, and that the transformation of teosinte to maize was completed in the last 5,000 years. Mesoamerican native cultures domesticated teosinte and developed maize from a 6 cm long, popping-kernel ear to what we now recognize as modern maize with its wide variety in ear size, kernel texture, color, size, and adequacy for diverse uses and also invented nixtamalization, a process key to maximizing its nutrition. Used directly for human and animal consumption, processed food products, bioenergy, and many cultural applications, it is now grown on six of the world’s seven continents. The study of its evolution and domestication from the wild grass teosinte helps us understand the nature of genetic diversity of maize and its wild relatives and gene expression. Genetic barriers to direct use of teosinte or Tripsacum in maize breeding have challenged our ability to identify valuable genes and traits, let alone incorporate them into elite, modern varieties. Genomic information and newer genetic technologies will facilitate the use of wild relatives in crop improvement; hence it is more important than ever to ensure their conservation and availability, fundamental to future food security. In situ conservation efforts dedicated to preserving remnant populations of wild relatives in Mexico are key to safeguarding the genetic diversity of maize and its genepool, as well as enabling these species to continue to adapt to dynamic climate and environmental changes. Genebank ex situ efforts are crucial to securely maintain collected wild relative resources and to provide them for gene discovery and other research efforts.
The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We incorporated multi-index merging into the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2?GB RAM with negligible impact on accuracy.
The robust detection of structural variants in mammalian genomes remains a challenge. It is particularly difficult in the case of genetically unstable Chinese hamster ovary (CHO) cell lines with only draft genome assemblies available. We explore the potential of the CRISPR/Cas9 system for the targeted capture of genomic loci containing integrated vectors in CHO-K1-based cell lines followed by next generation sequencing (NGS), and compare it to popular target-enrichment sequencing methods and to whole genome sequencing (WGS). Three different CRISPR/Cas9-based techniques were evaluated; all of them allow for amplification-free enrichment of target genomic regions in the range from 5 to 60 fold, and for recovery of ~15 kb-long sequences with no sequencing artifacts introduced. The utility of these protocols has been proven by the identification of transgene integration sites and flanking sequences in three CHO cell lines. The long enriched fragments helped to identify Escherichia coli genome sequences co-integrated with vectors, and were further characterized by Whole Genome Sequencing (WGS). Other advantages of CRISPR/Cas9-based methods are the ease of bioinformatics analysis, potential for multiplexing, and the production of long target templates for real-time sequencing.
Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).
Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.