BAC sequencing Archives

June 1, 2021 |

Sequencing and de novo assembly of the 17q21.31 disease associated region using long reads generated by Pacific Biosciences SMRT Sequencing technology.

Assessment of genome-wide variation revealed regions of the genome with complex, structurally diverse haplotypes that are insufficiently represented in the human reference genome. The 17q21.31 region is one of the most dynamic and complex regions of the human genome. Different haplotypes exist, in direct and inverted orientation, showing evidence of positive selection and predisposing to microdeletion associated with mental retardation. Sequencing of different haplotypes is extremely important to characterize the spectrum of structural variation at this locus. However, de novo assembly with second-generation sequencing reads is still problematic. Using PacBio technology we have sequenced and de novo assembled a tiling path of eight BAC clones (~1.6 Mb region) across this medically relevant region from the library of a hydatidiform mole. Complete hydatidiform moles arise from the fertilization of an enucleated egg from a single sperm and therefore carry a haploid complement of the human genome, eliminating allelic variation that may confound mapping and assembly. The PacBio RS system enables single molecule real time sequencing, featuring long reads and fast turnaround times. With deep sequencing, PacBio reads were able to generate a very uniform sequencing coverage with close to 100% coverage of most of the target interval regions covered. Due to long read lengths, the PacBio RS data could be accurately assembled.

June 1, 2021 |

Genomic Architecture of the KIR and MHC-B and -C Regions in Orangutan

PacBio 2013 User Group Meeting Presentation Slides: Lisbeth Guethlein from Stanford University School of Medicine looked at highly repetitive and variable immune regions of the orangutan genome. Guethlein reported that “PacBio managed to accomplish in a week what I have been working on for a couple years” (with Sanger sequencing), and the results were concordant. “Long story short, I was a happy customer.”

June 1, 2021 |

SMRT Sequencing solutions for plant genomes and transcriptomes

Single Molecule, Real-Time (SMRT) Sequencing provides efficient, streamlined solutions to address new frontiers in plant genomes and transcriptomes. Inherent challenges presented by highly repetitive, low-complexity regions and duplication events are directly addressed with multi- kilobase read lengths exceeding 8.5 kb on average, with many exceeding 20 kb. Differentiating between transcript isoforms that are difficult to resolve with short-read technologies is also now possible. We present solutions available for both reference genome and transcriptome research that best leverage long reads in several plant projects including algae, Arabidopsis, rice, and spinach using only the PacBio platform. Benefits for these applications are further realized with consistent use of size-selection of input sample using the BluePippin™ device from Sage Science. We will share highlights from our genome projects using the latest P5- C3 chemistry to generate high-quality reference genomes with the highest contiguity, contig N50 exceeding 1 Mb, and average base quality of QV50. Additionally, the value of long, intact reads to provide a no-assembly approach to investigate transcript isoforms using our Iso-Seq protocol will be presented for full transcriptome characterization and targeted surveys of genes with complex structures. PacBio provides the most comprehensive assembly with annotation when combining offerings for both genome and transcriptome research efforts. For more focused investigation, PacBio also offers researchers opportunities to easily investigate and survey genes with complex structures.

June 1, 2021 |

Old school/new school genome sequencing: One step backward — a quantum leap forward.

As the costs for genome sequencing have decreased the number of “genome” sequences have increased at a rapid pace. Unfortunately, the quality and completeness of these so–called “genome” sequences have suffered enormously. We prefer to call such genome assemblies as “gene assembly space” (GAS). We believe it is important to distinguish GAS assemblies from reference genome assemblies (RGAs) as all subsequent research that depends on accurate genome assemblies can be highly compromised if the only assembly available is a GAS assembly.

June 1, 2021 |

Long read sequencing technology to solve complex genomic regions assembly in plants

Numerous whole genome sequencing projects already achieved or ongoing have highlighted the fact that obtaining a high quality genome sequence is necessary to address comparative genomics questions such as structural variations among genotypes and gain or loss of specific function. Despite the spectacular progress that has been done regarding sequencing technologies, accurate and reliable data are still challenging, at the whole genome scale but also when targeting specific genomic regions. These issues are even more noticeable for complex plant genomes. Most plant genomes are known to be particularly challenging due to their size, high density of repetitive elements and various levels of ploidy. To overcome these issues, we have developed a strategy in order to reduce the genome complexity by using the large insert BAC libraries combined with next generation sequencing technologies. We have compared two different technologies (Roche-454 and Pacific Biosciences PacBio RS II) to sequence pools of BAC clones in order to obtain the best quality sequence. We targeted nine BAC clones from different species (maize, wheat, strawberry, barley, sugarcane and sunflower) known to be complex in terms of sequence assembly. We sequenced the pools of the nine BAC clones with both technologies. We have compared results of assembly and highlighted differences due to the sequencing technologies used. We demonstrated that the long reads obtained with the PacBio RS II technology enables to obtain a better and more reliable assembly notably by preventing errors due to duplicated or repetitive sequences in the same region.

February 5, 2021 |

PAG PacBio Workshop: Introducing 5 new high-quality PacBio genome assemblies for rice to help solve the 10-billion people question

At PAG 2017, Rod Wing presented five new, high-quality rice genome assemblies developed with SMRT Sequencing, including one that has eight complete chromosomes including centromeres. He also offered an early…

April 21, 2020 |

Long-read sequencing for rare human genetic diseases.

During the past decade, the search for pathogenic mutations in rare human genetic diseases has involved huge efforts to sequence coding regions, or the entire genome, using massively parallel short-read sequencers. However, the approximate current diagnostic rate is <50% using these approaches, and there remain many rare genetic diseases with unknown cause. There may be many reasons for this, but one plausible explanation is that the responsible mutations are in regions of the genome that are difficult to sequence using conventional technologies (e.g., tandem-repeat expansion or complex chromosomal structural aberrations). Despite the drawbacks of high cost and a shortage of standard analytical methods, several studies have analyzed pathogenic changes in the genome using long-read sequencers. The results of these studies provide hope that further application of long-read sequencers to identify the causative mutations in unsolved genetic diseases may expand our understanding of the human genome and diseases. Such approaches may also be applied to molecular diagnosis and therapeutic strategies for patients with genetic diseases in the future.

April 21, 2020 |

Construction and comparison of three reference-quality genome assemblies for soybean.

We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and one of Glycine soja, the closest wild relative of G. max. The G. max assemblies are for widely used U.S. cultivars: the northern line ‘Williams 82’ (Wm82); and the southern line ‘Lee’. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 SNPs/kb between Wm82 and Lee, and 4.7 SNPs/kb between these lines and G. soja. SNP distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgressions and haplotype structure. Comparisons against the U.S. germplasm collection shows placement of the sequenced accessions relative to global soybean diversity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found ~40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and ~32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for soybean’s domestication and improvement, serving as a basis for future research and crop improvement efforts for this important crop species. This article is protected by copyright. All rights reserved.This article is protected by copyright. All rights reserved.

April 21, 2020 |

BjuWRR1, a CC-NB-LRR gene identified in Brassica juncea, confers resistance to white rust caused by Albugo candida.

BjuWRR1, a CNL-type R gene, was identified from an east European gene pool line of Brassica juncea and validated for conferring resistance to white rust by genetic transformation. White rust caused by the oomycete pathogen Albugo candida is a significant disease of crucifer crops including Brassica juncea (mustard), a major oilseed crop of the Indian subcontinent. Earlier, a resistance-conferring locus named AcB1-A5.1 was mapped in an east European gene pool line of B. juncea-Donskaja-IV. This line was tested along with some other lines of B. juncea (AABB), B. rapa (AA) and B. nigra (BB) for resistance to six isolates of A. candida collected from different mustard growing regions of India. Donskaja-IV was found to be completely resistant to all the tested isolates. Sequencing of a BAC spanning the locus AcB1-A5.1 showed the presence of a single CC-NB-LRR protein encoding R gene. The genomic sequence of the putative R gene with its native promoter and terminator was used for the genetic transformation of a susceptible Indian gene pool line Varuna and was found to confer complete resistance to all the isolates. This is the first white rust resistance-conferring gene described from Brassica species and has been named BjuWRR1. Allelic variants of the gene in B. juncea germplasm and orthologues in the Brassicaceae genomes were studied to understand the evolutionary dynamics of the BjuWRR1 gene.

April 21, 2020 |

Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population.

The central goal of medical genomics is to understand the inherited basis of sequence variation that underlies human physiology, evolution, and disease. Functional association studies currently ignore millions of bases that span each centromeric region and acrocentric short arm. These regions are enriched in long arrays of tandem repeats, or satellite DNAs, that are known to vary extensively in copy number and repeat structure in the human population. Satellite sequence variation in the human genome is often so large that it is detected cytogenetically, yet due to the lack of a reference assembly and informatics tools to measure this variability, contemporary high-resolution disease association studies are unable to detect causal variants in these regions. Nevertheless, recently uncovered associations between satellite DNA variation and human disease support that these regions present a substantial and biologically important fraction of human sequence variation. Therefore, there is a pressing and unmet need to detect and incorporate this uncharacterized sequence variation into broad studies of human evolution and medical genomics. Here I discuss the current knowledge of satellite DNA variation in the human genome, focusing on centromeric satellites and their potential implications for disease.

April 21, 2020 |

Impact of Chromosomal Rearrangements on the Interpretation of Lupin Karyotype Evolution.

Plant genome evolution can be very complex and challenging to describe, even within a genus. Mechanisms that underlie genome variation are complex and can include whole-genome duplications, gene duplication and/or loss, and, importantly, multiple chromosomal rearrangements. Lupins (Lupinus) diverged from other legumes approximately 60 mya. In contrast to New World lupins, Old World lupins show high variability not only for chromosome numbers (2n = 32?52), but also for the basic chromosome number (x = 5?9, 13) and genome size. The evolutionary basis that underlies the karyotype evolution in lupins remains unknown, as it has so far been impossible to identify individual chromosomes. To shed light on chromosome changes and evolution, we used comparative chromosome mapping among 11 Old World lupins, with Lupinusangustifolius as the reference species. We applied set of L.angustifolius-derived bacterial artificial chromosome clones for fluorescence in situ hybridization. We demonstrate that chromosome variations in the species analyzed might have arisen from multiple changes in chromosome structure and number. We hypothesize about lupin karyotype evolution through polyploidy and subsequent aneuploidy. Additionally, we have established a cytogenomic map of L.angustifolius along with chromosome markers that can be used for related species to further improve comparative studies of crops and wild lupins.

April 21, 2020 |

Crustacean Genome Exploration Reveals the Evolutionary Origin of White Spot Syndrome Virus.

White spot syndrome virus (WSSV) is a crustacean-infecting, double-stranded DNA virus and is the most serious viral pathogen in the global shrimp industry. WSSV is the sole recognized member of the family Nimaviridae, and the lack of genomic data on other nimaviruses has obscured the evolutionary history of WSSV. Here, we investigated the evolutionary history of WSSV by characterizing WSSV relatives hidden in host genomic data. We surveyed 14 host crustacean genomes and identified five novel nimaviral genomes. Comparative genomic analysis of Nimaviridae identified 28 “core genes” that are ubiquitously conserved in Nimaviridae; unexpected conservation of 13 uncharacterized proteins highlighted yet-unknown essential functions underlying the nimavirus replication cycle. The ancestral Nimaviridae gene set contained five baculoviral per os infectivity factor homologs and a sulfhydryl oxidase homolog, suggesting a shared phylogenetic origin of Nimaviridae and insect-associated double-stranded DNA viruses. Moreover, we show that novel gene acquisition and subsequent amplification reinforced the unique accessory gene repertoire of WSSV. Expansion of unique envelope protein and nonstructural virulence-associated genes may have been the key genomic event that made WSSV such a deadly pathogen.IMPORTANCE WSSV is the deadliest viral pathogen threatening global shrimp aquaculture. The evolutionary history of WSSV has remained a mystery, because few WSSV relatives, or nimaviruses, had been reported. Our aim was to trace the history of WSSV using the genomes of novel nimaviruses hidden in host genome data. We demonstrate that WSSV emerged from a diverse family of crustacean-infecting large DNA viruses. By comparing the genomes of WSSV and its relatives, we show that WSSV possesses an expanded set of unique host-virus interaction-related genes. This extensive gene gain may have been the key genomic event that made WSSV such a deadly pathogen. Moreover, conservation of insect-infecting virus protein homologs suggests a common phylogenetic origin of crustacean-infecting Nimaviridae and other insect-infecting DNA viruses. Our work redefines the previously poorly characterized crustacean virus family and reveals the ancient genomic events that preordained the emergence of a devastating shrimp pathogen.Copyright © 2019 American Society for Microbiology.

April 21, 2020 |

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

As they migrated out of Africa and into Europe and Asia, anatomically modern humans interbred with archaic hominins, such as Neanderthals and Denisovans. The result of this genetic introgression on the recipient populations has been of considerable interest, especially in cases of selection for specific archaic genetic variants. Hsieh et al. characterized adaptive structural variants and copy number variants that are likely targets of positive selection in Melanesians. Focusing on population-specific regions of the genome that carry duplicated genes and show an excess of amino acid replacements provides evidence for one of the mechanisms by which genetic novelty can arise and result in differentiation between human genomes.Science, this issue p. eaax2083INTRODUCTIONCharacterizing genetic variants underlying local adaptations in human populations is one of the central goals of evolutionary research. Most studies have focused on adaptive single-nucleotide variants that either arose as new beneficial mutations or were introduced after interbreeding with our now-extinct relatives, including Neanderthals and Denisovans. The adaptive role of copy number variants (CNVs), another well-known form of genomic variation generated through deletions or duplications that affect more base pairs in the genome, is less well understood, despite evidence that such mutations are subject to stronger selective pressures.RATIONALEThis study focuses on the discovery of introgressed and adaptive CNVs that have become enriched in specific human populations. We combine whole-genome CNV calling and population genetic inference methods to discover CNVs and then assess signals of selection after controlling for demographic history. We examine 266 publicly available modern human genomes from the Simons Genome Diversity Project and genomes of three ancient homininstextemdasha Denisovan, a Neanderthal from the Altai Mountains in Siberia, and a Neanderthal from Croatia. We apply long-read sequencing methods to sequence-resolve complex CNVs of interest specifically in the Melanesianstextemdashan Oceanian population distributed from Papua New Guinea to as far east as the islands of Fiji and known to harbor some of the greatest amounts of Neanderthal and Denisovan ancestry.RESULTSConsistent with the hypothesis of archaic introgression outside Africa, we find a significant excess of CNV sharing between modern non-African populations and archaic hominins (P = 0.039). Among Melanesians, we observe an enrichment of CNVs with potential signals of positive selection (n = 37 CNVs), of which 19 CNVs likely introgressed from archaic hominins. We show that Melanesian-stratified CNVs are significantly associated with signals of positive selection (P = 0.0323). Many map near or within genes associated with metabolism (e.g., ACOT1 and ACOT2), development and cell cycle or signaling (e.g., TNFRSF10D and CDK11A and CDK11B), or immune response (e.g., IFNLR1). We characterize two of the largest and most complex CNVs on chromosomes 16p11.2 and 8p21.3 that introgressed from Denisovans and Neanderthals, respectively, and are absent from most other human populations. At chromosome 16p11.2, we sequence-resolve a large duplication of >383 thousand base pairs (kbp) that originated from Denisovans and introgressed into the ancestral Melanesian population 60,000 to 170,000 years ago. This large duplication occurs at high frequency (>79%) in diverse Melanesian groups, shows signatures of positive selection, and maps adjacent to Homo sapienstextendashspecific duplications that predispose to rearrangements associated with autism. On chromosome 8p21.3, we identify a Melanesian haplotype that carries two CNVs, a ~6-kbp deletion, and a ~38-kbp duplication, with a Neanderthal origin and that introgressed into non-Africans 40,000 to 120,000 years ago. This CNV haplotype occurs at high frequency (44%) and shows signals consistent with a partial selective sweep in Melanesians. Using long-read sequencing genomic and transcriptomic data, we reconstruct the structure and complex evolutionary history for these two CNVs and discover previously undescribed duplicated genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that show an excess of amino acid replacements consistent with the action of positive selection.CONCLUSIONOur results suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation that is absent from current reference genomes.Large adaptive-introgressed CNVs at chromosomes 8p21.3 and 16p11.2 in Melanesians.The magnifying glasses highlight structural differences between the archaic (top) and reference (bottom) genomes. Neanderthal (red) and Denisovan (blue) haplotypes encompassing large CNVs occur at high frequencies in Melanesians (44 and 79%, respectively) but are absent (black) in all non-Melanesians. These CNVs create positively selected genes (TNFRSF10D1, TNFRSF10D2, and NPIPB16) that are absent from the reference genome.Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.

April 21, 2020 |

Genomic inversions and GOLGA core duplicons underlie disease instability at the 15q25 locus.

Human chromosome 15q25 is involved in several disease-associated structural rearrangements, including microdeletions and chromosomal markers with inverted duplications. Using comparative fluorescence in situ hybridization, strand-sequencing, single-molecule, real-time sequencing and Bionano optical mapping analyses, we investigated the organization of the 15q25 region in human and nonhuman primates. We found that two independent inversions occurred in this region after the fission event that gave rise to phylogenetic chromosomes XIV and XV in humans and great apes. One of these inversions is still polymorphic in the human population today and may confer differential susceptibility to 15q25 microdeletions and inverted duplications. The inversion breakpoints map within segmental duplications containing core duplicons of the GOLGA gene family and correspond to the site of an ancestral centromere, which became inactivated about 25 million years ago. The inactivation of this centromere likely released segmental duplications from recombination repression typical of centromeric regions. We hypothesize that this increased the frequency of ectopic recombination creating a hotspot of hominid inversions where dispersed GOLGA core elements now predispose this region to recurrent genomic rearrangements associated with disease.

April 21, 2020 |

The genome sequence of segmental allotetraploid peanut Arachis hypogaea.

Like many other crops, the cultivated peanut (Arachis hypogaea L.) is of hybrid origin and has a polyploid genome that contains essentially complete sets of chromosomes from two ancestral species. Here we report the genome sequence of peanut and show that after its polyploid origin, the genome has evolved through mobile-element activity, deletions and by the flow of genetic information between corresponding ancestral chromosomes (that is, homeologous recombination). Uniformity of patterns of homeologous recombination at the ends of chromosomes favors a single origin for cultivated peanut and its wild counterpart A. monticola. However, through much of the genome, homeologous recombination has created diversity. Using new polyploid hybrids made from the ancestral species, we show how this can generate phenotypic changes such as spontaneous changes in the color of the flowers. We suggest that diversity generated by these genetic mechanisms helped to favor the domestication of the polyploid A. hypogaea over other diploid Arachis species cultivated by humans.

Asset Tag: BAC sequencing

Sequencing and de novo assembly of the 17q21.31 disease associated region using long reads generated by Pacific Biosciences SMRT Sequencing technology.

Genomic Architecture of the KIR and MHC-B and -C Regions in Orangutan

SMRT Sequencing solutions for plant genomes and transcriptomes

Old school/new school genome sequencing: One step backward — a quantum leap forward.

Long read sequencing technology to solve complex genomic regions assembly in plants

PAG PacBio Workshop: Introducing 5 new high-quality PacBio genome assemblies for rice to help solve the 10-billion people question

Long-read sequencing for rare human genetic diseases.

Construction and comparison of three reference-quality genome assemblies for soybean.

BjuWRR1, a CC-NB-LRR gene identified in Brassica juncea, confers resistance to white rust caused by Albugo candida.

Centromeric Satellite DNAs: Hidden Sequence Variation in the Human Population.

Impact of Chromosomal Rearrangements on the Interpretation of Lupin Karyotype Evolution.

Crustacean Genome Exploration Reveals the Evolutionary Origin of White Spot Syndrome Virus.

Adaptive archaic introgression of copy number variants and the discovery of previously unknown human genes

Genomic inversions and GOLGA core duplicons underlie disease instability at the 15q25 locus.

The genome sequence of segmental allotetraploid peanut Arachis hypogaea.

Subscribe for blog updates:

Filter by topic

Talk with an expert

ALS case study

Subscribe for blog updates:

Filter by topic

Talk with an expert