January 16, 2019

PAG Conference: Reference-quality drosophila genome assemblies for evolutionary analysis of previously inaccessible genomic regions

In this presentation, Andrew Clark from Cornell University describes work from a collaboration with Manyuan Long of the University of Chicago and Rod Wing of the University of Arizona to look at heterochromatic regions with long simple satellite repeats in drosophila genomes. The group used PacBio sequencing to create new genome assemblies of 10 drosophila species, including de novo assemblies of two individual flies using as little as 26 ng of gDNA.

March 22, 2016

Filling in the gap of human chromosome 4: Single Molecule Real Time sequencing of macrosatellite repeats in the facioscapulohumeral muscular dystrophy locus.

A majority of facioscapulohumeral muscular dystrophy (FSHD) is caused by contraction of macrosatellite repeats called D4Z4 that are located in the subtelomeric region of human chromosome 4q35. Sequencing the FSHD locus has been technically challenging due to its long size and nearly identical nature of repeat elements. Here we report sequencing and partial assembly of a BAC clone carrying an entire FSHD locus by a single molecule real time (SMRT) sequencing technology which could produce long reads up to about 18 kb containing D4Z4 repeats. De novo assembly by Hierarchical Genome Assembly Process 1 (HGAP.1) yielded a contig of 41…

November 1, 2015

Fc? receptors: genetic variation, function, and disease.

Fc? receptors (Fc?Rs) are key immune receptors responsible for the effective control of both humoral and innate immunity and are central to maintaining the balance between generating appropriate responses to infection and preventing autoimmunity. When this balance is lost, pathology results in increased susceptibility to cancer, autoimmunity, and infection. In contrast, optimal Fc?R engagement facilitates effective disease resolution and response to monoclonal antibody immunotherapy. The underlying genetics of the Fc?R gene family are a central component of this careful balance. Complex in humans and generated through ancestral duplication events, here we review the evolution of the gene family in mammals,…

March 19, 2015

PacBio-LITS: a large-insert targeted sequencing method for characterization of human disease-associated chromosomal structural variations.

Generation of long (>5 Kb) DNA sequencing reads provides an approach for interrogation of complex regions in the human genome. Currently, large-insert whole genome sequencing (WGS) technologies from Pacific Biosciences (PacBio) enable analysis of chromosomal structural variations (SVs), but the cost to achieve the required sequence coverage across the entire human genome is high.We developed a method (termed PacBio-LITS) that combines oligonucleotide-based DNA target-capture enrichment technologies with PacBio large-insert library preparation to facilitate SV studies at specific chromosomal regions. PacBio-LITS provides deep sequence coverage at the specified sites at substantially reduced cost compared with PacBio WGS. The efficacy of PacBio-LITS…

February 26, 2015

Analysis of the complete Mycoplasma hominis LBD-4 genome sequence reveals strain-variable prophage insertion and distinctive repeat-containing surface protein arrangements.

The complete genome sequence of Mycoplasma hominis LBD-4 has been determined and the gene content ascribed. The 715,165-bp chromosome contains 620 genes, including 14 carried by a strain-variable prophage genome related to Mycoplasma fermentans MFV-1 and Mycoplasma arthritidis MAV-1. Comparative analysis with the genome of M. hominis PG21(T) reveals distinctive arrangements of repeat-containing surface proteins. Copyright © 2015 Calcutt and Foecking.

February 12, 2015

A unique chromatin complex occupies young a-satellite arrays of human centromeres.

The intractability of homogeneous a-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric a-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized a-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100-base pair (bp) DNA wraps in tandem…

November 1, 2014

Potential impact on kidney infection: a whole-genome analysis of Leptospira santarosai serovar Shermani.

Leptospira santarosai serovar Shermani is the most frequently encountered serovar, and it causes leptospirosis and tubulointerstitial nephritis in Taiwan. This study aims to complete the genome sequence of L. santarosai serovar Shermani and analyze the transcriptional responses of L. santarosai serovar Shermani to renal tubular cells. To assemble this highly repetitive genome, we combined reads that were generated from four next-generation sequencing platforms by using hybrid assembly approaches to finish two-chromosome contiguous sequences without gaps by validating the data with optical restriction maps and Sanger sequencing. Whole-genome comparison studies revealed a 28-kb region containing genes that encode transposases and hypothetical…

August 28, 2014

The architecture of a scrambled genome reveals massivel levels of genomic rearrangement during development.

Programmed DNA rearrangements in the single-celled eukaryote Oxytricha trifallax completely rewire its germline into a somatic nucleus during development. This elaborate, RNA-mediated pathway eliminates noncoding DNA sequences that interrupt gene loci and reorganizes the remaining fragments by inversions and permutations to produce functional genes. Here, we report the Oxytricha germline genome and compare it to the somatic genome to present a global view of its massive scale of genome rearrangements. The remarkably encrypted genome architecture contains >3,500 scrambled genes, as well as >800 predicted germline-limited genes expressed, and some posttranslationally modified, during genome rearrangements. Gene segments for different somatic loci…

June 1, 2014

Co-option of Sox3 as the male-determining factor on the Y chromosome in the fish Oryzias dancena.

Sex chromosomes harbour a primary sex-determining signal that triggers sexual development of the organism. However, diverse sex chromosome systems have been evolved in vertebrates. Here we use positional cloning to identify the sex-determining locus of a medaka-related fish, Oryzias dancena, and find that the locus on the Y chromosome contains a cis-regulatory element that upregulates neighbouring Sox3 expression in developing gonad. Sex-reversed phenotypes in Sox3(Y) transgenic fish, and Sox3(Y) loss-of-function mutants all point to its critical role in sex determination. Furthermore, we demonstrate that Sox3 initiates testicular differentiation by upregulating expression of downstream Gsdf, which is highly conserved in fish…

May 1, 2014

Complete sequences of organelle genomes from the medicinal plant Rhazya stricta (Apocynaceae) and contrasting patterns of mitochondrial genome evolution across asterids.

Rhazya stricta is native to arid regions in South Asia and the Middle East and is used extensively in folk medicine to treat a wide range of diseases. In addition to generating genomic resources for this medicinally important plant, analyses of the complete plastid and mitochondrial genomes and a nuclear transcriptome from Rhazya provide insights into inter-compartmental transfers between genomes and the patterns of evolution among eight asterid mitochondrial genomes.The 154,841 bp plastid genome is highly conserved with gene content and order identical to the ancestral organization of angiosperms. The 548,608 bp mitochondrial genome exhibits a number of phenomena including the presence…

March 15, 2014

Rapid detection of expanded short tandem repeats in personal genomics using hybrid sequencing.

Long expansions of short tandem repeats (STRs), i.e. DNA repeats of 2-6 nt, are associated with some genetic diseases. Cost-efficient high-throughput sequencing can quickly produce billions of short reads that would be useful for uncovering disease-associated STRs. However, enumerating STRs in short reads remains largely unexplored because of the difficulty in elucidating STRs much longer than 100 bp, the typical length of short reads.We propose ab initio procedures for sensing and locating long STRs promptly by using the frequency distribution of all STRs and paired-end read information. We validated the reproducibility of this method using biological replicates and used it…

January 1, 2014

PacBio sequencing of gene families – a case study with wheat gluten genes.

Amino acids in wheat (Triticum aestivum) seeds mainly accumulate in storage proteins called gliadins and glutenins. Gliadins contain a/ß-, ?- and ?-types whereas glutenins contain HMW- and LMW-types. Known gliadin and glutenin sequences were largely determined through cloning and sequencing by capillary electrophoresis. This time-consuming process prevents us to intensively study the variation of each orthologous gene copy among cultivars. The throughput and sequencing length of Pacific Bioscience RS (PacBio) single molecule sequencing platform make it feasible to construct contiguous and non-chimeric RNA sequences. We assembled 424 wheat storage protein transcripts from ten wheat cultivars by using just one single-molecule-real-time…

January 1, 2014

Genome reference and sequence variation in the large repetitive central exon of human MUC5AC.

Despite modern sequencing efforts, the difficulty in assembly of highly repetitive sequences has prevented resolution of human genome gaps, including some in the coding regions of genes with important biological functions. One such gene, MUC5AC, encodes a large, secreted mucin, which is one of the two major secreted mucins in human airways. The MUC5AC region contains a gap in the human genome reference (hg19) across the large, highly repetitive, and complex central exon. This exon is predicted to contain imperfect tandem repeat sequences and multiple conserved cysteine-rich (CysD) domains. To resolve the MUC5AC genomic gap, we used high-fidelity long PCR…

