Menu
July 19, 2019

The diversity, structure, and function of heritable adaptive immunity sequences in the Aedes aegypti genome.

The Aedes aegypti mosquito transmits arboviruses, including dengue, chikungunya, and Zika virus. Understanding the mechanisms underlying mosquito immunity could provide new tools to control arbovirus spread. Insects exploit two different RNAi pathways to combat viral and transposon infection: short interfering RNAs (siRNAs) and PIWI-interacting RNAs (piRNAs) [1, 2]. Endogenous viral elements (EVEs) are sequences from non-retroviral viruses that are inserted into the mosquito genome and can act as templates for the production of piRNAs [3, 4]. EVEs therefore represent a record of past infections and a reservoir of potential immune memory [5]. The large-scale organization of EVEs has been difficult to resolve with short-read sequencing because they tend to integrate into repetitive regions of the genome. To define the diversity, organization, and function of EVEs, we took advantage of the contiguity associated with long-read sequencing to generate a high-quality assembly of the Ae. aegypti-derived Aag2 cell line genome, an important and widely used model system. We show EVEs are acquired through recombination with specific classes of long terminal repeat (LTR) retrotransposons and organize into large loci (>50 kbp) characterized by high LTR density. These EVE-containing loci have increased density of piRNAs compared to similar regions without EVEs. Furthermore, we detected EVE-derived piRNAs consistent with a targeted processing of persistently infecting virus genomes. We propose that comparisons of EVEs across mosquito populations may explain differences in vector competence, and further study of the structure and function of these elements in the genome of mosquitoes may lead to epidemiological interventions. Copyright © 2017 Elsevier Ltd. All rights reserved.


July 19, 2019

Analysis of recombinational switching at the antigenic variation locus of the Lyme spirochete using a novel PacBio sequencing pipeline.

The Lyme disease spirochete evades the host immune system by combinatorial variation of VlsE, a surface antigen. Antigenic variation occurs via segmental gene conversion from contiguous silent cassettes into the vlsE locus. Because of the high degree of similarity between switch variants and the size of vlsE, short-read NGS technologies have been unsuitable for sequencing vlsE populations. Here we use PacBio sequencing technology coupled with the first fully-automated software pipeline (VAST) to accurately process NGS data by minimizing error frequency, eliminating heteroduplex errors and accurately aligning switch variants. We extend earlier studies by showing use of almost all of the vlsE SNP repertoire. In different tissues of the same mouse, 99.6% of the variants were unique, suggesting that dissemination of Borrelia burgdorferi is predominantly unidirectional with little tissue-to-tissue hematogenous dissemination. We also observed a similar number of variants in SCID and wild-type mice, a heatmap of location and frequency of amino acid changes on the 3D structure and note differences observed in SCID versus wild type mice that hint at possible amino acid function. Our observed selection against diversification of residues at the dimer interface in wild-type mice strongly suggests that dimerization is required for in vivo functionality of vlsE.© 2017 John Wiley & Sons Ltd.


July 19, 2019

The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum.

Common bread wheat, Triticum aestivum, has one of the most complex genomes known to science, with 6 copies of each chromosome, enormous numbers of near-identical sequences scattered throughout, and an overall haploid size of more than 15 billion bases. Multiple past attempts to assemble the genome have produced assemblies that were well short of the estimated genome size. Here we report the first near-complete assembly of T. aestivum, using deep sequencing coverage from a combination of short Illumina reads and very long Pacific Biosciences reads. The final assembly contains 15 344 693 583 bases and has a weighted average (N50) contig size of 232 659 bases. This represents by far the most complete and contiguous assembly of the wheat genome to date, providing a strong foundation for future genetic studies of this important food crop. We also report how we used the recently published genome of Aegilops tauschii, the diploid ancestor of the wheat D genome, to identify 4 179 762 575 bp of T. aestivum that correspond to its D genome components.© The Author 2017. Published by Oxford University Press.


July 19, 2019

Long-read genome sequence assembly provides insight into ongoing retroviral invasion of the koala germline.

The koala retrovirus (KoRV) is implicated in several diseases affecting the koala (Phascolarctos cinereus). KoRV provirus can be present in the genome of koalas as an endogenous retrovirus (present in all cells via germline integration) or as exogenous retrovirus responsible for somatic integrations of proviral KoRV (present in a limited number of cells). This ongoing invasion of the koala germline by KoRV provides a powerful opportunity to assess the viral strategies used by KoRV in an individual. Analysis of a high-quality genome sequence of a single koala revealed 133 KoRV integration sites. Most integrations contain full-length, endogenous provirus; KoRV-A subtype. The second most frequent integrations contain an endogenous recombinant element (recKoRV) in which most of the KoRV protein-coding region has been replaced with an ancient, endogenous retroelement. A third set of integrations, with very low sequence coverage, may represent somatic cell integrations of KoRV-A, KoRV-B and two recently designated additional subgroups, KoRV-D and KoRV-E. KoRV-D and KoRV-E are missing several genes required for viral processing, suggesting they have been transmitted as defective viruses. Our results represent the first comprehensive analyses of KoRV integration and variation in a single animal and provide further insights into the process of retroviral-host species interactions.


July 19, 2019

The Aegilops tauschii genome reveals multiple impacts of transposons.

Wheat is an important global crop with an extremely large and complex genome that contains more transposable elements (TEs) than any other known crop species. Here, we generated a chromosome-scale, high-quality reference genome of Aegilops tauschii, the donor of the wheat D genome, in which 92.5% sequences have been anchored to chromosomes. Using this assembly, we accurately characterized genic loci, gene expression, pseudogenes, methylation, recombination ratios, microRNAs and especially TEs on chromosomes. In addition to the discovery of a wave of very recent gene duplications, we detected that TEs occurred in about half of the genes, and found that such genes are expressed at lower levels than those without TEs, presumably because of their elevated methylation levels. We mapped all wheat molecular markers and constructed a high-resolution integrated genetic map corresponding to genome sequences, thereby placing previously detected agronomically important genes/quantitative trait loci (QTLs) on the Ae. tauschii genome for the first time.


July 19, 2019

Single-molecule sequencing reveals the chromosome-scale genomic architecture of the nematode model organism Pristionchus pacificus.

The nematode Pristionchus pacificus is an established model for integrative evolutionary biology and comparative studies with Caenorhabditis elegans. While an existing genome draft facilitated the identification of several genes controlling various developmental processes, its high degree of fragmentation complicated virtually all genomic analyses. Here, we present a de novo genome assembly from single-molecule, long-read sequencing data consisting of 135 P. pacificus contigs. When combined with a genetic linkage map, 99% of the assembly could be ordered and oriented into six chromosomes. This allowed us to robustly characterize chromosomal patterns of gene density, repeat content, nucleotide diversity, linkage disequilibrium, and macrosynteny in P. pacificus. Despite widespread conservation of synteny between P. pacificus and C. elegans, we identified one major translocation from an autosome to the sex chromosome in the lineage leading to C. elegans. This highlights the potential of the chromosome-scale assembly for future genomic studies of P. pacificus. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.


July 19, 2019

The evolution of dark matter in the mitogenome of seed beetles.

Animal mitogenomes are generally thought of as being economic and optimized for rapid replication and transcription. We use long-read sequencing technology to assemble the remarkable mitogenomes of four species of seed beetles. These are the largest circular mitogenomes ever assembled in insects, ranging from 24,496 to 26,613?bp in total length, and are exceptional in that some 40% consists of non-coding DNA. The size expansion is due to two very long intergenic spacers (LIGSs), rich in tandem repeats. The two LIGSs are present in all species but vary greatly in length (114-10,408?bp), show very low sequence similarity, divergent tandem repeat motifs, a very high AT content and concerted length evolution. The LIGSs have been retained for at least some 45 my but must have undergone repeated reductions and expansions, despite strong purifying selection on protein coding mtDNA genes. The LIGSs are located in two intergenic sites where a few recent studies of insects have also reported shorter LIGSs (>200?bp). These sites may represent spaces that tolerate neutral repeat array expansions or, alternatively, the LIGSs may function to allow a more economic translational machinery. Mitochondrial respiration in adult seed beetles is based almost exclusively on fatty acids, which reduces the need for building complex I of the oxidative phosphorylation pathway (NADH dehydrogenase). One possibility is thus that the LIGSs may allow depressed transcription of NAD genes. RNA sequencing showed that LIGSs are partly transcribed and transcriptional profiling suggested that all seven mtDNA NAD genes indeed show low levels of transcription and co-regulation of transcription across sexes and tissues.© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.


July 19, 2019

Comparative genomic analyses of Clavibacter michiganensis subsp. insidiosus and pathogenicity on Medicago truncatula.

Clavibacter michiganensis is the most economically important gram-positive bacterial plant pathogen with subspecies that cause serious diseases of maize, wheat, tomato, potato, and alfalfa. Much less is known about pathogenesis involving gram-positive plant pathogens than is known for gram-negative bacteria. Comparative genome analyses of C. michiganensis subspecies affecting tomato, potato, and maize have provided insights on pathogenicity. In this study, we identified strains of C. michiganensis subsp. insidiosus with contrasting pathogenicity on three accessions of the model legume Medicago truncatula. We generated complete genome sequences for two strains and compared these to a previously sequenced strain and genome sequences of four other subspecies. The three C. michiganensis subsp. insidiosus strains varied in gene content due to genome rearrangements, most likely facilitated by insertion elements, and plasmid number, which varied from one to three depending on strain. The core C. michiganensis genome consisted of 1,930 genes, with 401 genes unique to C. michiganensis subsp. insidiosus. An operon for synthesis of the extracellular blue pigment indigoidine, enzymes for pectin degradation, and an operon for inositol metabolism are among the unique features. Secreted serine proteases belonging to both the pat-1 and ppa families were present but highly diverged from those in other subspecies.


July 19, 2019

Centromere evolution and CpG methylation during vertebrate speciation.

Centromeres and large-scale structural variants evolve and contribute to genome diversity during vertebrate speciation. Here, we perform de novo long-read genome assembly of three inbred medaka strains that are derived from geographically isolated subpopulations and undergo speciation. Using single-molecule real-time (SMRT) sequencing, we obtain three chromosome-mapped genomes of length ~734, ~678, and ~744Mbp with a resource of twenty-two centromeric regions of length 20-345kbp. Centromeres are positionally conserved among the three strains and even between four pairs of chromosomes that were duplicated by the teleost-specific whole-genome duplication 320-350 million years ago. The centromeres do not all evolve at a similar pace; rather, centromeric monomers in non-acrocentric chromosomes evolve significantly faster than those in acrocentric chromosomes. Using methylation sensitive SMRT reads, we uncover centromeres are mostly hypermethylated but have hypomethylated sub-regions that acquire unique sequence compositions independently. These findings reveal the potential of non-acrocentric centromere evolution to contribute to speciation.


July 19, 2019

Firefly genomes illuminate parallel origins of bioluminescence in beetles.

Fireflies and their luminous courtships have inspired centuries of scientific study. Today firefly luciferase is widely used in biotechnology, but the evolutionary origin of bioluminescence within beetles remains unclear. To shed light on this long-standing question, we sequenced the genomes of two firefly species that diverged over 100 million-years-ago: the North American Photinus pyralis and Japanese Aquatica lateralis. To compare bioluminescent origins, we also sequenced the genome of a related click beetle, the Caribbean Ignelater luminosus, with bioluminescent biochemistry near-identical to fireflies, but anatomically unique light organs, suggesting the intriguing hypothesis of parallel gains of bioluminescence. Our analyses support independent gains of bioluminescence in fireflies and click beetles, and provide new insights into the genes, chemical defenses, and symbionts that evolved alongside their luminous lifestyle.© 2018, Fallon et al.


July 19, 2019

Structure and distribution of centromeric retrotransposons at diploid and allotetraploid Coffea centromeric and pericentromeric regions.

Centromeric regions of plants are generally composed of large array of satellites from a specific lineage ofGypsyLTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genusCoffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploidC. arabicagenome and its two diploid ancestors:Coffea canephoraandC. eugenioides. Ten distinct CRC (Centromeric Retrotransposons inCoffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains inC. canephora, C. eugenioides, andC. arabicaclearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions onC. arabicaandC. canephorachromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization inCoffea, contrasting with other plant genomes.


July 19, 2019

Coupling of single molecule, long read sequencing with IMGT/HighV-QUEST analysis expedites identification of SIV gp140-specific antibodies from scFv phage display libraries.

The simian immunodeficiency virus (SIV)/macaque model of human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome pathogenesis is critical for furthering our understanding of the role of antibody responses in the prevention of HIV infection, and will only increase in importance as macaque immunoglobulin (IG) gene databases are expanded. We have previously reported the construction of a phage display library from a SIV-infected rhesus macaque (Macaca mulatta) using oligonucleotide primers based on human IG gene sequences. Our previous screening relied on Sanger sequencing, which was inefficient and generated only a few dozen sequences. Here, we re-analyzed this library using single molecule, real-time (SMRT) sequencing on the Pacific Biosciences (PacBio) platform to generate thousands of highly accurate circular consensus sequencing (CCS) reads corresponding to full length single chain fragment variable. CCS data were then analyzed through the international ImMunoGeneTics information system®(IMGT®)/HighV-QUEST (www.imgt.org) to identify variable genes and perform statistical analyses. Overall the library was very diverse, with 2,569 different IMGT clonotypes called for the 5,238 IGHV sequences assigned to an IMGT clonotype. Within the library, SIV-specific antibodies represented a relatively limited number of clones, with only 135 different IMGT clonotypes called from 4,594 IGHV-assigned sequences. Our data did confirm that the IGHV4 and IGHV3 gene usage was the most abundant within the rhesus antibodies screened, and that these genes were even more enriched among SIV gp140-specific antibodies. Although a broad range of VH CDR3 amino acid (AA) lengths was observed in the unpanned library, the vast majority of SIV gp140-specific antibodies demonstrated a more uniform VH CDR3 length (20 AA). This uniformity was far less apparent when VH CDR3 were classified according to their clonotype (range: 9-25 AA), which we believe is more relevant for specific antibody identification. Only 174 IGKV and 588 IGLV clonotypes were identified within the VL sequences associated with SIV gp140-specific VH. Together, these data strongly suggest that the combination of SMRT sequencing with the IMGT/HighV-QUEST querying tool will facilitate and expedite our understanding of polyclonal antibody responses during SIV infection and may serve to rapidly expand the known scope of macaque V genes utilized during these responses.


July 19, 2019

Biomonitoring for traditional herbal medicinal products using DNA metabarcoding and single molecule, real-time sequencing.

Global concerns have been paid to the potential hazard of traditional herbal medicinal products (THMPs). Substandard and counterfeit THMPs, including traditional Chinese patent medicine, health foods, dietary supplements, etc. are potential threats to public health. Recent marketplace studies using DNA barcoding have determined that the current quality control methods are not sufficient for ensuring the presence of authentic herbal ingredients and detection of contaminants/adulterants. An efficient biomonitoring method for THMPs is of great needed. Herein, metabarcoding and single-molecule, real-time (SMRT) sequencing were used to detect the multiple ingredients in Jiuwei Qianghuo Wan (JWQHW), a classical herbal prescription widely used in China for the last 800 years. Reference experimental mixtures and commercial JWQHW products from the marketplace were used to confirm the method. Successful SMRT sequencing results recovered 5416 and 4342 circular-consensus sequencing (CCS) reads belonging to the ITS2 and psbA-trnH regions. The results suggest that with the combination of metabarcoding and SMRT sequencing, it is repeatable, reliable, and sensitive enough to detect species in the THMPs, and the error in SMRT sequencing did not affect the ability to identify multiple prescribed species and several adulterants/contaminants. It has the potential for becoming a valuable tool for the biomonitoring of multi-ingredient THMPs.


July 19, 2019

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.


July 19, 2019

Herbivorous turtle ants obtain essential nutrients from a conserved nitrogen-recycling gut microbiome.

Nitrogen acquisition is a major challenge for herbivorous animals, and the repeated origins of herbivory across the ants have raised expectations that nutritional symbionts have shaped their diversification. Direct evidence for N provisioning by internally housed symbionts is rare in animals; among the ants, it has been documented for just one lineage. In this study we dissect functional contributions by bacteria from a conserved, multi-partite gut symbiosis in herbivorous Cephalotes ants through in vivo experiments, metagenomics, and in vitro assays. Gut bacteria recycle urea, and likely uric acid, using recycled N to synthesize essential amino acids that are acquired by hosts in substantial quantities. Specialized core symbionts of 17 studied Cephalotes species encode the pathways directing these activities, and several recycle N in vitro. These findings point to a highly efficient N economy, and a nutritional mutualism preserved for millions of years through the derived behaviors and gut anatomy of Cephalotes ants.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.