April 21, 2020  |  

Chromosome-length haplotigs for yak and cattle from trio binning assembly of an F1 hybrid

Background Assemblies of diploid genomes are generally unphased, pseudo-haploid representations that do not correctly reconstruct the two parental haplotypes present in the individual sequenced. Instead, the assembly alternates between parental haplotypes and may contain duplications in regions where the parental haplotypes are sufficiently different. Trio binning is an approach to genome assembly that uses short reads from both parents to classify long reads from the offspring according to maternal or paternal haplotype origin, and is thus helped rather than impeded by heterozygosity. Using this approach, it is possible to derive two assemblies from an individual, accurately representing both parental contributions in their entirety with higher continuity and accuracy than is possible with other methods.Results We used trio binning to assemble reference genomes for two species from a single individual using an interspecies cross of yak (Bos grunniens) and cattle (Bos taurus). The high heterozygosity inherent to interspecies hybrids allowed us to confidently assign >99% of long reads from the F1 offspring to parental bins using unique k-mers from parental short reads. Both the maternal (yak) and paternal (cattle) assemblies contain over one third of the acrocentric chromosomes, including the two largest chromosomes, in single haplotigs.Conclusions These haplotigs are the first vertebrate chromosome arms to be assembled gap-free and fully phased, and the first time assemblies for two species have been created from a single individual. Both assemblies are the most continuous currently available for non-model vertebrates.MbmegabaseskbkilobasesMYAmillions of years agoMHCmajor histocompatibility complexSMRTsingle molecule real time


April 21, 2020  |  

Insect genomes: progress and challenges.

In the wake of constant improvements in sequencing technologies, numerous insect genomes have been sequenced. Currently, 1219 insect genome-sequencing projects have been registered with the National Center for Biotechnology Information, including 401 that have genome assemblies and 155 with an official gene set of annotated protein-coding genes. Comparative genomics analysis showed that the expansion or contraction of gene families was associated with well-studied physiological traits such as immune system, metabolic detoxification, parasitism and polyphagy in insects. Here, we summarize the progress of insect genome sequencing, with an emphasis on how this impacts research on pest control. We begin with a brief introduction to the basic concepts of genome assembly, annotation and metrics for evaluating the quality of draft assemblies. We then provide an overview of genome information for numerous insect species, highlighting examples from prominent model organisms, agricultural pests and disease vectors. We also introduce the major insect genome databases. The increasing availability of insect genomic resources is beneficial for developing alternative pest control methods. However, many opportunities remain for developing data-mining tools that make maximal use of the available insect genome resources. Although rapid progress has been achieved, many challenges remain in the field of insect genomics. © 2019 The Royal Entomological Society.


April 21, 2020  |  

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system

Background A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. Results The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ~20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ~36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. Conclusions We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


April 21, 2020  |  

Extensive intraspecific gene order and gene structural variations in upland cotton cultivars.

Multiple cotton genomes (diploid and tetraploid) have been assembled. However, genomic variations between cultivars of allotetraploid upland cotton (Gossypium hirsutum L.), the most widely planted cotton species in the world, remain unexplored. Here, we use single-molecule long read and Hi-C sequencing technologies to assemble genomes of the two upland cotton cultivars TM-1 and zhongmiansuo24 (ZM24). Comparisons among TM-1 and ZM24 assemblies and the genomes of the diploid ancestors reveal a large amount of genetic variations. Among them, the top three longest structural variations are located on chromosome A08 of the tetraploid upland cotton, which account for ~30% total length of this chromosome. Haplotype analyses of the mapping population derived from these two cultivars and the germplasm panel show suppressed recombination rates in this region. This study provides additional genomic resources for the community, and the identified genetic variations, especially the reduced meiotic recombination on chromosome A08, will help future breeding.


April 21, 2020  |  

Whole genome sequence and de novo assembly revealed genomic architecture of Indian Mithun (Bos frontalis).

Mithun (Bos frontalis), also called gayal, is an endangered bovine species, under the tribe bovini with 2n?=?58 XX chromosome complements and reared under the tropical rain forests region of India, China, Myanmar, Bhutan and Bangladesh. However, the origin of this species is still disputed and information on its genomic architecture is scanty so far. We trust that availability of its whole genome sequence data and assembly will greatly solve this problem and help to generate many information including phylogenetic status of mithun. Recently, the first genome assembly of gayal, mithun of Chinese origin, was published. However, an improved reference genome assembly would still benefit in understanding genetic variation in mithun populations reared under diverse geographical locations and for building a superior consensus assembly. We, therefore, performed deep sequencing of the genome of an adult female mithun from India, assembled and annotated its genome and performed extensive bioinformatic analyses to produce a superior de novo genome assembly of mithun.We generated ˜300 Gigabyte (Gb) raw reads from whole-genome deep sequencing platforms and assembled the sequence data using a hybrid assembly strategy to create a high quality de novo assembly of mithun with 96% recovered as per BUSCO analysis. The final genome assembly has a total length of 3.0 Gb, contains 5,015 scaffolds with an N50 value of 1?Mb. Repeat sequences constitute around 43.66% of the assembly. The genomic alignments between mithun to cattle showed that their genomes, as expected, are highly conserved. Gene annotation identified 28,044 protein-coding genes presented in mithun genome. The gene orthologous groups of mithun showed a high degree of similarity in comparison with other species, while fewer mithun specific coding sequences were found compared to those in cattle.Here we presented the first de novo draft genome assembly of Indian mithun having better coverage, less fragmented, better annotated, and constitutes a reasonably complete assembly compared to the previously published gayal genome. This comprehensive assembly unravelled the genomic architecture of mithun to a great extent and will provide a reference genome assembly to research community to elucidate the evolutionary history of mithun across its distinct geographical locations.


April 21, 2020  |  

A high-quality de novo genome assembly from a single mosquito using PacBio sequencing

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


September 22, 2019  |  

Chromosome-level reference genome and alternative splicing atlas of moso bamboo (Phyllostachys edulis).

Bamboo is one of the most important nontimber forestry products worldwide. However, a chromosome-level reference genome is lacking, and an evolutionary view of alternative splicing (AS) in bamboo remains unclear despite emerging omics data and improved technologies.Here, we provide a chromosome-level de novo genome assembly of moso bamboo (Phyllostachys edulis) using additional abundance sequencing data and a Hi-C scaffolding strategy. The significantly improved genome is a scaffold N50 of 79.90 Mb, approximately 243 times longer than the previous version. A total of 51,074 high-quality protein-coding loci with intact structures were identified using single-molecule real-time sequencing and manual verification. Moreover, we provide a comprehensive AS profile based on the identification of 266,711 unique AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and Pacific Biosciences sequencing platforms. Through comparisons with orthologous genes in related plant species, we observed that the AS genes are concentrated among more conserved genes that tend to accumulate higher transcript levels and share less tissue specificity. Furthermore, gene family expansion, abundant AS, and positive selection were identified in crucial genes involved in the lignin biosynthetic pathway of moso bamboo.These fundamental studies provide useful information for future in-depth analyses of comparative genome and AS features. Additionally, our results highlight a global perspective of AS during evolution and diversification in bamboo.


September 22, 2019  |  

Insights into platypus population structure and history from whole-genome sequencing.

The platypus is an egg-laying mammal which, alongside the echidna, occupies a unique place in the mammalian phylogenetic tree. Despite widespread interest in its unusual biology, little is known about its population structure or recent evolutionary history. To provide new insights into the dispersal and demographic history of this iconic species, we sequenced the genomes of 57 platypuses from across the whole species range in eastern mainland Australia and Tasmania. Using a highly improved reference genome, we called over 6.7?M SNPs, providing an informative genetic data set for population analyses. Our results show very strong population structure in the platypus, with our sampling locations corresponding to discrete groupings between which there is no evidence for recent gene flow. Genome-wide data allowed us to establish that 28 of the 57 sampled individuals had at least a third-degree relative among other samples from the same river, often taken at different times. Taking advantage of a sampled family quartet, we estimated the de novo mutation rate in the platypus at 7.0?×?10-9/bp/generation (95% CI 4.1?×?10-9-1.2?×?10-8/bp/generation). We estimated effective population sizes of ancestral populations and haplotype sharing between current groupings, and found evidence for bottlenecks and long-term population decline in multiple regions, and early divergence between populations in different regions. This study demonstrates the power of whole-genome sequencing for studying natural populations of an evolutionarily important species.


September 22, 2019  |  

Evidence of non-tandemly repeated rDNAs and their intragenomic heterogeneity in Rhizophagus irregularis

Arbuscular mycorrhizal fungus (AMF) species are some of the most widespread symbionts of land plants. Our much improved reference genome assembly of a model AMF, Rhizophagus irregularis DAOM-181602 (total contigs?=?210), facilitated a discovery of repetitive elements with unusual characteristics. R. irregularis has only ten or 11 copies of complete 45S rDNAs, whereas the general eukaryotic genome has tens to thousands of rDNA copies. R. irregularis rDNAs are highly heterogeneous and lack a tandem repeat structure. These findings provide evidence for the hypothesis that rDNA heterogeneity depends on the lack of tandem repeat structures. RNA-Seq analysis confirmed that all rDNA variants are actively transcribed. Observed rDNA/rRNA polymorphisms may modulate translation by using different ribosomes depending on biotic and abiotic interactions. The non-tandem repeat structure and intragenomic heterogeneity of AMF rDNA/rRNA may facilitate successful adaptation to various environmental conditions, increasing host compatibility of these symbiotic fungi.


September 22, 2019  |  

A reference genome of the Chinese hamster based on a hybrid assembly strategy.

Accurate and complete genome sequences are essential in biotechnology to facilitate genome-based cell engineering efforts. The current genome assemblies for Cricetulus griseus, the Chinese hamster, are fragmented and replete with gap sequences and misassemblies, consistent with most short-read-based assemblies. Here, we completely resequenced C. griseus using single molecule real time sequencing and merged this with Illumina-based assemblies. This generated a more contiguous and complete genome assembly than either technology alone, reducing the number of scaffolds by >28-fold, with 90% of the sequence in the 122 longest scaffolds. Most genes are now found in single scaffolds, including up- and downstream regulatory elements, enabling improved study of noncoding regions. With >95% of the gap sequence filled, important Chinese hamster ovary cell mutations have been detected in draft assembly gaps. This new assembly will be an invaluable resource for continued basic and pharmaceutical research.© 2018 The Authors. Biotechnology and Bioengineering Published by Wiley Periodicals, Inc.


September 22, 2019  |  

A complete Leishmania donovani reference genome identifies novel genetic variations associated with virulence.

Leishmania donovani is responsible for visceral leishmaniasis, a neglected and lethal parasitic disease with limited treatment options and no vaccine. The study of L. donovani has been hindered by the lack of a high-quality reference genome and this can impact experimental outcomes including the identification of virulence genes, drug targets and vaccine development. We therefore generated a complete genome assembly by deep sequencing using a combination of second generation (Illumina) and third generation (PacBio) sequencing technologies. Compared to the current L. donovani assembly, the genome assembly reported within resulted in the closure over 2,000 gaps, the extension of several chromosomes up to telomeric repeats and the re-annotation of close to 15% of protein coding genes and the annotation of hundreds of non-coding RNA genes. It was possible to correctly assemble the highly repetitive A2 and Amastin virulence gene clusters. A comparative sequence analysis using the improved reference genome confirmed 70 published and identified 15 novel genomic differences between closely related visceral and atypical cutaneous disease-causing L. donovani strains providing a more complete map of genes associated with virulence and visceral organ tropism. Bioinformatic tools including protein variation effect analyzer and basic local alignment search tool were used to prioritize a list of potential virulence genes based on mutation severity, gene conservation and function. This complete genome assembly and novel information on virulence factors will support the identification of new drug targets and the development of a vaccine for L. donovani.


September 22, 2019  |  

Improved reference genome for the domestic horse increases assembly contiguity and composition.

Recent advances in genomic sequencing technology and computational assembly methods have allowed scientists to improve reference genome assemblies in terms of contiguity and composition. EquCab2, a reference genome for the domestic horse, was released in 2007. Although of equal or better quality compared to other first-generation Sanger assemblies, it had many of the shortcomings common to them. In 2014, the equine genomics research community began a project to improve the reference sequence for the horse, building upon the solid foundation of EquCab2 and incorporating new short-read data, long-read data, and proximity ligation data. Here, we present EquCab3. The count of non-N bases in the incorporated chromosomes is improved from 2.33?Gb in EquCab2 to 2.41?Gb in EquCab3. Contiguity has also been improved nearly 40-fold with a contig N50 of 4.5?Mb and scaffold contiguity enhanced to where all but one of the 32 chromosomes is comprised of a single scaffold.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.