Menu
April 21, 2020

Multi-platform discovery of haplotype-resolved structural variation in human genomes.

The incomplete identification of structural variants (SVs) from whole-genome sequencing data limits studies of human genetic diversity and disease association. Here, we apply a suite of long-read, short-read, strand-specific sequencing technologies, optical mapping, and variant discovery algorithms to comprehensively analyze three trios to define the full spectrum of human genetic variation in a haplotype-resolved manner. We identify 818,054 indel variants (<50?bp) and 27,622 SVs (=50?bp) per genome. We also discover 156 inversions per genome and 58 of the inversions intersect with the critical regions of recurrent microdeletion and microduplication syndromes. Taken together, our SV callsets represent a three to sevenfold increase in SV detection compared to most standard high-throughput sequencing studies, including those from the 1000 Genomes Project. The methods and the dataset presented serve as a gold standard for the scientific community allowing us to make recommendations for maximizing structural variation sensitivity for future genome sequencing studies.


April 21, 2020

Complete Assembly of the Genome of an Acidovorax citrulli Strain Reveals a Naturally Occurring Plasmid in This Species.

Acidovorax citrulli is the causal agent of bacterial fruit blotch (BFB), a serious threat to cucurbit crop production worldwide. Based on genetic and phenotypic properties, A. citrulli strains are divided into two major groups: group I strains have been generally isolated from melon and other non-watermelon cucurbits, while group II strains are closely associated with watermelon. In a previous study, we reported the genome of the group I model strain, M6. At that time, the M6 genome was sequenced by MiSeq Illumina technology, with reads assembled into 139 contigs. Here, we report the assembly of the M6 genome following sequencing with PacBio technology. This approach not only allowed full assembly of the M6 genome, but it also revealed the occurrence of a ~53 kb plasmid. The M6 plasmid, named pACM6, was further confirmed by plasmid extraction, Southern-blot analysis of restricted fragments and obtention of M6-derivative cured strains. pACM6 occurs at low copy numbers (average of ~4.1 ± 1.3 chromosome equivalents) in A. citrulli M6 and contains 63 open reading frames (ORFs), most of which (55.6%) encoding hypothetical proteins. The plasmid contains several genes encoding type IV secretion components, and typical plasmid-borne genes involved in plasmid maintenance, replication and transfer. The plasmid also carries an operon encoding homologs of a Fic-VbhA toxin-antitoxin (TA) module. Transcriptome data from A. citrulli M6 revealed that, under the tested conditions, the genes encoding the components of this TA system are among the highest expressed genes in pACM6. Whether this TA module plays a role in pACM6 maintenance is still to be determined. Leaf infiltration and seed transmission assays revealed that, under tested conditions, the loss of pACM6 did not affect the virulence of A. citrulli M6. We also show that pACM6 or similar plasmids are present in several group I strains, but absent in all tested group II strains of A. citrulli.


April 21, 2020

Mobilome of Brevibacterium aurantiacum Sheds Light on Its Genetic Diversity and Its Adaptation to Smear-Ripened Cheeses.

Brevibacterium aurantiacum is an actinobacterium that confers key organoleptic properties to washed-rind cheeses during the ripening process. Although this industrially relevant species has been gaining an increasing attention in the past years, its genome plasticity is still understudied due to the unavailability of complete genomic sequences. To add insights on the mobilome of this group, we sequenced the complete genomes of five dairy Brevibacterium strains and one non-dairy strain using PacBio RSII. We performed phylogenetic and pan-genome analyses, including comparisons with other publicly available Brevibacterium genomic sequences. Our phylogenetic analysis revealed that these five dairy strains, previously identified as Brevibacterium linens, belong instead to the B. aurantiacum species. A high number of transposases and integrases were observed in the Brevibacterium spp. strains. In addition, we identified 14 and 12 new insertion sequences (IS) in B. aurantiacum and B. linens genomes, respectively. Several stretches of homologous DNA sequences were also found between B. aurantiacum and other cheese rind actinobacteria, suggesting horizontal gene transfer (HGT). A HGT region from an iRon Uptake/Siderophore Transport Island (RUSTI) and an iron uptake composite transposon were found in five B. aurantiacum genomes. These findings suggest that low iron availability in milk is a driving force in the adaptation of this bacterial species to this niche. Moreover, the exchange of iron uptake systems suggests cooperative evolution between cheese rind actinobacteria. We also demonstrated that the integrative and conjugative element BreLI (Brevibacterium Lanthipeptide Island) can excise from B. aurantiacum SMQ-1417 chromosome. Our comparative genomic analysis suggests that mobile genetic elements played an important role into the adaptation of B. aurantiacum to cheese ecosystems.


April 21, 2020

A reference-grade wild soybean genome.

Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2?Mb and a contig N50 of 3.3?Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.


April 21, 2020

Closing the Yield Gap for Cannabis: A Meta-Analysis of Factors Determining Cannabis Yield.

Until recently, the commercial production of Cannabis sativa was restricted to varieties that yielded high-quality fiber while producing low levels of the psychoactive cannabinoid tetrahydrocannabinol (THC). In the last few years, a number of jurisdictions have legalized the production of medical and/or recreational cannabis with higher levels of THC, and other jurisdictions seem poised to follow suit. Consequently, demand for industrial-scale production of high yield cannabis with consistent cannabinoid profiles is expected to increase. In this paper we highlight that currently, projected annual production of cannabis is based largely on facility size, not yield per square meter. This meta-analysis of cannabis yields reported in scientific literature aimed to identify the main factors contributing to cannabis yield per plant, per square meter, and per W of lighting electricity. In line with previous research we found that variety, plant density, light intensity and fertilization influence cannabis yield and cannabinoid content; we also identified pot size, light type and duration of the flowering period as predictors of yield and THC accumulation. We provide insight into the critical role of light intensity, quality, and photoperiod in determining cannabis yields, with particular focus on the potential for light-emitting diodes (LEDs) to improve growth and reduce energy requirements. We propose that the vast amount of genomics data currently available for cannabis can be used to better understand the effect of genotype on yield. Finally, we describe diversification that is likely to emerge in cannabis growing systems and examine the potential role of plant-growth promoting rhizobacteria (PGPR) for growth promotion, regulation of cannabinoid biosynthesis, and biocontrol.


April 21, 2020

Analysis of genetic diversity of Xanthomonas oryzae pv. oryzae populations in Taiwan.

Rice bacterial blight caused by Xanthomonas oryzae pv. oryzae (Xoo) is a major rice disease. In Taiwan, the tropical indica type of Oryza sativa originally grown in this area is mix-cultivated with the temperate japonica type of O. sativa, and this might have led to adaptive changes of both rice host and Xoo isolates. In order to better understand how Xoo adapts to this unique environment, we collected and analyzed fifty-one Xoo isolates in Taiwan. Three different genetic marker systems consistently identified five groups. Among these groups, two of them had unique sequences in the last acquired ten spacers in the clustered regularly interspaced short palindromic repeats (CRISPR) region, and the other two had sequences that were similar to the Japanese isolate MAFF311018 and the Philippines isolate PXO563, respectively. The genomes of two Taiwanese isolates with unique CRISPR sequence features, XF89b and XM9, were further completely sequenced. Comparison of the genome sequences suggested that XF89b is phylogenetically close to MAFF311018, and XM9 is close to PXO563. Here, documentation of the diversity of groups of Xoo in Taiwan provides evidence of the populations from different sources and hitherto missing information regarding distribution of Xoo populations in East Asia.


April 21, 2020

Chromosome-level assembly of the water buffalo genome surpasses human and goat genomes in sequence contiguity.

Rapid innovation in sequencing technologies and improvement in assembly algorithms have enabled the creation of highly contiguous mammalian genomes. Here we report a chromosome-level assembly of the water buffalo (Bubalus bubalis) genome using single-molecule sequencing and chromatin conformation capture data. PacBio Sequel reads, with a mean length of 11.5?kb, helped to resolve repetitive elements and generate sequence contiguity. All five B. bubalis sub-metacentric chromosomes were correctly scaffolded with centromeres spanned. Although the index animal was partly inbred, 58% of the genome was haplotype-phased by FALCON-Unzip. This new reference genome improves the contig N50 of the previous short-read based buffalo assembly more than a thousand-fold and contains only 383 gaps. It surpasses the human and goat references in sequence contiguity and facilitates the annotation of hard to assemble gene clusters such as the major histocompatibility complex (MHC).


April 21, 2020

Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1.

Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (>?1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method.We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow.We assembled a complete mt genome sequence of 17,675 bp, which is >?4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat.We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such ‘new’ repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.


April 21, 2020

Cichorium intybus L.?×?Cicerbita alpina Walbr.: doubled haploid chicory induction and CENH3 characterization

Intergeneric hybridization between industrial chicory (Cichorium intybus L.) and Cicerbita alpina Walbr. induces interspecific hybrids and haploid chicory plants after in vitro embryo rescue. The protocol yielded haploids in 5 out of 12 cultivars pollinated; altogether 18 haploids were regenerated from 2836 embryos, with a maximum efficiency of 1.96% haploids per cross. Obtained haploids were chromosome doubled with mitosis inhibitors trifluralin and oryzalin; exposure to 0.05 g L-1 oryzalin during one week was the most efficient treatment to regenerate doubled haploids. Inbreeding effects in vitro were limited, but the ploidy level affects morphology. Transcriptome sequencing revealed two unique copies of CENH3 in Cicerbita alpina Walbr. Comparison of CENH3.1 protein sequences of Cicerbita and Cichorium obtained through transcriptome and whole shotgun genome sequencing revealed two amino-acid substitutions at critical residues of the histone fold domain. These particular changes cause chromosome elimination and reduced centromere loading in several other species and might indicate a CENH3-dependent mechanism causing chromosome elimination of parental chromosomes during Cichorium?×?Cicerbita intergeneric hybridization. Our results provide insights in chromosome elimination and might increase the efficiency of haploid induction in Cichorium.


April 21, 2020

Whole genomes and transcriptomes reveal adaptation and domestication of pistachio.

Pistachio (Pistacia vera), one of the most important commercial nut crops worldwide, is highly adaptable to abiotic stresses and is tolerant to drought and salt stresses.Here, we provide a draft de novo genome of pistachio as well as large-scale genome resequencing. Comparative genomic analyses reveal stress adaptation of pistachio is likely attributable to the expanded cytochrome P450 and chitinase gene families. Particularly, a comparative transcriptomic analysis shows that the jasmonic acid (JA) biosynthetic pathway plays an important role in salt tolerance in pistachio. Moreover, we resequence 93 cultivars and 14 wild P. vera genomes and 35 closely related wild Pistacia genomes, to provide insights into population structure, genetic diversity, and domestication. We find that frequent genetic admixture occurred among the different wild Pistacia species. Comparative population genomic analyses reveal that pistachio was domesticated about 8000?years ago and suggest that key genes for domestication related to tree and seed size experienced artificial selection.Our study provides insight into genetic underpinning of local adaptation and domestication of pistachio. The Pistacia genome sequences should facilitate future studies to understand the genetic basis of agronomically and environmentally related traits of desert crops.


April 21, 2020

A high-quality de novo genome assembly from a single mosquito using PacBio sequencing

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


April 21, 2020

Full-length transcript sequencing and comparative transcriptomic analysis to evaluate the contribution of osmotic and ionic stress components towards salinity tolerance in the roots of cultivated alfalfa (Medicago sativa L.).

Alfalfa is the most extensively cultivated forage legume. Salinity is a major environmental factor that impacts on alfalfa’s productivity. However, little is known about the molecular mechanisms underlying alfalfa responses to salinity, especially the relative contribution of the two important components of osmotic and ionic stress.In this study, we constructed the first full-length transcriptome database for alfalfa root tips under continuous NaCl and mannitol treatments for 1, 3, 6, 12, and 24?h (three biological replicates for each time points, including the control group) via PacBio Iso-Seq. This resulted in the identification of 52,787 full-length transcripts, with an average length of 2551?bp. Global transcriptional changes in the same 33 stressed samples were then analyzed via BGISEQ-500 RNA-Seq. Totals of 8861 NaCl-regulated and 8016 mannitol-regulated differentially expressed genes (DEGs) were identified. Metabolic analyses revealed that these DEGs overlapped or diverged in the cascades of molecular networks involved in signal perception, signal transduction, transcriptional regulation, and antioxidative defense. Notably, several well characterized signalling pathways, such as CDPK, MAPK, CIPK, and PYL-PP2C-SnRK2, were shown to be involved in osmotic stress, while the SOS core pathway was activated by ionic stress. Moreover, the physiological shifts of catalase and peroxidase activity, glutathione and proline content were in accordance with dynamic transcript profiles of the relevant genes, indicating that antioxidative defense system plays critical roles in response to salinity stress.Overall, our study provides evidence that the response to salinity stress in alfalfa includes both osmotic and ionic components. The key osmotic and ionic stress-related genes are candidates for future studies as potential targets to improve resistance to salinity stress via genetic engineering.


April 21, 2020

Construction of JRG (Japanese reference genome) with single-molecule real-time sequencing

In recent genome analyses, population-specific reference panels have indicated important. However, reference panels based on short-read sequencing data do not sufficiently cover long insertions. Therefore, the nature of long insertions has not been well documented. Here, we assembled a Japanese genome using single-molecule real-time sequencing data and characterized insertions found in the assembled genome. We identified 3691 insertions ranging from 100?bps to ~10,000?bps in the assembled genome relative to the international reference sequence (GRCh38). To validate and characterize these insertions, we mapped short-reads from 1070 Japanese individuals and 728 individuals from eight other populations to insertions integrated into GRCh38. With this result, we constructed JRGv1 (Japanese Reference Genome version 1) by integrating the 903 verified insertions, totaling 1,086,173 bases, shared by at least two Japanese individuals into GRCh38. We also constructed decoyJRGv1 by concatenating 3559 verified insertions, totaling 2,536,870 bases, shared by at least two Japanese individuals or by six other assemblies. This assembly improved the alignment ratio by 0.4% on average. These results demonstrate the importance of refining the reference assembly and creating a population-specific reference genome. JRGv1 and decoyJRGv1 are available at the JRG website.


April 21, 2020

Retrotranspositional landscape of Asian rice revealed by 3000 genomes.

The recent release of genomic sequences for 3000 rice varieties provides access to the genetic diversity at species level for this crop. We take advantage of this resource to unravel some features of the retrotranspositional landscape of rice. We develop software TRACKPOSON specifically for the detection of transposable elements insertion polymorphisms (TIPs) from large datasets. We apply this tool to 32 families of retrotransposons and identify more than 50,000 TIPs in the 3000 rice genomes. Most polymorphisms are found at very low frequency, suggesting that they may have occurred recently in agro. A genome-wide association study shows that these activations in rice may be triggered by external stimuli, rather than by the alteration of genetic factors involved in transposable element silencing pathways. Finally, the TIPs dataset is used to trace the origin of rice domestication. Our results suggest that rice originated from three distinct domestication events.


April 21, 2020

Origin and recent expansion of an endogenous gammaretroviral lineage in domestic and wild canids.

Vertebrate genomes contain a record of retroviruses that invaded the germlines of ancestral hosts and are passed to offspring as endogenous retroviruses (ERVs). ERVs can impact host function since they contain the necessary sequences for expression within the host. Dogs are an important system for the study of disease and evolution, yet no substantiated reports of infectious retroviruses in dogs exist. Here, we utilized Illumina whole genome sequence data to assess the origin and evolution of a recently active gammaretroviral lineage in domestic and wild canids.We identified numerous recently integrated loci of a canid-specific ERV-Fc sublineage within Canis, including 58 insertions that were absent from the reference assembly. Insertions were found throughout the dog genome including within and near gene models. By comparison of orthologous occupied sites, we characterized element prevalence across 332 genomes including all nine extant canid species, revealing evolutionary patterns of ERV-Fc segregation among species as well as subpopulations.Sequence analysis revealed common disruptive mutations, suggesting a predominant form of ERV-Fc spread by trans complementation of defective proviruses. ERV-Fc activity included multiple circulating variants that infected canid ancestors from the last 20 million to within 1.6 million years, with recent bursts of germline invasion in the sublineage leading to wolves and dogs.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.