Menu
July 7, 2019

The evolution and population diversity of human-specific segmental duplications

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (N?=?80 genes from 33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed ‘core duplicons’ and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (such as TCAF1/TCAF2), we highlight ten gene families (for example, ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.


July 7, 2019

Complete genome analysis of Serratia marcescens RSC-14: A plant growth-promoting bacterium that alleviates cadmium stress in host plants.

Serratia marcescens RSC-14 is a Gram-negative bacterium that was previously isolated from the surface-sterilized roots of the Cd-hyperaccumulator Solanum nigrum. The strain stimulates plant growth and alleviates Cd stress in host plants. To investigate the genetic basis for these traits, the complete genome of RSC-14 was obtained by single-molecule real-time sequencing. The genome of S. marcescens RSC-14 comprised a 5.12-Mbp-long circular chromosome containing 4,593 predicted protein-coding genes, 22 rRNA genes, 88 tRNA genes, and 41 pseudogenes. It contained genes with potential functions in plant growth promotion, including genes involved in indole-3-acetic acid (IAA) biosynthesis, acetoin synthesis, and phosphate solubilization. Moreover, annotation using NCBI and Rapid Annotation using Subsystem Technology identified several genes that encode antioxidant enzymes as well as genes involved in antioxidant production, supporting the observed resistance towards heavy metals, such as Cd. The presence of IAA pathway-related genes and oxidative stress-responsive enzyme genes may explain the plant growth-promoting potential and Cd tolerance, respectively. This is the first report of a complete genome sequence of Cd-tolerant S. marcescens and its plant growth promotion pathway. The whole-genome analysis of this strain clarified the genetic basis underlying its phenotypic and biochemical characteristics, underpinning the beneficial interactions between RSC-14 and plants.


July 7, 2019

First complete genome sequence of Marinilactibacillus piezotolerans strain 15R, a marine lactobacillus isolated from coal-bearing sediment 2.0 kilometers below the seafloor, determined by PacBio single-molecule real-time technology.

Marinilactibacillus piezotolerans strain 15R is a facultatively anaerobic heterotrophic lactobacillus isolated from deep marine subsurface sediment nearly 2 km below the seafloor in the northwestern Pacific. We report here the first whole-genome sequence of strain 15R. The identified genome sequence has 2,767,908 bp, 35.4% G+C content, and predicted 2,552 candidate protein-coding sequences, with no identified plasmids. Copyright © 2017 Wei et al.


July 7, 2019

The value of new genome references.

Genomic information has become a ubiquitous and almost essential aspect of biological research. Over the last 10-15 years, the cost of generating sequence data from DNA or RNA samples has dramatically declined and our ability to interpret those data increased just as remarkably. Although it is still possible for biologists to conduct interesting and valuable research on species for which genomic data are not available, the impact of having access to a high quality whole genome reference assembly for a given species is nothing short of transformational. Research on a species for which we have no DNA or RNA sequence data is restricted in fundamental ways. In contrast, even access to an initial draft quality genome (see below for definitions) opens a wide range of opportunities that are simply not available without that reference genome assembly. Although a complete discussion of the impact of genome sequencing and assembly is beyond the scope of this short paper, the goal of this review is to summarize the most common and highest impact contributions that whole genome sequencing and assembly has had on comparative and evolutionary biology. Copyright © 2016. Published by Elsevier Inc.


July 7, 2019

Deep sequencing in the management of hepatitis virus infections.

The hepatitis viruses represent a major public health problem worldwide. Procedures for characterization of the genomic composition of their populations, accurate diagnosis, identification of multiple infections, and information on inhibitor-escape mutants for treatment decisions are needed. Deep sequencing methodologies are extremely useful for these viruses since they replicate as complex and dynamic quasispecies swarms whose complexity and mutant composition are biologically relevant traits. Population complexity is a major challenge for disease prevention and control, but also an opportunity to distinguish among related but phenotypically distinct variants that might anticipate disease progression and treatment outcome. Detailed characterization of mutant spectra should permit choosing better treatment options, given the increasing number of new antiviral inhibitors available. In the present review we briefly summarize our experience on the use of deep sequencing for the management of hepatitis virus infections, particularly for hepatitis B and C viruses, and outline some possible new applications of deep sequencing for these important human pathogens. Copyright © 2016 Elsevier B.V. All rights reserved.


July 7, 2019

Comparative genomics and transcriptome analysis of Aspergillus niger and metabolic engineering for citrate production.

Despite a long and successful history of citrate production in Aspergillus niger, the molecular mechanism of citrate accumulation is only partially understood. In this study, we used comparative genomics and transcriptome analysis of citrate-producing strains-namely, A. niger H915-1 (citrate titer: 157?g?L(-1)), A1 (117?g?L(-1)), and L2 (76?g?L(-1))-to gain a genome-wide view of the mechanism of citrate accumulation. Compared with A. niger A1 and L2, A. niger H915-1 contained 92 mutated genes, including a succinate-semialdehyde dehydrogenase in the ?-aminobutyric acid shunt pathway and an aconitase family protein involved in citrate synthesis. Furthermore, transcriptome analysis of A. niger H915-1 revealed that the transcription levels of 479 genes changed between the cell growth stage (6?h) and the citrate synthesis stage (12?h, 24?h, 36?h, and 48?h). In the glycolysis pathway, triosephosphate isomerase was up-regulated, whereas pyruvate kinase was down-regulated. Two cytosol ATP-citrate lyases, which take part in the cycle of citrate synthesis, were up-regulated, and may coordinate with the alternative oxidases in the alternative respiratory pathway for energy balance. Finally, deletion of the oxaloacetate acetylhydrolase gene in H915-1 eliminated oxalate formation but neither influence on pH decrease nor difference in citrate production were observed.


July 7, 2019

Complete genome sequences of three multidrug-resistant clinical isolates of Streptococcus pneumoniae serotype 19A with different susceptibilities to the myxobacterial metabolite carolacton.

The full-genome sequences of three drug- and multidrug-resistant Streptococcus pneumoniae clinical isolates of serotype 19A were determined by PacBio single-molecule real-time sequencing, in combination with Illumina MiSeq sequencing. A comparison to the genomes of other pneumococci indicates a high nucleotide sequence identity to strains Hungary19A-6 and TCH8431/19A. Copyright © 2017 Donner et al.


July 7, 2019

Fallacy of the unique genome: sequence diversity within single Helicobacter pylori strains.

Many bacterial genomes are highly variable but nonetheless are typically published as a single assembled genome. Experiments tracking bacterial genome evolution have not looked at the variation present at a given point in time. Here, we analyzed the mouse-passaged Helicobacter pylori strain SS1 and its parent PMSS1 to assess intra- and intergenomic variability. Using high sequence coverage depth and experimental validation, we detected extensive genome plasticity within these H. pylori isolates, including movement of the transposable element IS607, large and small inversions, multiple single nucleotide polymorphisms, and variation in cagA copy number. The cagA gene was found as 1 to 4 tandem copies located off the cag island in both SS1 and PMSS1; this copy number variation correlated with protein expression. To gain insight into the changes that occurred during mouse adaptation, we also compared SS1 and PMSS1 and observed 46 differences that were distinct from the within-genome variation. The most substantial was an insertion in cagY, which encodes a protein required for a type IV secretion system function. We detected modifications in genes coding for two proteins known to affect mouse colonization, the HpaA neuraminyllactose-binding protein and the FutB a-1,3 lipopolysaccharide (LPS) fucosyltransferase, as well as genes predicted to modulate diverse properties. In sum, our work suggests that data from consensus genome assemblies from single colonies may be misleading by failing to represent the variability present. Furthermore, we show that high-depth genomic sequencing data of a population can be analyzed to gain insight into the normal variation within bacterial strains.IMPORTANCE Although it is well known that many bacterial genomes are highly variable, it is nonetheless traditional to refer to, analyze, and publish “the genome” of a bacterial strain. Variability is usually reduced (“only sequence from a single colony”), ignored (“just publish the consensus”), or placed in the “too-hard” basket (“analysis of raw read data is more robust”). Now that whole-genome sequences are regularly used to assess virulence and track outbreaks, a better understanding of the baseline genomic variation present within single strains is needed. Here, we describe the variability seen in typical working stocks and colonies of pathogen Helicobacter pylori model strains SS1 and PMSS1 as revealed by use of high-coverage mate pair next-generation sequencing (NGS) and confirmed by traditional laboratory techniques. This work demonstrates that reliance on a consensus assembly as “the genome” of a bacterial strain may be misleading. Copyright © 2017 Draper et al.


July 7, 2019

Genomic sequence of ‘Candidatus Liberibacter solanacearum’ haplotype C and its comparison with haplotype A and B genomes.

Haplotypes A and B of ‘Candidatus Liberibacter solanacearum’ (CLso) are associated with diseases of solanaceous plants, especially Zebra chip disease of potato, and haplotypes C, D and E are associated with symptoms on apiaceous plants. To date, one complete genome of haplotype B and two high quality draft genomes of haplotype A have been obtained for these unculturable bacteria using metagenomics from the psyllid vector Bactericera cockerelli. Here, we present the first genomic sequences obtained for the carrot-associated CLso. These two genomic sequences of haplotype C, FIN114 (1.24 Mbp) and FIN111 (1.20 Mbp), were obtained from carrot psyllids (Trioza apicalis) harboring CLso. Genomic comparisons between the haplotypes A, B and C revealed that the genome organization differs between these haplotypes, due to large inversions and other recombinations. Comparison of protein-coding genes indicated that the core genome of CLso consists of 885 ortholog groups, with the pan-genome consisting of 1327 ortholog groups. Twenty-seven ortholog groups are unique to CLso haplotype C, whilst 11 ortholog groups shared by the haplotypes A and B, are not found in the haplotype C. Some of these ortholog groups that are not part of the core genome may encode functions related to interactions with the different host plant and psyllid species.


July 7, 2019

Complete genome sequences of three Cupriavidus strains isolated from various Malaysian environments.

Cupriavidus sp. USMAA1020, USMAA2-4, and USMAHM13 are capable of producing polyhydroxyalkanoate (PHA). This biopolymer is an alternative solution to synthetic plastics, whereby polyhydroxyalkanoate synthase is the key enzyme involved in PHA biosynthesis. Here, we report the complete genomes of three Cupriavidus sp. strains: USMAA1020, USMAA2-4, and USMAHM13. Copyright © 2017 Shafie et al.


July 7, 2019

Complete genome sequence of Thermus brockianus GE-1 reveals key enzymes of xylan/xylose metabolism.

Thermus brockianus strain GE-1 is a thermophilic, Gram-negative, rod-shaped and non-motile bacterium that was isolated from the Geysir geothermal area, Iceland. Like other thermophiles, Thermus species are often used as model organisms to understand the mechanism of action of extremozymes, especially focusing on their heat-activity and thermostability. Genome-specific features of T. brockianus GE-1 and their properties further help to explain processes of the adaption of extremophiles at elevated temperatures. Here we analyze the first whole genome sequence of T. brockianus strain GE-1. Insights of the genome sequence and the methodologies that were applied during de novo assembly and annotation are given in detail. The finished genome shows a phred quality value of QV50. The complete genome size is 2.38 Mb, comprising the chromosome (2,035,182 bp), the megaplasmid pTB1 (342,792 bp) and the smaller plasmid pTB2 (10,299 bp). Gene prediction revealed 2,511 genes in total, including 2,458 protein-encoding genes, 53 RNA and 66 pseudo genes. A unique genomic region on megaplasmid pTB1 was identified encoding key enzymes for xylan depolymerization and xylose metabolism. This is in agreement with the growth experiments in which xylan is utilized as sole source of carbon. Accordingly, we identified sequences encoding the xylanase Xyn10, an endoglucanase, the membrane ABC sugar transporter XylH, the xylose-binding protein XylF, the xylose isomerase XylA catalyzing the first step of xylose metabolism and the xylulokinase XylB, responsible for the second step of xylose metabolism. Our data indicate that an ancestor of T. brockianus obtained the ability to use xylose as alternative carbon source by horizontal gene transfer.


July 7, 2019

Improving and correcting the contiguity of long-read genome assemblies of three plant species using optical mapping and chromosome conformation capture data.

Long-read sequencing can overcome the weaknesses of short reads in the assembly of eukaryotic genomes, however, at present additional scaffolding is needed to achieve chromosome-level assemblies. We generated PacBio long-read data of the genomes of three relatives of the model plant Arabidopsis thaliana and assembled all three genomes into only a few hundred contigs. To improve the contiguities of these assemblies, we generated BioNano Genomics optical mapping and Dovetail Genomics chromosome conformation capture data for genome scaffolding. Despite their technical differences, optical mapping and chromosome conformation capture performed similarly and doubled N50 values. After improving both integration methods, assembly contiguity reached chromosome-arm-levels. We rigorously assessed the quality of contigs and scaffolds using Illumina mate-pair libraries and genetic map information. This showed that PacBio assemblies have high sequence accuracy but can contain several misassemblies, which join unlinked regions of the genome. Most, but not all of these mis-joints were removed during the integration of the optical mapping and chromosome conformation capture data. Even though none of the centromeres was fully assembled, the scaffolds revealed large parts of some centromeric regions, even including some of the heterochromatic regions, which are not present in gold standard reference sequences. Published by Cold Spring Harbor Laboratory Press.


Talk with an expert

If you have a question, need to check the status of an order, or are interested in purchasing an instrument, we're here to help.