Consensus assessment of the contamination level of publicly available cyanobacterial genomes
Luc Cornet, Loïc Meunier, Mick Van Vlierberghe, Raphaël R. Léonard, Benoit Durieu, Yannick Lara, Agnieszka Misztak, Damien Sirjacobs, Emmanuelle J. Javaux, Hervé Philippe, Annick Wilmotte, Denis Baurain
Publicly available genomes are crucial for phylogenetic and metagenomic studies, in which contaminating sequences can be the cause of major problems. This issue is expected to be especially important for Cyanobacteria because axenic strains are notoriously difficult to obtain and keep in culture. Yet, despite their great scientific interest, no data are currently available concerning the quality of publicly available cyanobacterial genomes. As reliably detecting contaminants is a complex task, we designed a pipeline combining six methods in a consensus strategy to assess the contamination level of 440 genome assemblies of Cyanobacteria. Two methods are based on published reference databases of ribosomal genes (SSU rRNA 16S and ribosomal proteins), one is indirectly based on a reference database of marker genes (CheckM), and three are based on complete genome analysis. Among those genome-wide methods, Kraken and DIAMOND blastx share the same reference database that we derived from Ensembl Bacteria, whereas CONCOCT does not require any reference database, instead relying on differences in DNA tetramer frequencies. Given that all the six methods appear to have their own strengths and limitations, we used the consensus of their rankings to infer that >5% of cyanobacterial genome assemblies are highly contaminated by foreign DNA (i.e., contaminants were detected by 5 or 6 methods). Our results will help researchers to check the quality of publicly available genomic data before use in their own analyses. Moreover, we argue that journals should make mandatory the submission of raw read data along with genome assemblies in order to facilitate the detection of contaminants in sequence databases.
https://doi.org/10.1371/journal.pone.0200323
Draft Genome Sequence of the Axenic Strain Phormidesmispriestleyi ULC007, a Cyanobacterium Isolated from Lake Bruehwiler (Larsemann Hills, Antarctica)
Yannick Lara, Benoit Durieu, Luc Cornet, Olivier Verlaine, Rosmarie Rippka, Igor S. Pessi, Agnieszka Misztak, Bernard Joris, Emmanuelle J. Javaux, Denis Baurain, Annick Wilmotte
Phormidesmis priestleyi ULC007 is an Antarctic freshwater cyanobacterium. Its draft genome is 5,684,389 bp long. It contains a total of 5,604 protein-encoding genes, of which 22.2% have no clear homologues in known genomes. To date, this draft genome is the first one ever determined for an axenic cyanobacterium from Antarctica.
https://doi.org/10.1128/genomea.01546-16
Metagenomic assembly of new (sub)polar Cyanobacteria and their associated microbiome from non-axenic cultures
Luc Cornet, Amandine R. Bertrand, Marc Hanikenne, Emmanuelle J. Javaux, Annick Wilmotte and Denis Baurain
Cyanobacteria form one of the most diversified phyla of Bacteria. They are important ecologically as primary producers, for Earth evolution and biotechnological applications. Yet, Cyanobacteria are notably difficult to purify and grow axenically, and most strains in culture collections contain heterotrophic bacteria that were probably associated with Cyanobacteria in the environment. Obtaining cyanobacterial DNA without contaminant sequences is thus a challenging and time-consuming task. Here, we describe a metagenomic pipeline that enables the easy recovery of genomes from non-axenic cultures. We tested this pipeline on 17 cyanobacterial cultures from the BCCM/ULC public collection and generated novel genome sequences for 12 polar or subpolar strains and three temperate ones, including three early-branching organisms that will be useful for phylogenomics. In parallel, we assembled 31 co-cultivated bacteria (12 nearly complete) from the same cultures and showed that they mostly belong to Bacteroidetes and Proteobacteria, some of them being very closely related in spite of geographically distant sampling sites.
https://doi.org/10.1099/mgen.0.000212