2021 on Luc Cornet

Exploring syntenic conservation across genomes for phylogenetic studies of organisms subjected to horizontal gene transfers: A case study with Cyanobacteria and cyanolichens

Mon, 01 Jan 0001 00:00:00 +0000

Luc Cornet, Nicolas Magain, Denis Baurain, François Lutzoni
Understanding the evolutionary history of symbiotic Cyanobacteria at a fine scale is essential to unveil patterns of associations with their hosts and factors driving their spatiotemporal interactions. As for bacteria in general, Horizontal Gene Transfers (HGT) are expected to be rampant throughout their evolution, which justified the use of single-locus phylogenies in macroevolutionary studies of these photoautotrophic bacteria. Genomic approaches have greatly increased the amount of molecular data available, but the selection of orthologous, congruent genes that are more likely to reflect bacterial macroevolutionary histories remains problematic. In this study, we developed a synteny-based approach and searched for Collinear Orthologous Regions (COR), under the assumption that genes that are present in the same order and orientation across a wide monophyletic clade are less likely to have undergone HGT. We searched sixteen reference Nostocales genomes and identified 99 genes, part of 28 COR comprising three to eight genes each. We then developed a bioinformatic pipeline, designed to minimize inter-genome contamination and processed twelve Nostoc-associated lichen metagenomes. This reduced our original dataset to 90 genes representing 25 COR, which were used to infer phylogenetic relationships within Nostocales and among lichenized Cyanobacteria. This dataset was narrowed down further to 71 genes representing 22 COR by selecting only genes part of one (largest) operon per COR. We found a relatively high level of congruence among trees derived from the 90-gene dataset, but congruence was only slightly higher among genes within a COR compared to genes across COR. However, topological congruence was significantly higher among the 71 genes part of one operon per COR. Nostocales phylogenies resulting from concatenation and species tree approaches based on the 90- and 71-gene datasets were highly congruent, but the most highly supported result was obtained when using synteny, collinearity, and operon information (i.e., 71-gene dataset) as gene selection criteria, which outperformed larger datasets with more genes.
https://doi.org/10.1016/j.ympev.2021.107100

ORPER: A Workflow for Constrained SSU rRNA Phylogenies

Mon, 01 Jan 0001 00:00:00 +0000

Luc Cornet, Anne-Catherine Ahn, Annick Wilmotte, and Denis Baurain
he continuous increase in sequenced genomes in public repositories makes the choice of interesting bacterial strains for future sequencing projects ever more complicated, as it is difficult to estimate the redundancy between these strains and the already available genomes. Therefore, we developed the Nextflow workflow “ORPER”, for “ORganism PlacER”, containerized in Singularity, which allows the determination the phylogenetic position of a collection of organisms in the genomic landscape. ORPER constrains the phylogenetic placement of SSU (16S) rRNA sequences in a multilocus reference tree based on ribosomal protein genes extracted from public genomes. We demonstrate the utility of ORPER on the Cyanobacteria phylum, by placing 152 strains of the BCCM/ULC collection.

AMAW: automated gene annotation for non-model eukaryotic genomes

Mon, 01 Jan 0001 00:00:00 +0000

Loïc Meunier, Denis Baurain, Luc Cornet
Background: The annotation of genomes is a crucial step regarding the analysis of new genomic data and resulting insights, and this especially for emerging organisms which allow researchers to access unexplored lineages, so as to expand our knowledge of poorly represented taxonomic groups. Complete pipelines for eukaryotic genome annotation have been proposed for more than a decade, but the issue is still challenging. One of the most widely used tools in the field is MAKER2, an annotation pipeline using experimental evidence (mRNA-seq and proteins) and combining different gene prediction tools. MAKER2 enables individual laboratories and small-scale projects to annotate non-model organisms for which pre-existing gene models are not available. The optimal use of MAKER2 requires gathering evidence data (by searching and assembling transcripts, and/or collecting homologous proteins from related organisms), elaborating the best annotation strategy (training of gene models) and efficiently orchestrating the different steps of the software in a grid computing environment, which is tedious, time-consuming and requires a great deal of bioinformatic skills. Methods: To address these issues, we present AMAW (Automated MAKER2 Annotation Wrapper), a wrapper pipeline for MAKER2 that automates the above-mentioned tasks. Importantly, AMAW also exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2. Use case: The performance of AMAW is illustrated through the annotation of a selection of 32 protist genomes, for which we compared its annotations with those produced with gene models directly available in AUGUSTUS. Conclusions: Importantly, AMAW also exists as a Singularity container recipe easy to deploy on a grid computer, thereby overcoming the tricky installation of MAKER2.
https://f1000research.com/articles/12-186

The taxonomy of the Trichophyton rubrum complex: a phylogenomic approach

Mon, 01 Jan 0001 00:00:00 +0000

Luc Cornet, Elizabet D’hooge, Nicolas Magain, Dirk Stubbe, Ann Packeu, Denis Baurain, and Pierre Becker
The medically relevant Trichophyton rubrum species complex has a variety of phenotypic presentations but shows relatively little genetic differences. Conventional barcodes, such as the internal transcribed spacer (ITS) region or the beta-tubulin gene, are not able to completely resolve the relationships between these closely related taxa. T. rubrum, T. soudanense and T. violaceum are currently accepted as separate species. However, the status of certain variants, including the T. rubrum morphotypes megninii and kuryangei and the T. violaceum morphotype yaoundei, remains to be deciphered. We conducted the first phylogenomic analysis of the T. rubrum species complex by studying 3105 core genes of 18 new strains from the BCCM/IHEM culture collection and nine publicly available genomes. Our analyses revealed a highly resolved phylogenomic tree with six separate clades. Trichophyton rubrum, T. violaceum and T. soudanense were confirmed in their status of species. The morphotypes T. megninii, T. kuryangei and T. yaoundei all grouped in their own respective clade with high support, suggesting that these morphotypes should be reinstituted to the species-level. Robinson-Foulds distance analyses showed that a combination of two markers (a ubiquitin-protein transferase and a MYB DNA-binding domain-containing protein) can mirror the phylogeny obtained using genomic data, and thus represent potential new markers to accurately distinguish the species belonging to the T. rubrum complex.
https://doi.org/10.1099/mgen.0.000707

Contamination in Reference Sequence Databases: Time for Divide-and-Rule Tactics

Mon, 01 Jan 0001 00:00:00 +0000

Valérian Lupo, Mick Van Vlierberghe, Hervé Vanderschuren, Frédéric Kerff, Denis Baurain and Luc Cornet
Contaminating sequences in public genome databases is a pervasive issue with potentially far-reaching consequences. This problem has attracted much attention in the recent literature and many different tools are now available to detect contaminants. Although these methods are based on diverse algorithms that can sometimes produce widely different estimates of the contamination level, the majority of genomic studies rely on a single method of detection, which represents a risk of systematic error. In this work, we used two orthogonal methods to assess the level of contamination among National Center for Biotechnological Information Reference Sequence Database (RefSeq) bacterial genomes. First, we applied the most popular solution, CheckM, which is based on gene markers. We then complemented this approach by a genome-wide method, termed Physeter, which now implements a k-folds algorithm to avoid inaccurate detection due to potential contamination of the reference database. We demonstrate that CheckM cannot currently be applied to all available genomes and bacterial groups. While it performed well on the majority of RefSeq genomes, it produced dubious results for 12,326 organisms. Among those, Physeter identified 239 contaminated genomes that had been missed by CheckM. In conclusion, we emphasize the importance of using multiple methods of detection while providing an upgrade of our own detection tool, Physeter, which minimizes incorrect contamination estimates in the context of unavoidably contaminated reference databases.
https://doi.org/10.3389/fmicb.2021.755101

ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies

Mon, 01 Jan 0001 00:00:00 +0000

Raphaël R. Léonard, Marie Leleu, Mick Van Vlierberghe, Luc Cornet, Frédéric Kerff, Denis Baurain
TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].
https://peerj.com/articles/11348/