Skip to content
Snippets Groups Projects
README.md 6.2 KiB
Newer Older
Joanna Fourquet's avatar
Joanna Fourquet committed
# **metagWGS**
Celine Noirot's avatar
Celine Noirot committed

Joanna Fourquet's avatar
Joanna Fourquet committed
# Introduction

Joanna Fourquet's avatar
Joanna Fourquet committed
**metagWGS** is a [Nextflow](https://www.nextflow.io/docs/latest/index.html#) bioinformatics analysis pipeline used for **metag**enomic **W**hole **G**enome **S**hotgun sequencing data (Illumina HiSeq3000 or NovaSeq, paired, 2\*150bp).

## Pipeline graphical representation
The workflow processes raw data from `.fastq` or `.fastq.gz` inputs and do the modules represented into this figure:
![](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/raw/dev/docs/Pipeline.png)

## metagWGS steps
Joanna Fourquet's avatar
Joanna Fourquet committed

metagWGS is splitted into different steps, corresponding to different part of bioinformatics analysis :

Joanna Fourquet's avatar
Joanna Fourquet committed
* `clean_qc` (cyan module, can ke skipped)
Joanna Fourquet's avatar
Joanna Fourquet committed
   * trims adapters sequences and deletes low quality reads ([Cutadapt](https://cutadapt.readthedocs.io/en/stable/#), [Sickle](https://github.com/najoshi/sickle))
   * suppresses host contaminants ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
   * controls the quality of raw and cleaned data ([FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
   * makes a taxonomic classification of cleaned reads ([Kaiju MEM](https://github.com/bioinformatics-centre/kaiju) + [kronaTools](https://github.com/marbl/Krona/wiki/KronaTools) + [Generate_barplot_kaiju.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Generate_barplot_kaiju.py) + [merge_kaiju_results.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_kaiju_results.py))
* `assembly` (purple module)
Joanna Fourquet's avatar
Joanna Fourquet committed
   * assembles cleaned reads (combined with `clean_qc` step) or raw reads (combined with `--skip_clean_qc` parameter) ([metaSPAdes](https://github.com/ablab/spades) or [Megahit](https://github.com/voutcn/megahit))
Joanna Fourquet's avatar
Joanna Fourquet committed
   * assess the quality of assembly ([metaQUAST](http://quast.sourceforge.net/metaquast))
Joanna Fourquet's avatar
Joanna Fourquet committed
   * deduplicates cleaned reads (combined with `clean_qc` step) or raw reads (combined with `--skip_clean_qc` parameter) ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/) + [Bedtools](https://bedtools.readthedocs.io/en/latest/))
* `filtering` (orange module, can be skipped)
Joanna Fourquet's avatar
Joanna Fourquet committed
   * filter contigs with low CPM value ([Filter_contig_per_cpm.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Filter_contig_per_cpm.py) + [metaQUAST](http://quast.sourceforge.net/metaquast))
* `structural` (yellow module - part 1)
   * structural annotation of genes ([Prokka](https://github.com/tseemann/prokka) + [Rename_contigs_and_genes.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Rename_contigs_and_genes.py))
* `alignment` (yellow module - part 2)
   * alignment of reads on contigs ([BWA](http://bio-bwa.sourceforge.net/) + [Samtools](http://www.htslib.org/))
   * alignment of genes against protein database ([DIAMOND](https://github.com/bbuchfink/diamond))
* `functional` (blue module)
   * sample and global clustering of genes ([cd-hit-est](http://weizhongli-lab.org/cd-hit/) + [cd_hit_produce_table_clstr.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/cd_hit_produce_table_clstr.py))
   * quantification of reads on genes ([featureCounts](http://subread.sourceforge.net/) + [Quantification_clusters.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/Quantification_clusters.py))
   * functional annotation of genes and quantification of reads by function ([eggNOG-mapper](http://eggnog-mapper.embl.de/) + [best_bitscore_diamond.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/best_bitscore_diamond.py) + [merge_abundance_and_functional_annotations.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_abundance_and_functional_annotations.py) + [quantification_by_functional_annotation.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_functional_annotation.py))
* `taxonomic` (brown module)
   * taxonomic affiliation of contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
   * quantification by species of the number of reads and contigs in each sample ([merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* `binning` from [nf-core/mag](https://github.com/nf-core/mag) (green module)
   * binning of contigs ([metabat2](https://bitbucket.org/berkeleylab/metabat/src/master/))
   * assess bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
   * taxonomic affiliation of bins ([BAT](https://github.com/dutilh/CAT))

A single report html file is generated at the end of the workflow with [MultiQC](https://multiqc.info/).
Joanna Fourquet's avatar
Joanna Fourquet committed
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Joanna Fourquet's avatar
Joanna Fourquet committed
Two [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Joanna Fourquet's avatar
Joanna Fourquet committed
# Documentation

metagWGS documentation is available [here](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/tree/dev/docs).

Joanna Fourquet's avatar
Joanna Fourquet committed
# License
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
metagWGS is distributed under the GNU General Public License v3.
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
# Copyright
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
2021 INRAE
Joanna Fourquet's avatar
Joanna Fourquet committed

Joanna Fourquet's avatar
Joanna Fourquet committed
# Citation
Joanna Fourquet's avatar
Joanna Fourquet committed

metagWGS has been presented at JOBIM 2020:
Poster "Whole metagenome analysis with metagWGS", J. Fourquet, C. Noirot, C. Klopp, P. Pinton, S. Combes, C. Hoede, G. Pascal.
https://www.sfbi.fr/sites/sfbi.fr/files/jobim/jobim2020/posters/compressed/jobim2020_poster_9.pdf

Joanna Fourquet's avatar
Joanna Fourquet committed
metagWGS has been presented at JOBIM 2019 and at Genotoul Biostat Bioinfo day:
Poster "Whole metagenome analysis with metagWGS", J. Fourquet, A. Chaubet, H. Chiapello, C. Gaspin, M. Haenni, C. Klopp, A. Lupo, J. Mainguy, C. Noirot, T. Rochegue, M. Zytnicki, T. Ferry, C. Hoede.