dadaist2
dadaist2 - a shell wrapper for DADA2, to detect representative sequences and generate a feature table starting from Illumina Paired End reads. This is the main program of the dadaist2 toolkit that includes several wrappers and utilities to streamline the analysis of metabarcoding reads from the Linux shell to R.
Author
Andrea Telatin andrea.telatin@quadram.ac.uk
Synopsis
dadaist2 [options] -i INPUT_DIR -o OUTPUT_DIR
Parameters
Main Parameters
-
-i, –input-directory DIRECTORY
Directory containing the paired end files in FASTQ format, gzipped or not.
-
-o, –output-directory DIRECTORY
Output directory (will be created). It is recommended recomment using a new directory for each run.
-
-d, –database DATABASE
Reference database in gzipped FASTA format. Optional (default: skip) but highly recommended.
-
-m, –metadata FILE
Metadata file in TSV format, first column must match sample IDs. If not supplied a template will be autogenerated using
dadaist2-metadata
. -
-t, –threads INT
Number of threads (default: 2)
-
–primers FOR:REV
Strip primers with cutadapt, supply both sequences separated by a colon.
-
-j, –just-concat
Do not try merging paired end reads, just concatenate.
-
–fastp
Perform the legacy "fastp" QC
-
–no-trim
Do not trim primers (using fastp). Equivalent to
--trim-primer-for 0 --trim-primer-rev 0
. -
–force
Will overwrite output folder if it already exists, and will attempt to produce MicrobiomeAnalyst and Rhea folders even when DADA2 filters too many reads.
-
–dada-pool
Pool samples in DADA2 analysis (experimental)
Input reads
We recommend to prepare a polished directory of input reads having filenames like Samplename_R1.fastq.gz and Samplename_R2.fastq.gz.
Filename starting by numbers are not accepted.
-
-1, –for-tag TAG
Tag to recognize forward reads (default: _R1)
-
-2, –rev-tag TAG
Tag to recognize reverse reads (default: _R2)
-
-s, –id-separator CHAR
Character separating the sample name from the rest of the filename (default: _)
Metabarcoding processing
-
–trunc-len-1 and –trunc-len-2 INT
Position at which truncate reads (forward and reverse, respectively).
-
-q, –min-qual FLOAT
Minimum average quality for DADA2 truncation (default: 28)
-
–no-trunc
Do not truncate reads at the end (required for non-overlapping amplicons, like ITS)
-
–maxee1, and –maxee2 FLOAT
Maximum Expected Errors in R1 and R2, respectively (default: 1.0 and 1.5)
-
–trunc-qual FLOAT
DADA2 truncate quality (default: 10)
-
-s1, –trim-primer-for INT
Trim primer from R1 read specifying the number of bases. Similarly use
-s2
(--trim-primer-rev
) to remove the front bases from the reverse pair (R2). Default: 20 bases each side. -
–save-rds
Save a copy of the RDS file (default: off)
-
–max-loss FLOAT
After DADA2 run, check the amount of reads globally remaining from input to non-chimeric, abort if the ratio is below threshold (default: 0.2)
Other parameters
-
–crosstalk
Remove crosstalk using the UNCROSS2 algorithm as described here https://doi.org/10.1101/400762.
-
-p, –prefix STRING
Prefix for the output FASTA file, if "MD5" is specified, the sequence MD5 hash will be used instead. Default is "ASV".
-
-l, –log-file FILE
Filename for the program log.
-
–tmp-dir DIR
Where to place the temporary directory (default are system temp dir or
$TMPDIR
). -
–skip-tree
Do not generate tree. Experimental|Not recommended.
-
–skip-plots
Do not generate quality plots.
-
–popup
Display popup notifications (tested on MacOS and Ubuntu)
-
–quiet
Reduce verbosity
-
–verbose and –debug
Increase reported information
Source code and documentation
The program is freely available at https://quadram-institute-bioscience.github.io/dadaist2 released under the MIT licence. The website contains the full DOCUMENTATION and we recommend checking for updates.
The paper describing Dadaist2 was published in:
Ansorge, R.; Birolo, G.; James, S.A.; Telatin, A. Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments. Int. J. Mol. Sci. 2021, 22, 5309. https://doi.org/10.3390/ijms22105309