dadaist2

dadaist2 - a shell wrapper for DADA2, to detect representative sequences and generate a feature table starting from Illumina Paired End reads. This is the main program of the dadaist2 toolkit that includes several wrappers and utilities to streamline the analysis of metabarcoding reads from the Linux shell to R.

Author

Andrea Telatin andrea.telatin@quadram.ac.uk

Synopsis

dadaist2 [options] -i INPUT_DIR -o OUTPUT_DIR

Parameters

Main Parameters

-i, –input-directory DIRECTORY

Directory containing the paired end files in FASTQ format, gzipped or not.
-o, –output-directory DIRECTORY

Output directory (will be created). It is recommended recomment using a new directory for each run.
-d, –database DATABASE

Reference database in gzipped FASTA format. Optional (default: skip) but highly recommended.
-m, –metadata FILE

Metadata file in TSV format, first column must match sample IDs. If not supplied a template will be autogenerated using dadaist2-metadata.
-t, –threads INT

Number of threads (default: 2)
–primers FOR:REV

Strip primers with cutadapt, supply both sequences separated by a colon.
-j, –just-concat

Do not try merging paired end reads, just concatenate.
–fastp

Perform the legacy "fastp" QC
–no-trim

Do not trim primers (using fastp). Equivalent to --trim-primer-for 0 --trim-primer-rev 0.
–force

Will overwrite output folder if it already exists, and will attempt to produce MicrobiomeAnalyst and Rhea folders even when DADA2 filters too many reads.
–dada-pool

Pool samples in DADA2 analysis (experimental)

Input reads

We recommend to prepare a polished directory of input reads having filenames like Samplename_R1.fastq.gz and Samplename_R2.fastq.gz.

Filename starting by numbers are not accepted.

-1, –for-tag TAG

Tag to recognize forward reads (default: _R1)
-2, –rev-tag TAG

Tag to recognize reverse reads (default: _R2)
-s, –id-separator CHAR

Character separating the sample name from the rest of the filename (default: _)

Metabarcoding processing

–trunc-len-1 and –trunc-len-2 INT

Position at which truncate reads (forward and reverse, respectively).
-q, –min-qual FLOAT

Minimum average quality for DADA2 truncation (default: 28)
–no-trunc

Do not truncate reads at the end (required for non-overlapping amplicons, like ITS)
–maxee1, and –maxee2 FLOAT

Maximum Expected Errors in R1 and R2, respectively (default: 1.0 and 1.5)
–trunc-qual FLOAT

DADA2 truncate quality (default: 10)
-s1, –trim-primer-for INT

Trim primer from R1 read specifying the number of bases. Similarly use -s2 (--trim-primer-rev) to remove the front bases from the reverse pair (R2). Default: 20 bases each side.
–save-rds

Save a copy of the RDS file (default: off)
–max-loss FLOAT

After DADA2 run, check the amount of reads globally remaining from input to non-chimeric, abort if the ratio is below threshold (default: 0.2)

Other parameters

–crosstalk

Remove crosstalk using the UNCROSS2 algorithm as described here https://doi.org/10.1101/400762.
-p, –prefix STRING

Prefix for the output FASTA file, if "MD5" is specified, the sequence MD5 hash will be used instead. Default is "ASV".
-l, –log-file FILE

Filename for the program log.
–tmp-dir DIR

Where to place the temporary directory (default are system temp dir or $TMPDIR).
–skip-tree

Do not generate tree. Experimental|Not recommended.
–skip-plots

Do not generate quality plots.
–popup

Display popup notifications (tested on MacOS and Ubuntu)
–quiet

Reduce verbosity
–verbose and –debug

Increase reported information

Source code and documentation

The program is freely available at https://quadram-institute-bioscience.github.io/dadaist2 released under the MIT licence. The website contains the full DOCUMENTATION and we recommend checking for updates.

The paper describing Dadaist2 was published in:

Ansorge, R.; Birolo, G.; James, S.A.; Telatin, A. Dadaist2: A Toolkit to Automate and Simplify Statistical Analysis and Plotting of Metabarcoding Experiments. Int. J. Mol. Sci. 2021, 22, 5309. https://doi.org/10.3390/ijms22105309