dadaist2-mergeseqs

This tool merges the two paired end denoised sequences as they appear in a DADA2 feature table when asking DADA2 not to join the reads.

Synopsis

 Combine pairs in DADA2 unmerged tables

 Usage: 
 dadaist2-mergeseqs [options] -i dada2.tsv 

 Options:
   -i, --input-file FILE      FASTA or FASTQ file
   -f, --fasta FILE           Write new sequences to FASTA
   -p, --pair-spacer STRING   Pairs separator [default: NNNNNNNNNN]
   -s, --strip STRING         Remove this string from sample names
   -n, --seq-name STRING      Sequence string name [default: MD5]
   -m, --max-mismatches INT   Maximum allowed mismatches [default: 0]
   --id STRING                Features column name [default: #OTU ID]
   --verbose                  Print verbose output

Input

The TSV table produced by DADA2 (first column with the actual representative sequence, each following column with the counts per sample). Here a truncated example:

#OTU ID       A01_R1.fastq.gz A02_R1.fastq.gz F99_R1.fastq.gz
GGAATTTTG[..]GGGCTTAACCTNNNNNNNNNNGCGCTTA [..]AAACAG  1263    1544    1341
GGAATCTTC[..]TTGGCTCAACCNNNNNNNNNNAGCGCAGG[..]AAACAG  100     21      19
GGAATTTTGG[..]CTTAACCTNNNNNNNNNNGCGCAGGCGG[..]AAACAG  490 24  296

A stretch of Ns separates the denoised _R1 sequence and the denoise _R2.

Output

The output is a similar table after joining the reads, when possible. If using --verbose a summary will be printed to the standard error at the end.

Total:8;Split:8;Joined:8