Published metagenome studies usually deposit the raw data (FASTQ) of the shotgun sequencing to a public repository, such as NCBI Short Reads Archive (SRA) or EBI European Nucleotide Archive (ENA).
You can install a CLI tool called “SRA Toolkit” to download FASTQ files from SRA archives:
conda create -n getreads \
-c bioconda -c conda-forge \
sra-tools
for example to download the reads from ERR2231572:
fasterq-dump --verbose --skip-technical --split-files \
--outdir "coffee-reads/" \
--threads 8 \
ERR2231572
If you don’t have Nextflow, install it (it’s available from Conda, for example)
The nf-core consortium curates a pipeline called nf-core/fetchngs that allows to parallelise and download multiple accessions with ease.
You will need to generate a list of IDs, let’s call it for example coffee_ids.csv:
ERR2231567
ERR2231568
ERR2231569
ERR2231570
ERR2231571
ERR2231572
the file must not contain empty lines or unwanted spaces.
To run the pipeline you should have a container system, either Docker or Apptainer (singularity). Assuming we will use Docker:
nextflow run nf-core/fetchngs \
--input list.csv \
--outdir coffee-reads \
-profile docker
In Nextflow, parameters starting with double dash are fed to the pipeline itself, while the one with a single dash (like -profile) are interpreted by Nextflow itself.
The output directory contains these subdirectories: