1 How to Run This Tutorial

Before diving into the analysis, you need two things: the raw sequencing reads and a working R environment with the required packages. This preliminary page covers both.

1.1 TLDR; Minimal downloads

A subset of the reads from zenodo.org/records/20082786
Silva138.2 Database for DADA2 from zenodo.org/records/14169026
A script to install the required packages from GitHub
There is an R Markdown version that was tested on the NBI HPC (source)

1.2 The dataset: Getting the raw reads

The dataset used throughout this tutorial is publicly available in the European Nucleotide Archive (ENA) under accession PRJEB112055.

1.2.1 Downloading via the ENA browser

Go to https://www.ebi.ac.uk/ena/browser/view/PRJEB112055
Click the “Download” icon and select “FASTQ files (FTP)” to get a TSV file listing the FTP URLs for every sample.
Use wget to download the files listed in the TSV:

# Download the FTP links file from ENA
wget -i <(awk -F'\t' 'NR>1 {split($7,a,";"); for(i in a) print a[i]}' filereport.tsv)

1.3 Running locally with RStudio

1.3.1 Install R and RStudio

Download and install R (≥ 4.3, but latest is preferred) from https://cran.r-project.org/
Download and install RStudio Desktop (free) from https://posit.co/download/rstudio-desktop/

1.3.2 Install required packages

Open RStudio and run the following once to install all dependencies:

# Install some tidyverse packages from CRAN
install.packages(c("tidyr","tibble","readr","ggplot2","patchwork"))

# Install the Bioconductor package manager
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

# Install DADA2 and phyloseq from Bioconductor
BiocManager::install(c("dada2", "phyloseq", "Biostrings"))

Installation can take several minutes the first time. Once complete, verify everything loaded correctly:

library(dada2)

Version check

The tutorial was developed with dada2 ≥ 1.28, phyloseq ≥ 1.44, and tidyverse ≥ 2.0. If you see unexpected errors, check packageVersion("dada2") and update via BiocManager::install("dada2") if needed.

1.4 Running on the NBI HPC

HPC account required

You need an active NBI HPC account to follow this route. If you do not have one, contact your line manager or the NBI IT helpdesk to request access.

The NBI HPC provides RStudio through the Open on Demand web portal — no command-line setup required.

1.4.1 Step-by-step

Open your browser and go to https://ood.hpc.nbi.ac.uk/
Log in with your NBI HPC credentials.
In the top navigation bar click “Interactive Apps” and select RStudio.
Choose the compute resources you need (a 8 cores and 64 GB RAM are enough for this tutorial) and click “Launch”.
Wait for the job to start, then click “Connect to RStudio Server”.
Run the installation script to install dependencies

1.4.2 Installing packages on the HPC

The shared R installation on the HPC does not include DADA2, phyloseq, or tidyverse by default. Run the same installation commands as for a local setup — R will install the packages into your personal library (e.g. ~/r/lib/):

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install(c("dada2", "phyloseq", "Biostrings"))

install.packages("tidyverse")