Login and look around

First we need to log in to a “submission node”, using

ssh ${USER}@hpc.ac.uk

We know our home directory.

Create a conda environment

To check our Micromamba installation we will create a new environment

# Log in to the software node
ssh software

# Crea the new environment with votuderep, available from Bioconda
micromamba create -n tutorial votuderep

Staying in the software node we will download a training dataset using a tool called votuderep.

# Check the options
micromamba run -n tutorial  votuderep trainingdata --help

# Run the program
micromamba run -n tutorial votuderep trainingdata -o ~/tutorial/

⚠️ If the download fails but we got at least the first file, we can move on and skip the fastp part

Run jobs and choosing software

Return to the login node:

logout

Sequence statistics

We will use seqfu stats to gather statistics about the FASTA file we downloaded.

The package is available with lmod after configuring it with QIB Core Bioinformatics tool.

# Load Seqfu from lmod
module avail seqfu
module load 

# Fallback method
source /nbi/software/testing/bin/seqfu__1.22.0

Checking the stats: we will submit the job! Let’s try the configurator and produce something like:

#!/bin/bash
#SBATCH --job-name=assembly-stats
#SBATCH --output=%x-%j.out
#SBATCH --partition=qib-short
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=0-02:00:00
#SBATCH --mail-type=BEGIN,END,FAIL

# Job information
echo "Job started at: $(date)"
echo "Running on: $(hostname)"
echo "Job ID: $SLURM_JOB_ID"
echo ""
# Change to submission directory
cd "$SLURM_SUBMIT_DIR"

# Environment setup
# load seqfu
module load seqfu

# Main commands
# check stats
seqfu stats human_gut_assembly.fa.gz > stats.tsv

echo ""
echo "Job completed at: $(date)"

❓ How can you check the status of the job

💡 Verify with ls and cat that you produced the files you expect

Running Kraken2

cd ~/tutorial

# Here we use the $DATABASES shortcut from bashrc
export KRAKEN2_DEFAULT_DB=$DATABASES/kraken2/benlangmead/k2_standard_20230314/

We will use again the LMOD software catalogue:

# Load kraken2
module avail kraken
module load kraken2

But this time we will use nbi-slurm to run the job with some help:

# Use "nbi-slurm" helper to launch the job
runjob -c 16 -m 128G -w logs -run -n mykraken-1 \
  "kraken2 --threads 16 --memory-mapping --paired --report ERR6797443.tsv reads/ERR6797443_R{1,2}* > /dev/null"

# Check if running
lsjobs

Trying a singularity container

Cleanup

Remember to delete your files!


Next submodule: