Introduction to Bioinformatics using the NBI HPC

This page assumes you already know Slurm basics and just need the QIB/NBI-specific mapping: which partitions to use, what limits to expect, and the commands that explain “why is my job pending / slow”.

Mental model: where to do what

Login nodes (normally 4 behind a load balancer) are currently offline.
Use the software node only for (ssh software from a login node):
- editing job scripts
- compiling software
- downloading containers/data

Actual compute happens on compute nodes via Slurm partitions (queues).

QIB partitions: what to pick

Quick decision table

Use case	Partition	Time limit
quick tests / small jobs	`qib-short`	`02:00:00`
most “real” analyses	`qib-medium`	`2-00:00:00`
very long jobs	`qib-long`	unlimited MaxTime (default is long)
interactive debugging	`qib-interactive`	up to `90-00:00:00`
GPU workloads	`qib-gpu`	unlimited MaxTime (default `7-00:00:00`)

Hardware notes (from current node view)

QIB standard nodes (q512n*) are 84 CPUs, ~512 GB RAM.
Big-memory nodes exist:
- q1024n1 ~1 TB RAM
- q1536n* ~1.5 TB RAM (some are 192 CPUs)
- q4096n2/q4096n3 ~4 TB RAM (128 CPUs)
GPU node: q2048n1 has 2× A100 (gres/gpu=2), ~2 TB RAM, 32 CPUs.

NBI-wide partitions you might hear about

You may see these in documentation or in sinfo output:

nbi-short (2h), nbi-medium (2d), nbi-long (unlimited), nbi-interactive
nbi-download (max 14 days; default 7 days) — typically for controlled “download” from a limited list of whitelisted sites
nbi-compute — an “overlay” partition over multiple NBI nodes (policy/admin use varies)

QIB users generally use QIB partitions unless you have a specific reason/policy to use NBI-wide ones.

Some commands

To see if some partitions are unavailable or under heavy load:

sinfo -o "%20P %10a %10l %8D %8t %20F %N"

List all nodes, CPUs, memory, GPUs, and features

sinfo -N -o "%20N %10t %10c %12m %20G %30f %20P"

Partition policy (limits, node lists, allowed groups)

scontrol show partition qib-short
scontrol show partition qib-medium
scontrol show partition qib-long
scontrol show partition qib-gpu

Submitting jobs (QIB patterns)

Single-task, multi-threaded tool (typical bioinformatics)

Use --cpus-per-task, keep --ntasks=1 unless you really run MPI.

#!/usr/bin/env bash
#SBATCH --job-name=example_mt
#SBATCH --partition=qib-medium
#SBATCH --time=2-00:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --output=slurm-%j.out
#SBATCH --error=slurm-%j.err

set -euo pipefail

mytool --threads "$SLURM_CPUS_PER_TASK" input.fq.gz

Submit:

sbatch example.sbatch

GPU jobs (A100)

💡 You need to ask access to the GPU queue to Core Bioinformatics first

Request GPUs via GRES:

#!/usr/bin/env bash
#SBATCH --job-name=gpu_job
#SBATCH --partition=qib-gpu
#SBATCH --time=1-00:00:00
#SBATCH --cpus-per-task=8
#SBATCH --mem=64G
#SBATCH --gres=gpu:1
#SBATCH --output=slurm-%j.out

set -euo pipefail

nvidia-smi
python train.py

Interactive work (debugging / short sessions)

Allocate:

interactive

Then run commands as usual.

Diagnosing jobs

1) List your jobs and reasons for queue (to get individual JobIDs)

squeue -u "$USER" -o "%.18i %.9P %.12j %.2t %.10M %.10l %.6D %R"

2) Why is this pending / where is it running / what did I request?

scontrol show job <JOBID>

3) After completion: CPU/memory usage and exit status

sacct  -o JobID,JobName%30,Partition,State,Elapsed,Timelimit,AllocCPUS,ReqMem,MaxRSS,ExitCode -j <JOBID>

Institute boundaries / partitions

NBI is composed of four institutes; each institute has partitions (e.g. you may see ei-* jobs in squeue). As a QIB user, you usually submit to qib-* partitions. If you can see other partitions, it just means the cluster is shared and visible; access is governed by group membership in the partition config (AllowGroups=).

Previous submodule:

NBI-Slurm utilities