Workshop Overview: Functional Profiling of Metagenomes


Workshop Information

This hands-on workshop introduces functional profiling approaches for characterizing the metabolic potential of microbial communities. You’ll learn the conceptual differences between gene catalogue and read-based approaches, understand HUMAnN3’s tiered search strategy, and explore pathway-level functional changes in real metagenomic data. We’ll work with coffee bean fermentation samples to understand how microbial metabolism shifts during the fermentation process.

What you’ll learn:

  • Understand why functional profiling complements taxonomic profiling
  • Compare gene catalogue vs read-based functional profiling approaches
  • Understand HUMAnN3’s tiered search strategy and output formats
  • Explore pathway abundance and coverage across samples
  • Interpret stratified outputs to see species contributions to functions
  • Recognize when to use specialized annotation tools (ARGs, CAZymes, etc.)

Workshop structure: Theory presentations followed by guided R exploration of pre-computed HUMAnN3 results, with time for independent pathway investigation.

Dataset: Coffee bean fermentation time series (T0, T16, T24, T48 hours) from Ecuador showing metabolic shifts as lactic acid bacteria and yeasts process coffee bean substrates.

The Workshop material is available to download in Figshare


Why Functional Profiling?

Taxonomic profiling tells us who is present in a microbial community, but not what they can do. Functional profiling characterizes the genes, enzymes, and metabolic pathways present, revealing biochemical potential independent of taxonomic identity.

Key concept: Functional redundancy - Different species can perform the same functions. Multiple organisms may carry genes for the same metabolic pathway, providing functional stability even when taxonomic composition changes.

Why it matters: Functional profiling links community structure to ecosystem function, predicts metabolic outputs, identifies biomarkers, and enables microbiome engineering for desired functions.


Approaches to Functional Profiling

Two main strategies exist for functional profiling, each with distinct trade-offs.

Aspect Gene Catalogue Read-Based (HUMAnN3)
Workflow Assemble → Annotate → Quantify Map reads → Quantify
Assembly needed? Yes No
Novel functions Can discover Database-limited
Speed Slow (days) Fast (hours)
Best for Discovery, novel environments Comparative studies, standard analysis

Which to choose? Read-based profiling (HUMAnN3) is standard for most metagenomic studies—fast, standardized, and works well for comparative analyses. Gene catalogues are used for discovery in under-studied environments or when strain-level resolution is needed. Many projects use both approaches as they’re complementary.

For this workshop: We’ll focus on HUMAnN3, the most widely used tool for functional profiling.


HUMAnN3 Overview

HUMAnN3 (HMP Unified Metabolic Analysis Network) uses a tiered search strategy to balance speed and sensitivity.

The Tiered Search Strategy

Step 1: Taxonomic Profiling - Runs MetaPhlAn4 to identify species present
Step 2: Nucleotide Search - Maps reads to pangenomes of detected species only (ChocoPhlAn database)
Step 3: Translated Search - Unmapped reads are translated and searched against UniRef90 proteins
Step 4: Quantification - Counts reads per gene family and maps to metabolic pathways (MetaCyc)

Why this works: Only searching relevant pangenomes (Step 2) makes it fast. The translated search (Step 3) catches divergent genes missed initially. Results are stratified by species to show which organisms contribute to each function.

Three Main Outputs

Gene Families (genefamilies.tsv) - Abundance of each gene family (UniRef90 IDs), in RPK units

Pathway Abundance (pathabundance.tsv) - Abundance of metabolic pathways (MetaCyc), in CPM units

Pathway Coverage (pathcoverage.tsv) - What proportion of each pathway is present (0-1 scale)

Stratified Outputs

A key feature is stratification - showing which species contribute to each function.

  • Unstratified (no |): Community-total abundance
  • Stratified (with |): Species-specific contributions

Example: PWY-5484|Lactobacillus_plantarum shows L. plantarum’s contribution to pathway PWY-5484.


Domain-Specific Functional Annotation

While HUMAnN3 provides broad functional profiling, some questions require specialized tools for specific gene classes:

CAZymes (dbCAN) - Carbohydrate-active enzymes for polysaccharide degradation
ARGs (ABRicate, groot, AMRFinderPlus) - Antimicrobial resistance genes
Secondary metabolites (antiSMASH) - Biosynthetic gene clusters
Viral (VIBRANT, CheckV) - Viral sequences and prophages

When to use: Start with HUMAnN3 for overview, then use specialized tools when focusing on specific gene classes or when detailed characterization is needed.


Resources

HUMAnN3 Documentation: https://huttenhower.sph.harvard.edu/humann
bioBakery Forum: https://forum.biobakery.org/
MetaCyc Database: https://metacyc.org/
Functional Profiling Review: Nayfach et al. (2020) Nature Reviews Microbiology