Bray-Curtis on relative abundance, Aitchison on CLR
Significance test
Which test for community-level differences?
PERMANOVA via adonis2
Composition view
Which taxonomic level?
Genus, abundance + prevalence threshold via aggregate_rare()
The same pipeline applies to your own data. The values you choose at each step will be different, but the shape of the workflow is the same.
7 What we did not cover: differential abundance
Differential abundance (DA) analysis tests which specific taxa differ between groups. It is the natural next step after a PERMANOVA shows that groups differ overall.
We omitted DA from the live session for two reasons:
The dataset has n = 3 per group per time point, which is below the threshold where most DA methods produce well-calibrated results.
The composition and ordination plots already show the dominant biological story (Lactobacillus inoculation drives near-monoculture Lactiplantibacillus dominance, spontaneous fermentations show diverse succession). When the visual signal is this strong, DA testing is more confirmatory than discoveries.
For datasets where DA is appropriate, the section below provides a self-study template using MaAsLin3.
7.1 Why MaAsLin3
There are many DA methods for microbiome data. Among them, MaAsLin3 has several properties that make it a sensible default for typical study designs:
It models prevalence and abundance separately, returning two sets of results. A taxon can be associated with a covariate by being more often present in one group, more abundant when present, or both.
It handles continuous and categorical covariates and supports random effects for repeated measurement designs.
It produces both per-taxon results and diagnostic plots out of the box, which is helpful for exploratory work.
Other methods you should know about:
ALDEx2 — uses CLR-transformed values and a Bayesian framework. Strong on compositional principles.
ANCOM-BC2 — bias-corrected log-ratio approach with formal control of the false discovery rate.
LinDA — fast linear-model-based method that handles large datasets well.
corncob — beta-binomial regression on counts, models dispersion explicitly per taxon.
A useful comparison of these methods on real datasets is Nearing et al. (2022), Nature Communications [CITATION NEEDED: confirm full citation]. The general lesson from that paper is that different methods can disagree substantially on the same data, and that agreement across methods is a useful proxy for confidence in a finding.
7.2 Self-study DA section: MaAsLin3
Run this section on your own time, not during the workshop.
Code
library(maaslin3)# Inputs to MaAsLin3:# - feature table: samples x taxa, raw counts (it does its own normalization)# - metadata table: samples x variables# - output directory: where to write results and plotsps_filt <-readRDS("phyloseq_filtered.rds")# Aggregate to genus to make results more interpretableps_genus <-aggregate_taxa(ps_filt, level ="Genus")# Restrict to the two main fermentation conditionsps_test <-subset_samples( ps_genus, Fermentation %in%c("Spontaneous", "Lactobacillus"))# Drop unused factor levels so MaAsLin3 does not see empty groupssample_data(ps_test)$Fermentation <-droplevels(factor(sample_data(ps_test)$Fermentation))# Extract feature and metadata tables from the subsetfeature_table <-as.data.frame(t(microbiome::abundances(ps_test)))metadata_table <- microbiome::meta(ps_test)# Run MaAsLin3fit <-maaslin3(input_data = feature_table,input_metadata = metadata_table,output ="maaslin3_output",formula ="~ Fermentation + Time_point",reference ="Time_point,H24",normalization ="TSS",transform ="LOG",min_abundance =0,min_prevalence =0.1,cores =1)# fit$fit_data_abundance and fit$fit_data_prevalence contain the# per-taxon results. Significant hits are also written as TSV files# in the output directory.
Expected hits in this dataset (based on the published results): Lactiplantibacillus much higher in Lactobacillus-inoculated samples, Enterobacter and several other Enterobacteriaceae higher in spontaneous, with time-related changes within each fermentation type.
8 What is next: TreeSummarizedExperiment and miaverse
The phyloseq object is mature and widely used, but the field is gradually shifting to a more general data structure called TreeSummarizedExperiment (TSE), built on Bioconductor’s SummarizedExperiment framework. The associated tooling lives in the miaverse ecosystem (mia, miaViz, miaTime, miaSim).
Why bother learning it?
TSE handles multiple data types (counts, transformed values, metadata for taxa) in a single object more cleanly than phyloseq.
It integrates with the broader Bioconductor multi-omics ecosystem.
New methods are being developed against TSE rather than phyloseq.
The transition is not urgent. Phyloseq remains supported and widely used, and most published code uses it. But if you are starting a long project today, TSE is worth investigating early. The miaverse documentation has a thorough tutorial book at https://microbiome.github.io/OMA/.