Code
library(phyloseq)
library(microbiome)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)Alpha diversity describes the diversity within a single sample: how many taxa are there, and how evenly are the reads distributed among them? Three measures are commonly reported together because they capture different aspects:
In this dataset we expect a clear pattern: the L. plantarum IBB082 inoculation should rapidly drive the community toward dominance by Lactiplantibacillus, collapsing diversity and evenness. Spontaneous fermentations should retain higher diversity throughout. Both should show some change over the four time points (H24, H48, W1, W2).
library(phyloseq)
library(microbiome)
library(dplyr)
library(tidyr)
library(tibble)
library(ggplot2)Alpha diversity is computed on the filtered count object, not on the relative abundance object. The metrics work on integer counts.
ps_filt <- readRDS("phyloseq_filtered.rds")
ps_filtThe microbiome::alpha() function returns a single dataframe with many indices in one call. We will use a curated subset.
alpha_df <- microbiome::alpha(
ps_filt,
index = c("observed", "diversity_shannon", "evenness_pielou")
)
# microbiome::alpha returns rownames as sample IDs — bring them in as a column
alpha_df <- alpha_df %>%
rownames_to_column("sample_alias")
head(alpha_df)A note on the index names.
microbiome::alpha()uses prefixed names likediversity_shannonandevenness_pielouso that related indices group together alphabetically. The function supports many more indices than the three used here (Simpson, inverse Simpson, Chao1, several rarity and dominance metrics); see?microbiome::alphafor the full list.
To plot or test the diversity metrics across experimental groups, we need to merge them with the sample metadata.
meta_df <- microbiome::meta(ps_filt) %>%
rownames_to_column("sample_alias")
alpha_df <- alpha_df %>%
left_join(meta_df, by = "sample_alias")A faceted boxplot is the clearest way to see all three metrics at once across the experimental design.
alpha_long <- alpha_df %>%
pivot_longer(
cols = c(observed, diversity_shannon, evenness_pielou),
names_to = "Metric",
values_to = "Value"
) %>%
# Friendly facet labels and a sensible ordering
mutate(Metric = recode(Metric,
observed = "Observed ASVs",
diversity_shannon = "Shannon",
evenness_pielou = "Pielou's evenness"
)) %>%
mutate(Metric = factor(Metric,
levels = c("Observed ASVs", "Shannon", "Pielou's evenness")
)) %>%
# Replace NA for commercial and control
mutate(Time_point=ifelse(Fermentation=="Control", "Ctrl",
ifelse(Fermentation=="Commercial", "Com", Time_point)))
ggplot(alpha_long, aes(x = Time_point, y = Value, fill = Fermentation)) +
geom_boxplot(outlier.shape = NA, alpha = 0.7) +
geom_point(position = position_jitterdodge(jitter.width = 0.1), size = 1.5) +
facet_wrap(~ Metric, scales = "free_y") +
labs(x = NULL, y = NULL, fill = "Fermentation type") +
theme_bw() +
theme(legend.position = "bottom")What to look for:
When the visual signal is this strong, a formal test is more about quantifying the effect than discovering it. Two practical points before we run anything:
# Overall comparison: Spontaneous vs Lactobacillus, pooling time points.
# Restrict to the two main fermentation types; drop commercial and
# control samples for this comparison.
alpha_test <- alpha_df %>%
filter(Fermentation %in% c("Spontaneous", "Lactobacillus"))
wilcox.test(diversity_shannon ~ Fermentation, data = alpha_test)
wilcox.test(observed ~ Fermentation, data = alpha_test)
wilcox.test(evenness_pielou ~ Fermentation, data = alpha_test)The expected outcome is a strongly significant difference for all three metrics. Report the effect size (medians or means per group) alongside the p-value, the p-value alone is uninformative when the design guarantees a strong result.
alpha_test %>%
group_by(Fermentation) %>%
summarise(
n = n(),
mean_observed = mean(observed),
mean_shannon = mean(diversity_shannon),
mean_pielou = mean(evenness_pielou)
)Why no per-timepoint tests? As noted above, n = 3 per group is below the threshold at which Wilcoxon can reject the null. If per-timepoint comparisons are scientifically important in your own work, options include increasing replication, using a parametric test if the metric is approximately normal (e.g. t-test on Shannon), or fitting a model that uses information across time points jointly (e.g. linear model with
Time_point * Fermentation, or a mixed model if there is repeated measurement structure).
It is useful to keep the alpha diversity table as a separate file rather than recomputing it later — for example to include in a report or join with other sample-level measurements.
write_csv(alpha_df, "alpha_diversity.csv")microbiome::alpha() computes many indices in one call and returns a tidy dataframe ready to merge with metadata.