Rarefaction and Standardization
Gilles Colling
2026-03-06
Source:vignettes/rarefaction-standardization.Rmd
rarefaction-standardization.RmdOverview
Comparing diversity across sites with unequal sampling effort requires standardization. spacc offers three approaches:
Individual-based rarefaction: Subsample to equal number of individuals
Coverage-based rarefaction: Standardize by sample completeness
Spatial coverage tracking: Monitor coverage along accumulation curves
This vignette compares these methods and provides guidance on when to use each.
Data setup
library(spacc)
set.seed(42)
n_sites <- 60
coords <- data.frame(x = runif(n_sites), y = runif(n_sites))
# Variable total abundances across sites (realistic uneven sampling)
lambdas <- rep(c(1, 3, 5), each = 20)
species <- matrix(0, nrow = n_sites, ncol = 20)
for (i in seq_len(n_sites)) {
species[i, ] <- rpois(20, lambda = lambdas[i])
}
colnames(species) <- paste0("sp", 1:20)Individual-based rarefaction
The rarefy() function subsamples curves to a common
number of individuals, removing the abundance bias:
# Rarefy to minimum observed abundance
rare <- rarefy(species)
print(rare)
#> Individual-based rarefaction
#> ----------------------------
#> Total individuals: 3523
#> Observed species: 20
#> Bootstrap replicates: 100
plot(rare)You can also rarefy Hill number curves:
Coverage-based standardization
Coverage-based rarefaction (Chao & Jost 2012) standardizes by sample completeness rather than sample size. This is often preferred because equal coverage means equal proportional representation of the community:
cov_result <- spaccCoverage(species, coords, n_seeds = 10, progress = FALSE)
plot(cov_result)Interpolation at fixed coverage
interp <- interpolateCoverage(cov_result, target = c(0.8, 0.9, 0.95))
print(interp)
#> C80 C90 C95
#> 1 16.68063 18.67220 19.66799
#> 2 20.00000 20.00000 20.00000
#> 3 11.49091 15.74545 17.87273
#> 4 19.00000 19.00000 19.00000
#> 5 13.34400 16.67200 18.33600
#> 6 20.00000 20.00000 20.00000
#> 7 11.71570 15.70118 17.85059
#> 8 20.00000 20.00000 20.00000
#> 9 15.82418 17.91209 18.95604
#> 10 12.75765 16.37882 18.18941Extrapolation beyond observed
extrap <- extrapolateCoverage(cov_result, target_coverage = 0.99, q = 0)
print(extrap)
#> Coverage-based extrapolation
#> --------------------------------
#> Diversity order: q = 0
#> Observed coverage: 100.0%
#> Observed richness: 20.0
#>
#> Extrapolated richness:
#> C=99%: 19.7 (+/- 0.3)Combined Hill + Coverage analysis
The spaccHillCoverage() function tracks both Hill
numbers and coverage simultaneously, enabling standardization across q
orders:
hc <- spaccHillCoverage(species, coords, q = c(0, 1, 2),
target_coverage = 0.9,
n_seeds = 10, progress = FALSE)
print(hc)
#> spacc Hill + Coverage: 60 sites, 20 species, 10 seeds
#> Orders (q): 0, 1, 2
#> Mean final coverage: 1.000
#> Target coverage: 0.9
plot(hc, xaxis = "coverage")When to use which method
| Method | Best for | Limitation |
|---|---|---|
| Individual-based | Simple comparisons | Sensitive to abundance distribution |
| Coverage-based | Uneven sampling | Requires abundance data |
| Hill + Coverage | Multi-order standardization | Computationally heavier |
Rules of thumb:
- Use individual-based rarefaction when total abundances are the primary source of variation and you want a simple, well-understood correction.
- Use coverage-based methods when sites differ in sampling completeness and you want to compare at equal representativeness.
- Use Hill + Coverage when you need standardized comparisons across multiple diversity orders (q = 0, 1, 2) simultaneously.
References
- Chao, A. & Jost, L. (2012). Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size. Ecology, 93, 2533-2547.
- Chao, A., Gotelli, N.J., Hsieh, T.C., Sander, E.L., Ma, K.H., Colwell, R.K. & Ellison, A.M. (2014). Rarefaction and extrapolation with Hill numbers. Ecological Monographs, 84, 45-67.