Predictor pruning using association-based and model-based approaches. Fast, deterministic solutions with minimal code.

View Package Documentation

The corrselect package provides fast and flexible predictor pruning for data analysis and modeling in R. It addresses the admissible set problem, identifying maximal variable subsets where no pair exceeds a user-defined association threshold, helping reduce multicollinearity and redundancy in datasets.

The package offers two main approaches: corrPrune() for association-based pruning that operates model-free on raw data, and modelPrune() for VIF-based iterative removal compatible with lm, glm, lme4, and glmmTMB engines. Both functions support a force_in parameter to protect important variables from removal.

Under the hood, corrselect implements exhaustive subset enumeration via graph algorithms (Eppstein–Löffler–Strash and Bron–Kerbosch methods). It supports multiple association metrics including Pearson, Spearman, Kendall correlations, as well as specialized measures like bicor, Cramér’s V, eta, and energy distance for mixed-type data. Deterministic tie-breaking ensures reproducibility across analyses.

The package benefits ecological and bioclimatic modeling, trait-based species selection, and interpretable machine learning workflows. Available on CRAN under MIT license.

install.packages("corrselect")