Changelog • corrselect

corrselect 3.1.0

CRAN release: 2026-01-08

Bug Fixes

corrPrune: Fixed numeric-numeric pair handling in mixed-type data (was incorrectly using Cramer’s V instead of Pearson correlation)
corrPrune: Fixed numeric-ordered pair handling (now properly converts ordered to numeric for Spearman correlation)

Test Coverage Improvements

Coverage improved from 92% to 94%:

Added tests for optional package measures (bicor, distance, maximal) with proper skip_if_not_installed() guards
Added tests for lme4 and glmmTMB engines in modelPrune
Added chi-squared edge case tests (sparse contingency tables, NA handling)
Added VIF edge case tests (perfect collinearity, single predictor)
Added lexicographic tie-breaking tests with synthetic correlation structures
Added mixed-type data tests (numeric-ordered, ordered-ordered, factor-factor pairs)
Added condition_number criterion tests
findAllMaxSets.R now at 100% coverage
corrPrune.R now at 97% coverage

corrselect 3.0.7

New Features

corrPrune Enhancements

Grouped pruning: New by parameter computes association matrices per group and aggregates using the group_q quantile (default: 0.5 = median). Useful when correlations vary across experimental conditions or subpopulations.
Additional measures for numeric data:
- bicor: Biweight midcorrelation (requires WGCNA package)
- distance: Distance correlation (requires energy package)
- maximal: Maximal information coefficient (requires minerva package)

modelPrune Enhancements

Condition number criterion: New criterion = "condition_number" option uses SVD-based condition indices for detecting multicollinearity. Higher values indicate greater collinearity.

Tests

Added comprehensive tests for grouped pruning functionality
Added tests for condition_number criterion
Added edge case tests for single-group and insufficient-rows scenarios

corrselect 3.0.4

Test Coverage Improvements

Removed dead C++ code (isValidAddition, isValidCombination) from utils.cpp/utils.h
Added edge case tests for ELS algorithm (force_in validation, threshold boundaries)
Added edge case tests for association methods (Cramer’s V sparse tables, eta edge cases)
Added tests for corrPrune lexicographic tiebreaker and factor handling
Added tests for modelPrune custom engine error handling and VIF edge cases
Test coverage improved from 91.86% to 93.44%

corrselect 3.0.3

JOSS Review Response

This release addresses reviewer feedback from the JOSS submission.

Documentation

paper.md: Strengthened comparison with caret::findCorrelation() to emphasize the key difference (single solution vs. all maximal subsets)
paper.md: Added explicit graph-theoretic context (maximal cliques / independent sets formulation)
paper.md: Clarified that Bron-Kerbosch and ELS algorithms are implemented natively in C++, not as wrappers around igraph
paper.md: Added note about NP-hard complexity and the recommendation to use exact mode only for p ≤ 100
paper.md: Added code snippet demonstrating the “all subsets” output
paper.bib: Added citations for igraph (Csardi & Nepusz, 2006) and FCBF (Yu & Liu, 2003)
README.md: Added CRAN installation instructions (install.packages("corrselect"))
README.md: Fixed mixed model example with suppressWarnings() to hide expected VIF computation warnings
quickstart vignette: Fixed GitHub repository reference (GillesColling → gcol33)

Testing

Added edge case test: identity matrix (all off-diagonals = 0) returns single subset with all variables
Added edge case test: perfect duplicates (r = 1.0) are correctly separated into different subsets
Added threshold boundary test for correlation exactly at threshold

Infrastructure

Added GitHub Actions workflow for cross-platform R CMD check (Ubuntu, macOS, Windows)
Added GitHub Actions workflow for test coverage reporting
Updated .gitignore to exclude build artifacts (*.Rcheck/, *.tar.gz, CRAN-SUBMISSION)

corrselect 3.0.2

CRAN release: 2025-11-29

CRAN Compliance

Single-quoted software names in DESCRIPTION (‘lme4’, ‘glmmTMB’) per CRAN policy

Documentation

Updated vignettes with improved examples and workflows

corrselect 3.0.1

Bug Fixes

modelPrune(): Fixed infinite loop when VIF computation encountered perfect multicollinearity
- Added proper handling of Inf and NA VIF values in pruning loop
- Clamped extreme R² values (> 0.9999) to prevent division by near-zero
- Added safety checks to prevent removing all variables
modelPrune(): Fixed design matrix extraction for lme4 and glmmTMB engines
- Now uses stats::model.matrix() for all engines (more robust)
- Eliminated “Could not find columns” warnings
Test suite: All 261 tests pass with zero warnings (CRAN-compliant)

corrselect 3.0.0

Major Release: Predictor Pruning Toolkit

Version 3.0.0 represents a major expansion of corrselect from a specialized subset enumeration tool into a comprehensive predictor pruning toolkit. Fully backward compatible with 2.x - all existing code continues to work.

Major Features

New Functions

corrPrune(): High-level association-based predictor pruning
- Model-free pruning using pairwise correlations or associations
- Automatic measure selection (measure = "auto")
- Supports exact mode (small p), greedy mode (large p), or auto-selection
- force_in parameter to protect important predictors
- Returns single pruned data.frame with pairwise associations ≤ threshold
modelPrune(): Model-based predictor pruning using diagnostics
- VIF-based iterative removal of multicollinear predictors
- Supports multiple engines: lm, glm, lme4, glmmTMB
- Custom engine support: Define your own modeling backends (INLA, mgcv, brms, etc.)
- Prunes fixed effects only (preserves random effects in mixed models)
- force_in parameter for protecting important variables
- Returns pruned data.frame with final fitted model

New C++ Backend

Fast deterministic greedy pruning algorithm
- Polynomial-time complexity O(p² × k) vs exponential for exact search
- Handles p > 100 efficiently
- Deterministic tie-breaking for reproducibility
- Used by corrPrune(mode = "greedy") and mode = "auto"

Enhancements

Exact methods (corrSelect(), assocSelect()) now integrate seamlessly with corrPrune()
Deterministic subset selection when multiple maximal sets exist
Improved error messages for threshold feasibility checks
Better handling of edge cases (single predictor, all correlated, etc.)
Custom engine interface for modelPrune(): Users can define custom modeling backends with fit and diagnostics functions, enabling integration with any R modeling package

Documentation

Five new comprehensive vignettes (~60 minutes of content):
- Quick Start: 5-minute introduction to corrPrune() and modelPrune()
- Complete Workflows: Real-world examples across 4 domains (ecology, social science, genomics, clinical)
- Comparison with Alternatives: When to choose corrselect vs caret, Boruta, glmnet
- Performance Benchmarks: Timing comparisons, scalability tests, and optimization guidelines
- Advanced Topics: Algorithms, custom engines (INLA, mgcv), performance optimization, troubleshooting
Four new example datasets with full documentation (bioclim, survey, genes, longitudinal)
Updated README with quickstart examples and custom engine support
Full documentation for corrPrune() and modelPrune()
Usage examples for all modeling engines

Package Changes

Added lme4 and glmmTMB to Suggests (required for respective engines)
Version bumped to 3.0.0 (major feature release)
Updated package description to reflect expanded pruning functionality

Notes

No breaking changes: Version 3.0.0 is fully backward compatible with 2.0.1
For large predictor sets (p > 20), use corrPrune(mode = "auto") for best performance
Mixed model engines require optional packages: install with install.packages(c("lme4", "glmmTMB"))

corrselect 2.0.1

CRAN release: 2025-09-08

Bug Fixes

force_in in MatSelect() now correctly accepts character column names.
els now correctly lists all valid subsets when a single variable is forced in.
corrSelect() now displays an appropriate warning if only one variable remains after dropping unsupported columns.
Association matrix construction in assocSelect() now safely falls back to 0 for failed or meaningless associations (e.g. empty chi-squared tables due to sparse combinations or unused factor levels).

Features Added

assocSelect() now supports logical columns by automatically converting them to factors.

corrselect 2.0.0

Major Release: Mixed-Type Association Selection

Version 2.0.0 introduces support for mixed-type data through the new assocSelect() function, enabling subset selection on datasets containing numeric, factor, and ordered variables.

Major Features

assocSelect(): New function for mixed-type data frame interface
- Handles numeric, factor, and ordered variables
- Automatic association measure selection based on variable pair types
- Supports Pearson, Spearman, Kendall correlations
- Computes Eta-squared for numeric-factor pairs
- Computes Cramér’s V for factor-factor pairs

Enhancements

Improved algorithm selection logic
Better handling of edge cases in subset enumeration
Enhanced documentation with examples for mixed-type workflows