Changelog
Source:NEWS.md
couplr 1.3.0
New Features
Optimal Full Matching
-
full_match()gainsmethod = "optimal"(new default) using a min-cost max-flow solver (Dijkstra + Johnson potentials) that finds the globally optimal group assignment minimizing total distance:- Standard lower bound transformation enforces
min_controlsper group - Automatic transposition when
n_left > n_right - New C++ solver:
solve_full_matching.cpp(self-contained MCMF) -
method = "greedy"preserved for fast approximate matching
- Standard lower bound transformation enforces
Vignette Updates
-
Getting Started: Added full matching section with
full_match()example - Matching Workflows: New “Full Matching (Variable-Ratio Groups)” section covering optimal vs greedy, constraints, weights, and comparison table
- Comparison: Updated feature table and all sections to reflect couplr’s full matching support (previously listed as “No”)
couplr 1.2.0
New Features
Full Matching
-
New
full_match()function assigns every unit to a matched group with variable ratios (1:k or k:1):- Greedy group formation: match each left to nearest right, then assign remaining right units to nearest matched left
- Caliper support:
caliper(absolute) orcaliper_sd(SD-based) - Control group size constraints:
min_controls,max_controls - Weights inversely proportional to group size
- Returns
full_matching_resultS3 class
Coarsened Exact Matching (CEM)
-
New
cem_match()function implements coarsened exact matching:- Coarsens continuous variables into bins (Sturges, FD, Scott, or custom)
- Exact matching on coarsened values with stratum-based weights
- Support for categorical grouping variables via
groupingparameter - Custom cutpoints per variable via
cutpointsparameter - Returns
cem_resultS3 class with matched units and strata summary
Subclassification
-
New
subclass_match()function divides units into propensity score strata:- Quantile-based stratification with configurable number of subclasses
- Supports pre-computed PS, pre-fitted models, or formula interface
- Target estimands: ATT, ATE, ATC with appropriate weighting
- Returns
subclass_resultS3 class with subclass summary
Output Layer & Ecosystem Integration
-
New
match_data()generic converts any couplr result to analysis-ready format withtreatment,weights,subclass, anddistancecolumns. Methods for all result types (matching, full, CEM, subclass). -
New
as_matchit()converter createsmatchit-class objects from couplr results, enabling interop with cobalt, marginaleffects, and other MatchIt ecosystem packages. -
cobalt
bal.tab()methods for all couplr result types. Requires cobalt package (in Suggests).
Mahalanobis Distance Improvements
-
Robust singularity check using
rcond()instead of fragiledet() == 0 -
Custom
sigmaparameter inmatch_couples(),greedy_couples(), andcompute_distance_matrix()for user-supplied covariance matrices - Vectorized computation replacing nested R for-loops for ~10x speedup
S3 Generics
-
balance_diagnostics()andjoin_matched()are now S3 generics with methods for all result types. Existing code is 100% backward-compatible.
New Functions
-
full_match()- Variable-ratio full matching -
cem_match()- Coarsened exact matching -
subclass_match()- Propensity score subclassification -
match_data()- Unified analysis-ready output -
as_matchit()- Convert to MatchIt format
couplr 1.1.0
CRAN release: 2026-03-03
New Features
Ratio and Replacement Matching
-
k:1 ratio matching via
ratioparameter inmatch_couples()andgreedy_couples(). Matches k control units to each treated unit by replicating the cost matrix, then deduplicates assignments. -
With-replacement matching via
replaceparameter. Each treated unit independently selects its nearest control, allowing controls to be reused across multiple treated units.
Propensity Score Matching
-
New
ps_match()function wrapsmatch_couples()with logistic regression:- Accepts a formula or pre-fitted
glmobject - Matches on the logit of propensity scores with a caliper
- Default caliper: 0.2 SD of logit(PS) (Rosenbaum and Rubin recommendation)
- Returns matching_result with PS model metadata
- Accepts a formula or pre-fitted
Cardinality Matching
-
New
cardinality_match()function maximizes sample size subject to balance constraints:- Starts with a full optimal match, then iteratively prunes imbalanced pairs
- Balance threshold via
max_std_diff(default: 0.1 for excellent balance) - Configurable pruning speed with
batch_fraction - Returns pruning diagnostics: iterations, pairs removed, final balance
Sensitivity Analysis
-
New
sensitivity_analysis()function implements Rosenbaum bounds:
Visualization
-
autoplot()methods for ggplot2-based visualizations (requires ggplot2):-
autoplot.matching_result(): histogram, density, or ecdf of distances -
autoplot.balance_diagnostics(): love plot, histogram, or variance ratio plot -
autoplot.sensitivity_analysis(): gamma vs p-value curve
-
-
Enhanced
summary.matching_result()now reports match rate and distance percentiles
New Functions
-
ps_match()- Propensity score matching with logit caliper -
cardinality_match()- Balance-constrained cardinality matching -
sensitivity_analysis()- Rosenbaum bounds sensitivity analysis
couplr 1.0.7
Bug Fixes
- Fixed undefined behavior (UB) in Gabow-Tarjan algorithm: replaced left bit-shift of potentially negative values with multiplication to avoid sanitizer errors on M1-SAN checks
- Fixed namespace conflict with
select()in vignettes by using explicitdplyr::select()to prevent masking by MASS or other packages
couplr 1.0.6
CRAN release: 2026-01-20
Documentation
- Added Overview section to algorithms vignette with audience and prerequisites
- Fixed workflow diagram dark mode text handling in matching-workflows vignette
- Improved SVG theme-awareness for multi-line text labels
- Removed grid lines from matching-workflows plots for cleaner appearance
- Added threshold labels to balance comparison plot
couplr 1.0.0
Major New Features (2025-11-19 Update)
Automatic Preprocessing and Scaling
The package now includes intelligent preprocessing to improve matching quality:
-
New
auto_scaleparameter inmatch_couples()andgreedy_couples()enables automatic preprocessing -
Variable health checks detect and handle problematic variables:
- Constant columns (SD = 0) are automatically excluded with warnings
- High missingness (>50%) triggers warnings
- Extreme skewness (|skewness| > 2) is flagged
-
Smart scaling method selection analyzes data and recommends:
- “robust” scaling using median and MAD (resistant to outliers)
- “standardize” for traditional mean-centering and SD scaling
- “range” for min-max normalization
- New
preprocess_matching_vars()function for manual preprocessing control - Categorical variable encoding for binary and ordered factors
Balance Diagnostics
Comprehensive tools to assess matching quality:
-
New
balance_diagnostics()function computes multiple balance metrics:- Standardized differences: (mean_left - mean_right) / pooled_sd
- Variance ratios: SD_left / SD_right
- Kolmogorov-Smirnov tests for distribution comparison
- Overall balance metrics (mean, max, % large imbalance)
-
Quality thresholds with interpretation:
- |Std Diff| < 0.10: Excellent balance
- |Std Diff| 0.10-0.25: Good balance
- |Std Diff| 0.25-0.50: Acceptable balance
- |Std Diff| > 0.50: Poor balance
- Per-block statistics with quality ratings when blocking is used
-
balance_table()creates publication-ready formatted tables - Informative print methods with interpretation guides
Joined Matched Dataset Output
Create analysis-ready datasets directly from matching results:
-
New
join_matched()function automates data preparation:- Joins matched pairs with original left and right datasets
- Eliminates manual data wrangling after matching
- Select specific variables via
left_varsandright_varsparameters - Customizable suffixes (default:
_left,_right) for overlapping columns - Optional metadata:
pair_id,distance,block_id - Works with both optimal and greedy matching
-
Broom-style
augment()method for tidymodels integration:- S3 method following broom package conventions
- Sensible defaults for quick exploration
- Supports all
join_matched()parameters
-
Flexible output control:
-
include_distance- Include/exclude matching distance -
include_pair_id- Include/exclude sequential pair IDs -
include_block_id- Include/exclude block identifiers - Custom ID column support via
left_idandright_id - Clean column ordering: pair_id → IDs → distance → block → variables
-
Precomputed and Reusable Distances
Performance optimization for exploring multiple matching strategies:
-
New
compute_distances()function precomputes and caches distance matrices:- Compute distances once, reuse across multiple matching operations
- Store complete metadata: variables, distance metric, scaling method, timestamps
- Preserve original datasets for seamless integration with
join_matched() - Enable rapid exploration of different matching parameters
- Performance improvement: ~60% faster when trying multiple matching strategies
-
Distance objects (S3 class
distance_object):- Self-contained: cost matrix, IDs, metadata, original data
- Works with both
match_couples()andgreedy_couples() - Pass as first argument instead of datasets:
match_couples(dist_obj, max_distance = 5) - Informative print and summary methods with distance statistics
-
Constraint modification via
update_constraints():- Apply new
max_distanceorcaliperswithout recomputing distances - Creates new distance object following copy-on-modify semantics
- Experiment with different constraints efficiently
- Apply new
-
Backward compatible integration:
- Modified function signatures:
match_couples(left, right = NULL, vars = NULL, ...) - Automatically detects distance objects vs. datasets
- All existing code continues to work unchanged
- Modified function signatures:
Parallel Processing
Speed up blocked matching with multi-core processing:
-
New
parallelparameter inmatch_couples()andgreedy_couples():- Enable with
parallel = TRUEfor automatic configuration - Specify plan with
parallel = "multisession"or other future plan - Works with any number of blocks - automatically determines if beneficial
- Gracefully falls back if future packages not installed
- Enable with
-
Powered by the
futurepackage:- Cross-platform support (Windows, Unix/Mac, clusters)
- Respects user-configured parallel backends
- Automatic worker management
- Clean restoration of original plan after execution
-
Performance:
- Best for 10+ blocks with 50+ units per block
- Speedup scales with number of cores and complexity
- Minimal overhead for small problems
-
Integration:
- Works with all blocking methods (exact, fuzzy, clustering)
- Compatible with distance caching from Step 4
- Supports all matching parameters (constraints, calipers, scaling)
Fun Error Messages and Cost Checking
Like testthat, couplr makes errors light, memorable, and helpful with couple-themed messages:
-
New
check_costsparameter (default:TRUE) inmatch_couples()andgreedy_couples():- Automatically checks distance distributions before matching
- Provides friendly, actionable warnings for common problems
- Set to
FALSEto skip checks in production code
-
Fun couple-themed error messages throughout the package:
- 💔 “No matches made - can’t couple without candidates!”
- 🔍 “Your constraints are too strict. Love can’t bloom in a vacuum!”
- ✨ Helpful suggestions: “Try increasing max_distance or relaxing calipers”
- 💖 Success messages: “Excellent balance! These couples are well-matched!”
-
Automatic problem detection:
- Too many zeros: Warns about duplicates or identical values (>10% zero distances)
- Extreme costs: Detects skewed distributions (99th percentile > 10x the 95th)
- Many forbidden pairs: Warns when constraints eliminate >50% of valid pairs
- Constant distances: Alerts when all distances are identical
- Constant variables: Detects and excludes variables with no variation
-
New diagnostic function
diagnose_distance_matrix():- Comprehensive analysis of cost distributions
- Variable-specific problem detection
- Actionable suggestions for fixes
- Quality rating (good/fair/poor)
-
Emoji control: Disable with
options(couplr.emoji = FALSE)if preferred - Philosophy: Errors should be less intimidating, more memorable, and provide clear guidance
New Functions
-
preprocess_matching_vars()- Main preprocessing orchestrator -
balance_diagnostics()- Comprehensive balance assessment -
balance_table()- Formatted balance tables for reporting -
join_matched()- Create analysis-ready datasets from matching results -
augment.matching_result()- Broom-style interface for joined data -
compute_distances()- Precompute and cache distance matrices -
update_constraints()- Modify constraints on distance objects -
is_distance_object()- Type checking for distance objects -
diagnose_distance_matrix()- Comprehensive distance diagnostics -
check_cost_distribution()- Check for distribution problems - Added robust scaling method using median and MAD
Documentation & Examples
-
examples/auto_scale_demo.R- 5 preprocessing demonstrations -
examples/balance_diagnostics_demo.R- 6 balance diagnostic examples -
examples/join_matched_demo.R- 8 joined dataset demonstrations -
examples/distance_cache_demo.R- Distance caching and reuse examples -
examples/parallel_matching_demo.R- 7 parallel processing examples -
examples/error_messages_demo.R- 10 fun error message demonstrations - Complete implementation documentation (claude/IMPLEMENTATION_STEP1.md through STEP6.md)
- All functions have full Roxygen documentation
Major Changes (Initial 1.0.0 Release)
New Organization
R Code
- Eliminated 3 redundant files
- Consistent
morph_*naming prefix - Two-layer API:
assignment()(low-level) +lap_solve()(tidy) - 10 well-organized files (down from 13)
Features
API
-
lap_solve()- Main tidy interface -
lap_solve_batch()- Batch solving -
lap_solve_kbest()- K-best solutions -
assignment()- Low-level solver - Utilities:
get_total_cost(),as_assignment_matrix(), etc. - Visualization:
pixel_morph(),pixel_morph_animate()
Development history under “lapr” available in git log before v1.0.0.