Changelog
Source:NEWS.md
couplr 1.0.6
CRAN release: 2026-01-20
Documentation
- Added Overview section to algorithms vignette with audience and prerequisites
- Fixed workflow diagram dark mode text handling in matching-workflows vignette
- Improved SVG theme-awareness for multi-line text labels
- Removed grid lines from matching-workflows plots for cleaner appearance
- Added threshold labels to balance comparison plot
couplr 1.0.0
Major New Features (2025-11-19 Update)
Automatic Preprocessing and Scaling
The package now includes intelligent preprocessing to improve matching quality:
-
New
auto_scaleparameter inmatch_couples()andgreedy_couples()enables automatic preprocessing -
Variable health checks detect and handle problematic variables:
- Constant columns (SD = 0) are automatically excluded with warnings
- High missingness (>50%) triggers warnings
- Extreme skewness (|skewness| > 2) is flagged
-
Smart scaling method selection analyzes data and recommends:
- “robust” scaling using median and MAD (resistant to outliers)
- “standardize” for traditional mean-centering and SD scaling
- “range” for min-max normalization
- New
preprocess_matching_vars()function for manual preprocessing control - Categorical variable encoding for binary and ordered factors
Balance Diagnostics
Comprehensive tools to assess matching quality:
-
New
balance_diagnostics()function computes multiple balance metrics:- Standardized differences: (mean_left - mean_right) / pooled_sd
- Variance ratios: SD_left / SD_right
- Kolmogorov-Smirnov tests for distribution comparison
- Overall balance metrics (mean, max, % large imbalance)
-
Quality thresholds with interpretation:
- |Std Diff| < 0.10: Excellent balance
- |Std Diff| 0.10-0.25: Good balance
- |Std Diff| 0.25-0.50: Acceptable balance
- |Std Diff| > 0.50: Poor balance
- Per-block statistics with quality ratings when blocking is used
-
balance_table()creates publication-ready formatted tables - Informative print methods with interpretation guides
Joined Matched Dataset Output
Create analysis-ready datasets directly from matching results:
-
New
join_matched()function automates data preparation:- Joins matched pairs with original left and right datasets
- Eliminates manual data wrangling after matching
- Select specific variables via
left_varsandright_varsparameters - Customizable suffixes (default:
_left,_right) for overlapping columns - Optional metadata:
pair_id,distance,block_id - Works with both optimal and greedy matching
-
Broom-style
augment()method for tidymodels integration:- S3 method following broom package conventions
- Sensible defaults for quick exploration
- Supports all
join_matched()parameters
-
Flexible output control:
-
include_distance- Include/exclude matching distance -
include_pair_id- Include/exclude sequential pair IDs -
include_block_id- Include/exclude block identifiers - Custom ID column support via
left_idandright_id - Clean column ordering: pair_id → IDs → distance → block → variables
-
Precomputed and Reusable Distances
Performance optimization for exploring multiple matching strategies:
-
New
compute_distances()function precomputes and caches distance matrices:- Compute distances once, reuse across multiple matching operations
- Store complete metadata: variables, distance metric, scaling method, timestamps
- Preserve original datasets for seamless integration with
join_matched() - Enable rapid exploration of different matching parameters
- Performance improvement: ~60% faster when trying multiple matching strategies
-
Distance objects (S3 class
distance_object):- Self-contained: cost matrix, IDs, metadata, original data
- Works with both
match_couples()andgreedy_couples() - Pass as first argument instead of datasets:
match_couples(dist_obj, max_distance = 5) - Informative print and summary methods with distance statistics
-
Constraint modification via
update_constraints():- Apply new
max_distanceorcaliperswithout recomputing distances - Creates new distance object following copy-on-modify semantics
- Experiment with different constraints efficiently
- Apply new
-
Backward compatible integration:
- Modified function signatures:
match_couples(left, right = NULL, vars = NULL, ...) - Automatically detects distance objects vs. datasets
- All existing code continues to work unchanged
- Modified function signatures:
Parallel Processing
Speed up blocked matching with multi-core processing:
-
New
parallelparameter inmatch_couples()andgreedy_couples():- Enable with
parallel = TRUEfor automatic configuration - Specify plan with
parallel = "multisession"or other future plan - Works with any number of blocks - automatically determines if beneficial
- Gracefully falls back if future packages not installed
- Enable with
-
Powered by the
futurepackage:- Cross-platform support (Windows, Unix/Mac, clusters)
- Respects user-configured parallel backends
- Automatic worker management
- Clean restoration of original plan after execution
-
Performance:
- Best for 10+ blocks with 50+ units per block
- Speedup scales with number of cores and complexity
- Minimal overhead for small problems
-
Integration:
- Works with all blocking methods (exact, fuzzy, clustering)
- Compatible with distance caching from Step 4
- Supports all matching parameters (constraints, calipers, scaling)
Fun Error Messages and Cost Checking
Like testthat, couplr makes errors light, memorable, and helpful with couple-themed messages:
-
New
check_costsparameter (default:TRUE) inmatch_couples()andgreedy_couples():- Automatically checks distance distributions before matching
- Provides friendly, actionable warnings for common problems
- Set to
FALSEto skip checks in production code
-
Fun couple-themed error messages throughout the package:
- 💔 “No matches made - can’t couple without candidates!”
- 🔍 “Your constraints are too strict. Love can’t bloom in a vacuum!”
- ✨ Helpful suggestions: “Try increasing max_distance or relaxing calipers”
- 💖 Success messages: “Excellent balance! These couples are well-matched!”
-
Automatic problem detection:
- Too many zeros: Warns about duplicates or identical values (>10% zero distances)
- Extreme costs: Detects skewed distributions (99th percentile > 10x the 95th)
- Many forbidden pairs: Warns when constraints eliminate >50% of valid pairs
- Constant distances: Alerts when all distances are identical
- Constant variables: Detects and excludes variables with no variation
-
New diagnostic function
diagnose_distance_matrix():- Comprehensive analysis of cost distributions
- Variable-specific problem detection
- Actionable suggestions for fixes
- Quality rating (good/fair/poor)
-
Emoji control: Disable with
options(couplr.emoji = FALSE)if preferred - Philosophy: Errors should be less intimidating, more memorable, and provide clear guidance
New Functions
-
preprocess_matching_vars()- Main preprocessing orchestrator -
balance_diagnostics()- Comprehensive balance assessment -
balance_table()- Formatted balance tables for reporting -
join_matched()- Create analysis-ready datasets from matching results -
augment.matching_result()- Broom-style interface for joined data -
compute_distances()- Precompute and cache distance matrices -
update_constraints()- Modify constraints on distance objects -
is_distance_object()- Type checking for distance objects -
diagnose_distance_matrix()- Comprehensive distance diagnostics -
check_cost_distribution()- Check for distribution problems - Added robust scaling method using median and MAD
Documentation & Examples
-
examples/auto_scale_demo.R- 5 preprocessing demonstrations -
examples/balance_diagnostics_demo.R- 6 balance diagnostic examples -
examples/join_matched_demo.R- 8 joined dataset demonstrations -
examples/distance_cache_demo.R- Distance caching and reuse examples -
examples/parallel_matching_demo.R- 7 parallel processing examples -
examples/error_messages_demo.R- 10 fun error message demonstrations - Complete implementation documentation (claude/IMPLEMENTATION_STEP1.md through STEP6.md)
- All functions have full Roxygen documentation
Major Changes (Initial 1.0.0 Release)
New Organization
R Code
- Eliminated 3 redundant files
- Consistent
morph_*naming prefix - Two-layer API:
assignment()(low-level) +lap_solve()(tidy) - 10 well-organized files (down from 13)
Features
API
-
lap_solve()- Main tidy interface -
lap_solve_batch()- Batch solving -
lap_solve_kbest()- K-best solutions -
assignment()- Low-level solver - Utilities:
get_total_cost(),as_assignment_matrix(), etc. - Visualization:
pixel_morph(),pixel_morph_animate()
Development history under “lapr” available in git log before v1.0.0.