Bounded Outcome Risk Guard for Model Evaluation
BORG detects data leakage and invalid cross-validation setups before you compute performance metrics. It checks for information reuse between training and test data, and blocks evaluation when problems are found.
Quick Start
library(BORG)
# Validate a train/test split
data <- iris
train_idx <- 1:100
test_idx <- 101:150
borg(data, train_idx = train_idx, test_idx = test_idx)
# Detect overlapping indices
borg(data, train_idx = 1:100, test_idx = 51:150)
#> Error: index_overlap - Train and test indices overlap (50 shared indices)Statement of Need
A model shows 95% accuracy on test data, then drops to 60% in production. The usual cause: data leakage. Information from the test set contaminated training, and the reported metrics were wrong.
A Princeton meta-analysis found leakage errors in 648 published papers across 30 fields. In civil war prediction research, correcting leakage revealed that “complex ML models do not perform substantively better than decades-old Logistic Regression.” The reported gains were artifacts.
BORG catches these errors before metrics are computed.
Features
Core Validation
-
borg(): Main entry point for all validation- Validates train/test splits against data
- Detects preprocessing leakage (scaling, PCA fitted on full data)
- Checks for target leakage (features derived from outcome)
- Validates grouped data (same patient in train and test)
- Validates temporal data (test predates training)
- Validates spatial data (test points too close to training)
-
borg_inspect(): Detailed inspection of specific objects- Works with
caret::preProcess,recipes::recipe,prcomp - Checks
rsampleresampling objects - Validates fitted models (
lm,glm,ranger, etc.)
- Works with
-
borg_diagnose(): Analyze data for dependency structure- Detects spatial autocorrelation (Moran’s I)
- Detects temporal autocorrelation (ACF/Ljung-Box)
- Detects clustered structure (ICC)
- Recommends appropriate CV strategy
Risk Categories
| Category | Impact | Response |
|---|---|---|
| Hard Violation | Results invalid | Blocks evaluation |
| Soft Inflation | Results biased | Warns, allows with caution |
Hard Violations: - index_overlap - Same row in train and test - duplicate_rows - Identical observations across sets - preprocessing_leak - Scaler/PCA fitted on full data - target_leakage - Feature with |r| > 0.99 with target - group_leakage - Same group in train and test - temporal_leak - Test data predates training
Soft Inflation: - proxy_leakage - Feature with |r| 0.95-0.99 with target - spatial_proximity - Test points close to training - spatial_overlap - Test inside training convex hull
Usage Examples
Detecting Preprocessing Leakage
# BAD: scale() fitted on all data before splitting
data_scaled <- scale(iris[, 1:4])
borg_inspect(data_scaled, train_idx = 1:100, test_idx = 101:150)
#> Hard violation: preprocessing_leakTarget Leakage Detection
# Feature highly correlated with outcome
leaky_data <- data.frame(
x = rnorm(100),
outcome = rnorm(100)
)
leaky_data$leaked <- leaky_data$outcome + rnorm(100, sd = 0.01)
borg_inspect(leaky_data, train_idx = 1:70, test_idx = 71:100, target = "outcome")
#> Hard violation: target_leakage_directGrouped Data Validation
# Clinical data with patient IDs
clinical <- data.frame(
patient_id = rep(1:10, each = 10),
measurement = rnorm(100)
)
# Random split ignoring patients
set.seed(123)
idx <- sample(100)
train_idx <- idx[1:70]
test_idx <- idx[71:100]
borg_inspect(clinical, train_idx, test_idx, groups = "patient_id")
#> Hard violation: group_leakageFramework Integration
BORG works with common ML frameworks:
# caret
library(caret)
pp <- preProcess(mtcars[, -1], method = c("center", "scale"))
borg_inspect(pp, train_idx = 1:25, test_idx = 26:32, data = mtcars)
# tidymodels
library(recipes)
rec <- recipe(mpg ~ ., data = mtcars) |>
step_normalize(all_numeric_predictors()) |>
prep()
borg_inspect(rec, train_idx = 1:25, test_idx = 26:32, data = mtcars)Interface Summary
| Function | Purpose |
|---|---|
borg() |
Main entry point - diagnose data or validate splits |
borg_inspect() |
Detailed inspection of objects |
borg_diagnose() |
Analyze data dependencies |
borg_validate() |
Validate complete workflow |
borg_rewrite() |
Attempt automatic repair |
plot() |
Visualize results |
summary() |
Generate methods text |
borg_certificate() |
Create validation certificate |
borg_export() |
Export certificate to YAML/JSON |