Automatically fixes trivial join key issues like whitespace and case mismatches. Returns the repaired data frame(s) with a summary of changes.
Usage
join_repair(
x,
y = NULL,
by,
trim_whitespace = TRUE,
standardize_case = NULL,
remove_invisible = TRUE,
empty_to_na = FALSE,
dry_run = FALSE
)Arguments
- x
A data frame (left table).
- y
A data frame (right table). If NULL, only repairs x.
- by
A character vector of column names to repair.
- trim_whitespace
Logical. Trim leading/trailing whitespace. Default TRUE.
- standardize_case
Character. Standardize case to "lower", "upper", or NULL (no change). Default NULL.
- remove_invisible
Logical. Remove invisible Unicode characters. Default TRUE.
- empty_to_na
Logical. Convert empty strings to NA. Default FALSE.
- dry_run
Logical. If TRUE, only report what would be changed without modifying data. Default FALSE.
Value
If y is NULL, returns the repaired x. If both are provided,
returns a list with x and y. In dry_run mode, returns a summary of
proposed changes.
Examples
# Data with whitespace issues
orders <- data.frame(
id = c(" A", "B ", "C"),
value = 1:3,
stringsAsFactors = FALSE
)
# Dry run to see what would change
join_repair(orders, by = "id", dry_run = TRUE)
# Actually repair
orders_fixed <- join_repair(orders, by = "id")