Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.
Usage
join_matched(
result,
left,
right,
left_vars = NULL,
right_vars = NULL,
left_id = "id",
right_id = "id",
suffix = c("_left", "_right"),
include_distance = TRUE,
include_pair_id = TRUE,
include_block_id = TRUE
)Arguments
- result
A matching_result object from
match_couples()orgreedy_couples()- left
The original left dataset
- right
The original right dataset
- left_vars
Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column.
- right_vars
Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column.
- left_id
Name of the ID column in left dataset (default: "id")
- right_id
Name of the ID column in right dataset (default: "id")
- suffix
Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right"))
- include_distance
Include the matching distance in output (default: TRUE)
- include_pair_id
Include pair_id column (default: TRUE)
- include_block_id
Include block_id if blocking was used (default: TRUE)
Value
A tibble with one row per matched pair, containing:
pair_id: Sequential pair identifier (if include_pair_id = TRUE)left_id: ID from left datasetright_id: ID from right datasetdistance: Matching distance (if include_distance = TRUE)block_id: Block identifier (if blocking used and include_block_id = TRUE)Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)
Details
This function simplifies the common workflow of joining matched pairs
with original data. Instead of manually merging result$pairs with left
and right datasets, join_matched() handles the joins automatically
and applies consistent naming conventions.
When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.
Examples
# Basic usage
left <- data.frame(
id = 1:5,
treatment = 1,
age = c(25, 30, 35, 40, 45),
income = c(45000, 52000, 48000, 61000, 55000)
)
right <- data.frame(
id = 6:10,
treatment = 0,
age = c(24, 29, 36, 41, 44),
income = c(46000, 51500, 47500, 60000, 54000)
)
result <- match_couples(left, right, vars = c("age", "income"))
matched_data <- join_matched(result, left, right)
head(matched_data)
# Specify which variables to include
matched_data <- join_matched(
result, left, right,
left_vars = c("treatment", "age", "income"),
right_vars = c("age", "income"),
suffix = c("_treated", "_control")
)
# Without distance or pair_id
matched_data <- join_matched(
result, left, right,
include_distance = FALSE,
include_pair_id = FALSE
)