Join Matched Pairs with Original Data — join

Creates an analysis-ready dataset by joining matched pairs with variables from the original left and right datasets. This eliminates the need for manual joins and provides a convenient format for downstream analysis.

Usage

join_matched(result, ...)

# S3 method for class 'matching_result'
join_matched(
  result,
  left,
  right,
  left_vars = NULL,
  right_vars = NULL,
  left_id = "id",
  right_id = "id",
  suffix = c("_left", "_right"),
  include_distance = TRUE,
  include_pair_id = TRUE,
  include_block_id = TRUE,
  ...
)

# S3 method for class 'full_matching_result'
join_matched(result, left, right, left_id = "id", right_id = "id", ...)

# S3 method for class 'cem_result'
join_matched(result, left, right, left_id = "id", right_id = "id", ...)

# S3 method for class 'subclass_result'
join_matched(result, data = NULL, ...)

Arguments

result: A result object from match_couples(), greedy_couples(), full_match(), or cem_match()
...: Additional arguments passed to methods
left: The original left dataset
right: The original right dataset
left_vars: Character vector of variable names to include from left. If NULL (default), includes all variables except the ID column.
right_vars: Character vector of variable names to include from right. If NULL (default), includes all variables except the ID column.
left_id: Name of the ID column in left dataset (default: "id")
right_id: Name of the ID column in right dataset (default: "id")
suffix: Character vector of length 2 specifying suffixes for left and right variables (default: c("_left", "_right"))
include_distance: Include the matching distance in output (default: TRUE)
include_pair_id: Include pair_id column (default: TRUE)
include_block_id: Include block_id if blocking was used (default: TRUE)
data: Data frame used for subclassification

Value

A tibble with one row per matched pair, containing:

pair_id: Sequential pair identifier (if include_pair_id = TRUE)
left_id: ID from left dataset
right_id: ID from right dataset
distance: Matching distance (if include_distance = TRUE)
block_id: Block identifier (if blocking used and include_block_id = TRUE)
Variables from left dataset (with left suffix)
Variables from right dataset (with right suffix)

Details

This function simplifies the common workflow of joining matched pairs with original data. Instead of manually merging result$pairs with left and right datasets, join_matched() handles the joins automatically and applies consistent naming conventions.

When variables appear in both left and right datasets, suffixes are appended to distinguish them (e.g., "age_left" and "age_right"). This makes it easy to compute differences or use both values in models.

Examples

# Basic usage
left <- data.frame(
  id = 1:5,
  treatment = 1,
  age = c(25, 30, 35, 40, 45),
  income = c(45000, 52000, 48000, 61000, 55000)
)

right <- data.frame(
  id = 6:10,
  treatment = 0,
  age = c(24, 29, 36, 41, 44),
  income = c(46000, 51500, 47500, 60000, 54000)
)

result <- match_couples(left, right, vars = c("age", "income"))
matched_data <- join_matched(result, left, right)
head(matched_data)

# Specify which variables to include
matched_data <- join_matched(
  result, left, right,
  left_vars = c("treatment", "age", "income"),
  right_vars = c("age", "income"),
  suffix = c("_treated", "_control")
)

# Without distance or pair_id
matched_data <- join_matched(
  result, left, right,
  include_distance = FALSE,
  include_pair_id = FALSE
)