pkgdown/mathjax-config.html

Skip to contents

Precomputes a distance matrix between left and right datasets, allowing it to be reused across multiple matching operations with different constraints. This is particularly useful when exploring different matching parameters (max_distance, calipers, methods) without recomputing distances.

Usage

compute_distances(
  left,
  right,
  vars,
  distance = "euclidean",
  weights = NULL,
  scale = FALSE,
  auto_scale = FALSE,
  left_id = "id",
  right_id = "id",
  block_id = NULL
)

Arguments

left

Left dataset (data frame)

right

Right dataset (data frame)

vars

Character vector of variable names to use for distance computation

distance

Distance metric (default: "euclidean")

weights

Optional numeric vector of variable weights

scale

Scaling method: FALSE, "standardize", "range", or "robust"

auto_scale

Apply automatic preprocessing (default: FALSE)

left_id

Name of ID column in left (default: "id")

right_id

Name of ID column in right (default: "id")

block_id

Optional block ID column name for blocked matching

Value

An S3 object of class "distance_object" containing:

  • cost_matrix: Numeric matrix of distances

  • left_ids: Character vector of left IDs

  • right_ids: Character vector of right IDs

  • block_id: Block ID column name (if specified)

  • metadata: List with computation details (vars, distance, scale, etc.)

  • original_left: Original left dataset (for later joining)

  • original_right: Original right dataset (for later joining)

Details

This function computes distances once and stores them in a reusable object. The resulting distance_object can be passed to match_couples() or greedy_couples() instead of providing datasets and variables.

Benefits:

  • Performance: Avoid recomputing distances when trying different constraints

  • Exploration: Quickly test max_distance, calipers, or methods

  • Consistency: Ensures same distances used across comparisons

  • Memory efficient: Can use sparse matrices when many pairs are forbidden

The distance_object stores the original datasets, allowing downstream functions like join_matched() to work seamlessly.

Examples

# Compute distances once
left <- data.frame(id = 1:5, age = c(25, 30, 35, 40, 45), income = c(45, 52, 48, 61, 55) * 1000)
right <- data.frame(id = 6:10, age = c(24, 29, 36, 41, 44), income = c(46, 51, 47, 60, 54) * 1000)

dist_obj <- compute_distances(
  left, right,
  vars = c("age", "income"),
  scale = "standardize"
)

# Reuse for different matching strategies
result1 <- match_couples(dist_obj, max_distance = 0.5)
result2 <- match_couples(dist_obj, max_distance = 1.0)
result3 <- greedy_couples(dist_obj, strategy = "sorted")

# All use the same precomputed distances