pkgdown/mathjax-config.html

Skip to contents

Joins two tables using approximate string matching on key columns. Optionally blocks by a second column (e.g., genus) for performance — only rows sharing the same blocking key are compared.

Usage

fuzzy_join(
  x,
  y,
  by,
  method = "dl",
  max_dist = 0.2,
  block_by = NULL,
  n_threads = 4L,
  suffix = ".y"
)

Arguments

x

A vectra_node object (probe / query side).

y

A vectra_node object (build / reference side).

by

A named character vector of length 1: c("probe_col" = "build_col"). The columns to compute string distance on.

method

Character. Distance algorithm: "dl" (Damerau-Levenshtein, default), "levenshtein", or "jw" (Jaro-Winkler).

max_dist

Numeric. Maximum normalized distance (0-1) to keep a match. Default 0.2.

block_by

Optional named character vector of length 1: c("probe_col" = "build_col"). Rows must match exactly on these columns before distance is computed. Dramatically reduces comparisons.

n_threads

Integer. Number of OpenMP threads for parallel distance computation over partitions. Default 4L.

suffix

Character. Suffix appended to build-side column names that collide with probe-side names. Default ".y".

Value

A vectra_node with all probe columns, all build columns (suffixed on collision), and a fuzzy_dist column (double).