Computes string distances between query keys and a string column in a materialized block. Optionally uses exact-match blocking on a second column (e.g., genus) to reduce the search space.
Usage
block_fuzzy_lookup(
block,
column,
keys,
method = "dl",
max_dist = 0.2,
block_col = NULL,
block_keys = NULL,
n_threads = 4L
)Arguments
- block
A
vectra_blockfrommaterialize().- column
Character scalar. Name of the string column to fuzzy-match against.
- keys
Character vector. Query strings to match.
- method
Character. Distance method:
"dl"(Damerau-Levenshtein, default),"levenshtein", or"jw"(Jaro-Winkler).- max_dist
Numeric. Maximum normalized distance (default 0.2).
- block_col
Optional character scalar. Column name for exact-match blocking (e.g., genus). When provided, only rows where
block_colmatches the correspondingblock_keysvalue are compared.- block_keys
Optional character vector (same length as
keys). Exact-match values for blocking. Required whenblock_colis provided.- n_threads
Integer. Number of OpenMP threads (default 4L).