Joins an external data source (CSV file or data.frame) to a taxify()
result. Species names in the external data are matched through the same
backbone(s) used in the original taxify() call, and the join is performed
on accepted_id — so synonyms in either dataset resolve to the same key.
Usage
add_data(
x,
data,
species_col = NULL,
table = NULL,
sheet = NULL,
start_row = NULL,
cols = NULL,
group_col = NULL,
groups = "all",
fuzzy = TRUE,
fuzzy_threshold = 0.2,
verbose = TRUE
)Arguments
- x
A data.frame returned by
taxify().- data
One of:
A data.frame already in R.
A file path to a
.csv,.csv.gz,.tsv,.tsv.gz,.xlsx,.sqlite/.db, or.vtrfile (read via vectra).
- species_col
Character. Name of the column in
datathat contains species names. IfNULL(default), auto-detected by matchinghead(10)of each character column against the backbone.- table
Character. Required when
datais a SQLite file — the table name to read.- sheet
Integer or character. Sheet to read when
datais an.xlsxfile. DefaultNULL(auto-detect the sheet containing species names). Set explicitly to skip auto-detection.- start_row
Integer. Row where column headers begin in an
.xlsxfile. DefaultNULL(auto-detect by scanning the first 20 rows for a header row that produces species name matches). Set explicitly when the layout is known.- cols
Character vector of column names from
datato join. IfNULL(default), all columns exceptspecies_colare joined.- group_col
Character or
NULL. Column indatathat defines groups (e.g., country codes, regions). When set, the output is pivoted to wide format with one column per group (e.g.,trait_AT,trait_DE), just like the built-in grouped enrichments. Usetaxify_long()to reshape back to long format. DefaultNULL(flat join, one row per species).- groups
Character vector or
"all". Which groups to include whengroup_colis set. Default"all".- fuzzy
Logical. Enable fuzzy matching for names in
data. DefaultTRUE.- fuzzy_threshold
Numeric. Maximum allowed distance for fuzzy matches. Default
0.2.- verbose
Logical. Default
TRUE.
Value
The input data.frame with additional columns from data, joined
via backbone-resolved accepted_id. Columns from data that collide
with existing columns in x are prefixed with "data_".
Details
The workflow:
Read
data(CSV or data.frame).Identify the species column (explicit or auto-detected).
Match species names through the same backbone(s) as the original
taxify()call, obtainingaccepted_idfor each row.Check for conflicting duplicates: if multiple rows in
dataresolve to the sameaccepted_idwith different values, an error is raised (unlessgroup_colis set). Exact duplicates produce a warning and are deduplicated.Left-join on
accepted_id.
Grouped data
When your data has multiple rows per species (e.g., one row per species
per country), set group_col to produce wide output with suffixed
columns. This is the same format as the built-in grouped enrichments.
Auto-detection
When species_col is not specified, add_data() takes the first 10 rows
of each character column and runs them through taxify(). The column with
the highest match rate is selected. If no column achieves at least 50%
matches, an error is raised asking the user to specify species_col
explicitly.
Examples
# Runs offline against the bundled example database.
old <- options(taxify.data_dir = taxify_example_data())
result <- taxify(c("Quercus robur", "Pinus sylvestris"))
traits <- data.frame(species = c("Quercus robur", "Pinus sylvestris"),
height = c(30, 25))
result |> add_data(traits, species_col = "species")
options(old)