Streams both files and computes a set-level diff keyed on key_col.
Returns a list with two elements:
Details
added: avectra_node(lazytbl()) of rows present innew_pathbut notold_path(matched onkey_col). Callcollect()to materialise. The underlying temp file is deleted when the node is garbage-collected or when the calling R session ends viaon.exit().deleted: a vector of key values present inold_pathbut notnew_path.
This is a logical diff (key-based set difference), not a binary file
diff. Rows with the same key that have changed values are not reported
as modified — use added and deleted together to detect updates (a key
that appears in both means a row was replaced).
Examples
f1 <- tempfile(fileext = ".vtr")
f2 <- tempfile(fileext = ".vtr")
df1 <- data.frame(id = 1:5, val = letters[1:5], stringsAsFactors = FALSE)
df2 <- data.frame(id = c(3L, 4L, 5L, 6L, 7L),
val = c("C", "d", "e", "f", "g"),
stringsAsFactors = FALSE)
write_vtr(df1, f1)
write_vtr(df2, f2)
d <- diff_vtr(f1, f2, "id")
# Rows 1 and 2 deleted; rows 6 and 7 added
stopifnot(all(d$deleted %in% c(1, 2)))
stopifnot(all(collect(d$added)$id %in% c(6, 7)))
unlink(c(f1, f2))