Keep streamed rows by their spatial relation to a resident layer
Source:R/spatial.R
spatial_filter.RdStreams a large left side x through the engine and keeps each row whose
geometry satisfies an sf binary predicate against a small resident
layer y (select by location). This is the spatial counterpart to a
semi_join(): rows are filtered, never duplicated, and no columns are added,
so the output carries x's schema unchanged. With negate = TRUE it keeps
the rows that do not match (select by location, inverted). The billion-row
left stream never materializes; y (a study region, habitat patches, a
coastline buffer, ...) stays resident.
Usage
spatial_filter(
x,
y,
predicate = NULL,
negate = FALSE,
geom = "geometry",
coords = NULL,
crs = NA,
flush_rows = NULL,
...
)Arguments
- x
A
vectra_node(fromtbl(),tbl_tiff(), any verb chain, ...). It is consumed by the stream.- y
An
sforsfcobject: the resident locator layer to test against.- predicate
An sf binary predicate function, e.g. sf::st_intersects (default), sf::st_within, sf::st_covered_by, sf::st_is_within_distance. A left row is kept when the predicate reports at least one match against
y.- negate
If
TRUE, keep the rows with no match instead (the inverted select-by-location). DefaultFALSE.- geom
Name of the input geometry column holding hex-WKB or WKT strings. Default
"geometry". Ignored whencoordsis given.- coords
Optional length-2 character vector naming the x and y coordinate columns to assemble point geometry from (e.g.
c("x", "y")), for inputs such astiff_extract_points()output. The coordinate columns are retained.- crs
Coordinate reference system of the input geometry, in any form
sf::st_crs()accepts (EPSG integer, WKT, proj string). Defaults to the CRS the upstream node carries, or unknown.- flush_rows
Transformed rows buffered before a spill flush. Larger values mean fewer, bigger temporary files. Defaults to
getOption("vectra.spatial_flush", 5e5).- ...
Further arguments passed to
predicate, e.g.dist =for sf::st_is_within_distance.
Value
A vectra_node of the kept rows with x's schema, backed by
temporary .vtr spills and carrying the input CRS.
Details
For the recognised predicates – the topological ones (intersects, within,
contains, overlaps, covers, covered by, touches, crosses), equals, disjoint,
and within-distance (sf::st_is_within_distance, whose radius is passed as
dist =) – on projected or unprojected planar data, the test runs natively
on the GEOS C API straight off the hex-WKB column: y is parsed once into a
spatial index and each batch is tested in C, with no per-batch round-trip
through sf. Coordinate-assembled (coords) point input runs natively
too, building each point in C rather than through sf; disjoint is the
one exception there (its matches are the bounding boxes the index prunes
away) and keeps the sf loop, as it does for the join. Geographic
coordinates with spherical geometry on (sf::sf_use_s2()) and any other
predicate use sf instead, preserving its semantics. When y carries no
CRS it inherits the stream's so the predicate does not reject on a mismatch.
See also
spatial_join() to tag rows with y's attributes, spatial_clip()
to cut geometry against a mask, filter() for attribute predicates.
Examples
nc <- sf::st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)
region <- nc[nc$NAME %in% c("Ashe", "Alleghany", "Surry"), "NAME"]
set.seed(1)
pts <- sf::st_coordinates(sf::st_sample(nc, 300))
f <- tempfile(fileext = ".vtr")
write_vtr(data.frame(id = seq_len(nrow(pts)), x = pts[, 1], y = pts[, 2]), f)
# Keep only the points that fall inside the three-county region, streaming.
inside <- tbl(f) |>
spatial_filter(region, coords = c("x", "y"), crs = sf::st_crs(nc))
nrow(collect(inside))
unlink(f)