Self-overlay a polygon layer into disjoint pieces (QGIS-style Union)
Source:R/spatial.R
spatial_overlay.RdSplits a polygon layer along all its own overlaps into disjoint pieces and
returns a lazy node with one row per piece per covering polygon: where k
polygons overlap, that piece appears k times, each row carrying one source
polygon's attributes. This is the union overlay GIS tools expose as
"Union (single layer)", with the overlap retained once per contributing
feature rather than dissolved. Resolve the duplicates with a grouped
slice_min() / slice_max() – for example earliest designation year wins:
group_by(piece_id) |> slice_min(year).
Usage
spatial_overlay(
x,
y = NULL,
vars = NULL,
vars_y = NULL,
how = c("intersection", "union", "identity", "symdiff"),
piece = "piece_id",
geom = "geometry",
grid = NULL,
precision = NULL,
dedup = TRUE,
flush_rows = NULL,
mem_limit = NULL,
threads = NULL,
quiet = TRUE,
layer = NULL,
query = NULL,
layer_y = NULL,
query_y = NULL,
read_chunk = NULL
)Arguments
- x
An
sfobject with polygon or multipolygon geometry, or a single path to a vector file (e.g. a GeoPackage). A path is read in feature batches vialayer/query, so the whole layer is never held in memory at once – peak memory then tracks the cleaned geometry, not the source size, which lets a larger-than-RAM layer overlay on a modest machine.- y
Optional second layer to overlay
xagainst, in the same formsxaccepts (ansfobject or a file path read vialayer_y/query_y). It must share the CRS ofx.NULL(the default) self-unionsx.- vars
Character vector of attribute columns of
xto carry onto each piece. DefaultNULLkeeps them all; name a subset to keep the streamed output narrow.- vars_y
Character vector of attribute columns of
yto carry onto each piece (two-layer overlay only). DefaultNULLkeeps them all. A name shared with anxcolumn is disambiguated with a.x/.ysuffix in the output.- how
For a two-layer overlay, which pieces to keep:
"intersection"(covered by both layers; the default),"union"(every piece of either, the absent side's attributes filled withNA),"identity"(all ofx, split byy, withy's attributes where it covers andNAelsewhere), or"symdiff"(pieces in exactly one layer). Ignored wheny = NULL.- piece
Name of the integer piece-id column added to the output (the key you group by to resolve overlaps). Default
"piece_id".- geom
Name of the output hex-WKB geometry column. Default
"geometry".- grid
Fixed-precision snapping grid size in CRS units. Coordinates are snapped to this grid before noding so near-duplicate shared boundaries merge into one.
NULL(the default) derives it from coordinate magnitude (max(abs(st_bbox(x))) * 3e-8), which suits projected layers. Pass a number to override when that default is too coarse for fine geometry (or too coarse because an outlier coordinate inflated the magnitude), or0to disable snapping entirely.- precision
Fixed-precision grid size, in CRS units, for noding the boundary linework. Noding on a fixed grid is deterministic and avoids the floating noder's repair-and-retry on dense overlapping linework, which is what makes a large dense layer feasible to overlay. It is far finer than
gridso intersection points are not collapsed.NULL(the default) derives it from coordinate magnitude (max(abs(st_bbox(x))) * 1e-13); pass a number to override, or0to node in floating precision.- dedup
Overlay one representative per group of byte-identical cleaned geometries and fan the per-record attributes back onto its pieces afterwards. Duplicates add no faces, so the result is identical; this only removes the redundant noding when many records are stacked over one site (common in WDPA-style data).
TRUEby default; setFALSEto overlay every record.- flush_rows
Exploded rows buffered before a spill flush. Defaults to
getOption("vectra.spatial_flush", 5e5).- mem_limit
Approximate peak working-set budget in bytes, bounding the per-tile size (
tile_bytes = mem_limit / (threads * 24)). It is a throughput knob with an interior optimum, not "bigger is faster": too small replicates features across many tiles, too large nodes too much linework per tile (a superlinear cost), and a budget of tens of GB runs slower than the default on a dense layer. Lower it for tighter memory. Defaults viagetOption("vectra.overlay_mem_limit", ...)to a value that scales withthreadsto hold the per-tile size near its measured optimum.- threads
Number of OpenMP threads for the per-component overlay within a chunk.
0(the default, viagetOption("vectra.overlay_threads", 0)) uses all available cores.- quiet
If
FALSE, show a text progress bar over the overlay chunks.- layer
When
xis a file path, the name of the layer to read. Ignored for ansfx. Supply this orquery.- query
When
xis a file path, a SQL statement selecting the features to overlay (read in batches viaLIMIT/OFFSET); use it instead oflayerfor a subset or join. Withqueryand nolayer, passgridexplicitly, since the layer extent cannot be read from the file metadata.- layer_y, query_y
The
layer/queryequivalents for a file-pathy.- read_chunk
Features per read/parse batch.
NULL(default) sizes it from available RAM. Smaller batches lower peak memory; larger ones do fewer round trips.
Value
A vectra_node over the exploded overlay, backed by temporary .vtr
spills removed when the node is garbage-collected, carrying the CRS of x
for collect_sf(). For a self-union it is one row per piece per covering
polygon; for a two-layer overlay one row per piece per covering
x-record / y-record pair, with the columns of both layers.
Details
The topology is done once with sf/GEOS and tiled over connected overlap
clusters (disjoint clusters never share a piece, so the tiling is exact and
bounded in memory), then the exploded pieces are streamed to a .vtr and
handed back as a lazy node. Geometry rides through the engine as hex-encoded
WKB in a string column; the CRS is carried on the node for collect_sf().
The overlay runs on a fixed-precision model: coordinates are snapped to a
grid derived from their own magnitude so the pieces come out disjoint and
their areas reconstruct the union of the inputs, instead of drifting by the
fraction of a percent that floating-point sliver artefacts on invalid input
otherwise introduce. Inputs are also passed through sf::st_make_valid().
With a second layer y, the same machinery overlays two layers instead of
self-unioning one: both layers are noded together into one planar partition,
and each piece carries the attributes of the x record and the y record
that cover it. how selects which pieces to keep – the intersection (pieces
covered by both), the union (every piece of either), x split by y
("identity"), or the parts in exactly one layer ("symdiff"). With
y = NULL (the default) the function self-unions x and how is ignored.
See also
slice_min() / slice_max() to resolve each piece to one winner,
collect_sf() to materialize as sf.
Examples
# Two overlapping squares designated in different years.
sq <- function(a, b) sf::st_polygon(list(rbind(
c(a, 0), c(b, 0), c(b, 1), c(a, 1), c(a, 0))))
polys <- sf::st_sf(year = c(1990L, 2010L),
geometry = sf::st_sfc(sq(0, 2), sq(1, 3)))
# Split into disjoint pieces; earliest year wins where they overlap.
first <- spatial_overlay(polys) |>
group_by(piece_id) |>
slice_min(year, n = 1, with_ties = FALSE) |>
collect_sf()
first
# Two-layer overlay: intersect the squares with a zone layer, keeping both
# sets of attributes on each overlapping piece.
zones <- sf::st_sf(zone = c("A", "B"),
geometry = sf::st_sfc(sq(0, 1.5), sq(1.5, 3)))
inter <- spatial_overlay(polys, zones, how = "intersection") |> collect_sf()
inter