pkgdown/mathjax-config.html

Skip to contents

Automatically detects spatial autocorrelation, temporal autocorrelation, and clustered structure in data. Returns a diagnosis object that specifies appropriate cross-validation strategies.

Usage

borg_diagnose(
  data,
  coords = NULL,
  time = NULL,
  groups = NULL,
  target = NULL,
  alpha = 0.05,
  verbose = FALSE
)

Arguments

data

A data frame to diagnose.

coords

Character vector of length 2 specifying coordinate column names (e.g., c("lon", "lat") or c("x", "y")). If NULL, spatial autocorrelation is not tested.

time

Character string specifying the time column name. Can be Date, POSIXct, or numeric. If NULL, temporal autocorrelation is not tested.

groups

Character string specifying the grouping column name (e.g., "site_id", "patient_id"). If NULL, clustered structure is not tested.

target

Character string specifying the response variable column name. Used for more accurate autocorrelation diagnostics on residuals. Optional.

alpha

Numeric. Significance level for autocorrelation tests. Default: 0.05.

verbose

Logical. If TRUE, print diagnostic progress. Default: FALSE.

Value

A BorgDiagnosis object containing:

  • Detected dependency type(s)

  • Severity assessment

  • Recommended CV strategy

  • Detailed diagnostics for each dependency type

  • Estimated metric inflation from using random CV

Details

Spatial Autocorrelation

Detected using Moran's I test on the target variable (or first numeric column). The autocorrelation range is estimated from the empirical variogram. Effective sample size is computed as \(n_{eff} = n / DEFF\) where DEFF is the design effect.

Temporal Autocorrelation

Detected using the Ljung-Box test on the target variable. The decorrelation lag is the first lag where ACF drops below the significance threshold. Minimum embargo period is set to the decorrelation lag.

Clustered Structure

Detected by computing the intraclass correlation coefficient (ICC). An ICC > 0.05 indicates meaningful clustering. The design effect (DEFF) quantifies variance inflation: \(DEFF = 1 + (m-1) \times ICC\) where m is the average cluster size.

Examples

# Spatial data example
set.seed(42)
spatial_data <- data.frame(
  x = runif(100, 0, 100),
  y = runif(100, 0, 100),
  response = rnorm(100)
)
# Add spatial autocorrelation (nearby points are similar)
for (i in 2:100) {
  nearest <- which.min((spatial_data$x[1:(i-1)] - spatial_data$x[i])^2 +
                       (spatial_data$y[1:(i-1)] - spatial_data$y[i])^2)
  spatial_data$response[i] <- 0.7 * spatial_data$response[nearest] +
                              0.3 * rnorm(1)
}

diagnosis <- borg_diagnose(spatial_data, coords = c("x", "y"),
                           target = "response")
print(diagnosis)

# Clustered data example
clustered_data <- data.frame(
  site = rep(1:10, each = 20),
  value = rep(rnorm(10, sd = 2), each = 20) + rnorm(200, sd = 0.5)
)

diagnosis <- borg_diagnose(clustered_data, groups = "site", target = "value")
print(diagnosis)