25 Contracts and defensive code

You have written a function that computes a rate from two columns of a data frame. Someone passes it a data frame where one column is character instead of numeric. R does not crash; mean() on a character vector returns NA with a warning, and your function divides NA by a number and returns NA. The warning scrolls past in a long pipeline. The NA propagates through mutate(), summarise(), left_join(), and into a CSV that feeds a report. Three weeks later a collaborator asks why half the rates in the report are missing. You trace it back to a column that was read as character because one cell contained the string "N/A".

Every line of code between the bad input and the discovery was wasted work. The function that produced the NA could have stopped the pipeline in the first line, with a message that said exactly what went wrong. It didn’t, because nobody wrote that check. This was the problem a young man faced in the 1980s while working at Électricité de France, where a wrong number in a pipeline could mean a miscalculated load on a reactor.

After nine years of watching functions accept garbage and hand it downstream, he left and decided to build a programming language where that could not happen. His name was Bertrand Meyer; the language he was building was called Eiffel, which was defined by one central idea he called Design by Contract. It was simple: every function states a precondition (what must be true before it runs), a postcondition (what it guarantees after it runs), and an invariant (what stays true throughout). If a precondition fails, the caller broke the deal. If a postcondition fails, the function broke it. Either way, execution stops with a message that names the broken obligation.

R has no contract syntax. But stop() at the top of a function is a precondition. stopifnot() before the return value is a postcondition. The validator in a constructor/validator/helper pattern (Section 24.3) is a class invariant. The tools are manual; the discipline is the same. State your assumptions, halt the moment one is violated, and make the error message say what went wrong.

25.1 The case for validation

R is dynamically typed, which means nothing stops you from passing a character vector to a function that expects a number. Without validation, the error surfaces deep inside the function body, far from the actual mistake:

add_tax <- function(price, rate) {
  price * (1 + rate)
}

add_tax("42", 0.2)
#> Error in `price * (1 + rate)`:
#> ! non-numeric argument to binary operator

The error message says non-numeric argument to binary operator. If you already know what went wrong, that is enough; if you don’t, it is useless. The real problem is that someone passed "42" instead of 42, but R doesn’t tell you that. It tells you about a multiplication that failed three levels deep.

With validation, the error surfaces immediately:

add_tax <- function(price, rate) {
  if (!is.numeric(price)) {
    stop(sprintf("price must be numeric, got %s", class(price)[1]))
  }
  price * (1 + rate)
}

add_tax("42", 0.2)
#> Error in `add_tax()`:
#> ! price must be numeric, got character

Check inputs at the boundary, where data enters your function, not in the guts where it is consumed. The principle is short: fail fast, fail loud, fail clear.

R has no static types, no compiler, no ahead-of-time guarantees. Every precondition you want enforced must be enforced by code you write. That sounds like a burden, but it has an upside: every assumption is visible in the function body when you revisit it six months later. if (!is.numeric(price)) stop(...) is verbose, but it is also impossible to miss.

The Ariane 5 rocket self-destructed 37 seconds after launch in 1996 because a 64-bit float was silently converted to a 16-bit integer, overflowing without a check. A single stop() equivalent would have saved a $370 million rocket. A 2003 NASA study found that 40% of aerospace software failures came from missing code: checks the programmers never wrote. When your function returns NA instead of erroring on bad input, you are in the same category of silent failure. stop() with a clear message forces the caller to deal with the problem immediately, before the bad value propagates through fifty lines of downstream code.

The type theory connection

A stop() that fires on wrong input is a runtime proof by contradiction: “if this value were numeric, execution would continue; it isn’t, so we halt.” The Curry-Howard correspondence formalizes this: types are propositions, programs are proofs, and a function with signature A -> B is a proof that “if A then B.” Dependently typed languages (Idris, Agda) encode preconditions in the type system itself, making runtime stop() calls unnecessary. R sits at the opposite end of that spectrum. The question both approaches answer is the same: “what must hold for this computation to be sound?”

(This is about functions you write for yourself and for others, not about validating user input in a Shiny app.)

25.2 `stop()`, `warning()`, and `message()`

R gives you three signalling functions, each pitched at a different severity. The differences matter more than you might think.

stop("msg") halts execution and signals an error. Use it when the function cannot produce a correct result.

divide <- function(x, y) {
  if (y == 0) stop("y must not be zero")
  x / y
}

divide(10, 0)
#> Error in `divide()`:
#> ! y must not be zero

warning("msg") continues execution but signals a warning, which makes it appropriate when the function can proceed but the result might surprise the caller.

safe_log <- function(x) {
  if (any(x <= 0)) warning("non-positive values replaced with NA")
  ifelse(x > 0, log(x), NA)
}

safe_log(c(1, -2, 3))
#> Warning in safe_log(c(1, -2, 3)): non-positive values replaced with NA
#> Warning in log(x): NaNs produced
#> [1] 0.000000       NA 1.098612

message("msg") prints informational output: progress updates, status reports, diagnostics. suppressMessages() silences these, which is why message() is better than cat() for status output. It respects the caller’s wishes.

load_data <- function(path) {
  message(sprintf("Reading %s...", path))
  # ... actual loading code ...
}

The decision rule: if the result would be wrong, stop(). If the result is correct but surprising, warning(). If you are just providing status, message(). But where exactly is the line between “surprising” and “wrong”?

Opinion

Never use warning() when you should use stop(). A warning that scrolls past unread is worse than no warning at all; it gives you the illusion of safety while the bad value keeps propagating. If you are not confident the function produced a correct result, don’t return one.

25.3 Writing informative errors

The difference between a helpful error and a frustrating one is the difference between ten seconds of debugging and ten minutes. Most of the gap comes down to one thing: does the message tell you what was expected and what was received?

# Bad
stop("invalid input")

# Good
stop(sprintf("x must be numeric, got %s", class(x)[1]))

# Better
stop(sprintf(
  "x must be a single positive number, got %s of length %d",
  class(x)[1], length(x)
))

Say what was expected, say what was received. The caller should be able to fix the problem without reading your source code.

stopifnot() is base R’s assertion function, terse and good for internal checks where you trust the caller to interpret the output:

normalize <- function(x) {
  stopifnot(is.numeric(x), length(x) > 0)
  x / sum(x)
}

normalize("abc")
#> Error in `normalize()`:
#> ! is.numeric(x) is not TRUE

The auto-generated error message is functional but cryptic. Since R 4.0, you can name the assertions to control the message:

normalize <- function(x) {
  stopifnot(
    "x must be numeric" = is.numeric(x),
    "x must not be empty" = length(x) > 0
  )
  x / sum(x)
}

normalize("abc")
#> Error in `normalize()`:
#> ! x must be numeric

Named stopifnot() gives you the conciseness of assertions with messages you control. For functions that other people will call, this is the minimum standard. But what happens when you have ten arguments to validate, each with its own type, range, and length constraint?

Exercises

The following function has a bad error message. Rewrite it so the error says what was expected and what was received.

compute_bmi <- function(weight_kg, height_m) {
  if (!is.numeric(weight_kg) || !is.numeric(height_m)) {
    stop("bad input")
  }
  weight_kg / height_m^2
}

Rewrite the validation using named stopifnot(). Does the error message improve?
What error message does stopifnot(is.numeric("hello")) produce? What about stopifnot("x must be numeric" = is.numeric("hello"))?

25.4 Common validation patterns

A function that expects a numeric scalar but receives a character vector of length three will fail somewhere. The question is where. These patterns catch the problem at the door.

Type checks: is.numeric(x), is.character(x), is.logical(x), is.data.frame(x), is.list(x).
Length checks: length(x) == 1 for scalars, length(x) > 0 for non-empty inputs.
Range checks: x > 0 for positive, x >= 0 && x <= 1 for probabilities.
NULL and NA handling: is.null(x) and anyNA(x).

if (is.null(x)) stop("x must not be NULL")
if (anyNA(x)) stop("x must not contain NA values")

File existence. file.exists(path).

if (!file.exists(path)) stop(sprintf("file not found: %s", path))

Set membership. This pattern deserves special attention: match.arg().

my_summary <- function(x, method = c("mean", "median", "trimmed")) {
  method <- match.arg(method)
  switch(method,
    mean    = mean(x),
    median  = median(x),
    trimmed = mean(x, trim = 0.1)
  )
}

match.arg() does three things at once. It validates that the argument is one of the allowed choices. It supports partial matching, so my_summary(1:10, "med") works. And it picks the first option as the default, so calling my_summary(1:10) uses "mean". All of this comes from the function signature, which means the valid choices are visible in the documentation and in ?my_summary.

my_summary(1:10)
#> [1] 5.5
my_summary(1:10, "median")
#> [1] 5.5
my_summary(1:10, "tri")
#> [1] 5.5

my_summary(1:10, "xyz")
#> Error in `match.arg()`:
#> ! 'arg' should be one of "mean", "median", "trimmed"

The error message from match.arg() is clear and lists the valid options, which is why it is preferred over manual %in% checks for string arguments with fixed options. But all of these patterns share a problem: writing them out by hand, over and over, for every function in a package.

Exercises

Write a function rescale() that takes a numeric vector x and a method argument accepting "minmax" or "zscore". Use match.arg() for validation. Test that partial matching works.
Add validation to the following function using stopifnot() or if/stop(). Check that data is a data frame, n is a single positive integer, and col is a string that exists as a column name.
```
top_rows <- function(data, col, n = 5) {
  data[order(data[[col]], decreasing = TRUE), ][seq_len(n), ]
}
```

25.5 The checkmate package

Writing the same if (!is.numeric(x)) stop(...) pattern for the fifteenth time in a package gets old fast, and each repetition is another chance to misspell a class name or forget a length check. The checkmate package provides concise, fast assertion functions that handle the boilerplate:

library(checkmate)

process <- function(data, method = "fast", threshold = 0.5) {
  assert_data_frame(data, min.rows = 1)
  assert_choice(method, choices = c("fast", "accurate"))
  assert_number(threshold, lower = 0, upper = 1)
  # ...
}

Each assert_* function throws an informative error if the check fails. No if/stop boilerplate, no sprintf formatting. The function names are self-documenting: assert_numeric, assert_character, assert_file_exists, assert_flag (for single TRUE/FALSE), and dozens more.

checkmate also provides two softer variants. test_* functions return TRUE or FALSE for use in conditional logic; check_* functions return the error message as a string (or TRUE on success) for custom handling.

# Conditional logic
if (test_numeric(x)) {
  mean(x)
} else {
  length(x)
}

# Custom error message
msg <- check_numeric(x, lower = 0)
if (!isTRUE(msg)) stop("Custom prefix: ", msg)

Opinion

For package code with many validated arguments, checkmate pays for itself in readability and consistency. For scripts and one-off functions, stopifnot() and match.arg() are enough. Don’t add a dependency for two assertions.

25.6 `tryCatch()` and error handling

Everything so far has been about signalling your own errors. But sometimes the error comes from somewhere else: a file that doesn’t exist, a web API that times out, a parsing function that chokes on malformed input. You need to catch those errors and decide what to do about them.

tryCatch() catches an error, warning, or message and runs a handler:

result <- tryCatch(
  log("a"),
  error = function(e) NA
)

result
#> [1] NA

Instead of crashing, log("a") returns NA. The error handler receives the condition object e and returns a fallback value. This is recovery: you know an error might happen, and you have a plan for it.

try() is a simpler wrapper that returns the result on success, or an object of class "try-error" on failure:

result <- try(log("a"), silent = TRUE)
class(result)
#> [1] "try-error"

try() is convenient for quick scripts, but tryCatch() gives you more control.

For functional programming, purrr::safely() and purrr::possibly() wrap a function to make it error-tolerant. Together with tryCatch(), these form R’s condition handling system. safely() returns a list with $result and $error, while possibly() returns a default value on failure. Both are designed for use with map() when applying a function that might fail on some elements:

safe_log <- purrr::safely(log)
safe_log(10)
#> $result
#> [1] 2.302585
#> 
#> $error
#> NULL
safe_log("a")
#> $result
#> NULL
#> 
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

Erlang (1986, Ericsson) takes this philosophy to its extreme: instead of preventing crashes with validation, Erlang designs supervisors that detect and restart failed processes. Ericsson’s telecom switches achieved 99.9999999% uptime using this “let it crash” philosophy, and R’s tryCatch() is a mild version of the same idea. Both philosophies agree on the essential point: silent failures are the real enemy. But what does a disciplined version of “catch and recover” look like as a data structure?

In functional programming, the answer is the Either type (or Result in Rust): a computation either succeeds with a value or fails with an error. tryCatch runs the expression; if it succeeds, you get the result; if it throws, you get the error handler’s output. This is the same “value or absence” logic as NA propagation (Section 4.6), but for errors instead of missing data. NA implements Maybe (present or absent); tryCatch implements Either (success or failure). Both thread the possibility of failure through a computation without crashing. Haskell and Rust make these patterns explicit with monadic types; R keeps them implicit in NA semantics and tryCatch mechanics, but the underlying structure is identical.

Opinion

Use tryCatch() when you have a recovery plan. Don’t use it to silently swallow errors. An error you hide is a bug you invite.

Exercises

Write a function safe_read that takes a file path and returns the file contents (via readLines()), or NULL if the file doesn’t exist. Use tryCatch().
Use purrr::map() and purrr::possibly() to apply as.numeric() to the list list("1", "two", "3", "four"). Use NA as the default.

25.7 `rlang::abort()` and cli

rlang::abort() is a modern replacement for stop() that automatically saves a backtrace (the chain of function calls that led to the error) and supports condition classes for selective catching:

rlang::abort(
  message = "x must be positive",
  class = "validation_error",
  x = x
)

A caller can now catch validation_error specifically, while letting other errors propagate:

tryCatch(
  my_function(x),
  validation_error = function(e) {
    message("Validation failed: ", e$message)
  }
)

The cli package takes this further. cli::cli_abort() produces structured, formatted error messages with bullets and inline markup:

cli::cli_abort(c(
  "x" = "{.arg x} must be a positive number.",
  "i" = "You supplied {.cls {class(x)}} of length {length(x)}."
))

The "x" prefix marks the main error; "i" marks informational context. {.arg x} and {.cls ...} are cli’s inline formatting: they render the argument name and class with special styling in terminals that support it.

If you have ever wondered why tidyverse error messages look so polished, with colored bullets and formatted argument names, cli::cli_abort() is the answer. You don’t need cli-formatted errors in scripts, but knowing they exist helps you read the error messages that tidyverse packages produce.

rlang also provides rlang::arg_match(), a stricter alternative to match.arg() that produces tidyverse-styled error messages and does not support partial matching, which makes it safer for package APIs where partial matching can cause subtle bugs.

Exercises

Rewrite one of your stop() calls from the exercises above using rlang::abort() with a custom class (e.g., "input_error"). Then write a tryCatch() that catches only that class.
Use cli::cli_abort() to produce an error message with a bullet point showing the expected type and the actual type. Compare the output with a plain stop() message.

25.8 Designing function interfaces

Validation is downstream of design. A well-designed function is hard to misuse in the first place, which means much of your defensive code never needs to exist.

Required arguments have no defaults. Optional arguments have sensible defaults. If a function needs a file path to work, don’t give path a default of NULL and then check for it inside the body. Make the caller provide it.

Use match.arg() for string options. The valid choices are visible in the function signature and in the help page, so the caller doesn’t need to read the source code to know what’s allowed.

# Good: choices are visible
my_plot <- function(data, type = c("scatter", "line", "bar")) {
  type <- match.arg(type)
  # ...
}

# Bad: choices are hidden inside the body
my_plot <- function(data, type = "scatter") {
  if (!type %in% c("scatter", "line", "bar")) stop("invalid type")
  # ...
}

Put the data first. This makes the function pipe-friendly (data |> my_func()). Modifier arguments come after.

Use ... sparingly. If you use it, document what it accepts and where it gets forwarded. An undocumented ... is a trap: the caller passes a misspelled argument, ... absorbs it silently, and no error is raised. (This is one of R’s most common sources of invisible bugs.)

Don’t overload argument types. An argument x that can be a vector, a data frame, or a formula is hard to validate and hard to use, because each accepted type needs its own validation branch and its own documentation. If you find yourself writing if (is.data.frame(x)) ... else if (is.numeric(x)) ..., consider splitting the function or using S3 dispatch.

This connects to Chapter 24: S7 properties with type declarations are validation built into the class definition. When you declare that a property must be a positive numeric scalar, invalid objects cannot exist. Validation at construction time, enforced by the type system, is the strongest form of defensive programming.

Opinion

The best validation is the kind you don’t have to write. A function with a clear signature, named options via match.arg(), and typed S7 properties catches most mistakes before any if/stop check runs. Design the interface right, and defensive code becomes a safety net instead of a load-bearing wall.

25.9 Summary

Silent failures are the worst kind of bug: the function returns an answer, and the answer is wrong, and nothing tells you. Validate at the boundary. Write error messages that say what was expected and what was received. Catch errors only when you have a recovery plan. And design interfaces that make misuse difficult in the first place, so the validation you write covers less ground.

25.1 The case for validation

25.2 stop(), warning(), and message()

25.3 Writing informative errors

Exercises

25.4 Common validation patterns

Exercises

25.5 The checkmate package

25.6 tryCatch() and error handling

Exercises

25.7 rlang::abort() and cli

Exercises

25.8 Designing function interfaces

25.9 Summary

25.2 `stop()`, `warning()`, and `message()`

25.6 `tryCatch()` and error handling

25.7 `rlang::abort()` and cli