7 Functions are values

In Section 5.4, you saw that a function is a value: you can assign sqrt to a new name and call it, the same way you’d assign a number. If functions are values, you can pass them to other functions, store them in lists, and build functions that create new functions.

7.1 Passing functions to functions

sapply() takes a vector and a function, and applies the function to every element:

sapply(1:5, sqrt)
#> [1] 1.000000 1.414214 1.732051 2.000000 2.236068

Notice that sqrt has no parentheses. You are not calling sqrt here; you are handing the function itself to sapply, which calls it five times, once per element. This is the same distinction from Section 5.8: fact is the definition (a rule), while fact(n) is the rule applied to an argument.

Here is the pattern behind most of R’s power: instead of writing a loop that processes each element, you hand a function to something that knows how to apply it. The loop is still running inside sapply, but you never had to write it. You described what to compute, not how to iterate.

sapply(c(-3, 0, 4, -1, 7), abs)
#> [1] 3 0 4 1 7

sapply(c("hello", "world"), nchar)
#> hello world 
#>     5     5

Any function that takes a single argument works here: abs computes absolute values, nchar counts characters, and you can pass either to sapply with the same syntax.

sapply takes each element of the input, applies your function, and tries to simplify the results into a vector. When all the results have the same length (as they do here, since sqrt returns one number per input), you get a numeric vector back. When results differ in length or type, sapply returns a list instead. lapply() is the stricter sibling that always returns a list. (Chapter 19 covers the distinction in detail; for now, the simple cases are enough.)

A function that takes another function as an argument is called a higher-order function. sapply is one; most of R’s power comes from others.

The lambda calculus connection

In lambda calculus notation (Section 1.2), passing a function to another function is just application:

sapply = λxs. λf. [f(xs₁), f(xs₂), ..., f(xsₙ)]
sapply([1,2,3,4,5])(sqrt) = [sqrt(1), sqrt(2), sqrt(3), sqrt(4), sqrt(5)]

Replace each f(xsᵢ) with its value, and the expression reduces to the result vector. This is the same term replacement from Section 4.2.

Exercises

Use sapply() to compute log() of the numbers 1 through 10.
What happens if you write sapply(1:5, sqrt()) with parentheses? Try it and read the error.
Use sapply() and is.na() to check which elements of c(1, NA, 3, NA, 5) are missing.

7.2 Anonymous functions

You can pass a named function like sqrt to sapply. But suppose you need to square every element: there is no built-in square function sitting in base R, and creating one just to use it once feels like overkill.

sapply(1:5, function(x) x^2)
#> [1]  1  4  9 16 25

function(x) x^2 creates a function right where it is needed, with no name and no assignment; it exists only for this one call to sapply, and then it vanishes. This is an anonymous function, and you already saw a brief example in Section 5.4.

The function(x) prefix is verbose enough that many R programmers avoided anonymous functions for years, defaulting to named helpers even for throwaway operations. The language eventually responded.

R 4.1 introduced a shorter syntax:

sapply(1:5, \(x) x^2)
#> [1]  1  4  9 16 25

\(x) is shorthand for function(x), and the backslash is meant to evoke the Greek letter lambda (λ), which is where the whole idea comes from (Section 1.2). In lambda calculus, this function is written λx. x²; in R before 4.1, it is function(x) x^2; in R 4.1+, it is \(x) x^2.

Anonymous functions shine when the operation is short and you only need it once:

sapply(c(10, 20, 30), \(x) x / sum(c(10, 20, 30)))
#> [1] 0.1666667 0.3333333 0.5000000

sapply(c("alice", "bob"), \(name) paste("Hello,", name))
#>          alice            bob 
#> "Hello, alice"   "Hello, bob"

When the function gets longer than one line, or when you need it in more than one place, give it a name.

Opinion

If you use a function twice, name it. Anonymous functions belong inside sapply, lapply, and similar one-off contexts. The moment you catch yourself copying the same \(x) ... expression, that function is asking for a name.

Exercises

Use sapply() with an anonymous function to add 10 to each element of c(1, 5, 9).
Rewrite sapply(1:5, \(x) x^2) using the older function(x) syntax. Verify you get the same result.
Write a named function to_celsius that converts Fahrenheit to Celsius. Then use sapply() to convert c(32, 72, 100, 212).

7.3 Storing functions in lists

Consider a data frame with mixed column types: some numeric, some character, some logical. You want a one-number summary of each column, but mean is meaningless for character data and length(unique(...)) is meaningless for logicals. The brute-force solution is a chain of if/else if blocks that checks each column’s type and calls the right function. It works for three types. It gets ugly at six. At twelve, it is unreadable, and every new type means hunting through the chain to find the right insertion point.

The logic is fine. The problem is that it’s tangled with the control flow. If functions are values, you can separate the two: store the logic in a list, keyed by type, and let a single lookup replace the entire chain.

summarizers <- list(
  numeric = \(x) mean(x, na.rm = TRUE),
  character = \(x) length(unique(x)),
  logical = \(x) sum(x, na.rm = TRUE)
)

summarizers$numeric(c(1, 2, 3, NA, 5))
#> [1] 2.75
summarizers$character(c("a", "b", "a", "c"))
#> [1] 3
summarizers$logical(c(TRUE, FALSE, TRUE, TRUE))
#> [1] 3

summarizers is a list of three functions, accessed with $ like any named list element. A list of functions keyed by name is a dispatch table: given a key, it returns the function you need, no conditionals required. Adding a new type means adding one entry to the list. No if/else to restructure, no risk of breaking the branches above or below. Package internals, Shiny applications, and testing frameworks all use this pattern to select behavior at runtime.

Notice what moved. In the if/else version, control flow is in charge: it interrogates the type, then calls the right function as a subordinate. In the dispatch table, the functions exist independently as values in a list, and the control flow shrinks to a single lookup. The operations no longer live inside the conditional; the conditional lives inside the data structure. A dispatch table you extend by dropping a new entry into a list; the entries that already work never see the change.

Why dispatch tables feel natural in R

In Church’s lambda calculus, functions were values from the start. Packaging them inside control structures was a habit inherited from imperative languages, not a necessity. The dispatch table recovers the default: data holds the operations, and a lookup replaces the branching.

Here is the same idea in a simpler form:

transforms <- list(
  double = \(x) x * 2,
  triple = \(x) x * 3,
  negate = \(x) -x
)

transforms$double(5)
#> [1] 10
transforms$triple(5)
#> [1] 15
transforms$negate(5)
#> [1] -5

The dispatch table stores functions that already exist. But what if you need to manufacture a function that does not exist yet?

Exercises

Create a list with two functions: square and cube. Apply both to the number 4.
Write a list of functions where each one converts a temperature from Celsius to a different scale (Fahrenheit, Kelvin). Use them to convert 100 degrees Celsius.

7.4 Functions that return functions

A function can return a number, a string, a vector, or a data frame. It can also return a function.

make_adder <- function(n) {
  function(x) x + n
}

make_adder is a function factory: it takes a number n and returns a new function that adds n to its argument:

add5 <- make_adder(5)
add5(3)
#> [1] 8
add5(100)
#> [1] 105

add10 <- make_adder(10)
add10(3)
#> [1] 13

add5 and add10 are both functions, built by make_adder with different values of n, and each one remembers the value it was created with. add5 remembers 5; add10 remembers 10. How?

When make_adder(5) runs, it creates a local environment where n = 5. The returned function function(x) x + n was defined inside that environment. When you later call add5(3), R finds n through lexical scoping (Section 5.5): it looks in the environment where the function was defined and finds n = 5. But that environment was created during a function call that has already returned. The call is gone. So where does the environment live?

A function that travels with its creation environment is called a closure. Every function in R is technically a closure, but the term earns its weight when a function returned by another function captures variables from the enclosing scope. Closures are powerful enough to deserve their own chapter (Chapter 18); the mechanism that makes make_adder work has consequences far beyond adding numbers.

The lambda calculus connection

In lambda calculus notation:

make_adder = λn. λx. x + n
make_adder(5) = (λn. λx. x + n)(5) = λx. x + 5

Replacing n with 5 gives a new function, λx. x + 5, which is add5. The returned function has one argument already filled in. This is currying: fixing one argument to produce a simpler function. Each call to a function factory is a beta reduction (Section 1.2); the returned function is the reduced term.

make_multiplier <- function(factor) {
  function(x) x * factor
}

double <- make_multiplier(2)
triple <- make_multiplier(3)

double(7)
#> [1] 14
triple(7)
#> [1] 21

make_multiplier is another function factory, producing double and triple from the same template with different values baked in. Function factories appear throughout R: custom plot themes, parameterized statistical tests, configured data transformations. Chapter Chapter 20 covers them in depth. For now, notice that the functions we have been passing around, storing in lists, and returning from factories all share a quiet assumption: that operators like + and * are themselves functions. Are they?

Exercises

Write a function make_power that takes an exponent n and returns a function that raises its argument to the nth power. Create square <- make_power(2) and cube <- make_power(3) and test them.
What does make_adder(0) return? Is it the identity function?
Write a function make_greeter that takes a greeting string (like "Hello" or "Bonjour") and returns a function that takes a name and produces the greeting. Test: greet_en <- make_greeter("Hello"); greet_en("Alice") should return "Hello, Alice".

7.5 Operators are functions

Try this:

`+`(2, 3)
#> [1] 5

+ is a function. It takes two arguments and returns their sum, and when you write 2 + 3, R translates it to `+`(2, 3). The infix notation is syntactic sugar; underneath, it is a function call.

The same is true for every operator:

`*`(4, 5)
#> [1] 20
`>`(10, 3)
#> [1] TRUE
`&`(TRUE, FALSE)
#> [1] FALSE

It goes further than arithmetic. Indexing with [ is a function call:

x <- c(10, 20, 30)
`[`(x, 2)
#> [1] 20

Accessing a list element with $ is a function call, and so is assignment with <-. Even the curly brace { is a function call (it evaluates each expression inside and returns the last one).

John Chambers, the creator of S (Section 2.3), summarized R’s design with two principles:

“Everything that exists is an object. Everything that happens is a function call.”

Numbers, strings, vectors, functions, even NULL are all objects. Addition, comparison, indexing, assignment, and control flow are all function calls. Since functions are themselves objects, the two principles fold into each other.

You can even override built-in operators. (This is a demonstration, not a recommendation.)

`+` <- function(a, b) a * b
2 + 3
#> [1] 6

rm(`+`)
2 + 3
#> [1] 5

After redefining + to mean multiplication, 2 + 3 returns 6. rm() removes the custom definition and restores the original. The fact that you can redefine + shows that it really is just a name bound to a function, like any other name, a value you can inspect, replace, and pass around.

Opinion

Never redefine built-in operators in real code. The example above exists to show you what R is made of, not to suggest a workflow. Code that redefines + is code that nobody, including you six months later, will be able to read.

Exercises

Rewrite 10 - 3 as an explicit function call using backticks.
Rewrite x[1] as an explicit function call. (Hint: `[`(x, 1).)
What does `{`(1, 2, 3) return? Why?