sapply(1:5, sqrt)
#> [1] 1.000000 1.414214 1.732051 2.000000 2.2360687 Functions are values
In Section 5.4, you saw that a function is a value: you can assign sqrt to a new name and call it, the same way you’d assign a number. If functions are values, you can pass them to other functions, store them in lists, and build functions that create new functions.
7.1 Passing functions to functions
sapply() takes a vector and a function, and applies the function to every element:
Notice that sqrt has no parentheses. You are not calling sqrt here; you are handing the function itself to sapply, which calls it five times, once per element. This is the same distinction from Section 5.8: fact is the definition (a rule), while fact(n) is the rule applied to an argument. When you write sapply(1:5, sqrt), you pass the definition, and sapply decides when to apply it.
Here is the pattern behind most of R’s power: instead of writing a loop that processes each element, you hand a function to something that knows how to apply it. The loop is still running inside sapply, but you never had to write it. You described what to compute, not how to iterate.
sapply(c(-3, 0, 4, -1, 7), abs)
#> [1] 3 0 4 1 7sapply(c("hello", "world"), nchar)
#> hello world
#> 5 5Any function that takes a single argument works here: abs computes absolute values, nchar counts characters, and you can pass either to sapply with the same syntax.
sapply takes each element of the input, applies your function, and tries to simplify the results into a vector. When all the results have the same length (as they do here, since sqrt returns one number per input), you get a numeric vector back. When results differ in length or type, sapply returns a list instead. lapply() is the stricter sibling that always returns a list. (Chapter 19 covers the distinction in detail; for now, the simple cases are enough.)
A function that takes another function as an argument is called a higher-order function. sapply is one; most of R’s power comes from others. The pattern is the same one from Section 4.4: apply a function to each element of a container, get back a container of the same shape. In Section 4.4, R did this implicitly with vectorization (sqrt(c(1,4,9))). Here, sapply does it explicitly: you pick the function, sapply handles the iteration.
In lambda calculus notation (Section 1.2), passing a function to another function is just application:
sapply = λxs. λf. [f(xs₁), f(xs₂), ..., f(xsₙ)]
sapply([1,2,3,4,5])(sqrt) = [sqrt(1), sqrt(2), sqrt(3), sqrt(4), sqrt(5)]
Replace each f(xsᵢ) with its value, and the expression reduces to the result vector. This is the same term replacement from Section 4.2.
Exercises
- Use
sapply()to computelog()of the numbers 1 through 10. - What happens if you write
sapply(1:5, sqrt())with parentheses? Try it and read the error. - Use
sapply()andis.na()to check which elements ofc(1, NA, 3, NA, 5)are missing.
7.2 Anonymous functions
You can pass a named function like sqrt to sapply. But suppose you need to square every element: there is no built-in square function sitting in base R, and creating one just to use it once feels like overkill.
sapply(1:5, function(x) x^2)
#> [1] 1 4 9 16 25function(x) x^2 creates a function right where it is needed, with no name and no assignment; it exists only for this one call to sapply, and then it vanishes. This is an anonymous function, and you already saw a brief example in Section 5.4.
The function(x) prefix is verbose enough that many R programmers avoided anonymous functions for years, defaulting to named helpers even for throwaway operations. The language eventually responded.
R 4.1 introduced a shorter syntax:
sapply(1:5, \(x) x^2)
#> [1] 1 4 9 16 25\(x) is shorthand for function(x), and the backslash is meant to evoke the Greek letter lambda (λ), which is where the whole idea comes from (Section 1.2). In lambda calculus, this function is written λx. x²; in R before 4.1, it is function(x) x^2; in R 4.1+, it is \(x) x^2.
Anonymous functions shine when the operation is short and you only need it once:
sapply(c(10, 20, 30), \(x) x / sum(c(10, 20, 30)))
#> [1] 0.1666667 0.3333333 0.5000000sapply(c("alice", "bob"), \(name) paste("Hello,", name))
#> alice bob
#> "Hello, alice" "Hello, bob"When the function gets longer than one line, or when you need it in more than one place, give it a name. A function called normalize is easier to read than an anonymous function that divides by the sum. A function called greet is easier to maintain than a lambda pasted into three different sapply calls.
If you use a function twice, name it. Anonymous functions belong inside sapply, lapply, and similar one-off contexts. The moment you catch yourself copying the same \(x) ... expression, that function is asking for a name.
Exercises
- Use
sapply()with an anonymous function to add 10 to each element ofc(1, 5, 9). - Rewrite
sapply(1:5, \(x) x^2)using the olderfunction(x)syntax. Verify you get the same result. - Write a named function
to_celsiusthat converts Fahrenheit to Celsius. Then usesapply()to convertc(32, 72, 100, 212).
7.3 Storing functions in lists
Consider a data frame with mixed column types: some numeric, some character, some logical. You want a one-number summary of each column, but mean is meaningless for character data and length(unique(...)) is meaningless for logicals. The brute-force solution is a chain of if/else if blocks that checks each column’s type and calls the right function. It works for three types. It gets ugly at six. At twelve, it is unreadable, and every new type means hunting through the chain to find the right insertion point.
The logic is fine. The problem is that it’s tangled with the control flow. If functions are values, you can separate the two: store the logic in a list, keyed by type, and let a single lookup replace the entire chain.
summarizers <- list(
numeric = \(x) mean(x, na.rm = TRUE),
character = \(x) length(unique(x)),
logical = \(x) sum(x, na.rm = TRUE)
)
summarizers$numeric(c(1, 2, 3, NA, 5))
#> [1] 2.75
summarizers$character(c("a", "b", "a", "c"))
#> [1] 3
summarizers$logical(c(TRUE, FALSE, TRUE, TRUE))
#> [1] 3summarizers is a list of three functions, accessed with $ like any named list element. A list of functions keyed by name is a dispatch table: given a key, it returns the function you need, no conditionals required. Adding a new type means adding one entry to the list. No if/else to restructure, no risk of breaking the branches above or below. Package internals, Shiny applications, and testing frameworks all use this pattern to select behavior at runtime.
Notice what moved. In the if/else version, control flow is in charge: it interrogates the type, then calls the right function as a subordinate. In the dispatch table, the functions exist independently as values in a list, and the control flow shrinks to a single lookup. The operations no longer live inside the conditional; the conditional lives inside the data structure.
An if/else chain is closed: extending it means opening the chain and editing its guts, hoping you do not break the branches above and below. A dispatch table you extend by dropping a new entry into a list; the entries that already work never see the change.
In Church’s lambda calculus, functions were values from the start. Packaging them inside control structures was a habit inherited from imperative languages, not a necessity. The dispatch table recovers the default: data holds the operations, and a lookup replaces the branching.
Here is the same idea in a simpler form:
transforms <- list(
double = \(x) x * 2,
triple = \(x) x * 3,
negate = \(x) -x
)
transforms$double(5)
#> [1] 10
transforms$triple(5)
#> [1] 15
transforms$negate(5)
#> [1] -5The dispatch table stores functions that already exist. But what if you need to manufacture a function that does not exist yet?
Exercises
- Create a list with two functions:
squareandcube. Apply both to the number 4. - Write a list of functions where each one converts a temperature from Celsius to a different scale (Fahrenheit, Kelvin). Use them to convert 100 degrees Celsius.
7.4 Functions that return functions
A function can return a number, a string, a vector, or a data frame. It can also return a function.
make_adder <- function(n) {
function(x) x + n
}make_adder takes a number n and returns a new function that adds n to its argument:
add5 <- make_adder(5)
add5(3)
#> [1] 8
add5(100)
#> [1] 105add10 <- make_adder(10)
add10(3)
#> [1] 13add5 and add10 are both functions, built by make_adder with different values of n, and each one remembers the value it was created with. add5 remembers 5; add10 remembers 10. How?
When make_adder(5) runs, it creates a local environment where n = 5. The returned function function(x) x + n was defined inside that environment. When you later call add5(3), R finds n through lexical scoping (Section 5.5): it looks in the environment where the function was defined and finds n = 5. But that environment was created during a function call that has already returned. The call is gone. So where does the environment live?
A function that travels with its creation environment is called a closure. Every function in R is technically a closure, but the term earns its weight when a function returned by another function captures variables from the enclosing scope. Closures are powerful enough to deserve their own chapter (Chapter 18); the mechanism that makes make_adder work has consequences far beyond adding numbers.
In lambda calculus notation:
make_adder = λn. λx. x + n
make_adder(5) = (λn. λx. x + n)(5) = λx. x + 5
Replacing n with 5 gives a new function, λx. x + 5, which is add5. The returned function has one argument already filled in. This is currying: fixing one argument to produce a simpler function. Each call to a function factory is a beta reduction (Section 1.2); the returned function is the reduced term.
make_multiplier <- function(factor) {
function(x) x * factor
}
double <- make_multiplier(2)
triple <- make_multiplier(3)
double(7)
#> [1] 14
triple(7)
#> [1] 21make_multiplier is another function factory, producing double and triple from the same template with different values baked in. Function factories appear throughout R: custom plot themes, parameterized statistical tests, configured data transformations. Chapter Chapter 20 covers them in depth. For now, notice that the functions we have been passing around, storing in lists, and returning from factories all share a quiet assumption: that operators like + and * are themselves functions. Are they?
Exercises
- Write a function
make_powerthat takes an exponentnand returns a function that raises its argument to thenth power. Createsquare <- make_power(2)andcube <- make_power(3)and test them. - What does
make_adder(0)return? Is it the identity function? - Write a function
make_greeterthat takes a greeting string (like"Hello"or"Bonjour") and returns a function that takes a name and produces the greeting. Test:greet_en <- make_greeter("Hello"); greet_en("Alice")should return"Hello, Alice".
7.5 Operators are functions
Try this:
`+`(2, 3)
#> [1] 5+ is a function. It takes two arguments and returns their sum, and when you write 2 + 3, R translates it to `+`(2, 3). The infix notation is syntactic sugar; underneath, it is a function call.
The same is true for every operator:
`*`(4, 5)
#> [1] 20
`>`(10, 3)
#> [1] TRUE
`&`(TRUE, FALSE)
#> [1] FALSEIt goes further than arithmetic. Indexing with [ is a function call:
x <- c(10, 20, 30)
`[`(x, 2)
#> [1] 20Accessing a list element with $ is a function call. Even the curly brace { is a function call (it evaluates each expression inside and returns the last one). Assignment with <- is a function call.
John Chambers, the creator of S (Section 2.3), summarized R’s design with two principles:
“Everything that exists is an object. Everything that happens is a function call.”
Numbers, strings, vectors, functions, even NULL are all objects. Addition, comparison, indexing, assignment, and control flow are all function calls. Since functions are themselves objects, the two principles fold into each other.
You can even override built-in operators. (This is a demonstration, not a recommendation.)
`+` <- function(a, b) a * b
2 + 3
#> [1] 6rm(`+`)
2 + 3
#> [1] 5After redefining + to mean multiplication, 2 + 3 returns 6. rm() removes the custom definition and restores the original. The fact that you can redefine + shows that it really is just a name bound to a function, like any other name. R’s entire syntax is built on function calls, and those functions are values you can inspect, replace, and pass around.
Never redefine built-in operators in real code. The example above exists to show you what R is made of, not to suggest a workflow. Code that redefines + is code that nobody, including you six months later, will be able to read.
Exercises
- Rewrite
10 - 3as an explicit function call using backticks. - Rewrite
x[1]as an explicit function call. (Hint:`[`(x, 1).) - What does
`{`(1, 2, 3)return? Why?