power <- function(exponent) {
function(x) x ^ exponent
}
square <- power(2)
cube <- power(3)20 Function factories
Suppose you need twelve formatting functions: one for dollars, one for euros, one for percentages, one for each of nine other currencies your client cares about. You could write twelve functions by hand, each identical except for a prefix string. Or you could write one function that manufactures the other twelve. In Section 7.4, you saw make_adder, a function that returns a function; in Chapter 18, you learned why it works, because the returned function is a closure that captures its creation environment. This chapter puts both ideas to serious use, building entire families of functions from a single template.
20.1 The pattern
A function factory is a function that returns a function:
square(5)
#> [1] 25
cube(5)
#> [1] 125When you call power(2), R creates an execution environment where exponent is 2, then returns an anonymous function that closes over that environment. square remembers exponent = 2 forever; cube remembers exponent = 3. Same logic, different parameters. The factory produces a family of related functions, and each member of that family carries its own private copy of the parameter that distinguishes it from its siblings.
This is exactly make_adder from Section 7.4, with exponentiation instead of addition. That section introduced the idea without naming it. The name is function factory, and the functions it produces are closures.
Function factories are partial application made explicit: power(2) fixes one argument of a two-argument operation and returns a function of the remaining argument. In lambda calculus, power is λe. λx. x^e; applying it to 2 gives λx. x^2. This decomposition of multi-argument functions into chains of single-argument functions is called currying (Schonfinkel 1924, Curry 1958). In Haskell, all functions are curried by default: add 5 3 is really (add 5) 3, where add 5 returns a function. R’s function factories are manual currying. In combinatory logic, the simplest factory is the K combinator (K = λx.λy. x): it takes a value and returns a function that always produces that value, ignoring its argument. make_formatter("$") is K applied: it captures "$" and returns a function that ignores everything except its argument.
Exercises
- Write a function factory
make_multiplierthat takes afactorand returns a function that multiplies its argument byfactor. Createdoubleandtripleand test them on the number 7. - What does
power(1)return? Is it the identity function? Test it. - Write a factory
make_greeterthat takes a greeting string and returns a function that takes a name and produces the greeting.make_greeter("Hello")("Alice")should return"Hello, Alice".
20.2 The lazy evaluation trap
Here is a bug that will cost you an afternoon if you have never seen it before. R evaluates arguments lazily, meaning they are not computed until they are actually used, and this seemingly innocent optimization has consequences for factories that are anything but innocent (the lazy evaluation trap):
exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 125That returns 125, not 25. Why? Because the variable exponent inside the factory is a promise pointing to exp in the calling environment. When sq(5) finally forces the evaluation of exponent, exp has already been changed to 3. The factory captured a promise, not a value.
The loop version is worse:
fns <- list()
for (i in 1:3) {
fns[[i]] <- function(x) x ^ i
}
fns[[1]](2)
#> [1] 8
fns[[2]](2)
#> [1] 8
fns[[3]](2)
#> [1] 8All three return 8. By the time any of them is called, i is 3.
The fix is force():
power <- function(exponent) {
force(exponent)
function(x) x ^ exponent
}force(exponent) evaluates the argument immediately, capturing the value rather than the promise, and now the factory works correctly:
exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 2525, as expected. The loop version needs force() too, but the idiomatic solution is lapply, which sidesteps the problem entirely:
fns <- lapply(1:3, \(i) function(x) x ^ i)
fns[[1]](2)
#> [1] 2
fns[[2]](2)
#> [1] 4
fns[[3]](2)
#> [1] 8Why does this work when the for loop didn’t? Because the anonymous function \(i) creates a new scope for each iteration. When lapply calls \(i) with the value 1, that call gets its own execution environment where i is 1, and the inner function(x) x ^ i closes over that environment. The next call gets a fresh environment where i is 2, and so on. Each manufactured function captures a different i in a different environment, whereas in the for loop there is only one i in one environment and every function points to the same variable. The \(i) wrapper is doing what force() does, but structurally: by binding i as a function argument, it forces evaluation at the moment of the call.
Every factory should force() every argument that the manufactured function uses. Make this a habit, not a debugging exercise. The lazy evaluation trap is one of the most common sources of subtle bugs in R, and force() costs nothing.
Exercises
Predict the output of the following code, then run it to check:
val <- 10 make_adder <- function(n) function(x) x + n add_val <- make_adder(val) val <- 20 add_val(1)Fix the
make_adderfactory above withforce(). Verify that changingvalafter creatingadd_valno longer affects the result.Why does
lapply(1:3, \(i) function(x) x ^ i)work without an explicitforce()call?
20.3 Practical factories
The pattern shows up wherever you find yourself repeating the same logic with different parameters: formatting, filtering, statistical transformations, anywhere a family of behaviors differs by a handful of constants.
Parameterized formatters. A factory can bake a prefix and suffix into a formatting function:
make_formatter <- function(prefix, suffix = "") {
force(prefix)
force(suffix)
function(x) paste0(prefix, x, suffix)
}
usd <- make_formatter("$")
pct <- make_formatter("", "%")usd(42)
#> [1] "$42"
pct(0.15)
#> [1] "0.15%"Threshold filters. A factory can produce filter functions for different cutoffs:
above <- function(threshold) {
force(threshold)
function(x) x[x > threshold]
}
above_zero <- above(0)
above_zero(c(-2, 0, 3, -1, 5))
#> [1] 3 5Statistical function families. The Box-Cox transformation is parameterized by a single value, lambda, which makes it a natural fit for a factory:
box_cox <- function(lambda) {
force(lambda)
if (lambda == 0) {
\(x) log(x)
} else {
\(x) (x ^ lambda - 1) / lambda
}
}
bc1 <- box_cox(1)
bc0 <- box_cox(0)
bc1(c(1, 2, 4))
#> [1] 0 1 3
bc0(c(1, 2, 4))
#> [1] 0.0000000 0.6931472 1.3862944Notice something subtle about that if statement: it runs once, when the factory is called, not every time the produced function is called. The factory picks the right formula at creation time and returns a function that does no branching at all.
This is staged computation: you separate the cost of choosing behavior from the cost of running behavior. The if branch evaluates once, during construction; the produced function evaluates many times, during use, and pays nothing for the choice that was already made. The same pattern drives much of ggplot2. When you call scale_color_brewer(palette = "Set1"), the factory runs once at plot-build time, baking the palette into a function that maps data values to colors. If it re-evaluated the palette choice every time it colored a point, the plot would slow to a crawl on large datasets. ggplot2 gains its speed from that separation: the decision moves from the data path (runs per point) to the construction path (runs once). Every factory you write makes a version of the same trade. The returned function carries less weight because the decision has already been made, baked into the closure’s environment at creation time, and never revisited.
The next section starts from the other direction: instead of building a function from parameters, you take a function that already exists and modify its behavior.
Exercises
- Write a factory
betweenthat takeslowandhighand returns a function that keeps only elements of a vector that fall in the range(low, high). Test it on1:20with bounds 5 and 15. - Write a factory
make_counterthat returns a function with no arguments. Each time the returned function is called, it should return the next integer (1, 2, 3, …). Hint: use<<-to modify a variable in the enclosing environment.
20.4 Memoization
Sometimes you already have the right function, and the only problem is speed. Memoization wraps a function so it caches its results: the first call computes normally, but subsequent calls with the same arguments return the cached value instantly, skipping the computation entirely.
library(memoise)
slow_square <- function(x) {
Sys.sleep(1)
x ^ 2
}
fast_square <- memoise(slow_square)system.time(fast_square(10))
#> user system elapsed
#> 0.00 0.00 1.03
system.time(fast_square(10))
#> user system elapsed
#> 0.02 0.00 0.00The first call takes about a second. The second is instant, because memoise recognizes the argument and returns the cached result without re-executing the function body.
forget() clears the cache:
forget(fast_square)
#> [1] TRUEWhen to memoize: expensive computations with repeated inputs, API calls, simulations, recursive algorithms like Fibonacci, file parsing.
When not to: functions with side effects (plotting, writing files), functions whose inputs are large or unique every time (the cache grows without bound), functions that depend on external state (database contents, system time). Memoization trades memory for speed, and it only pays off when inputs recur.
Exercises
- Write a function
slow_sumthat takes a vector, sleeps for 1 second, and returns the sum. Memoize it. Call it twice with the same input and verify the second call is fast. - What happens if you memoize
rnorm? Trymemo_rnorm <- memoise(rnorm); memo_rnorm(5); memo_rnorm(5). Is this useful?
20.5 Function operators
What do you do when a function might fail on some inputs, but you need to apply it to hundreds of values and cannot afford to let one error kill the whole pipeline? You could wrap every call in tryCatch. Or you could wrap the function itself once, producing a new function that handles errors gracefully everywhere it is called.
A function operator takes a function as input and returns a modified function as output. Like a factory, but the raw material is a function rather than a number or string. This pattern is called a “decorator” in Python (where @decorator syntax makes it first-class) and appeared in the Gang of Four design patterns book (1994) as the Decorator Pattern. R has no @-syntax, but safely(), memoise(), and with_logging() are all decorators: they modify behavior by wrapping, without changing the original.
safely() wraps a function so it never errors, returning instead a list with $result and $error:
library(purrr)
safe_log <- safely(log)
safe_log(10)
#> $result
#> [1] 2.302585
#>
#> $error
#> NULL
safe_log("a")
#> $result
#> NULL
#>
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>possibly() is simpler: it returns a default value on error:
careful_log <- possibly(log, otherwise = NA)
careful_log(10)
#> [1] 2.302585
careful_log("a")
#> [1] NAquietly() captures messages, warnings, and output as list components instead of printing them:
quiet_log <- quietly(log)
quiet_log(-1)
#> $result
#> [1] NaN
#>
#> $output
#> [1] ""
#>
#> $warnings
#> [1] "NaNs produced"
#>
#> $messages
#> character(0)You can write your own operators just as easily:
with_logging <- function(f) {
force(f)
function(...) {
cat("Calling function with", length(list(...)), "argument(s)\n")
f(...)
}
}
logged_mean <- with_logging(mean)
logged_mean(1:10)
#> Calling function with 1 argument(s)
#> [1] 5.5Notice the force(f) call. Function operators are factories whose input happens to be a function, so the same lazy evaluation trap from Section 20.2 applies: without force(), f is a promise, and if the variable it points to changes before the operator’s result is called, you get the wrong function.
Exercises
- Use
safely()andmap()to applylog()to the listlist(1, -1, "a", 10). Extract the results and the errors separately. - Write a function operator
with_timerthat wraps a function so it prints the elapsed time each time it is called. Test it withSys.sleep. - What does
possibly(possibly(log, NA), NA)do? Is double-wrapping useful?
20.6 Composing factories and operators
Each of these patterns (factories, operators, functionals like map()) does one small thing. The real power emerges when you compose them:
safe_log <- safely(log)
results <- map(list(1, -1, "a", 10), safe_log)
#> Warning in .f(...): NaNs produced
str(results)
#> List of 4
#> $ :List of 2
#> ..$ result: num 0
#> ..$ error : NULL
#> $ :List of 2
#> ..$ result: num NaN
#> ..$ error : NULL
#> $ :List of 2
#> ..$ result: NULL
#> ..$ error :List of 2
#> .. ..$ message: chr "non-numeric argument to mathematical function"
#> .. ..$ call : language .f(...)
#> .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#> $ :List of 2
#> ..$ result: num 2.3
#> ..$ error : NULLsafely(log) is a function operator applied to log; the result is passed to map(), a functional from Section 7.1. Three concepts from three chapters, working together in a single pipeline. What if you need both safety and logging on the same function?
safe_logged_log <- with_logging(safely(log))
safe_logged_log(10)
#> Calling function with 1 argument(s)
#> $result
#> [1] 2.302585
#>
#> $error
#> NULL
safe_logged_log("a")
#> Calling function with 1 argument(s)
#> $result
#> NULL
#>
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>The inner operator (safely) handles errors; the outer one (with_logging) adds logging. Each layer does one thing, and you combine them to get the behavior you want.
A factory builds specialized functions, an operator modifies them, and map() applies them across inputs.
formatters <- map(c("$", "EUR ", "GBP "), make_formatter)
map(formatters, \(f) f(100))
#> [[1]]
#> [1] "$100"
#>
#> [[2]]
#> [1] "EUR 100"
#>
#> [[3]]
#> [1] "GBP 100"map() over a vector of prefixes produces a list of formatters, then map() over the formatters applies each one. A list of functions is data you can iterate over, filter, compose, and pass around just like a list of numbers. Chapter 21 uses that idea to collapse entire sequences into single values.