20 Function factories

Suppose you need twelve formatting functions: one for dollars, one for euros, one for percentages, one for each of nine other currencies your client cares about. You could write twelve functions by hand, each identical except for a prefix string. Or you could write one function that manufactures the other twelve. In Section 7.4, you saw make_adder, a function that returns a function; in Chapter 18, you learned why it works, because the returned function is a closure that captures its creation environment. This chapter puts both ideas to serious use, building entire families of functions from a single template.

20.1 The pattern

A function factory is a function that returns a function:

power <- function(exponent) {
  function(x) x ^ exponent
}

square <- power(2)
cube <- power(3)

square(5)
#> [1] 25
cube(5)
#> [1] 125

When you call power(2), R creates an execution environment where exponent is 2, then returns an anonymous function that closes over that environment. square remembers exponent = 2 forever; cube remembers exponent = 3. Same logic, different parameters. The factory produces a family of related functions, and each member of that family carries its own private copy of the parameter that distinguishes it from its siblings.

The name for this pattern is function factory, and the functions it produces are closures.

Function factories are partial application made explicit: power(2) fixes one argument of a two-argument operation and returns a function of the remaining argument. In lambda calculus, power is λe. λx. x^e; applying it to 2 gives λx. x^2. This decomposition of multi-argument functions into chains of single-argument functions is called currying (Schonfinkel 1924, Curry 1958). In Haskell, all functions are curried by default: add 5 3 is really (add 5) 3, where add 5 returns a function. R’s function factories are manual currying.

Exercises

Write a function factory make_multiplier that takes a factor and returns a function that multiplies its argument by factor. Create double and triple and test them on the number 7.
What does power(1) return? Is it the identity function? Test it.
Write a factory make_greeter that takes a greeting string and returns a function that takes a name and produces the greeting. make_greeter("Hello")("Alice") should return "Hello, Alice".

20.2 The lazy evaluation trap

Here is a bug that will cost you an afternoon if you have never seen it before. R evaluates arguments lazily, meaning they are not computed until they are actually used, and this seemingly innocent optimization has consequences for factories that are anything but innocent (the lazy evaluation trap):

exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 125

That returns 125, not 25. Why? Because the variable exponent inside the factory is a promise pointing to exp in the calling environment. When sq(5) finally forces the evaluation of exponent, exp has already been changed to 3. The factory captured a promise, not a value.

The loop version is worse:

fns <- list()
for (i in 1:3) {
  fns[[i]] <- function(x) x ^ i
}

fns[[1]](2)
#> [1] 8
fns[[2]](2)
#> [1] 8
fns[[3]](2)
#> [1] 8

All three return 8. By the time any of them is called, i is 3.

The fix is force():

power <- function(exponent) {
  force(exponent)
  function(x) x ^ exponent
}

force(exponent) evaluates the argument immediately, capturing the value rather than the promise, and now the factory works correctly:

exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 25

25, as expected. The loop version needs force() too, but the idiomatic solution is lapply, which sidesteps the problem entirely:

fns <- lapply(1:3, \(i) function(x) x ^ i)

fns[[1]](2)
#> [1] 2
fns[[2]](2)
#> [1] 4
fns[[3]](2)
#> [1] 8

Why does this work when the for loop didn’t? Because the anonymous function \(i) creates a new scope for each iteration. When lapply calls \(i) with the value 1, that call gets its own execution environment where i is 1, and the inner function(x) x ^ i closes over that environment. The next call gets a fresh environment where i is 2, and so on. Each manufactured function captures a different i in a different environment, whereas in the for loop there is only one i in one environment and every function points to the same variable. The \(i) wrapper is doing what force() does, but structurally: by binding i as a function argument, it forces evaluation at the moment of the call.

Opinion

Every factory should force() every argument that the manufactured function uses. Make this a habit, not a debugging exercise. The lazy evaluation trap is one of the most common sources of subtle bugs in R, and force() costs nothing.

Exercises

Predict the output of the following code, then run it to check:

val <- 10
make_adder <- function(n) function(x) x + n
add_val <- make_adder(val)
val <- 20
add_val(1)

Fix the make_adder factory above with force(). Verify that changing val after creating add_val no longer affects the result.
Why does lapply(1:3, \(i) function(x) x ^ i) work without an explicit force() call?

20.3 Practical factories

The pattern shows up wherever you find yourself repeating the same logic with different parameters: formatting, filtering, statistical transformations, anywhere a family of behaviors differs by a handful of constants.

Parameterized formatters. A factory can bake a prefix and suffix into a formatting function:

make_formatter <- function(prefix, suffix = "") {
  force(prefix)
  force(suffix)
  function(x) paste0(prefix, x, suffix)
}

usd <- make_formatter("$")
pct <- make_formatter("", "%")

usd(42)
#> [1] "$42"
pct(0.15)
#> [1] "0.15%"

Threshold filters. A factory can produce filter functions for different cutoffs:

above <- function(threshold) {
  force(threshold)
  function(x) x[x > threshold]
}

above_zero <- above(0)
above_zero(c(-2, 0, 3, -1, 5))
#> [1] 3 5

Statistical function families. The Box-Cox transformation is parameterized by a single value, lambda, which makes it a natural fit for a factory:

box_cox <- function(lambda) {
  force(lambda)
  if (lambda == 0) {
    \(x) log(x)
  } else {
    \(x) (x ^ lambda - 1) / lambda
  }
}

bc1 <- box_cox(1)
bc0 <- box_cox(0)

bc1(c(1, 2, 4))
#> [1] 0 1 3
bc0(c(1, 2, 4))
#> [1] 0.0000000 0.6931472 1.3862944

Notice something subtle about that if statement: it runs once, when the factory is called, not every time the produced function is called. The factory picks the right formula at creation time and returns a function that does no branching at all.

This is staged computation: you separate the cost of choosing behavior from the cost of running behavior. The if branch evaluates once, during construction; the produced function evaluates many times, during use, and pays nothing for the choice that was already made. The same pattern drives much of ggplot2. When you call scale_color_brewer(palette = "Set1"), the factory runs once at plot-build time, baking the palette into a function that maps data values to colors. If it re-evaluated the palette choice every time it colored a point, the plot would slow to a crawl on large datasets. ggplot2 gains its speed from that separation: the decision moves from the data path (runs per point) to the construction path (runs once). Every factory you write makes a version of the same trade.

The next section starts from the other direction: instead of building a function from parameters, you take a function that already exists and modify its behavior.

Exercises

Write a factory between that takes low and high and returns a function that keeps only elements of a vector that fall in the range (low, high). Test it on 1:20 with bounds 5 and 15.
Write a factory make_counter that returns a function with no arguments. Each time the returned function is called, it should return the next integer (1, 2, 3, …). Hint: use <<- to modify a variable in the enclosing environment.

20.4 Memoization

Sometimes you already have the right function, and the only problem is speed. Memoization wraps a function so it caches its results: the first call computes normally, but subsequent calls with the same arguments return the cached value instantly, skipping the computation entirely.

library(memoise)

slow_square <- function(x) {
  Sys.sleep(1)
  x ^ 2
}

fast_square <- memoise(slow_square)

system.time(fast_square(10))
#>    user  system elapsed 
#>    0.00    0.00    1.03
system.time(fast_square(10))
#>    user  system elapsed 
#>    0.02    0.00    0.02

The first call takes about a second. The second is instant, because memoise recognizes the argument and returns the cached result without re-executing the function body.

forget() clears the cache:

forget(fast_square)
#> [1] TRUE

When to memoize:

Expensive computations with repeated inputs
API calls
Simulations
Recursive algorithms like Fibonacci
File parsing

When not to:

Functions with side effects (plotting, writing files)
Functions whose inputs are large or unique every time (the cache grows without bound)
Functions that depend on external state (database contents, system time)

Memoization trades memory for speed, and it only pays off when inputs recur.

Exercises

Write a function slow_sum that takes a vector, sleeps for 1 second, and returns the sum. Memoize it. Call it twice with the same input and verify the second call is fast.
What happens if you memoize rnorm? Try memo_rnorm <- memoise(rnorm); memo_rnorm(5); memo_rnorm(5). Is this useful?

20.5 Function operators

What do you do when a function might fail on some inputs, but you need to apply it to hundreds of values and cannot afford to let one error kill the whole pipeline? You could wrap every call in tryCatch. Or you could wrap the function itself once, producing a new function that handles errors gracefully everywhere it is called.

A function operator takes a function as input and returns a modified function as output. Like a factory, but the raw material is a function rather than a number or string.

safely() wraps a function so it never errors, returning instead a list with $result and $error:

library(purrr)

safe_log <- safely(log)
safe_log(10)
#> $result
#> [1] 2.302585
#> 
#> $error
#> NULL
safe_log("a")
#> $result
#> NULL
#> 
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

This wrapping is called a decorator in Python (where @decorator syntax makes it first-class) and appeared in the Gang of Four design patterns book (1994). R has no @-syntax, but safely(), memoise(), and with_logging() are all decorators: they modify behavior by wrapping, without changing the original.

possibly() is simpler: it returns a default value on error:

careful_log <- possibly(log, otherwise = NA)
careful_log(10)
#> [1] 2.302585
careful_log("a")
#> [1] NA

quietly() captures messages, warnings, and output as list components instead of printing them:

quiet_log <- quietly(log)
quiet_log(-1)
#> $result
#> [1] NaN
#> 
#> $output
#> [1] ""
#> 
#> $warnings
#> [1] "NaNs produced"
#> 
#> $messages
#> character(0)

You can write your own operators just as easily:

with_logging <- function(f) {
  force(f)
  function(...) {
    cat("Calling function with", length(list(...)), "argument(s)\n")
    f(...)
  }
}

logged_mean <- with_logging(mean)
logged_mean(1:10)
#> Calling function with 1 argument(s)
#> [1] 5.5

Notice the force(f) call. Function operators are factories whose input happens to be a function, so the same lazy evaluation trap from Section 20.2 applies: without force(), f is a promise, and if the variable it points to changes before the operator’s result is called, you get the wrong function.

Exercises

Use safely() and map() to apply log() to the list list(1, -1, "a", 10). Extract the results and the errors separately.
Write a function operator with_timer that wraps a function so it prints the elapsed time each time it is called. Test it with Sys.sleep.
What does possibly(possibly(log, NA), NA) do? Is double-wrapping useful?

20.6 Composing factories and operators

Each of these patterns (factories, operators, functionals like map()) does one small thing. The real power emerges when you compose them:

safe_log <- safely(log)
results <- map(list(1, -1, "a", 10), safe_log)
#> Warning in .f(...): NaNs produced

str(results)
#> List of 4
#>  $ :List of 2
#>   ..$ result: num 0
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num NaN
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .f(...)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#>  $ :List of 2
#>   ..$ result: num 2.3
#>   ..$ error : NULL

safely(log) is a function operator applied to log; the result is passed to map(), a functional from Section 7.1. Three concepts from three chapters, working together in a single pipeline. What if you need both safety and logging on the same function?

safe_logged_log <- with_logging(safely(log))
safe_logged_log(10)
#> Calling function with 1 argument(s)
#> $result
#> [1] 2.302585
#> 
#> $error
#> NULL
safe_logged_log("a")
#> Calling function with 1 argument(s)
#> $result
#> NULL
#> 
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

The inner operator (safely) handles errors; the outer one (with_logging) adds logging. Each layer does one thing, and you combine them to get the behavior you want.

A factory builds specialized functions, an operator modifies them, and map() applies them across inputs.

formatters <- map(c("$", "EUR ", "GBP "), make_formatter)

map(formatters, \(f) f(100))
#> [[1]]
#> [1] "$100"
#> 
#> [[2]]
#> [1] "EUR 100"
#> 
#> [[3]]
#> [1] "GBP 100"

map() over a vector of prefixes produces a list of formatters, then map() over the formatters applies each one. A list of functions is data you can iterate over, filter, compose, and pass around just like a list of numbers. Chapter 21 uses that idea to collapse entire sequences into single values.