20  Function factories

A function factory doesn’t compute a result. It builds a tool that computes results. You give it parameters once, and it gives you back a specialized function you can use forever. In Section 7.4, you saw make_adder: a function that returns a function. In Chapter 18, you learned why it works: the returned function is a closure that captures its creation environment. This chapter puts both ideas to work.

20.1 The pattern

A function factory is a function that returns a function:

power <- function(exponent) {
  function(x) x ^ exponent
}

square <- power(2)
cube <- power(3)
square(5)
#> [1] 25
cube(5)
#> [1] 125

power(2) creates an execution environment where exponent is 2, then returns a function that captures that environment. square remembers exponent = 2 forever, and cube remembers exponent = 3: same logic, different parameters. The factory produces a family of related functions.

This is exactly make_adder from Section 7.4, with a different operation. That section introduced the idea without naming it. The name is function factory, and the functions it produces are closures.

Function factories are partial application made explicit. power(2) fixes one argument of a two-argument operation and returns a function of the remaining argument. In lambda calculus, power is λe. λx. x^e; applying it to 2 gives λx. x^2. This decomposition of multi-argument functions into chains of single-argument functions is called currying (Schonfinkel 1924, Curry 1958). In Haskell, all functions are curried by default: add 5 3 is really (add 5) 3, where add 5 returns a function. R’s function factories are manual currying. In combinatory logic, the simplest factory is the K combinator (K = λx.λy. x): it takes a value and returns a function that always produces that value, ignoring its argument. make_formatter("$") is K at work: it captures "$" and returns a function whose behavior is fixed regardless of how it was called.

Exercises

  1. Write a function factory make_multiplier that takes a factor and returns a function that multiplies its argument by factor. Create double and triple and test them on the number 7.
  2. What does power(1) return? Is it the identity function? Test it.
  3. Write a factory make_greeter that takes a greeting string and returns a function that takes a name and produces the greeting. make_greeter("Hello")("Alice") should return "Hello, Alice".

20.2 The lazy evaluation trap

R evaluates arguments lazily: they are not computed until they are used. This matters in factories:

exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 125

That returns 125, not 25. The variable exponent inside the factory is a promise pointing to exp. When sq(5) finally evaluates exponent, exp has already been changed to 3. The factory captured a promise, not a value.

The loop version is worse:

fns <- list()
for (i in 1:3) {
  fns[[i]] <- function(x) x ^ i
}

fns[[1]](2)
#> [1] 8
fns[[2]](2)
#> [1] 8
fns[[3]](2)
#> [1] 8

All three return 8. By the time any of them is called, i is 3.

The fix is force():

power <- function(exponent) {
  force(exponent)
  function(x) x ^ exponent
}

force(exponent) evaluates the argument immediately, capturing the value, not the promise. Now the factory works correctly:

exp <- 2
sq <- power(exp)
exp <- 3
sq(5)
#> [1] 25

25, as expected. The loop version needs force() too, but the idiomatic solution is lapply, which sidesteps the problem entirely:

fns <- lapply(1:3, \(i) function(x) x ^ i)

fns[[1]](2)
#> [1] 2
fns[[2]](2)
#> [1] 4
fns[[3]](2)
#> [1] 8

Why does this work when the for loop didn’t? The anonymous function \(i) creates a new scope for each iteration. When lapply calls \(i) with the value 1, that call gets its own execution environment where i is 1, and the inner function(x) x ^ i closes over that environment. The next call gets a fresh environment where i is 2, and so on. Each manufactured function captures a different i in a different environment. In the for loop, by contrast, there is only one i in one environment, and every function points to the same variable. The \(i) wrapper is doing what force() does, but structurally: by binding i as a function argument, it forces evaluation at the moment of the call.

TipOpinion

Every factory should force() every argument that the manufactured function uses. Make this a habit, not a debugging exercise. The lazy evaluation trap is one of the most common sources of subtle bugs in R, and force() costs nothing.

Exercises

  1. Predict the output of the following code, then run it to check:

    val <- 10
    make_adder <- function(n) function(x) x + n
    add_val <- make_adder(val)
    val <- 20
    add_val(1)
  2. Fix the make_adder factory above with force(). Verify that changing val after creating add_val no longer affects the result.

  3. Why does lapply(1:3, \(i) function(x) x ^ i) work without an explicit force() call?

20.3 Practical factories

Function factories appear everywhere once you start looking.

Parameterized formatters. A factory can bake a prefix and suffix into a formatting function:

make_formatter <- function(prefix, suffix = "") {
  force(prefix)
  force(suffix)
  function(x) paste0(prefix, x, suffix)
}

usd <- make_formatter("$")
pct <- make_formatter("", "%")
usd(42)
#> [1] "$42"
pct(0.15)
#> [1] "0.15%"

Threshold filters. A factory can produce filter functions for different cutoffs:

above <- function(threshold) {
  force(threshold)
  function(x) x[x > threshold]
}

above_zero <- above(0)
above_zero(c(-2, 0, 3, -1, 5))
#> [1] 3 5

Statistical function families. The Box-Cox transformation is parameterized by a single value, lambda. A factory makes this natural:

box_cox <- function(lambda) {
  force(lambda)
  if (lambda == 0) {
    \(x) log(x)
  } else {
    \(x) (x ^ lambda - 1) / lambda
  }
}

bc1 <- box_cox(1)
bc0 <- box_cox(0)

bc1(c(1, 2, 4))
#> [1] 0 1 3
bc0(c(1, 2, 4))
#> [1] 0.0000000 0.6931472 1.3862944

The if runs once, when the factory is called, not every time the produced function is called. The factory picks the right formula at creation time and returns a function that does no branching.

Factories are common in ggplot2 as well. Functions like scale_color_brewer() and theme_minimal() are factories or factory-like: they take parameters and return objects that modify a plot.

Exercises

  1. Write a factory between that takes low and high and returns a function that keeps only elements of a vector that fall in the range (low, high). Test it on 1:20 with bounds 5 and 15.
  2. Write a factory make_counter that returns a function with no arguments. Each time the returned function is called, it should return the next integer (1, 2, 3, …). Hint: use <<- to modify a variable in the enclosing environment.

20.4 Memoization

A special factory pattern: wrap a function so it caches its results. The first call computes normally, but subsequent calls with the same arguments return the cached value instantly.

library(memoise)

slow_square <- function(x) {
  Sys.sleep(1)
  x ^ 2
}

fast_square <- memoise(slow_square)
system.time(fast_square(10))
#>    user  system elapsed 
#>   0.001   0.000   1.001
system.time(fast_square(10))
#>    user  system elapsed 
#>   0.013   0.000   0.012

The first call takes about a second. The second is instant: memoise recognizes the argument and returns the cached result.

forget() clears the cache:

forget(fast_square)
#> [1] TRUE

When to memoize: expensive computations with repeated inputs. API calls, simulations, recursive algorithms like Fibonacci, file parsing.

When not to: functions with side effects (plotting, writing files), functions whose inputs are large or unique every time (the cache grows without bound), functions that depend on external state (database contents, system time).

Exercises

  1. Write a function slow_sum that takes a vector, sleeps for 1 second, and returns the sum. Memoize it. Call it twice with the same input and verify the second call is fast.
  2. What happens if you memoize rnorm? Try memo_rnorm <- memoise(rnorm); memo_rnorm(5); memo_rnorm(5). Is this useful?

20.5 Function operators

A function operator takes a function as input and returns a modified function as output. Like a factory, but the raw material is a function, not a number or string. This pattern is called a “decorator” in Python (where @decorator syntax makes it first-class) and appeared in the Gang of Four design patterns book (1994) as the Decorator Pattern. R has no @-syntax, but safely(), memoise(), and with_logging() are all decorators: they modify behavior by wrapping, without changing the original.

purrr provides three useful operators:

safely() wraps a function so it never errors. Instead, it returns a list with $result and $error:

library(purrr)

safe_log <- safely(log)
safe_log(10)
#> $result
#> [1] 2.302585
#> 
#> $error
#> NULL
safe_log("a")
#> $result
#> NULL
#> 
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

possibly() is simpler: it returns a default value on error:

careful_log <- possibly(log, otherwise = NA)
careful_log(10)
#> [1] 2.302585
careful_log("a")
#> [1] NA

quietly() captures messages, warnings, and output as list components instead of printing them:

quiet_log <- quietly(log)
quiet_log(-1)
#> $result
#> [1] NaN
#> 
#> $output
#> [1] ""
#> 
#> $warnings
#> [1] "NaNs produced"
#> 
#> $messages
#> character(0)

You can write your own operators. Here is a logging wrapper:

with_logging <- function(f) {
  force(f)
  function(...) {
    cat("Calling function with", length(list(...)), "argument(s)\n")
    f(...)
  }
}

logged_mean <- with_logging(mean)
logged_mean(1:10)
#> Calling function with 1 argument(s)
#> [1] 5.5

Notice the force(f) call. Function operators are factories whose input happens to be a function. The same lazy evaluation trap from Section 20.2 applies: without force(), f is a promise, and if the variable it points to changes before the operator’s result is called, you get the wrong function.

Exercises

  1. Use safely() and map() to apply log() to the list list(1, -1, "a", 10). Extract the results and the errors separately.
  2. Write a function operator with_timer that wraps a function so it prints the elapsed time each time it is called. Test it with Sys.sleep.
  3. What does possibly(possibly(log, NA), NA) do? Is double-wrapping useful?

20.6 Composing factories and operators

Factories and operators compose. You can apply an operator to a factory’s output, or feed a factory into map():

safe_log <- safely(log)
results <- map(list(1, -1, "a", 10), safe_log)
#> Warning in .f(...): NaNs produced

str(results)
#> List of 4
#>  $ :List of 2
#>   ..$ result: num 0
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: num NaN
#>   ..$ error : NULL
#>  $ :List of 2
#>   ..$ result: NULL
#>   ..$ error :List of 2
#>   .. ..$ message: chr "non-numeric argument to mathematical function"
#>   .. ..$ call   : language .f(...)
#>   .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
#>  $ :List of 2
#>   ..$ result: num 2.3
#>   ..$ error : NULL

safely(log) is a function operator applied to log. The result is passed to map(), a functional from Section 7.1. Three concepts from three chapters, working together.

You can also compose operators. A function that is both safe and logged:

safe_logged_log <- with_logging(safely(log))
safe_logged_log(10)
#> Calling function with 1 argument(s)
#> $result
#> [1] 2.302585
#> 
#> $error
#> NULL
safe_logged_log("a")
#> Calling function with 1 argument(s)
#> $result
#> NULL
#> 
#> $error
#> <simpleError in .f(...): non-numeric argument to mathematical function>

The inner operator (safely) handles errors, while the outer one (with_logging) adds logging. Each layer does one thing, and you combine them to get the behavior you want.

This is the payoff of thinking functionally: small, composable pieces that combine cleanly. A factory builds specialized functions, an operator modifies them, and map() applies them across inputs. Each piece is simple on its own, yet together they handle complex workflows with very little code.

formatters <- map(c("$", "EUR ", "GBP "), make_formatter)

map(formatters, \(f) f(100))
#> [[1]]
#> [1] "$100"
#> 
#> [[2]]
#> [1] "EUR 100"
#> 
#> [[3]]
#> [1] "GBP 100"

map() over a vector of prefixes produces a list of formatters. Then map() over the formatters applies each one. No loops, no intermediate variables, no boilerplate.