22  Recursion and fixed points

A recursive function is a function that calls itself. That sentence sounds circular, and it should: circularity is the point. Recursion solves a problem by breaking it into smaller instances of the same problem, until the instances are small enough to solve directly. You saw Reduce() in Chapter 21 collapse a list into a single value by repeatedly applying a binary function. Recursion is the more general mechanism underneath: Reduce() is a loop that hides a specific recursive pattern, but recursion itself can express any computation that a loop can, and several that loops express awkwardly.

R supports recursion natively. Any function can call itself, and R will track the chain of calls on a stack. This chapter covers the mechanics (the call stack, depth limits), the patterns (accumulators, divide and conquer, mutual recursion), the workarounds for R’s lack of tail-call optimization (trampolining), and the surprising theoretical result that recursion itself can be derived from anonymous functions alone (the fixed-point combinator). That last part previews Chapter 30, where Church’s lambda calculus gets a full treatment.

22.1 Recursive functions

The textbook example is the factorial. n! = n * (n-1) * … * 1, with 0! = 1:

factorial_r <- function(n) {
  if (n == 0) return(1)
  n * factorial_r(n - 1)
}

factorial_r(6)
#> [1] 720

Every recursive function needs two parts: a base case that stops the recursion (here, n == 0), and a recursive case that makes progress toward the base case (here, n - 1 is closer to 0 than n was). Forget the base case and you get infinite recursion, which in R means a stack overflow error.

Fibonacci numbers are the other classic:

fib <- function(n) {
  if (n <= 1) return(n)
  fib(n - 1) + fib(n - 2)
}

sapply(0:10, fib)
#>  [1]  0  1  1  2  3  5  8 13 21 34 55

This works but is spectacularly inefficient. Each call spawns two more calls, giving O(2^n) time complexity (see Chapter 9 for Big-O). fib(30) makes over a million calls. Memoization (Section 20.4) or an accumulator (later in this chapter) fixes the performance problem, but the naive version illustrates a real danger of recursion: elegant code can hide terrible performance.

R provides Recall(), which calls the current function without naming it. This is useful for anonymous recursive functions:

(function(n) {
  if (n == 0) return(1)
  n * Recall(n - 1)
})(6)
#> [1] 720

Recall() looks up the function on the call stack using sys.function(), so it works even when the function has no name. In practice, named functions are clearer, but Recall() occasionally appears in one-off recursive helpers passed to higher-order functions.

Exercises

  1. Write a recursive function sum_to(n) that computes 1 + 2 + … + n. What is the base case?
  2. Write a recursive function power_r(x, n) that computes x^n using repeated multiplication. Handle the case n = 0.
  3. The naive fib() recomputes the same values many times. Use memoise::memoise() to create fib_memo and compare system.time(fib(30)) with system.time(fib_memo(30)).

22.2 The call stack

Every time R calls a function, it creates a new frame on the call stack: a record of the function, its arguments, and its local variables (each frame corresponds to the execution environment from Section 18.3). When the function returns, its frame is removed. Recursive calls stack up frames:

factorial_traced <- function(n) {
  cat("Entering n =", n, "at depth", sys.nframe(), "\n")
  if (n == 0) return(1)
  result <- n * factorial_traced(n - 1)
  cat("Returning", result, "for n =", n, "\n")
  result
}

factorial_traced(4)
#> Entering n = 4 at depth 31 
#> Entering n = 3 at depth 32 
#> Entering n = 2 at depth 33 
#> Entering n = 1 at depth 34 
#> Entering n = 0 at depth 35 
#> Returning 1 for n = 1 
#> Returning 2 for n = 2 
#> Returning 6 for n = 3 
#> Returning 24 for n = 4
#> [1] 24

sys.nframe() returns the current depth of the call stack. Each recursive call adds one frame. For factorial_traced(4), the stack grows to depth 4 (plus whatever depth the calling context contributes), then unwinds as each call returns.

R imposes a limit on stack depth. The default is around 5000 frames, controlled by options(expressions = ...). Go deeper and you get:

Error: C stack usage <number> is too close to the limit

or:

Error: evaluation nested too deeply

This is not a soft limit you should raise. It protects against infinite recursion crashing your R session. If your algorithm needs more than a few thousand levels of recursion, the right response is to restructure it, not to increase the limit.

# This will error:
count_down <- function(n) {
  if (n == 0) return(0)
  count_down(n - 1)
}
count_down(10000)

Exercises

  1. Call factorial_traced(5) and note the depth values. Then call factorial_traced(5) from inside another function. How do the depths change?
  2. What is the default value of getOption("expressions")? Try setting it to 100 and calling count_down(200). What error do you get?

22.3 Tail recursion and accumulators

Look at factorial_r again:

factorial_r <- function(n) {
  if (n == 0) return(1)
  n * factorial_r(n - 1)
}

After the recursive call returns, there is still work to do: multiply the result by n. This means the function cannot forget about the current frame while waiting for the recursive call. Every frame must be kept alive until the recursion bottoms out.

A tail-recursive version moves the multiplication into an accumulator argument, so the recursive call is the very last thing the function does:

factorial_tail <- function(n, acc = 1) {
  if (n == 0) return(acc)
  factorial_tail(n - 1, acc * n)
}

factorial_tail(6)
#> [1] 720

When the recursive call is in tail position (nothing happens after it), a smart compiler can reuse the current frame instead of allocating a new one. This is tail-call optimization (TCO), and it converts O(n) stack usage to O(1). Scheme guarantees TCO. Haskell effectively provides it. R does not. factorial_tail(10000) will still overflow the stack, even though the function is logically equivalent to a loop.

The accumulator pattern is still worth knowing. It clarifies the logic (the accumulator carries the “answer so far”), it is the prerequisite for trampolining (next section), and if you ever work in a language with TCO, the skill transfers directly.

Here is Fibonacci with an accumulator, running in linear time instead of exponential:

fib_tail <- function(n, a = 0, b = 1) {
  if (n == 0) return(a)
  fib_tail(n - 1, b, a + b)
}

fib_tail(30)
#> [1] 832040

Compare this to the naive fib() that takes millions of calls for fib(30). The accumulator version makes exactly n calls.

Exercises

  1. Write a tail-recursive sum_to_tail(n, acc = 0) that computes 1 + 2 + … + n.
  2. Write a tail-recursive reverse_list(x, acc = list()) that reverses a list.
  3. Verify that factorial_tail(5000) still crashes. Why can R not optimize this?

22.4 Trampolining

Since R will not optimize tail calls, we have to do it ourselves. The trick: instead of making a recursive call, return a thunk (a zero-argument function that represents the call you would have made). A loop, the trampoline, keeps calling the thunk until it gets a non-function result:

trampoline <- function(f, ...) {
  result <- f(...)
  while (is.function(result)) {
    result <- result()
  }
  result
}

Now rewrite factorial_tail to return thunks instead of calling itself:

factorial_thunk <- function(n, acc = 1) {
  if (n == 0) return(acc)
  function() factorial_thunk(n - 1, acc * n)
}

trampoline(factorial_thunk, 10)
#> [1] 3628800
trampoline(factorial_thunk, 1000)
#> [1] Inf

The second call processes 1,000 iterations without touching the stack. Each “recursive” step returns a closure, and the while loop in trampoline calls it. Stack depth stays constant at O(1).

The trampoline pattern works for any tail-recursive function: rewrite the tail call as a thunk, and let the trampoline do the looping. The cost is a closure allocation per step, which is cheap. The CRAN package trampoline provides a polished implementation with better error handling, but the core idea is exactly this while loop.

TipOpinion

Trampolining is clever, but if you find yourself trampolining in production R code, step back and ask whether a for loop or Reduce() would be simpler. The trampoline exists to rescue recursive algorithms from the stack limit, not to replace straightforward iteration. Use it when you genuinely need recursion (tree traversal of unknown depth, for instance) but the depth could be large.

Exercises

  1. Convert fib_tail into a trampolined version. Compute fib(100000) without a stack overflow. (The number will be huge; just verify it completes.)
  2. The trampoline function above has a subtle limitation: it cannot return a function as a final result (since functions trigger another iteration). How would you fix this? Hint: wrap the final result in a special marker.

22.5 Divide and conquer

Recursion is most natural for problems that split into independent subproblems. This is the divide and conquer strategy: split the input, solve each half recursively, combine the results.

Merge sort is the cleanest example. Split the vector in half, sort each half, merge the sorted halves:

merge_sort <- function(x) {
  if (length(x) <= 1) return(x)

  mid <- length(x) %/% 2
  left  <- merge_sort(x[1:mid])
  right <- merge_sort(x[(mid + 1):length(x)])

  merge_sorted(left, right)
}

merge_sorted <- function(a, b) {
  result <- numeric(length(a) + length(b))
  i <- j <- k <- 1
  while (i <= length(a) && j <= length(b)) {
    if (a[i] <= b[j]) {
      result[k] <- a[i]; i <- i + 1
    } else {
      result[k] <- b[j]; j <- j + 1
    }
    k <- k + 1
  }
  while (i <= length(a)) { result[k] <- a[i]; i <- i + 1; k <- k + 1 }
  while (j <= length(b)) { result[k] <- b[j]; j <- j + 1; k <- k + 1 }
  result
}

merge_sort(c(5, 2, 8, 1, 9, 3))
#> [1] 1 2 3 5 8 9

Each level of recursion halves the problem, giving O(log n) levels. Each level does O(n) work merging, so the total is O(n log n), as discussed in Chapter 9. The recursion depth is only log2(n), so even a million-element vector only needs 20 levels. Stack overflow is not a concern for balanced divide-and-conquer algorithms.

Binary search follows the same pattern, but only recurses into one half:

binary_search <- function(x, target, lo = 1, hi = length(x)) {
  if (lo > hi) return(NA)
  mid <- (lo + hi) %/% 2
  if (x[mid] == target) return(mid)
  if (x[mid] < target) binary_search(x, target, mid + 1, hi)
  else binary_search(x, target, lo, mid - 1)
}

binary_search(1:100, 73)
#> [1] 73

Exercises

  1. Trace through merge_sort(c(4, 1, 3, 2)) on paper. How many recursive calls are made?
  2. Write a recursive function tree_depth that takes a nested list and returns its maximum depth. tree_depth(list(1, list(2, list(3)))) should return 3.

22.6 Recursive list processing

Nested lists are R’s tree structure, and recursion is the natural way to walk them. Suppose you want to find all numeric values buried in an arbitrarily nested list:

find_numbers <- function(x) {
  if (is.numeric(x)) return(x)
  if (!is.list(x)) return(NULL)
  unlist(lapply(x, find_numbers))
}

nested <- list("a", list(1, list("b", 2, list(3, "c"))))
find_numbers(nested)
#> [1] 1 2 3

The function dispatches on type: numbers are returned, non-list non-numbers are discarded, and lists are processed element by element with a recursive lapply. This pattern of “check the base cases, recurse on the structure” is the backbone of tree-processing code.

Base R provides rapply() for recursive application over lists:

rapply(nested, sqrt, classes = "numeric", how = "unlist")
#> [1] 1.000000 1.414214 1.732051

rapply() walks the nested list, applies sqrt to every numeric element, and returns the results. The classes argument filters which elements to process, and how controls the output shape (“unlist”, “replace”, or “list”).

rapply(nested, toupper, classes = "character", how = "replace")
#> [[1]]
#> [1] "A"
#> 
#> [[2]]
#> [[2]][[1]]
#> [1] 1
#> 
#> [[2]][[2]]
#> [[2]][[2]][[1]]
#> [1] "B"
#> 
#> [[2]][[2]][[2]]
#> [1] 2
#> 
#> [[2]][[2]][[3]]
#> [[2]][[2]][[3]][[1]]
#> [1] 3
#> 
#> [[2]][[2]][[3]][[2]]
#> [1] "C"

With how = "replace", the structure is preserved: character elements are uppercased in place, and everything else is left alone. rapply() is powerful but rarely seen in practice, partly because deeply nested lists are uncommon in day-to-day R, and partly because its interface is not as well known as lapply or map.

Exercises

  1. Write a recursive function flatten that takes a nested list and returns a flat list of its leaf elements. Compare with rapply(x, identity, how = "unlist").
  2. Use rapply() with how = "replace" to double every numeric element in list(1, list("a", 2, list(3, "b"))).
  3. Write a recursive function that counts the total number of leaf elements in a nested list.

22.7 Mutual recursion

Two functions can call each other. The classic (if contrived) example:

is_even <- function(n) {
  if (n == 0) return(TRUE)
  is_odd(n - 1)
}

is_odd <- function(n) {
  if (n == 0) return(FALSE)
  is_even(n - 1)
}

is_even(4)
#> [1] TRUE
is_odd(7)
#> [1] TRUE

Each function delegates to the other, peeling off one layer at a time. This is silly for parity checking, but mutual recursion appears naturally in recursive descent parsers (where each grammar rule is a function that may call functions for other rules) and in state machines (where each state is a function that transitions to another state by calling its function).

Mutual recursion has the same stack depth problem as ordinary recursion, doubled: each round consumes two frames. For deep mutual recursion, trampolining works the same way, just have both functions return thunks.

22.8 Fixed-point combinators

Here is an odd question: can you write a recursive function without giving it a name? A named function calls itself by name, but what if you only have anonymous functions?

A fixed point of a function f is a value x where f(x) = x. The number 0 is a fixed point of function(x) x^2, because 0^2 = 0. A fixed-point combinator is a higher-order function that finds the fixed point of a function-valued function. Applied to the right argument, it produces recursion from nothing but anonymous functions.

In lazy languages (Haskell, untyped lambda calculus), the Y combinator does this:

Y = λf. (λx. f(x x))(λx. f(x x))

But R is strict: it evaluates arguments before passing them. If you try to translate Y directly into R, the self-application x(x) evaluates immediately, which triggers another x(x), which triggers another, and the whole thing diverges before f ever runs. Here is what happens step by step: Y(f) calls (function(x) f(x(x))) with itself as argument, so x becomes function(x) f(x(x)). To evaluate f(x(x)), R must first evaluate x(x), which means evaluating f(x(x)) again, which means evaluating x(x) again, forever:

# This diverges in R:
Y <- function(f) (function(x) f(x(x)))(function(x) f(x(x)))
Y(function(f) function(n) if (n == 0) 1 else n * f(n - 1))
# Error: C stack usage is too close to the limit

The fix is the Z combinator, which wraps the self-application x(x) in a lambda function(v) x(x)(v) to delay it. The extra lambda acts as a thunk: x(x) is not evaluated when the combinator is constructed, only when the inner function is actually called with an argument v:

Z <- function(f) {
  (function(x) f(function(v) x(x)(v)))(
    function(x) f(function(v) x(x)(v)))
}

The structure is identical to Y, except each x(x) is wrapped in function(v) .... That single change is the difference between divergence and termination in a strict language.

Now define factorial without any self-reference:

fact <- Z(function(f) function(n) {
  if (n == 0) 1 else n * f(n - 1)
})

fact(6)
#> [1] 720

The argument f inside the inner function is not a name the function gave itself. It is provided by the Z combinator, which threads the self-reference through. The function never names itself, never uses Recall(), yet it recurses.

This is not a party trick. Church proved in 1936 that recursion can be derived from lambda calculus, which has only three constructs: variables, abstraction (function definition), and application (function calling). No built-in recursion, no looping, no assignment. The Y/Z combinator is the mechanism. Chapter Chapter 30 explores this in depth, along with Church numerals and the deeper connections between lambda calculus and computation.

For practical R, you will never use the Z combinator in production code. But understanding that recursion is not primitive, that it emerges from function application alone, reshapes how you think about what functions can do.

Exercises

  1. Use the Z combinator to define a recursive Fibonacci function without self-reference.
  2. Apply Z to function(f) function(xs) if (length(xs) == 0) 0 else xs[[1]] + f(xs[-1]) and test it on list(1, 2, 3, 4, 5). What does this compute?
  3. Why does the Y combinator (without the extra function(v) wrapper) diverge in R? What happens if you try Y <- function(f) (function(x) f(x(x)))(function(x) f(x(x)))?

22.9 When not to recurse

R is not Scheme. It does not optimize tail calls, its stack is limited, and function call overhead is high compared to vectorized operations. Recursion in R is the right tool in specific situations and the wrong tool in many others.

Use recursion for tree-shaped problems: nested lists, file system traversal, parsing hierarchical data, divide-and-conquer algorithms where the depth is O(log n). These stay well within stack limits and express naturally as recursive functions.

Do not use recursion for linear problems. Summing a vector, computing a running total, filtering elements: these are loops, Reduce() calls, or vectorized operations. A recursive sum of n elements uses n stack frames for something that sum() does in one C call.

TipOpinion

If your recursive function processes a flat sequence from left to right, it is a fold. Use Reduce() or purrr::reduce() from Chapter 21. If it processes elements independently, it is a map. Use lapply() or purrr::map() from Section 7.1. Recursion should be reserved for problems where the structure of the data is itself recursive.

The table below summarizes when each approach fits:

Problem shape Preferred approach Why
Element-wise transformation lapply() / map() / vectorized No overhead, parallel-friendly
Linear fold (sum, concat, merge) Reduce() / reduce() Constant stack, clear intent
Tree traversal, nested structures Recursion / rapply() Matches the data shape
Divide and conquer (sort, search) Recursion O(log n) depth, natural decomposition
Deep linear recursion (>1000) Trampoline or convert to loop Stack limit workaround

22.10 Historical notes

Recursive functions predate programming. Kurt Godel used primitive recursive functions in his 1931 incompleteness theorems to encode logical statements as numbers. Alonzo Church’s lambda calculus (1936) showed that recursion could be derived from pure function application, via the fixed-point combinator. Alan Turing’s machines (also 1936) provided the alternative, imperative model; Church and Turing proved them equivalent.

John McCarthy brought recursion into programming with LISP (1960), the first language where recursive function calls were the primary control structure. LISP’s car, cdr, and cons made recursive list processing practical. McCarthy’s “Recursive Functions of Symbolic Expressions and Their Computation by Machine” remains one of the most influential papers in computer science.

Peter Landin’s SECD machine (1964) showed how to mechanically evaluate expressions with closures and recursive calls, connecting lambda calculus to real implementation. Guy Steele and Gerald Sussman’s “Lambda: The Ultimate” papers (1975-1980) proved that tail-recursive calls and GOTOs are the same thing, that lambda expressions can replace every control structure, and that a sufficiently smart compiler can make functional code as fast as imperative code. Their work led to Scheme’s guarantee of tail-call optimization, a guarantee that most other languages (R included) chose not to make.

R inherits its recursive capabilities from S, which inherited them from the general tradition of expression-based languages. S was not designed for deep recursion; it was designed for interactive statistical computing, where vectorized operations dominate. The lack of TCO in R is not an oversight but a reflection of priorities: the language optimized for the common case (vectorized numerical work) at the cost of the uncommon one (deep recursion).