22 Recursion and fixed points

Suppose you need to process a tree of unknown depth, a structure that nests inside itself with no predetermined bottom. A for loop can’t express this because you don’t know how many levels to iterate; you would need a loop inside a loop inside a loop, turtles all the way down.

The alternative is a function that calls itself, peeling away one layer of nesting each time until it reaches something solid. That function is recursive, and recursion predates loops by decades in the theory of computation.

A recursive function calls itself. The sentence sounds circular because circularity is the point: you solve a problem by breaking it into smaller copies of the same problem, bottoming out when the copies are small enough to answer directly.

You saw Reduce() in Chapter 21 collapse a list into a single value by repeatedly applying a binary function, but Reduce() is just one specific recursive pattern wearing a loop’s clothing. Recursion itself can express anything a loop can, and several things loops express awkwardly. What happens, though, when the chain of self-calls grows too long for R to handle?

22.1 Recursive functions

The simplest case is the one every textbook reaches for: the factorial. n! = n * (n-1) * … * 1, with 0! = 1:

factorial_r <- function(n) {
  if (n == 0) return(1)
  n * factorial_r(n - 1)
}

factorial_r(6)
#> [1] 720

Two parts make this work. A base case stops the recursion (n == 0 here), and a recursive case makes progress toward it (since n - 1 is closer to 0 than n was). Forget the base case and you get infinite recursion, which in R means a stack overflow error, the computational equivalent of a sentence that never finds its period.

Fibonacci numbers are the other classic:

fib <- function(n) {
  if (n <= 1) return(n)
  fib(n - 1) + fib(n - 2)
}

sapply(0:10, fib)
#>  [1]  0  1  1  2  3  5  8 13 21 34 55

This works but is spectacularly inefficient: each call spawns two more calls, ballooning to O(2^n) time complexity (see Chapter 9 for Big-O), so fib(30) alone triggers over a million calls. Memoization (Section 20.4) or an accumulator (later in this chapter) fixes the performance problem, but the naive version illustrates a real danger of recursion: elegant code can hide terrible performance. Performance aside, there is a more basic question. Does a recursive function even need a name?

Recall() calls the current function without naming it:

(function(n) {
  if (n == 0) return(1)
  n * Recall(n - 1)
})(6)
#> [1] 720

It looks up the function on the call stack using sys.function(), so it works even inside an anonymous function. In practice, named functions are clearer, but Recall() occasionally appears in one-off recursive helpers passed to higher-order functions.

Exercises

Write a recursive function sum_to(n) that computes 1 + 2 + … + n. What is the base case?
Write a recursive function power_r(x, n) that computes x^n using repeated multiplication. Handle the case n = 0.
The naive fib() recomputes the same values many times. Use memoise::memoise() to create fib_memo and compare system.time(fib(30)) with system.time(fib_memo(30)).

22.2 The call stack

Every time R calls a function, it creates a new frame on the call stack: a record of the function, its arguments, and its local variables (each frame corresponds to the execution environment from Section 18.3). When the function returns, its frame is popped. Recursive calls pile up frames like plates on a spring-loaded stack, and you can watch it happen:

factorial_traced <- function(n) {
  cat("Entering n =", n, "at depth", sys.nframe(), "\n")
  if (n == 0) return(1)
  result <- n * factorial_traced(n - 1)
  cat("Returning", result, "for n =", n, "\n")
  result
}

factorial_traced(4)
#> Entering n = 4 at depth 31 
#> Entering n = 3 at depth 32 
#> Entering n = 2 at depth 33 
#> Entering n = 1 at depth 34 
#> Entering n = 0 at depth 35 
#> Returning 1 for n = 1 
#> Returning 2 for n = 2 
#> Returning 6 for n = 3 
#> Returning 24 for n = 4
#> [1] 24

sys.nframe() returns the current depth of the call stack, and each recursive call adds one frame. For factorial_traced(4), the stack grows to depth 4 (plus whatever depth the calling context contributes), then unwinds as each call returns its result back up the chain.

R imposes a limit on this depth. The default sits around 5,000 frames, controlled by options(expressions = ...), and if you exceed it you get one of two messages:

Error: C stack usage <number> is too close to the limit

or:

Error: evaluation nested too deeply

This is not a soft limit you should raise. It exists to protect against infinite recursion crashing your R session entirely, and if your algorithm needs more than a few thousand levels, the right response is to restructure it, not to bump a number in your options.

# This will error:
count_down <- function(n) {
  if (n == 0) return(0)
  count_down(n - 1)
}
count_down(10000)

Exercises

Call factorial_traced(5) and note the depth values. Then call factorial_traced(5) from inside another function. How do the depths change?
What is the default value of getOption("expressions")? Try setting it to 100 and calling count_down(200). What error do you get?

22.3 Tail recursion and accumulators

Look at factorial_r one more time:

factorial_r <- function(n) {
  if (n == 0) return(1)
  n * factorial_r(n - 1)
}

After the recursive call returns, there is still work left: multiply the result by n. That pending multiplication means R cannot discard the current frame while waiting for the recursion to bottom out; every single frame in the chain must stay alive, holding onto its local n, until the deepest call finally returns 1 and the whole tower of deferred multiplications collapses upward.

A tail-recursive version eliminates that pending work by threading the partial answer through an accumulator argument, so the recursive call becomes the very last operation the function performs:

factorial_tail <- function(n, acc = 1) {
  if (n == 0) return(acc)
  factorial_tail(n - 1, acc * n)
}

factorial_tail(6)
#> [1] 720

When the recursive call sits in tail position (nothing happens after it), a smart compiler can reuse the current frame instead of allocating a new one, converting O(n) stack usage to O(1). This is tail-call optimization (TCO), and Scheme guarantees it. Haskell effectively provides it. R does not. So factorial_tail(10000) will still overflow the stack, even though the function is logically a loop in disguise.

The accumulator pattern still pays its rent, though. It clarifies intent (the accumulator carries the “answer so far”), it is the prerequisite for trampolining (next section), and it can convert exponential algorithms into linear ones. Here is Fibonacci rewritten with two accumulators, running in linear time instead of exponential:

fib_tail <- function(n, a = 0, b = 1) {
  if (n == 0) return(a)
  fib_tail(n - 1, b, a + b)
}

fib_tail(30)
#> [1] 832040

Where the naive fib() needed millions of calls for fib(30), this version makes exactly n. But it still can’t survive fib_tail(10000) on R’s stack. What if we could get the O(1) stack usage ourselves, without waiting for a language feature R will never add?

Exercises

Write a tail-recursive sum_to_tail(n, acc = 0) that computes 1 + 2 + … + n.
Write a tail-recursive reverse_list(x, acc = list()) that reverses a list.
Verify that factorial_tail(5000) still crashes. Why can R not optimize this?

22.4 Trampolining

Since R refuses to optimize tail calls, you can do it by hand. The idea: instead of making a recursive call, return a thunk (a zero-argument function that represents the call you would have made). Then a loop, the trampoline, keeps calling thunks until it gets a non-function result:

trampoline <- function(f, ...) {
  result <- f(...)
  while (is.function(result)) {
    result <- result()
  }
  result
}

Now rewrite factorial_tail to return thunks instead of calling itself:

factorial_thunk <- function(n, acc = 1) {
  if (n == 0) return(acc)
  function() factorial_thunk(n - 1, acc * n)
}

trampoline(factorial_thunk, 10)
#> [1] 3628800
trampoline(factorial_thunk, 1000)
#> [1] Inf

That second call churns through 1,000 iterations without growing the stack at all; each “recursive” step returns a closure, and the while loop in trampoline calls it, bouncing like a ball on a trampoline (hence the name). Stack depth stays at O(1) no matter how deep the logical recursion goes.

The pattern works for any tail-recursive function: rewrite the tail call as a thunk and let the trampoline do the looping. The cost is one closure allocation per step, which is cheap. The CRAN package trampoline provides a polished implementation with better error handling, but the core idea is exactly this while loop.

Opinion

Trampolining is clever, but if you catch yourself trampolining in production R code, pause and ask whether a for loop or Reduce() would be simpler. The trampoline rescues recursive algorithms from the stack limit; it does not replace straightforward iteration. Save it for cases where you genuinely need recursion (tree traversal of unknown depth, for instance) but the depth could be large.

Exercises

Convert fib_tail into a trampolined version. Compute fib(100000) without a stack overflow. (The number will be huge; just verify it completes.)
The trampoline function above has a subtle limitation: it cannot return a function as a final result (since functions trigger another iteration). How would you fix this? Hint: wrap the final result in a special marker.

22.5 Divide and conquer

Some problems split cleanly in half, and each half looks just like the original problem, only smaller. Split, solve each piece recursively, combine. This is the divide and conquer strategy, and recursion expresses it so naturally that the code almost reads like the definition.

Merge sort is the cleanest example: split the vector in half, sort each half, merge the sorted halves back together:

merge_sort <- function(x) {
  if (length(x) <= 1) return(x)

  mid <- length(x) %/% 2
  left  <- merge_sort(x[1:mid])
  right <- merge_sort(x[(mid + 1):length(x)])

  merge_sorted(left, right)
}

merge_sorted <- function(a, b) {
  result <- numeric(length(a) + length(b))
  i <- j <- k <- 1
  while (i <= length(a) && j <= length(b)) {
    if (a[i] <= b[j]) {
      result[k] <- a[i]; i <- i + 1
    } else {
      result[k] <- b[j]; j <- j + 1
    }
    k <- k + 1
  }
  while (i <= length(a)) { result[k] <- a[i]; i <- i + 1; k <- k + 1 }
  while (j <= length(b)) { result[k] <- b[j]; j <- j + 1; k <- k + 1 }
  result
}

merge_sort(c(5, 2, 8, 1, 9, 3))
#> [1] 1 2 3 5 8 9

Each level of recursion halves the problem, giving O(log n) levels, and each level does O(n) work merging, so the total comes to O(n log n) as discussed in Chapter 9. The recursion depth is only log2(n), which means even a million-element vector needs just 20 levels. Stack overflow is not a concern for balanced divide-and-conquer algorithms; the branching that makes them powerful also keeps them shallow.

Binary search follows the same pattern but only recurses into one half:

binary_search <- function(x, target, lo = 1, hi = length(x)) {
  if (lo > hi) return(NA)
  mid <- (lo + hi) %/% 2
  if (x[mid] == target) return(mid)
  if (x[mid] < target) binary_search(x, target, mid + 1, hi)
  else binary_search(x, target, lo, mid - 1)
}

binary_search(1:100, 73)
#> [1] 73

Exercises

Trace through merge_sort(c(4, 1, 3, 2)) on paper. How many recursive calls are made?
Write a recursive function tree_depth that takes a nested list and returns its maximum depth. tree_depth(list(1, list(2, list(3)))) should return 3.

22.6 Recursive list processing

Nested lists are R’s tree structure, and recursion is the natural way to walk them. You might think unlist() already handles this, but unlist() flattens blindly: it discards the nesting, coerces everything to a common type, and gives you no control over which elements to keep or transform. When the tree has mixed types or you need to preserve structure while operating on selected leaves, hand-written recursion recovers what unlist() throws away.

Suppose you need to extract all numeric values buried somewhere inside an arbitrarily nested list, but you don’t know how deep the nesting goes or where the numbers hide:

find_numbers <- function(x) {
  if (is.numeric(x)) return(x)
  if (!is.list(x)) return(NULL)
  unlist(lapply(x, find_numbers))
}

nested <- list("a", list(1, list("b", 2, list(3, "c"))))
find_numbers(nested)
#> [1] 1 2 3

The function dispatches on type: numbers come back directly, non-list non-numbers get discarded, and lists trigger a recursive lapply that peels off one layer at a time. This pattern of “check the base cases, recurse on the structure” is the backbone of all tree-processing code; once you see it, you see it everywhere.

Base R provides rapply() for recursive application over lists:

rapply(nested, sqrt, classes = "numeric", how = "unlist")
#> [1] 1.000000 1.414214 1.732051

rapply() walks the nested structure, applies sqrt to every numeric element it encounters, and returns the results. The classes argument filters which elements get processed, and how controls the output shape (“unlist”, “replace”, or “list”).

rapply(nested, toupper, classes = "character", how = "replace")
#> [[1]]
#> [1] "A"
#> 
#> [[2]]
#> [[2]][[1]]
#> [1] 1
#> 
#> [[2]][[2]]
#> [[2]][[2]][[1]]
#> [1] "B"
#> 
#> [[2]][[2]][[2]]
#> [1] 2
#> 
#> [[2]][[2]][[3]]
#> [[2]][[2]][[3]][[1]]
#> [1] 3
#> 
#> [[2]][[2]][[3]][[2]]
#> [1] "C"

With how = "replace", the nesting is preserved: character elements are uppercased in place while everything else stays untouched. rapply() is surprisingly powerful but rarely seen in practice, partly because deeply nested lists are uncommon in day-to-day R work, and partly because fewer people know about it than about lapply or map. If your data is a tree, though, it deserves a look.

Exercises

Write a recursive function flatten that takes a nested list and returns a flat list of its leaf elements. Compare with rapply(x, identity, how = "unlist").
Use rapply() with how = "replace" to double every numeric element in list(1, list("a", 2, list(3, "b"))).
Write a recursive function that counts the total number of leaf elements in a nested list.

22.7 Mutual recursion

Two functions can call each other, passing control back and forth. Mutual recursion appears naturally in recursive descent parsers, where each grammar rule becomes a function that may call functions for other rules, and in state machines, where each state is a function that transitions by calling another state’s function. The pattern is easier to see, though, in a stripped-down example:

is_even <- function(n) {
  if (n == 0) return(TRUE)
  is_odd(n - 1)
}

is_odd <- function(n) {
  if (n == 0) return(FALSE)
  is_even(n - 1)
}

is_even(4)
#> [1] TRUE
is_odd(7)
#> [1] TRUE

Each function delegates to the other, peeling off one layer at a time. Nobody would check parity this way, but the structure is identical to what a parser does when an expression rule calls a term rule that calls an expression rule again.

The stack cost is the same as ordinary recursion, doubled: each round trip consumes two frames. For deep mutual recursion, trampolining works identically; just have both functions return thunks instead of calling each other directly.

But mutual recursion raises a subtler question than stack depth: both is_even and is_odd refer to each other by name. The trampoline removed the stack constraint; can anything remove the naming constraint?

22.8 Fixed-point combinators

Here is an odd question. Can you write a recursive function without ever giving it a name? Named functions call themselves by name, so recursion seems to require naming. But what if you only have anonymous functions, nothing else?

A fixed point of a function f is a value x where f(x) = x. The number 0 is a fixed point of function(x) x^2, because 0^2 = 0. A fixed-point combinator extends this idea to functions that return functions.

In lazy languages (Haskell, untyped lambda calculus), the Y combinator does this:

Y = λf. (λx. f(x x))(λx. f(x x))

R is strict, though: it evaluates arguments before passing them, and that eagerness is fatal here. If you translate Y directly into R, the self-application x(x) evaluates immediately, which triggers another x(x), which triggers another, and the whole thing diverges before f ever gets a chance to run. Step by step: Y(f) calls (function(x) f(x(x))) with itself as argument, so x becomes function(x) f(x(x)). To evaluate f(x(x)), R must first evaluate x(x), which means evaluating f(x(x)) again, forever:

# This diverges in R:
Y <- function(f) (function(x) f(x(x)))(function(x) f(x(x)))
Y(function(f) function(n) if (n == 0) 1 else n * f(n - 1))
# Error: C stack usage is too close to the limit

The fix is the Z combinator, which wraps each x(x) in a lambda, function(v) x(x)(v), to delay evaluation. That extra wrapper acts as a thunk: x(x) is not evaluated when the combinator is constructed, only when the inner function is actually called with an argument v:

Z <- function(f) {
  (function(x) f(function(v) x(x)(v)))(
    function(x) f(function(v) x(x)(v)))
}

The structure is identical to Y, except each x(x) wears a function(v) ... coat. That single change is the difference between divergence and termination in a strict language.

Now define factorial without any self-reference at all:

fact <- Z(function(f) function(n) {
  if (n == 0) 1 else n * f(n - 1)
})

fact(6)
#> [1] 720

The argument f inside the inner function comes from the Z combinator, which threads in the self-reference from thin air. The function never names itself, never uses Recall(), yet it recurses.

This is not a party trick. In 1936, Church proved that recursion can be derived from lambda calculus, a system with only three constructs: variables, abstraction (function definition), and application (function calling). No built-in recursion, no loops, no assignment.

The Y/Z combinator is the mechanism that makes it work, and Chapter 30 explores the full implications, along with Church numerals and the deeper connections between lambda calculus and computation.

You will never use the Z combinator in production R code. But understanding that recursion is not primitive, that it emerges from function application alone, changes how you think about what functions are capable of. If anonymous functions can bootstrap recursion without any help from the language, what else might be hiding in plain sight?

Exercises

Use the Z combinator to define a recursive Fibonacci function without self-reference.
Apply Z to function(f) function(xs) if (length(xs) == 0) 0 else xs[[1]] + f(xs[-1]) and test it on list(1, 2, 3, 4, 5). What does this compute?
Why does the Y combinator (without the extra function(v) wrapper) diverge in R? What happens if you try Y <- function(f) (function(x) f(x(x)))(function(x) f(x(x)))?

22.9 When not to recurse

R is not Scheme. It does not optimize tail calls, its stack is limited, and function call overhead is high compared to vectorized operations. The shape of the data tells you whether recursion is the right tool or a trap.

Tree-shaped problems, where the structure branches and nests (nested lists, file system traversal, parsing hierarchical data, divide-and-conquer algorithms with O(log n) depth), stay well within stack limits and express naturally as recursive functions. These are recursion’s home territory.

Linear problems are a different story. Summing a vector, computing a running total, filtering elements: these are loops, Reduce() calls, or vectorized operations, and writing them recursively wastes n stack frames on something that sum() handles in a single C call.

Opinion

If your recursive function processes a flat sequence from left to right, it is a fold; use Reduce() or purrr::reduce() from Chapter 21. If it processes elements independently, it is a map; use lapply() or purrr::map() from Section 7.1. Reserve recursion for problems where the structure of the data is itself recursive. When the data branches, your code should too.

The table below summarizes when each approach fits:

Problem shape	Preferred approach	Why
Element-wise transformation	`lapply()` / `map()` / vectorized	No overhead, parallel-friendly
Linear fold (sum, concat, merge)	`Reduce()` / `reduce()`	Constant stack, clear intent
Tree traversal, nested structures	Recursion / `rapply()`	Matches the data shape
Divide and conquer (sort, search)	Recursion	O(log n) depth, natural decomposition
Deep linear recursion (>1000)	Trampoline or convert to loop	Stack limit workaround

22.10 Historical notes

By the early twentieth century, mathematicians had built formal systems powerful enough to express most of mathematics. The natural next question was whether those systems could verify themselves: could a formal system prove that every true statement within it is provable? If yes, mathematics would be self-certifying, a closed loop with no loose ends. The systems were expressive enough to state any mathematical truth, so surely they were expressive enough to prove any mathematical truth.

Hilbert had made exactly this bet (the same program that led to the Entscheidungsproblem in Section 9.6.1), challenging mathematicians to prove their formal systems complete and consistent, but then in 1931 a twenty-five-year-old logician in Vienna found a way to make a formal system reason about its own proofs.

He encoded logical statements as numbers using primitive recursive functions, turning each proof step into arithmetic. Once proofs were numbers, he could construct a statement inside the system that said, in effect, “this statement has no proof in this system.” If the system could prove it, the statement would be false, and the system would be proving a falsehood. If the system could not prove it, the statement would be true, and there would be a true statement the system could never reach. Either way, the closed loop was shattered. Mathematics would always have blind spots: true statements that no amount of cleverness, no added axioms, no better proof strategies could ever capture. It became known as Godel’s incompleteness theorem.

Five years later, Church showed that recursion did not need to be a primitive at all: the fixed-point combinator could derive it from pure function application. Turing’s machines provided the alternative model, and Church and Turing proved the two equivalent.

But recursion stayed in mathematics until McCarthy gave it a programming language. LISP (1960) was the first language where recursive function calls served as the primary control structure; car, cdr, and cons made recursive list processing practical. Recursion was no longer a proof technique restricted to mathematics; it was how you wrote programs.

The question that remained was efficiency. Landin’s SECD machine (1964) showed how to mechanically evaluate expressions with closures and recursive calls. A decade later, Steele and Sussman proved something that surprised the programming world: tail-recursive calls and goto are the same thing. Their “Lambda the Ultimate” papers (1975-1980) showed that lambda expressions can replace every control structure, and that a sufficiently smart compiler can make functional code as fast as imperative code. Scheme guaranteed tail-call optimization as a consequence.

R chose not to. S was designed for interactive statistical computing, where vectorized operations do the heavy lifting, not deep recursion. The absence of TCO reflects that priority: the language optimized for the common case (vectorized numerical work) at the cost of the uncommon one (recursive descent past a thousand frames).

But R did make another choice about evaluation that most languages avoided: when you pass an argument to a function, R does not evaluate it immediately.