Reduce(`+`, 1:5)
#> [1] 1521 Reduce and accumulate
map() transforms each element independently. Reduce() combines elements into one result. If map() is “do this to each,” then Reduce() is “collapse everything with this.”
In Chapter 19 you saw how map() replaces loops that apply a function to each element. But some loops do not just apply: they accumulate. A running total, a chain of merges, a series of intersections. These loops carry state from one iteration to the next. Reduce() is the functional that replaces them.
21.1 Left fold
Reduce() takes a binary function and a list, and combines the elements pairwise from left to right:
That computes ((((1 + 2) + 3) + 4) + 5). The first call is f(1, 2), then f(3, 3), then f(6, 4), then f(10, 5). Each intermediate result becomes the first argument to the next call. This is a left fold.
The name comes from functional programming (Haskell’s foldl), but the idea is older. APL had the / operator in 1962: +/1 2 3 4 5 gives 15. Kenneth Iverson recognized that reducing a list with a binary operator was a fundamental operation, not a loop pattern.
With a named function, the structure is clearer:
Reduce(paste, c("the", "cat", "sat", "on", "the", "mat"))
#> [1] "the cat sat on the mat"Each step takes the accumulated string and pastes the next word onto it.
21.2 The init argument
What does Reduce(f, x) do when x has one element? It returns that element without ever calling f. What about zero elements?
Reduce(`+`, list())
#> NULLAn error. There is no identity to return. The init argument fixes this:
Reduce(`+`, list(), init = 0)
#> [1] 0With init, the fold starts by calling f(init, x[[1]]) instead of f(x[[1]], x[[2]]). The initial value acts as an identity element for the operation. For addition that is 0; for multiplication, 1; for paste0, the empty string "". For c(), it is NULL. For intersect, it would be the universal set (though R has no such thing, so you typically use the first element).
This connection between an operation, its identity element, and associativity has a name in algebra: a monoid. Integers with addition and 0 form a monoid. Strings with concatenation and "" form a monoid. Lists with c() and NULL form a monoid. The pattern is everywhere, and recognizing it tells you something practical: if your operation is associative, the fold can be split, parallelized, and recombined. Google’s MapReduce (Dean and Ghemawat, 2004) is literally map() followed by a monoid fold at warehouse scale.
Always pass init to Reduce(). Without it, your code breaks on empty input and behaves differently on length-1 input. The identity element makes the fold total: it works for every list, including the empty one.
21.3 Right fold
Reduce() folds from the left by default. Setting right = TRUE folds from the right:
Reduce(`-`, 1:4)
#> [1] -8
Reduce(`-`, 1:4, right = TRUE)
#> [1] -2The left fold computes (((1 - 2) - 3) - 4) = -8. The right fold computes (1 - (2 - (3 - 4))) = -2. The difference only matters when the operation is not associative. Subtraction, division, exponentiation: these give different results depending on fold direction. Addition, multiplication, paste0, c(), intersect, union: these are associative, so direction does not matter (and left fold is the natural choice).
In Haskell, foldl and foldr are separate functions, and choosing between them is a design decision with performance implications. In R, right = TRUE is rarely needed. If you find yourself reaching for a right fold, consider whether your binary function simply has its arguments in the wrong order.
Exercises
- Use
Reduce()to compute the product of the numbers 1 through 6 (i.e., 6 factorial). Check your answer againstfactorial(6). - What is
Reduce(intersect, list(c(1,2,3,4), c(2,3,4,5), c(3,4,5,6)))? Work through the steps by hand, then verify. - Write a
Reduce()call that merges three data frames by a shared"id"column. (Hint:mergeis a binary function.) - Explain why
Reduce(/, c(120, 2, 3, 4, 5))gives 1 with a left fold and something different with a right fold.
21.4 accumulate: keeping the intermediates
Sometimes you want not just the final result, but every intermediate step. Base R handles this with the accumulate argument:
Reduce(`+`, 1:5, accumulate = TRUE)
#> [1] 1 3 6 10 15Running totals: 1, 3, 6, 10, 15. This is the scan operation (Haskell’s scanl, APL’s \). Where fold collapses a list to a single value, scan collapses it to a list of running values.
Cumulative operations are everywhere in practice: cumulative sums, running maxima, progressive string building.
Reduce(max, c(3, 1, 4, 1, 5, 9, 2, 6), accumulate = TRUE)
#> [1] 3 3 4 4 5 9 9 9A running maximum. Each value is the largest seen so far.
Reduce(paste, c("one", "two", "three", "four"), accumulate = TRUE)
#> [1] "one" "one two" "one two three"
#> [4] "one two three four"Progressive sentence construction, step by step.
21.5 purrr::reduce() and purrr::accumulate()
The purrr package provides reduce() and accumulate() as cleaner alternatives:
reduce(1:5, `+`)
#> [1] 15
accumulate(1:5, `+`)
#> [1] 1 3 6 10 15Same results. The differences from base Reduce() are syntactic: .x and .f instead of positional arguments, .init instead of init, no right argument (use reduce(.x, .f, .dir = "backward") instead). The purrr versions also accept lambda-style formulas, though anonymous functions with \() are clearer.
purrr::reduce2() folds over two lists in parallel, passing an extra argument from the second list at each step:
reduce2(
c("a", "b", "c"),
c("-", "+"),
\(acc, val, sep) paste0(acc, sep, val)
)
#> [1] "a-b+c"The binary function receives three arguments: the accumulator, the current element from the first list, and the current element from the second list. The second list must be one element shorter than the first (or the same length if .init is provided).
21.6 Practical patterns
Combining data frames. When split() or repeated API calls give you a list of data frames, Reduce(rbind, dfs) stacks them. For large lists, do.call(rbind, dfs) is faster because it avoids the O(n²) copying of repeated two-at-a-time binding.
dfs <- list(
data.frame(x = 1:2, y = c("a", "b")),
data.frame(x = 3:4, y = c("c", "d")),
data.frame(x = 5:6, y = c("e", "f"))
)
Reduce(rbind, dfs)
#> x y
#> 1 1 a
#> 2 2 b
#> 3 3 c
#> 4 4 d
#> 5 5 e
#> 6 6 fNested list extraction. Walking into a deeply nested structure:
nested <- list(a = list(b = list(c = 42)))
Reduce(\(x, i) x[[i]], c("a", "b", "c"), init = nested)
#> [1] 42Each step indexes one level deeper. The fold replaces a chain of [[ calls. (purrr’s pluck() does the same thing with better error messages.)
Set operations across many sets:
sets <- list(c(1,2,3,4), c(2,3,4,5), c(3,4,5,6))
Reduce(intersect, sets)
#> [1] 3 4
Reduce(union, sets)
#> [1] 1 2 3 4 5 6State machines with accumulate. An accumulate() call can model any sequential process where each step depends on the previous state:
# Simple bank account: deposits and withdrawals
transactions <- c(100, -20, -30, 50, -10)
accumulate(transactions, `+`)
#> [1] 100 80 50 100 90The running balance after each transaction. For more complex state (where the state is not a single number), use a list as the accumulator and a function that updates it.
Exercises
- Given
words <- c("R", "is", "a", "functional", "language"), useaccumulate()to produce the vectorc("R", "R is", "R is a", "R is a functional", "R is a functional language"). - Write a
Reduce()call that finds the intersection of the columns present in a list of data frames. (Hint: usenames()andintersect.) - Use
accumulate()to compute a running product ofc(2, 3, 5, 7). Verify that the last element equalsprod(c(2, 3, 5, 7)).
21.7 Row-wise folds with c_across()
Reduce() folds across the elements of a list. In Section 19.8, you saw across() fold a function across the columns of a data frame. But what about folding across columns within a single row? That is a row-wise reduction, and dplyr::c_across() handles it.
The setup is rowwise(), which tells dplyr to treat each row as a group of one:
penguins <- palmerpenguins::penguins
penguins |>
rowwise() |>
mutate(bill_ratio = bill_length_mm / bill_depth_mm) |>
ungroup() |>
head(3)
#> # A tibble: 3 × 9
#> species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
#> <fct> <fct> <dbl> <dbl> <int> <int>
#> 1 Adelie Torgers… 39.1 18.7 181 3750
#> 2 Adelie Torgers… 39.5 17.4 186 3800
#> 3 Adelie Torgers… 40.3 18 195 3250
#> # ℹ 3 more variables: sex <fct>, year <int>, bill_ratio <dbl>c_across() gathers the specified columns into a vector for each row, so you can pass that vector to any summary function:
df <- tibble(a = 1:3, b = 4:6, c = 7:9)
df |>
rowwise() |>
mutate(total = sum(c_across(a:c))) |>
ungroup()
#> # A tibble: 3 × 4
#> a b c total
#> <int> <int> <int> <int>
#> 1 1 4 7 12
#> 2 2 5 8 15
#> 3 3 6 9 18Each row’s a, b, and c values are collected into a vector, summed, and stored in total. This is a fold in the same sense as Reduce(): a binary operation (+) applied repeatedly to collapse multiple values into one, with the same monoid structure (associativity, identity element) that makes folds well-behaved (Section 21.2). The difference is the axis: Reduce() folds across list elements (or rows), while c_across() folds across columns within a row.
A performance warning: rowwise() + c_across() processes one row at a time in R, which means it scales poorly on large data frames. For simple cases like summing or taking a maximum, rowSums() and pmax() are orders of magnitude faster because they are vectorized in C. On a data frame with 100,000 rows, rowSums() might finish in milliseconds where rowwise() + c_across() + sum() takes seconds. Use c_across() when the row-wise operation is more complex than a single built-in function can handle, or when the set of columns is selected dynamically with tidyselect helpers, but switch to vectorized alternatives for anything that runs at scale.
Exercises
- Using
rowwise()andc_across(), compute the maximum of columnsa,b, andcfor each row ofdf <- tibble(a = c(3,1,4), b = c(1,5,9), c = c(2,6,5)). - Rewrite the
c_across()sum example usingrowSums()instead. Which is clearer? Which is faster? (Trybench::mark()on a larger data frame.) - Use
rowwise()andc_across(where(is.numeric))to compute the range (max minus min) of every numeric column for each row ofpenguins. Why would this be hard to do withoutc_across()?
21.8 When to fold and when not to
Reduce() is the right tool when you have a binary operation and a list of things to combine: merging data frames, chaining set operations, collapsing nested structures. It is the wrong tool when a vectorized function already exists. sum(x) is faster and clearer than Reduce(+, x). paste(x, collapse = " ") beats Reduce(paste, x). The base R cumulative functions (cumsum, cumprod, cummax, cummin) are compiled C and will always outperform accumulate(+, x).
The rule: if a specialized function exists, use it. Reach for Reduce() when you have a binary operation that R does not already provide a vectorized version of.
Reduce() also becomes awkward when the accumulator needs complex state. If your fold carries a list with multiple fields and each step updates several of them, you are writing a state machine. That is better expressed as a loop or a recursive function (Chapter 22), where the state updates are explicit and readable.
Graham Hutton proved in 1999 that fold can express any function on finite lists. This is a theoretical universality result, not practical advice. Just because you can write something as a fold does not mean you should.
Fold is the most underused functional in R. Most R programmers who discover map() never learn Reduce(), and end up writing loops for exactly the problems fold solves cleanly. If you take one thing from this chapter: next time you write a loop that carries an accumulator variable, try Reduce() first.