1  Two models of computation

By 1930, the dream was close. For three centuries, each generation of mathematicians had made precise what the previous one had assumed — numbers, inference, and finally proof itself. The goal was a mechanical method to settle any mathematical question: feed in a statement, follow the rules, get the answer. The greatest living mathematician had spent thirty years organizing this effort into a single program. He needed to show that every true statement had a proof, and that no proof ever led to a contradiction. Establish both, and any question could be settled by turning a crank. He stood before a crowd in Königsberg and declared it would be done. Wir müssen wissen. Wir werden wissen. We must know. We shall know. His name was David Hilbert.

The following year, a twenty-five-year-old in Vienna proved it was impossible. Some true statements about arithmetic had no proof. The structure of formal systems ruled them out. Hilbert’s program was over.

But proving it required first defining something nobody had precisely stated. To show that no formal system could settle every question, someone had to say what formal systems could do. What does it mean to compute?

Two people answered that question in 1936, strangers to each other. One imagined a machine: a paper tape, a reading head, a finite list of rules. The other wrote down a grammar of functions, three lines long. It would take a few years before anyone noticed that the two answers described exactly the same thing. The split between them runs through every programming language alive today, R included. We’ll get to the formalism. First, the intuition.


Once upon a time, four numbers had been left on the table: 3, 5, 2, 8.

Mr. Tape came in through the front door. He was long and thin, and he walked in a straight line from the door to the table. Behind him came Mr. State, who took his place behind Mr. Tape and waited, as he always did. But they were not the only ones there. At the far end of the table someone was already sitting with his hands flat on the surface.

Mr. Tape touched the first number and read it aloud.

“Three.”

Mr. State was holding a 3. No one had seen where it came from. On the table, the 3 was still there. He seemed pleased.

Mr. Tape touched the next.

“Five.”

Mr. State was holding an 8. The 3 had slipped his mind. He looked at the 8 as though he had always been holding it.

“Two.”

He was holding a 10.

“Eight.”

He was holding an 18. He could not have told you what he’d had a moment before.

“Eighteen,” said Mr. Tape, as was his duty.

The man at the far end of the table had been watching.

“Let’s see,” said Mr. Fold at last.

Mr. Fold picked up the 2 and the 8. He pressed their edges together and creased them flat. A 10 sat on the table. The 2 and the 8 were gone.

That seemed reasonable.

He folded the 10 into the 5 the same way. Crease and press again. Then the 15 into the 3.

Mr. State stepped forward and picked it up. He turned it over.

It was the same number.

1.1 Instructions and state

Mr. Tape and Mr. State are a computing machine. In 1936, a twenty-four-year-old graduate student at Cambridge proved that their method — read a symbol, update a state, move to the next — is enough to carry out any computation that can be precisely described. The graduate student was Alan Turing, who three years later would crack the German military’s Enigma cipher in World War II.

A Turing machine does things: it reads, writes, moves, changes. It has memory (the tape), and that memory changes over time. A program, in this view, is a sequence of instructions that progressively modify stored values until the answer emerges.

Here is what that looks like as pseudocode, for a simple sum of four numbers:

total = 0
for each number in [3, 5, 2, 8]:
    total = total + number
return total

total starts at 0, becomes 3, becomes 8, becomes 10, becomes 18. Mr. Tape read the numbers; Mr. State held the running total, forgetting each previous value the moment the next one arrived. The variable varies; the answer lives in its final state.

Turing’s tape was an abstraction. ENIAC, built a decade later at the University of Pennsylvania, was not. It weighed thirty tons. To program it, you walked between cabinets and rewired plugboards by hand, instruction by instruction, switch by switch, the machine’s “memory” a literal physical configuration of cables. Every popular language since has inherited that picture, even when they hid the cables. A variable is still a box. A program is still a sequence of things you do to what’s inside it.

So is that what computation is?

Figure 1.1: Two models of computation: a Turing machine mutates state step by step; Church’s lambda calculus reduces an expression to a value.

1.2 Expressions and functions

Mr. Fold used no tape and kept no state. He pressed numbers together, pair by pair, each pair replaced by its sum and the originals destroyed, until one remained. Replacement instead of memory, and yet the same answer as Mr. Tape and Mr. State. Can Mr. Fold match their infinite reading and rewriting power?

Alonzo Church, who happened to be Turing’s doctoral advisor at Princeton, was working on the same problem from a completely different angle. His notation had no tape and no memory. You wrote down an expression, something like (λx. x + 1)(5), and you simplified it. The 5 slid into the x. What was left was 6. Nothing had been stored anywhere. Nothing had been overwritten. The arithmetic had simply unfolded, the way Mr. Fold had folded.

He called it the lambda calculus, and it has only three pieces: variables (x), functions (λx. x + 1, meaning take x, return x + 1), and application (writing one next to the other). That is the entire grammar. You write an expression and simplify it by applying functions to arguments until you reach a value that can’t be simplified further. Church proved that this system, despite having no concept of memory or instructions, can compute exactly the same class of functions as Turing’s machine.

Memory and instructions were never necessary for computation in the first place.

The result is known as the Church-Turing thesis. Everything a Turing machine can do, pure expression can do without a single mutable cell.

What does the same sum look like in this world?

sum([3, 5, 2, 8])
= add(3, sum([5, 2, 8]))
= add(3, add(5, sum([2, 8])))
= add(3, add(5, add(2, 8)))
= add(3, add(5, 10))
= add(3, 15)
= 18

The expression unfolds, each add consuming its arguments, until only 18 remains. This was Mr. Fold’s method — no running total, no memory, just pairs collapsing until one number was left. Nothing was overwritten.

1.3 What this has to do with R

R descends from Church’s model. The chain runs through four decades and several languages (Chapter 2 tells the full story), but the result is visible right now, without knowing any of that history. Compare the shape of the code with the two models from earlier.

Suppose you have a list of exam scores and you want to know how many students scored above 70. In the instruction-and-state style, you would create a counter, loop through the scores, and increment each time you find one above the threshold:

scores <- c(65, 82, 71, 90, 55, 78)

count <- 0
for (s in scores) {
  if (s > 70) {
    count <- count + 1
  }
}
count

This works, and R can do it. But there is a variable, count, that gets modified in a loop, one step at a time. It starts at 0, becomes 1, becomes 2, becomes 3, becomes 4. One slate, always overwritten. Here is the same problem in the expression-and-function style:

scores <- c(65, 82, 71, 90, 55, 78)

sum(scores > 70)

One line. scores > 70 produces a vector of TRUE and FALSE values; sum() counts the TRUEs. The expression describes the answer, and R evaluates it.

Both give the answer 4. The second one is a piece of mathematics from 1936, written before electronic computers existed, and in 2026 it still outruns the loop. The loop works against the grain. The expression works with it.

The same pattern runs through the rest of the language:

  • if/else returns a value, so you can assign its result directly. In most languages, if is a statement that does something; in R, it’s an expression that produces something (Chapter 3):

    y <- if (x > 0) "positive" else "negative"
  • Functions are ordinary values. You can store one in a variable, pass it to another function, or create one on the fly (Section 5.4).

  • x * 2 multiplies every element of x at once, with no loop and no index variable. Operations that in other languages require a loop are built into R as expressions on whole vectors (Chapter 4).

  • Pipe chains compose functions left to right, each one receiving the result of the previous one. This is Church’s core operation, function composition, made into syntax (Chapter 15):

    data |> filter(x > 5) |> summarize(mean(y))

But how did Church’s 1936 notation, written decades before computers existed, end up inside a language used by statisticians? Chapter 2 follows the chain: Church to Lisp to Scheme to S to R, and what each step added along the way.