1 + 1
#> [1] 23 Expressions and names
So you can name values and use them later. But what exactly counts as an expression?
Open R. Type 1 + 1 and press Enter.
You just wrote a complete R program. It has no variables, no functions, no setup. It is an expression, and R evaluated it and printed the result. Every piece of R code you will ever write works this way: you give R an expression, R gives you back its value.
What makes this work? Three things: R’s interactive loop, expressions themselves, and a mechanism for giving names to values so you can use them later.
3.1 The REPL
When you open R, you see a prompt: >. R is waiting. When you type something and press Enter, four things happen in sequence:
- R reads what you typed.
- R evaluates it (figures out the value).
- R prints the result.
- R loops back to step 1 and waits again.
This is called a REPL (Read-Eval-Print Loop). There is no “compile” step, no “run” button; you type an expression, you see its value immediately.
In compiled languages (C, Java, and Go, which Google created in 2009 because even their best engineers were waiting too long for C++ to compile), you write the whole program first, then a compiler translates it into machine code before anything runs. R is interpreted: it reads and evaluates one expression at a time, immediately. Python and JavaScript work the same way. The tradeoff is speed: a compiler can optimize the whole program in advance, so compiled code runs faster. But interpreted languages let you experiment and see results as you go, which is why they dominate in data analysis.
In practice, the line between compiled and interpreted is blurry. R itself is written in C, and many R functions are thin wrappers around compiled C or Fortran code. When you type sum(x), the actual arithmetic happens in compiled machine code underneath. You get the interactive interface; the heavy computation still runs at full speed.
Try a few:
42
#> [1] 42"hello"
#> [1] "hello"TRUE
#> [1] TRUEsqrt(9)
#> [1] 3Each of these is an expression. 42 is an expression whose value is 42; "hello" evaluates to the text string “hello”; sqrt(9) calls the function sqrt with the argument 9 and returns 3. The [1] you see before each result is R telling you “this is the first element of the output.” You can ignore it for now; it becomes useful when results have many elements (Chapter 4).
Notice the pattern: R evaluates expressions and prints their values, exactly the way the calculator in Chapter 1 evaluated 3 + 5 + 2 + 8. This is Church’s model of computation (Section 1.2), made interactive through the REPL loop. But evaluating expressions and forgetting them immediately is like a calculator with no memory. What if you want to keep a result?
Exercises
- Type
2 + 3 * 4into R. What do you get? Does R follow the usual order of operations (multiplication before addition)? - What happens when you type
"hello" + 1? Read the error message. What is R telling you? - Try
pi. This is a name that R already knows. What value does it have?
3.2 Names and binding
You want to compute something, save the result, and use it later. In R, you do this by giving a value a name:
x <- 3The arrow <- is R’s assignment operator. It binds the name x to the value 3. After this line runs, typing x gives you back 3:
x
#> [1] 3You can then use x in other expressions:
x + 10
#> [1] 13x * x
#> [1] 9It is tempting to think of x as a box that contains 3, and if you have used C or Python, that is exactly the model you carry around: the variable is a container, and assignment puts a value into it. R works differently. x <- 3 does not put 3 into a box; it attaches the label x to a value that already exists. The value 3 is out there. x is just a name for it.
Why does that distinction matter?
y <- xNow both x and y refer to 3. If you later write x <- 7, that changes what x points to, but y still points to 3:
x <- 7
x
#> [1] 7
y
#> [1] 3This “names are labels, not containers” model is a consequence of R’s functional design. Values in R are immutable; when you “change” something, R creates a new value and moves the name to point at it. The old value stays untouched (and gets cleaned up automatically when nothing points to it anymore; R’s garbage collector periodically scans memory and reclaims anything that no name points to anymore, a mechanism R inherited from Lisp (Section 2.1); Section 29.1 covers how this works at the implementation level). If names were containers, you would need to worry about two containers sharing the same contents and one of them mutating behind your back. With labels and immutable values, that problem vanishes.
Think about what vanishes with it. In C, you learn to watch for aliasing: two pointers to the same block of memory, one of them writing while the other reads stale data. In Python, you learn that a = b for a list means both names see the same mutable object, and modifying through one surprises anyone holding the other. Entire chapters of software engineering textbooks are devoted to defensive copying, synchronization, and ownership protocols — machinery whose sole purpose is to keep shared mutable state from corrupting your program. R sidesteps all of it by removing the condition that makes the checks necessary. Values do not change, so there is nothing to corrupt, no mutable state for two references to fight over. That category of bugs — aliasing, stale reads, defensive-copy failures — is structurally absent, which frees you to think about the problem you sat down to solve instead of the bookkeeping.
Use <- for assignment, not =. Both work, but <- is the convention in R, and for good reason: on the Bell Labs terminals where S was designed, there was a single key that typed <- (Section 2.3). The convention stuck, and almost all R code uses it. Reserve = for naming arguments inside function calls, like log(8, base = 2).
Exercises
Predict what this code prints, then run it to check:
a <- 10 b <- a a <- 20 bWhat happens if you type a name that doesn’t exist, like
zzz? Read the error message.Names in R are case-sensitive. If you run
X <- 5, can you get the value back by typingx?
3.3 Expressions all the way down
In most programming languages, there are two kinds of code: expressions (which produce a value) and statements (which do something but don’t produce a value). In Python, 1 + 1 is an expression, but if x > 0: print("yes") is a statement. You cannot write result = if x > 0: "yes" else: "no" because if doesn’t produce a value.
R has no statements. Everything is an expression, and every expression returns a value.
if/else returns a value:
x <- 3
result <- if (x > 0) "positive" else "non-positive"
result
#> [1] "positive"You can store the result of if/else in a name because if/else is an expression that evaluates to a value, just like 1 + 1. In Python or C, you cannot do this because if is a statement that performs an action rather than producing a result.
A block of code in curly braces { } returns the value of its last expression:
result <- {
a <- 2
b <- 3
a + b
}
result
#> [1] 5The block evaluated three expressions, and the value of the whole block is the value of the last one: a + b, which is 5.
Even assignment returns a value (though R hides it). You can reveal it with parentheses:
(z <- 42)
#> [1] 42Without the parentheses, R assigns 42 to z and prints nothing. With them, R prints the value that the assignment expression returned.
This “everything is an expression” rule comes directly from Church’s lambda calculus (Section 1.2), where there are only expressions: no instructions, no statements, no side-channel commands. R followed that design faithfully, and the payoff is that you can compose pieces of R code in ways that would be impossible in a language built around statements. If every piece of code produces a value, every piece of code can be nested inside every other piece of code. What happens when you apply that principle to the thing that does the most work in R: functions?
Exercises
- What does this return? Predict first, then run it:
if (FALSE) 1 else 2 - What value does this block produce?
{ 10; 20; 30 } - Can you store the result of
for (i in 1:3) iin a name? Try it. What happens? (This is one of the very few places where R behaves more like a statement language.)
3.4 Calling functions
You have already called a function: sqrt(9). But what is actually happening when R sees those parentheses?
sqrt is a name bound to a function, the same way x was a name bound to 3. The parentheses () tell R to call the function, passing 9 as an argument. R evaluates the call and returns the result:
sqrt(9)
#> [1] 3What happens if you type sqrt without parentheses?
sqrt
#> function (x) .Primitive("sqrt")R prints the function itself. sqrt is a value, just like 3 or "hello", and the parentheses are what make R actually run it. This distinction between the function itself and calling the function may seem academic right now, but it will matter enormously when you start passing functions to other functions in later chapters.
Functions can take multiple arguments. log computes a logarithm, and you can specify the base:
log(8, base = 2)
#> [1] 38 is a positional argument (R knows it’s the first one), while base = 2 is a named argument (you’re saying explicitly which parameter gets the value 2). Named arguments can go in any order:
log(base = 2, x = 8)
#> [1] 3To learn what arguments a function takes, use ?:
?logThis opens the help page for log, which lists its arguments, explains what they do, and gives examples.
Here is where things get interesting. + is a function:
`+`(3, 5)
#> [1] 8So is <-:
`<-`(w, 99)
w
#> [1] 99Everything that happens in R is a function call. When you write 3 + 5, R translates it to `+`(3, 5): a function called +, applied to two arguments. There is no special syntax for arithmetic, no operators baked into the language at a level below functions. + is a function, 3 + 5 is function application, and in the lambda calculus the only thing you can do is apply a function to an argument. R took that idea literally.
Exercises
- Use
round()to round 3.14159 to two decimal places. (Hint: check?roundfor the argument name.) - What does
nchar("functional")return? What aboutnchar("")? - Try
sum(1, 2, 3, 4, 5). Now trysum(1:5). Do they give the same result?
3.5 The workspace
Every name you create lives in your workspace (also called the global environment). You can see all the names you’ve defined with ls():
ls()And you can remove a name with rm():
rm(x)After rm(x), the name x is gone. The value it pointed to gets cleaned up by garbage collection if nothing else refers to it.
The workspace is simple on the surface: ls() shows what’s in it, rm() removes things. But there is more going on underneath. Functions create their own local environments when they run, and R looks up names by searching through a chain of environments from the inside out. You have been working in one environment so far, the global one. When you start writing functions (Chapter 18), the picture gets richer.
When you quit R, it may ask whether to “save your workspace image.” This saves all names and values currently in your workspace to a file, so they reappear next time you open R. In the early days, when computations could take hours, this made sense. Today, most computations are fast enough that you’re better off keeping your work in a script file (.R or .qmd) and rerunning it from scratch. That way, every value in your workspace traces back to a line of code you can point to, and you never wonder where a mysterious leftover variable came from.
Try this:
environment()R returns <environment: R_GlobalEnv>. The workspace is an object, the same kind of thing as the numbers and strings you have been binding to names. You can pass it to ls.str() and get back every binding in it. The container that holds your names is itself a value with a name.