quote(x + 1)
#> x + 126 Metaprogramming
You have been writing lm(y ~ x) since Chapter 17, and filter(penguins, species == "Adelie") since Chapter 14. Both do something that might seem impossible: y ~ x is not evaluated before lm() sees it, and species is found inside a data frame, not in your environment. Normally, 1 + 1 evaluates to 2 and the expression disappears. R can keep it. Code in R is a data structure you can hold, examine, rearrange, and run whenever you choose.
That difference has a name: metaprogramming. Code that operates on code. You have been using it from your first formula, your first call to aes(), your first dplyr pipeline, all of it built on the machinery this chapter describes. This chapter shows you the mechanism.
26.1 Code is data
quote() captures an expression without evaluating it:
No result. What you got back is the expression itself: x + 1 was not computed but returned as a data structure (a call object) that you can store, inspect, and eventually evaluate. If that sounds strange, think of it through lambda calculus: (λx. x + 1)(3) reduces to 4, but if you “quote” the term you get the syntactic object (λx. x + 1)(3), something you can pull apart and study without triggering the reduction. quote(1 + 1) in R gives you the expression tree, not the value 2.
e <- quote(x + 1)
typeof(e)
#> [1] "language"
class(e)
#> [1] "call"The rlang package provides expr(), which does the same thing:
rlang::expr(x + 1)
#> x + 1To run a captured expression, use eval():
x <- 10
eval(e)
#> [1] 11eval() takes the frozen expression and evaluates it in the current environment, where x is 10, so the result is 11. The cycle is always the same: capture code, possibly modify it, then evaluate it somewhere. But why would you want to?
In Section 1.2, Church’s lambda calculus treated functions as data that could be passed around and applied. R extends that principle to all code, not only functions. An expression is a value, exactly like a number or a string; you can assign it, put it in a list, pass it to a function, transform it, then run the transformed version. In Section 7.5, you saw that + is a function and 2 + 3 is a function call. That function call is also a data structure you can capture and manipulate before it ever executes.
R inherited this idea from Lisp, where (quote (+ 1 2)) returns the list (+ 1 2) instead of computing 3. A language whose code has the same structure as its data is called homoiconic: Lisp code is lists, Lisp data is lists; R code is call objects (trees), and R data includes call objects. Python, Java, and C++ are not homoiconic — their code is text, and metaprogramming means parsing strings, which is fragile. R’s homoiconicity means it gets this for free.
Python’s inspect.getsource() returns a string you then have to parse; Java’s reflection is a separate API with its own rules and sharp edges. R skips both steps. quote(x + 1) gives you the syntax tree directly — a call object you can subset with [[, modify with replacement, and evaluate in any environment you choose. That tree is an ordinary R object. It lives in memory, responds to R’s standard operations, and you can pass it to functions like any vector or list. Tidy evaluation, formula interfaces, and ggplot2’s aes() all rest on that foundation.
26.2 Abstract syntax trees
Every R expression has a tree structure called an abstract syntax tree (AST). The lobstr package makes them visible:
lobstr::ast(x + y * 2)
#> █─`+`
#> ├─x
#> └─█─`*`
#> ├─y
#> └─2What looks like a flat sequence of tokens on the page is actually a tree where + sits at the root, x hangs off one branch, and * (y, 2) hangs off the other. The tree encodes precedence: * binds tighter than +, so y * 2 forms a subtree nested inside the + node, and R knows to evaluate it first without needing parentheses from you.
Three kinds of nodes make up every AST:
- Constants:
1,"hello",TRUE. These are leaves. They have no children. - Symbols (names):
x,y,mean. Also leaves. They represent name lookups: when evaluated, R searches for the value bound to that name. - Calls: function applications. These are branches. The first child is the function, the rest are arguments.
That is the entire vocabulary. x + 1 is a call to `+` with arguments x and 1. if (x > 0) "yes" else "no" is a call to `if` with three arguments. There is no special syntax that escapes the tree; even control flow is just function calls wearing different clothes.
lobstr::ast(if (x > 0) "yes" else "no")
#> █─`if`
#> ├─█─`>`
#> │ ├─x
#> │ └─0
#> ├─"yes"
#> └─"no"You can take a call object apart with standard list operations, where the first element is the function and the rest are the arguments:
e <- quote(mean(x, na.rm = TRUE))
e[[1]]
#> mean
e[[2]]
#> x
e[[3]]
#> [1] TRUEas.list() converts the whole call into a list, which makes the structure easy to see at a glance:
as.list(e)
#> [[1]]
#> mean
#>
#> [[2]]
#> x
#>
#> $na.rm
#> [1] TRUEYou can convert between text and expressions with parse() and deparse():
deparse(quote(x + y * 2))
#> [1] "x + y * 2"
parse(text = "x + y * 2")[[1]]
#> x + y * 2The conversion is not perfectly symmetric; comments and whitespace vanish in the round trip. But the tree structure survives, and the tree is what matters. So what happens when you need to capture not your own expression but someone else’s?
Exercises
- Draw the AST for
f(a, g(b, c))on paper. Then check your answer withlobstr::ast(). - What does
lobstr::ast(1 + 2 + 3)look like? Is+left-associative or right-associative? - Use
lobstr::ast()to visualizex[1]. What function is at the root?
26.3 Capturing and evaluating expressions
Capturing your own expression and capturing the caller’s expression require different tools, and confusing the two is a reliable source of bugs.
quote() and expr() capture what you type directly:
quote(a + b)
#> a + b
rlang::expr(a + b)
#> a + bsubstitute() captures what the caller passed:
f <- function(x) substitute(x)
f(a + b)
#> a + bYou called f(a + b), and substitute(x) reached back across the function boundary to the call site, grabbing a + b, the expression the caller actually wrote. The same mechanism powers dplyr::filter(): it sees species == "Adelie" as an expression rather than immediately evaluating it and getting an error about a missing variable.
The rlang equivalents are enexpr() (captures an expression) and enquo() (captures an expression plus its environment, forming a quosure). The decision rule is simple: use substitute() in base R code, use enexpr()/enquo() in tidyverse/rlang code. If your function will be called by dplyr or other tidy-evaluation-aware code, reach for the rlang tools because they integrate with !!, { }, and eval_tidy(). If you are writing standalone base R, substitute() and eval() are sufficient and carry zero dependencies.
g <- function(x) rlang::enexpr(x)
g(a + b)
#> a + bOnce you have an expression, you evaluate it with eval(). The second argument controls where:
e <- quote(x + 1)
eval(e, list(x = 10))
#> [1] 11
eval(e, list(x = 100))
#> [1] 101The same frozen expression, evaluated in different environments, gives different results. Data masking is exactly that trick: eval_tidy() from rlang evaluates an expression against a data frame, so filter(penguins, species == "Adelie") looks up species in the data rather than in the global environment.
library(rlang)
df <- data.frame(x = c(1, 2, 3), y = c(10, 20, 30))
eval_tidy(expr(x + y), data = df)
#> [1] 11 22 33Tidy evaluation calls this pattern defuse-and-inject: capture, optionally modify, then evaluate in the right context.
One more base R tool worth knowing: match.call(). Inside a function, it returns the entire call as the user typed it, with arguments matched by name:
my_lm <- function(formula, data, subset = NULL) {
match.call()
}
my_lm(y ~ x, data = mtcars)
#> my_lm(formula = y ~ x, data = mtcars)Many modeling functions use match.call() to record the call for reproducibility. When you print a fitted model and see Call: lm(formula = y ~ x, data = mtcars), that string came from match.call() stashing the original invocation.
Exercises
- Write a function
show_codethat takes an argument and prints the expression the caller passed (usesubstitute()anddeparse()). Test:show_code(mean(x, na.rm = TRUE))should print"mean(x, na.rm = TRUE)". - Evaluate
quote(x * 2)in an environment wherex = 7. Then evaluate it wherex = -3. - Write a function that uses
match.call()to return its own call. Call it with several arguments and observe the output.
26.4 Building expressions programmatically
What if the function name or the variable is not known until runtime, when your code has to decide at the last moment what expression to build? You need to construct the expression from parts.
rlang::call2() constructs a call object:
rlang::call2("+", 1, 2)
#> 1 + 2
eval(rlang::call2("+", 1, 2))
#> [1] 3The next step is quasiquotation: building an expression template with holes that get filled in at construction time. The !! (bang-bang) operator injects a value into an expression:
my_var <- rlang::expr(body_mass_g)
rlang::expr(mean(!!my_var))
#> mean(body_mass_g)!! replaced my_var with its value (body_mass_g), producing the expression mean(body_mass_g). Without !!, you would get mean(my_var), which is an entirely different expression and not the one you wanted.
!!! (triple bang, or splice) injects a list of expressions as separate arguments:
vars <- rlang::exprs(species, island)
rlang::expr(group_by(penguins, !!!vars))
#> group_by(penguins, species, island)This is what { } (embrace) from tidy evaluation does under the hood. When you write a function like:
my_summary <- function(data, var) {
data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE))
}
my_summary(palmerpenguins::penguins, body_mass_g)
#> # A tibble: 1 × 1
#> mean
#> <dbl>
#> 1 4202.
my_summary(palmerpenguins::penguins, flipper_length_mm)
#> # A tibble: 1 × 1
#> mean
#> <dbl>
#> 1 201.the embrace operator defuses var with enquo() and injects it with !!, which means it is syntactic sugar for the defuse-and-inject pattern. The caller writes bare column names, exactly as they would with dplyr directly, and your function forwards them transparently, no quoted strings, no special syntax at the call site.
Base R has bquote() for quasiquotation, using .() instead of !!:
my_var <- quote(body_mass_g)
bquote(mean(.(my_var)))
#> mean(body_mass_g)It works, but bquote() is less common in practice and does not support splicing. All of these tools, !!, bquote(), expr(), solve the same problem that Lisp macros have addressed since the 1960s: how to write code that writes code, safely, without collapsing into string manipulation. When Wickham designed tidy evaluation, he was adapting that lineage for R, and the result is that you can generate expressions with the same structural guarantees that quote() gives you by hand.
Exercises
- Use
rlang::call2()to build the expressionsqrt(16), then evaluate it. - Create a variable
col <- rlang::expr(bill_length_mm). Use!!to build the expressionmean(bill_length_mm, na.rm = TRUE). - Given
fns <- rlang::exprs(mean, sd, median), uselapply()andcall2()to build three expressions:mean(x),sd(x),median(x).
26.5 Formulas as expressions
Before anyone used the word “metaprogramming” in an R context, there were formulas. When you write:
lm(body_mass_g ~ bill_length_mm, data = palmerpenguins::penguins)
#>
#> Call:
#> lm(formula = body_mass_g ~ bill_length_mm, data = palmerpenguins::penguins)
#>
#> Coefficients:
#> (Intercept) bill_length_mm
#> 362.31 87.42the expression body_mass_g ~ bill_length_mm is not evaluated in the ordinary sense. R captures it as a formula object: two expressions (the left-hand side and the right-hand side) bundled together with the environment where the formula was created.
f <- y ~ x + z
typeof(f)
#> [1] "language"
length(f)
#> [1] 3
f[[2]]
#> y
f[[3]]
#> x + zA formula stores its terms as call objects. f[[2]] is the left-hand side (y), f[[3]] is the right-hand side (x + z). The formula also carries an environment attribute:
environment(f)
#> <environment: R_GlobalEnv>Formulas work across function boundaries because the formula remembers where its variables should be looked up. Quosures in tidy evaluation solve the same problem: an expression bundled with its environment. Wickham borrowed the concept directly from formulas, and their 30-year track record in R’s modeling ecosystem is what made the design credible.
The formula language (+, *, :, -, I()) is a domain-specific language for specifying models. y ~ x1 * x2 does not mean “multiply x1 by x2.” It means “include x1, x2, and their interaction.” model.matrix() interprets these operators to build the design matrix:
model.matrix(~ species + island, data = palmerpenguins::penguins) |> head()
#> (Intercept) speciesChinstrap speciesGentoo islandDream islandTorgersen
#> 1 1 0 0 0 1
#> 2 1 0 0 0 1
#> 3 1 0 0 0 1
#> 4 1 0 0 0 1
#> 5 1 0 0 0 1
#> 6 1 0 0 0 1You can also build formulas dynamically, which becomes useful when the set of predictors is not known in advance:
predictors <- c("bill_length_mm", "flipper_length_mm")
f <- as.formula(paste("body_mass_g ~", paste(predictors, collapse = " + ")))
f
#> body_mass_g ~ bill_length_mm + flipper_length_mmConstructing code from data, then running it: that is metaprogramming at its most practical. The formula is a piece of code assembled from strings, about to be interpreted by lm() as a model specification, and neither the user nor the modeling function needs to know it was built programmatically.
The connection to Chapter 23 is direct. Formulas are non-standard evaluation: the variable names are unquoted, body_mass_g is looked up in penguins rather than in the calling environment. Dplyr uses the same trick, just with newer machinery.
Exercises
- Create a formula
y ~ x1 + x2and extract its right-hand side. - Build a formula programmatically: given
response <- "mpg"andpredictors <- c("wt", "hp"), construct the formulampg ~ wt + hpusingas.formula()andpaste(). Pass it tolm()with themtcarsdataset.
26.6 When to use metaprogramming
Metaprogramming lets you generate code, build DSLs, and eliminate boilerplate. That power comes with a cost.
Good uses:
- Domain-specific languages. Model formulas, ggplot2 aesthetics, dplyr pipelines. These are all DSLs built on metaprogramming. If you are building a package with an interactive interface that benefits from concise, expressive syntax, metaprogramming is the right tool.
- Code generation. Building model specifications programmatically, creating batches of test cases, generating reports from templates.
- Debugging and inspection.
substitute()to see what was passed,match.call()to record the exact call for reproducibility.
Bad uses:
- Things that work with regular functions. If
f(x)solves the problem, do not complicate it witheval(substitute(...)). - Performance. Metaprogramming is not faster. It is more flexible, which is a different axis entirely.
- Showing off. Code that manipulates code is hard to read and hard to debug. Use it when the benefit (a concise user interface) outweighs the cost (a complex implementation), not before.
Most R users consume metaprogramming (by using dplyr, ggplot2, formulas) and very few need to produce it. Package authors building interactive interfaces may need these tools; analysts writing data pipelines almost certainly do not. The test: does your user-facing API become meaningfully better with non-standard evaluation? If yes, the complexity is worth absorbing. Saving yourself a quoted string is not enough reason.
26.7 The metaprogramming toolkit
For further study:
- Tidy evaluation (rlang):
expr(),enquo(),eval_tidy(),!!,!!!,{ }. The system behind dplyr, tidyr, and ggplot2. Covered in depth in Advanced R, chapters 17 through 20. - Base R tools:
quote(),substitute(),eval(),match.call(),sys.call(),bquote(). These predate rlang and still power much of R’s infrastructure. Every R programmer should knowsubstitute()andeval(). - Inspection packages:
lobstrfor AST visualization,pryrfor probing environments and function internals. - Further reading: Wickham’s Advanced R (2nd edition) devotes four chapters to metaprogramming. Mailund’s Metaprogramming in R (Springer, 2017) is a book-length treatment covering domain-specific languages and code generation.
Exercises
- Look at the source of
dplyr::filter(typedplyr::filter.data.frameat the console). Can you spot where it captures the user’s expressions? - Compare
quote(),rlang::expr(),substitute(), andrlang::enexpr(). Write one sentence describing when you would use each.