10  Lists

Vectors are strict: every element must be the same type. If you combine a number and a string, R coerces one to match the other (Section 4.3). This works well for columns of data, but not everything fits into a single type. A penguin has a species (character), a body mass (double), and a vector of measurements (double). You need a container that holds different types together without coercing them. That container is a list.

10.1 Why lists exist

A vector is homogeneous, but a list is heterogeneous: each element can be a number, a string, a vector, a function, or another list.

penguin <- list(
  species = "Adelie",
  mass = 3750,
  measurements = c(39.1, 18.7, 181)
)
penguin
#> $species
#> [1] "Adelie"
#> 
#> $mass
#> [1] 3750
#> 
#> $measurements
#> [1]  39.1  18.7 181.0

species is a character scalar, mass is a numeric scalar, measurements is a numeric vector of length 3. A vector could not hold these together without flattening and coercing them. A list keeps each element intact.

Technically, a list is a recursive vector. “Recursive” because its elements can themselves be lists, allowing arbitrary nesting. R’s documentation calls lists “generic vectors” to distinguish them from atomic vectors (Section 4.2). The type is "list":

typeof(penguin)
#> [1] "list"

You already used a list in Section 7.3 when you stored functions as named elements. Lists are the same structure whether they hold numbers, strings, or functions.

The idea of pairing two values into one object has deep roots. In lambda calculus, a pair is encoded as λa.λb.λf. f a b: a function that captures two values and waits for a selector. Pass it λx.λy. x (select first) and you get a; pass λx.λy. y (select second) and you get b. Lisp’s cons, car, and cdr are exactly these operations with names. R’s list() is the practical descendant of this construction: a container built from the idea that you can bundle values together and retrieve them with selectors ([[1]], [[2]], $name).

Exercises

  1. Create a list with your name (character), your age (numeric), and your three favourite colours (a character vector). Print it.
  2. What does typeof(list(1, "a", TRUE)) return?
  3. What happens if you use c() instead of list() to combine 1, "a", and TRUE? Why is the result different?

10.2 Creating and accessing lists

10.2.1 Creating lists

list() creates a list. Elements can be named or unnamed:

named <- list(a = 1, b = "hello", c = TRUE)
unnamed <- list(1, "hello", TRUE)

Named elements are easier to work with. Unnamed elements are accessed by position only.

10.2.2 The train analogy

Think of a list as a train. Each carriage holds cargo. There are three ways to access it:

  • x[1] returns the first carriage, still attached to the train. The result is a list of length 1.
  • x[[1]] opens the first carriage and pulls out what’s inside. The result is the element itself.
  • x$name is shorthand for x[["name"]].
x <- list(a = 10, b = c(1, 2, 3), c = "hello")

[ returns a sub-list:

x[1]
#> $a
#> [1] 10
typeof(x[1])
#> [1] "list"

The result is still a list, just a shorter one.

[[ extracts the element:

x[[1]]
#> [1] 10
typeof(x[[1]])
#> [1] "double"

Now it’s the number 10, not a list containing 10. This is the most common source of confusion with lists: [ keeps the container, [[ removes it.

$ works with names:

x$b
#> [1] 1 2 3
x[["b"]]
#> [1] 1 2 3

Both return the same thing. $ is convenient for interactive use; [[ is necessary when the name is stored in a variable:

key <- "b"
x[[key]]
#> [1] 1 2 3
x$key
#> NULL

x$key looks for a literal element named "key", not for the value stored in the variable key. Use [[ when the name is computed.

You can select multiple elements with [:

x[c(1, 3)]
#> $a
#> [1] 10
#> 
#> $c
#> [1] "hello"
x[c("a", "c")]
#> $a
#> [1] 10
#> 
#> $c
#> [1] "hello"

But [[ only works with a single index. It extracts one element at a time.

TipOpinion

Default to [[ and $ for extracting elements. Use [ only when you need a sub-list. If you find yourself writing x[1][[1]], you wanted x[[1]] all along.

Exercises

  1. Given x <- list(a = 10, b = 20, c = 30), predict the output of x[2], x[[2]], and x$b before running them.
  2. What is typeof(x[1]) versus typeof(x[[1]])? Explain the difference.
  3. Create a variable name <- "c". Use it to extract the element "c" from the list x. Which accessor works?

10.3 Nested lists

A list element can be another list. This creates nested structures:

study <- list(
  site = "Palmer Station",
  years = 2007:2009,
  species = list(
    list(name = "Adelie", count = 152),
    list(name = "Gentoo", count = 124),
    list(name = "Chinstrap", count = 68)
  )
)

str() shows the tree:

str(study)
#> List of 3
#>  $ site   : chr "Palmer Station"
#>  $ years  : int [1:3] 2007 2008 2009
#>  $ species:List of 3
#>   ..$ :List of 2
#>   .. ..$ name : chr "Adelie"
#>   .. ..$ count: num 152
#>   ..$ :List of 2
#>   .. ..$ name : chr "Gentoo"
#>   .. ..$ count: num 124
#>   ..$ :List of 2
#>   .. ..$ name : chr "Chinstrap"
#>   .. ..$ count: num 68

str() is the most useful function for inspecting lists. It prints the structure compactly, showing types, lengths, and nesting levels. Use it whenever you receive an unfamiliar object. Every nested list is, in fact, a tree: the top-level list is the root, each element is a branch, and atomic values are leaves. The same structure appears in file systems (directories containing files and subdirectories), HTML (the DOM), and JSON.

To access nested elements, chain [[ or $:

study$species[[1]]$name
#> [1] "Adelie"
study[["species"]][[2]][["count"]]
#> [1] 124

Each [[ or $ steps one level deeper. study$species is a list of three lists. study$species[[1]] is the first of those lists. study$species[[1]]$name is the string "Adelie".

Real-world nested lists are everywhere. You will encounter nested lists constantly in practice: JSON from a web API, the output of lm(), and configuration files are all nested lists or objects built on them. The access pattern is always the same: $ or [[ to step down one level, repeated as many times as needed.

Exercises

  1. Given the study list above, extract the count for Chinstrap penguins.
  2. Use str() on the result of lm(mpg ~ wt, data = mtcars). How many top-level elements does the model object have?
  3. Create a nested list representing a book: title, author, and a list of chapters (each with a number and a title). Extract the title of the second chapter.

10.4 Lists as the backbone of R

Lists are not just a data structure you use directly. They are the foundation that other structures are built on.

A data frame is a list:

df <- data.frame(x = 1:3, y = c("a", "b", "c"))
typeof(df)
#> [1] "list"
is.list(df)
#> [1] TRUE

Each column is one element of the list. The data frame adds a constraint: all columns must have the same length. But underneath, df$x works exactly like list access, because it is list access.

A linear model is a list:

fit <- lm(mpg ~ wt, data = mtcars)
typeof(fit)
#> [1] "list"
names(fit)
#>  [1] "coefficients"  "residuals"     "effects"       "rank"         
#>  [5] "fitted.values" "assign"        "qr"            "df.residual"  
#>  [9] "xlevels"       "call"          "terms"         "model"

fit$coefficients, fit$residuals, fit$fitted.values: these are all list extractions. The model object is a list with a class attribute ("lm") that tells R how to print and summarize it, but the data access works the same way.

Understanding lists means understanding everything built on top of them. Data frames (Chapter 11), model objects, and environments are all lists (or list-like structures) with different rules about what they can contain. The next chapter shows how data frames exploit this structure.

Exercises

  1. Run typeof() on a data frame you create. Then run is.list() on it. What do you conclude?
  2. Fit a model with lm(Sepal.Length ~ Petal.Length, data = iris). Use names() to see its elements, then extract the R-squared from summary() of the fit. (Hint: str(summary(fit)) will help.)
  3. What does length() return for a data frame with 5 columns and 100 rows? Why?

10.5 Modifying lists

Add an element by assigning to a new name:

x <- list(a = 1, b = 2)
x$c <- 3
x[["d"]] <- "new"
str(x)
#> List of 4
#>  $ a: num 1
#>  $ b: num 2
#>  $ c: num 3
#>  $ d: chr "new"

Remove an element by setting it to NULL:

x$b <- NULL
str(x)
#> List of 3
#>  $ a: num 1
#>  $ c: num 3
#>  $ d: chr "new"

b is gone, not set to NULL. This is a common trap: assigning NULL to a list element deletes it. If you actually want to store NULL as a value, use x["e"] <- list(NULL):

x["e"] <- list(NULL)
str(x)
#> List of 4
#>  $ a: num 1
#>  $ c: num 3
#>  $ d: chr "new"
#>  $ e: NULL

e exists and its value is NULL. The single-bracket assignment with list(NULL) is the only way to store an actual NULL in a list.

Replace an element by assigning to an existing name:

x$a <- 100
x$a
#> [1] 100

R uses copy-on-modify for lists, the same as for vectors. When you modify a list, R copies only the parts that change, not the entire structure. For practical purposes, you can treat assignment as modifying in place; the copying is an implementation detail that rarely affects your code.

Exercises

  1. Create a list with elements x = 1 and y = 2. Add an element z = 3, then delete x. Print the result.
  2. What does length() return after you delete an element from a list?
  3. Try x$a <- NULL on a list where a exists. Then try x["a"] <- list(NULL). What is the difference?

10.6 Linked lists

A linked list is the simplest recursive data structure: each node holds a value and a pointer to the next node. The chain ends with NULL (the empty list). You can build one in R using nested lists:

cons <- function(head, tail) list(head = head, tail = tail)
car  <- function(lst) lst$head
cdr  <- function(lst) lst$tail

cons constructs a node. car extracts the first element. cdr extracts the rest. These names come from Lisp (1958), where car and cdr referred to hardware registers on the IBM 704.

Build a linked list of three elements:

ll <- cons(1, cons(2, cons(3, NULL)))
str(ll)
#> List of 2
#>  $ head: num 1
#>  $ tail:List of 2
#>   ..$ head: num 2
#>   ..$ tail:List of 2
#>   .. ..$ head: num 3
#>   .. ..$ tail: NULL

To traverse it, recurse until you hit NULL:

ll_to_vector <- function(lst) {
  if (is.null(lst)) return(c())
  c(car(lst), ll_to_vector(cdr(lst)))
}
ll_to_vector(ll)
#> [1] 1 2 3

This structure is a direct translation of Church encoding. In lambda calculus, a pair is λh.λt.λf. f h t: a function that captures two values and waits for a selector. car passes a selector that returns the first value; cdr passes one that returns the second. A linked list is a chain of such pairs, terminated by a special “nil” value. Lisp was built on exactly this encoding, and cons/car/cdr remain the standard names for these operations across functional languages.

Why doesn’t R use linked lists for everyday work? R was designed for column-oriented numerical computing, where you need to pass entire columns to C routines like sum() or BLAS. That requires contiguous memory: a single block of doubles that a C function can walk with a pointer. A linked list scatters its elements across the heap, so you would need to copy them into a contiguous buffer before any vectorized operation could touch them. R’s built-in lists are arrays of pointers (VECSXP), giving O(1) random access and contiguous pointer storage. Linked lists are good for prepending (O(1), just cons a new head) and for recursive processing where you always work with the first element and pass the rest along. Lisp used them because Lisp’s model is recursive symbolic processing, not numerical arrays. R chose arrays because its model is vectorized computation.

Exercises

  1. Using cons, car, and cdr defined above, build the linked list (10, 20, 30) and extract the second element without converting to a vector.
  2. Write a function ll_length that counts the number of nodes in a linked list by recursing through cdr until NULL.
  3. Write a function ll_map that takes a linked list and a function, and returns a new linked list with the function applied to each element. Test it by doubling every element of cons(1, cons(2, cons(3, NULL))).

10.7 Looking ahead

Lists hold anything, including other lists. Data frames are lists where every element is a vector of the same length. That single constraint turns a general-purpose container into a table. Chapter 11 picks up exactly where this chapter leaves off.