32 Building packages

You email analysis.R to a colleague. They run it and get an error because they don’t have the janitor package. They install it, try again, and hit a different wall: your script calls source("helpers.R"), a file they don’t have. You send it. They put it in the wrong directory. Two days later, they give up. The problem isn’t your code; it’s that scripts have no way to declare what they need, no way to carry their own context, no mechanism for self-description. Packages solve all three problems at once, and once you see the structure, you can build one in ten minutes.

32.1 Why packages

A package is the unit of shareable, testable, documented R code. Scripts depend on source() calls, hardcoded paths, and the hope that the right libraries happen to be installed; packages carry their dependencies, their documentation, and their tests as a single installable unit that either works or tells you exactly why it doesn’t.

Compare the email scenario with what happens when you share a package. Your colleague runs pak::pak("username/mypackage"), and everything installs: functions, documentation, dependencies. They type ?my_function and see how to use it. No back-and-forth. No guessing.

Even if you never publish to CRAN, a package gives you:

Forced documentation: you have to describe what your function does.
Automated testing: you can prove it works.
Dependency management: you declare what you need.
Namespace control: your names do not collide with other packages’ names.

In Chapter 7, you saw that functions are values. A package is a named collection of functions with metadata, and that metadata is what separates “a folder of scripts” from “something someone else can install and use.” The metadata has a specific shape, which raises an immediate question: what does it look like?

Opinion

If you have written the same function in three scripts, it belongs in a package. Packages are not just for CRAN.

The complete reference for everything in this chapter is R Packages (2nd edition) by Hadley Wickham and Jenny Bryan, freely available at r-pkgs.org. This chapter gives you enough to build your first package. That book gives you enough to build your twentieth.

32.2 The anatomy of a package

The minimum viable package has three components: DESCRIPTION, NAMESPACE, and an R/ directory. A package is a namespace of functions: the R/ directory holds function definitions, NAMESPACE controls which ones are visible, and DESCRIPTION records metadata. A standard layout looks like this:

mypackage/
├── DESCRIPTION       # metadata: name, version, authors, dependencies
├── NAMESPACE         # what you export, what you import (auto-generated)
├── R/                # your functions
├── man/              # documentation (auto-generated by roxygen2)
├── tests/            # test files
├── vignettes/        # long-form documentation
├── data/             # included datasets
├── .Rbuildignore     # files to exclude from the built package
└── LICENSE           # license file

DESCRIPTION is the most important file. It declares the package name, title, version, authors, license, and dependencies, and every field matters because CRAN checks it, install.packages() reads it, and other packages’ Imports reference it. A minimal example:

Package: mypackage
Title: What My Package Does (One Line, Title Case)
Version: 0.1.0
Authors@R: person("Jane", "Doe", email = "jane@example.com",
                   role = c("aut", "cre"))
Description: A longer description of what the package does. This can
    span multiple lines. It should explain the purpose, not list
    functions.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1

NAMESPACE controls what is visible to users (exports) and what you borrow from other packages (imports). Never edit it by hand; let roxygen2 generate it. Two hooks let you run code when the package loads: .onLoad() fires when the namespace is loaded (even by :: access), while .onAttach() fires only when someone calls library(). Use .onLoad() for setup that must happen before any function runs (registering S3 methods, setting default options, initializing connections) and .onAttach() only for startup messages. Both go in R/zzz.R by convention. For full details, see R Packages (2e), chapter 10.

The namespace is an environment (the concept from Section 7.4), and NAMESPACE defines what names are visible in that environment. When you call dplyr::filter(), R looks in dplyr’s namespace environment; when you call just filter(), R searches the environment chain, and which filter it finds depends on what is attached. In lambda calculus terms, a package namespace is a closure at the module level: functions inside can reference each other because they share the namespace environment, but the outside world only sees what is explicitly exported. The NAMESPACE file marks the same boundary that a lambda abstraction marks between bound and free variables.

32.3 Creating a package

One function call scaffolds everything:

usethis::create_package("path/to/mypackage")

This creates the directory, DESCRIPTION, NAMESPACE, R/, .Rbuildignore, and .gitignore. The package is immediately loadable, which means you can start writing functions right away.

The development loop has four verbs:

devtools::load_all()    # simulate installing the package (Ctrl+Shift+L)
devtools::document()    # regenerate man/ and NAMESPACE from roxygen (Ctrl+Shift+D)
devtools::test()        # run all tests (Ctrl+Shift+T)
devtools::check()       # run R CMD check (Ctrl+Shift+E)

load_all() is the key to fast iteration. It sources all files in R/, makes exports available, and simulates a fresh install without actually installing, so the cycle becomes: edit a function, hit Ctrl+Shift+L, test immediately, repeat. An installed package is immutable: once built, its functions cannot be edited in place. load_all() gives you a mutable development loop on top of that immutable target, so you iterate freely and then freeze the result when you install. Without it, you’d have to rebuild and reinstall the entire package after every change, which is the kind of friction that kills momentum.

The usethis::use_*() functions handle setup tasks:

usethis::use_r("bmi")              # create R/bmi.R
usethis::use_test("bmi")           # create tests/testthat/test-bmi.R
usethis::use_package("dplyr")      # add dplyr to Imports
usethis::use_mit_license()         # add MIT license
usethis::use_readme_rmd()          # add README.Rmd
usethis::use_github_action()       # add CI/CD

Each one does one task correctly. No boilerplate to remember. But the functions you write remain invisible to users until you document them, and documentation has its own conventions.

Opinion

Never create package files by hand. usethis knows the right boilerplate. You focus on the code.

Exercises

Create a package called mymath using usethis::create_package(). Add a file R/square.R with a function square <- function(x) x^2. Load it with devtools::load_all() and test that square(5) returns 25.
Run devtools::check() on your package. How many errors, warnings, and notes do you get? Read the output carefully.

32.4 Writing documentation with roxygen2

Pure functions are the easiest to document: their contract is fully specified by inputs and outputs, so @param and @return capture everything a caller needs to know. Functions with side effects need additional prose explaining what state they modify and when.

Documentation lives above your function as special comments (#'):

#' Calculate body mass index
#'
#' Computes BMI from weight and height using the standard formula.
#'
#' @param weight Weight in kilograms.
#' @param height Height in meters.
#' @return A numeric vector of BMI values.
#' @export
#' @examples
#' bmi(70, 1.75)
bmi <- function(weight, height) {
  weight / height^2
}

Key tags:

@param: describe each argument. Type + meaning + constraints.
@return: what the function returns.
@export: makes the function visible to users. Without it, the function is internal (accessible via the triple-colon operator but not ::).
@examples: runnable code. R CMD check executes these.
@seealso, @family: cross-references to related functions.
@inheritParams: borrow parameter documentation from another function.

Markdown in roxygen is enabled by adding Roxygen: list(markdown = TRUE) to DESCRIPTION, which lets you use **bold**, *italic*, `code`, and [function()] for cross-links.

devtools::document() converts roxygen comments to .Rd files in man/ and updates NAMESPACE. You write roxygen; you never touch man/ directly. Good documentation earns users, but even the best-documented function needs other packages to work with.

Opinion

Document the “why”, not just the “what”. @param x A numeric vector tells me the type. @param x Body mass in grams; must be positive tells me how to use it.

A common pattern in real packages is documenting multiple related functions on the same help page using @rdname:

#' Arithmetic operations
#'
#' @param x A numeric vector.
#' @return A numeric vector.
#' @name arithmetic
NULL

#' @rdname arithmetic
#' @export
square <- function(x) x^2

#' @rdname arithmetic
#' @export
cube <- function(x) x^3

Both ?square and ?cube now point to the same help page, which is useful when functions are closely related and easier to understand side by side. But how do you manage the other packages your functions depend on?

Exercises

Add roxygen documentation to the square() function from the previous exercise. Include @param, @return, @export, and @examples. Run devtools::document() and view the help with ?square.
Create a second function cube() in the same package. Document both using @rdname so they share a help page. Run devtools::document() and verify with ?square.

32.5 Dependencies

Your package will use functions from other packages. How you declare that relationship determines whether your package installs cleanly or collapses into a tangle of missing symbols.

Imports: packages your code needs to work. Listed in DESCRIPTION. Use package::function() in your code:

# Good: explicit
clean_data <- function(df) {
  dplyr::filter(df, !is.na(value))
}

Suggests: packages needed only for tests, vignettes, or examples. Not installed automatically when a user installs your package.

Depends: rarely appropriate. Attaches the package when yours is loaded, meaning library(mypackage) also runs library(dependency), which pollutes the user’s namespace. Almost always wrong; use Imports instead.

The rule is simple: use Imports and call functions with ::. This makes dependencies explicit and avoids namespace collisions. If typing dplyr::filter() everywhere feels verbose, you can import specific functions into your namespace with the roxygen tag @importFrom dplyr filter, which writes importFrom(dplyr, filter) into NAMESPACE and lets you call filter() without the prefix. Use this sparingly: one or two heavily-used functions per dependency, not every function you call.

# Bad: which filter? stats::filter or dplyr::filter?
filter(data, x > 0)

# Good: unambiguous
dplyr::filter(data, x > 0)

usethis::use_package("dplyr") adds the package to Imports in DESCRIPTION. The variant usethis::use_package("ggplot2", type = "Suggests") adds it to Suggests instead.

When you use a Suggested package in tests or vignettes, guard the code:

test_that("plotting works", {
  skip_if_not_installed("ggplot2")
  p <- ggplot2::ggplot(data, ggplot2::aes(x, y)) + ggplot2::geom_point()
  expect_s3_class(p, "ggplot")
})

skip_if_not_installed() prevents the test from failing on machines where the Suggested package is not available.

Minimize dependencies. Every dependency is a potential breakage point, and if you only need one function from a package, consider whether you can write it yourself. A useful diagnostic: usethis::use_package() will refuse to add a package that is not installed, catching typos early, and devtools::check() will warn if you import a package but never use it. Dependencies declare what your package needs; tests prove that what you built actually works.

32.6 Testing with testthat

usethis::use_testthat(edition = 3) sets up the test infrastructure: tests/testthat/, tests/testthat.R, and the necessary DESCRIPTION fields.

usethis::use_test("bmi") creates tests/testthat/test-bmi.R. Tests are organized in test_that() blocks:

test_that("bmi computes correctly", {
  expect_equal(bmi(70, 1.75), 70 / 1.75^2)
  expect_length(bmi(c(70, 80), c(1.75, 1.80)), 2)
})

test_that("bmi rejects invalid input", {
  expect_error(bmi("a", 1.75))
})

Key expectations:

expect_equal(): compare with tolerance (good for floating-point results).
expect_identical(): compare exactly (good for integers, strings, logicals).
expect_true(), expect_false(): boolean checks.
expect_error(), expect_warning(), expect_message(): check for conditions.
expect_length(), expect_named(): structural checks.
expect_snapshot(): capture output and compare to a stored reference. Good for complex output that is hard to specify by hand.

devtools::test() runs all tests. devtools::test_active_file() runs just the file you are editing.

Pure functions are also the easiest to test: same inputs always produce the same output, so each expect_equal() call is a complete specification of behavior with no setup or teardown required.

How many tests should you write? Enough to cover the normal case, the edge cases, and the error cases. For a function like bmi(), that means correct output for typical input, correct output for vectorized input, correct behavior for zero or negative values, and an error for non-numeric input. Four or five tests per function is a reasonable starting point, and writing them forces you to think about your function’s contract more carefully than any amount of staring at the implementation would.

Opinion

Test the contract, not the implementation. Your test should pass even if you rewrite the function body, as long as the inputs and outputs stay the same.

Exercises

Write tests for the square() function: test that square(3) is 9, square(-2) is 4, square(0) is 0, and square(c(1, 2, 3)) returns c(1, 4, 9).
Add a test that checks square() returns a numeric vector. (Hint: expect_type().)

32.7 Vignettes

Function documentation is reference: you look up a specific function when you already know what you’re looking for. Vignettes are the opposite. They show a newcomer how the pieces fit together, walking through a workflow from start to finish, and the difference between the two is the difference between a dictionary and a story.

usethis::use_vignette("getting-started")

This creates a template in vignettes/. R Markdown vignettes (.Rmd) are the standard: code chunks run during package build, and the output is HTML. Quarto vignettes (.qmd) are the newer option, with more features but requiring Quarto as a system dependency.

A good package has at least one vignette: “Getting Started” or “Introduction to mypackage.” Write it as if the reader has never seen your package before and has five minutes to decide whether to keep reading.

A vignette template looks like this:

---
title: "Getting Started with mypackage"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with mypackage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Installation

Install from GitHub:

```r
pak::pak("username/mypackage")
```

## Basic usage

Load the package and run a simple example:

```r
library(mypackage)
result <- bmi(70, 1.75)
result
```

The YAML header is boilerplate. The content is yours. Documentation and vignettes make a package usable, but what makes it reliable is the automated quality gate that catches mistakes you’d never notice yourself.

32.8 R CMD check

R CMD check runs roughly 50 tests against your package: does the NAMESPACE match the exports? Do all examples run without error? Are all arguments documented? Do the tests pass? Is the package installable?

devtools::check()

Three levels of feedback:

ERROR: something is broken. Must fix.
WARNING: something is wrong. Must fix for CRAN; should fix regardless.
NOTE: something is unusual. Fix if you can; explain if you can’t.

The goal: 0 errors, 0 warnings, 0 notes. Each of those roughly 50 tests exists because someone, somewhere, shipped a broken package, and the strictness reflects the cost of breakage on CRAN, where 20,000+ packages depend on each other. Every NOTE exists because someone shipped that exact mistake to CRAN and broke something downstream.

Common issues and how to fix them:

“Undocumented arguments”: add @param tags in your roxygen comments.
“Undefined global functions or variables”: you are calling a function from another package without ::. Use dplyr::filter() instead of filter(), or add @importFrom dplyr filter to your roxygen.
“Non-standard file/directory found”: add the file to .Rbuildignore with usethis::use_build_ignore("filename").
“No visible binding for global variable”: common with tidyverse code that uses unquoted column names. Fix with utils::globalVariables() or use .data$column from rlang.

Run check() often during development, not just at the end. Each run should find at most one or two issues; if you wait until the end, you face dozens of notes at once, which is demoralizing and nearly impossible to debug systematically. A clean check means your package is well-formed, but a check that passes on your machine and fails on Linux is still a problem, which is why the next question matters: how do you get your package into other people’s hands, and how do you make sure it works on their machines too?

Exercises

Introduce a deliberate error in your package: remove the @param tag for one argument. Run devtools::check() and find the NOTE. Fix it and re-check.
Add dplyr::filter to a function without adding dplyr to Imports. Run devtools::check(). What feedback do you get?

32.9 Sharing your package

GitHub: push your package to GitHub. Users install it with one line — either pak::pak("user/mypackage") or devtools::install_github("user/mypackage"). Lowest barrier to sharing.

pkgdown: usethis::use_pkgdown_github_pages() builds a website from your documentation, README, and vignettes, with automatic deployment through GitHub Actions. Professional documentation for free.

CRAN: the official repository. devtools::submit_cran(). Strict review: 0 errors, 0 warnings, ideally 0 notes, plus human reviewers who check CRAN policies. Worth the effort for packages with a broad audience.

R-universe (r-universe.dev): automated package hosting with a lower barrier than CRAN but still discoverable. Point it at your GitHub repo, and it builds binaries for all platforms automatically.

Version numbers follow semantic versioning (semver.org): MAJOR.MINOR.PATCH. PATCH for bug fixes, MINOR for new backward-compatible features, MAJOR for breaking changes. The tidyverse follows this strictly: when dplyr goes from 1.0 to 1.1, your code will not break; when it goes to 2.0, check the changelog.

The progression is natural. You build a package for yourself, put it on GitHub so colleagues can use it, add pkgdown so new users can find their way around, and eventually submit to CRAN if the package solves a general enough problem. Each step adds effort but also reach.

For GitHub-hosted packages, continuous integration is nearly free:

usethis::use_github_action("check-standard")

This adds a GitHub Actions workflow that runs R CMD check on Linux, macOS, and Windows every time you push. If the check fails, GitHub shows a red X next to the commit; if it passes, you get a green checkmark and a badge for your README. The real value is catching platform-specific bugs you would never find on your own machine: path separators (Windows uses \, everything else uses /), case-sensitive file systems (Linux is case-sensitive, macOS and Windows are not), and system library availability.

A package with CI, documentation, and a one-line install command turns a two-day email chain into a single pak::pak() call.

A closure captures variables in its enclosing environment and exposes only the returned function (Chapter 18). A package does the same thing at a larger scale: internal functions, data, and dependencies live in the namespace; only the functions listed in NAMESPACE are visible outside. devtools, usethis, and roxygen automate the mechanics.

Exercises

Push a package to GitHub and install it on another machine (or in a fresh R session) using pak::pak(). Does it install cleanly?

Opinion

Start with GitHub. Add pkgdown when your package has users. Submit to CRAN when it is stable and general-purpose. CRAN gives your package visibility, credibility, and a guarantee that it installs cleanly across platforms. If your package solves a real problem, aim for CRAN.