32  Building packages

An R package is just a folder with a specific structure. Once you know the structure, you can make one in ten minutes. The hard part is not the packaging but the discipline of writing functions worth packaging.

32.1 Why packages

A package is the unit of shareable, testable, documented R code. Scripts share poorly: they depend on source() calls, hardcoded paths, and the hope that someone has the right packages installed. Packages share well.

Consider what happens when you share a script. You email analysis.R to a colleague. They run it and get an error because they do not have the janitor package installed. They install it, try again, and get a different error because your script calls source("helpers.R") and they do not have that file. They ask you for it. You send it. They put it in the wrong directory. Another error. Two days later, they give up.

Now consider sharing a package. They run pak::pak("username/mypackage"). Everything installs: your functions, your documentation, your dependencies. They type ?my_function and see how to use it. It just works.

Even if you never publish to CRAN, a package gives you:

  • Forced documentation: you have to describe what your function does.
  • Automated testing: you can prove it works.
  • Dependency management: you declare what you need.
  • Namespace control: your names do not collide with other packages’ names.

In Chapter 7, you saw that functions are values. A package is a named collection of functions with metadata. The metadata is what makes the difference between “a folder of scripts” and “something someone else can install and use.”

TipOpinion

If you have written the same function in three scripts, it belongs in a package. Packages are not just for CRAN.

The complete reference for everything in this chapter is R Packages (2nd edition) by Hadley Wickham and Jenny Bryan, freely available at r-pkgs.org. This chapter gives you enough to build your first package. That book gives you enough to build your twentieth.

32.2 The anatomy of a package

The minimum viable package has three components: DESCRIPTION, NAMESPACE, and an R/ directory. A standard layout looks like this:

mypackage/
├── DESCRIPTION       # metadata: name, version, authors, dependencies
├── NAMESPACE         # what you export, what you import (auto-generated)
├── R/                # your functions
├── man/              # documentation (auto-generated by roxygen2)
├── tests/            # test files
├── vignettes/        # long-form documentation
├── data/             # included datasets
├── .Rbuildignore     # files to exclude from the built package
└── LICENSE           # license file

DESCRIPTION is the most important file. It declares the package name, title, version, authors, license, and dependencies. Every field matters: CRAN checks it, install.packages() reads it, and other packages’ Imports reference it. A minimal example:

Package: mypackage
Title: What My Package Does (One Line, Title Case)
Version: 0.1.0
Authors@R: person("Jane", "Doe", email = "jane@example.com",
                   role = c("aut", "cre"))
Description: A longer description of what the package does. This can
    span multiple lines. It should explain the purpose, not list
    functions.
License: MIT + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1

NAMESPACE controls what is visible to users (exports) and what you borrow from other packages (imports). Never edit it by hand; let roxygen2 generate it. Two hooks let you run code when the package loads: .onLoad() runs when the namespace is loaded (even by :: access), and .onAttach() runs when the package is attached with library(). Use .onLoad() for setup that must happen before any function is called (registering S3 methods, setting default options, initializing connections). Use .onAttach() only for startup messages. Both go in R/zzz.R by convention. For full details, see R Packages (2e), chapter 10.

The namespace is an environment (the concept from Section 7.4). NAMESPACE defines what names are visible in that environment. When you call dplyr::filter(), R looks in dplyr’s namespace environment. When you call just filter(), R searches the environment chain, and which filter it finds depends on what is attached. In lambda calculus terms, a package namespace is a closure at the module level: functions inside can reference each other (they share the namespace environment), but the outside world only sees what is explicitly exported. The NAMESPACE file draws the same boundary that a lambda abstraction draws between bound and free variables. ML’s module system (1980s), Haskell’s module system, and OCaml’s functors all solve the same problem: controlling what names are visible and preventing conflicts. R’s approach is simpler but the underlying idea is the same.

32.3 Creating a package

One function call scaffolds everything:

usethis::create_package("path/to/mypackage")

This creates the directory, DESCRIPTION, NAMESPACE, R/, .Rbuildignore, and .gitignore. The package is immediately loadable.

The development loop has four verbs:

devtools::load_all()    # simulate installing the package (Ctrl+Shift+L)
devtools::document()    # regenerate man/ and NAMESPACE from roxygen (Ctrl+Shift+D)
devtools::test()        # run all tests (Ctrl+Shift+T)
devtools::check()       # run R CMD check (Ctrl+Shift+E)

load_all() is the key to fast iteration. It sources all files in R/, makes exports available, and simulates a fresh install without actually installing. You edit a function, hit Ctrl+Shift+L, and test immediately. This tight feedback loop (edit, load, test) is what makes package development fast. Without it, you would have to rebuild and reinstall the package after every change.

The usethis::use_*() functions handle setup tasks:

usethis::use_r("bmi")              # create R/bmi.R
usethis::use_test("bmi")           # create tests/testthat/test-bmi.R
usethis::use_package("dplyr")      # add dplyr to Imports
usethis::use_mit_license()         # add MIT license
usethis::use_readme_rmd()          # add README.Rmd
usethis::use_github_action()       # add CI/CD

Each one does one task correctly. No boilerplate to remember.

TipOpinion

Never create package files by hand. usethis knows the right boilerplate. You focus on the code.

Exercises

  1. Create a package called mymath using usethis::create_package(). Add a file R/square.R with a function square <- function(x) x^2. Load it with devtools::load_all() and test that square(5) returns 25.
  2. Run devtools::check() on your package. How many errors, warnings, and notes do you get? Read the output carefully.

32.4 Writing documentation with roxygen2

Documentation lives above your function as special comments (#'):

#' Calculate body mass index
#'
#' Computes BMI from weight and height using the standard formula.
#'
#' @param weight Weight in kilograms.
#' @param height Height in meters.
#' @return A numeric vector of BMI values.
#' @export
#' @examples
#' bmi(70, 1.75)
bmi <- function(weight, height) {
  weight / height^2
}

Key tags:

  • @param: describe each argument. Type + meaning + constraints.
  • @return: what the function returns.
  • @export: makes the function visible to users. Without it, the function is internal (accessible via ::: but not ::).
  • @examples: runnable code. R CMD check executes these.
  • @seealso, @family: cross-references to related functions.
  • @inheritParams: borrow parameter documentation from another function.

Markdown in roxygen is enabled by adding Roxygen: list(markdown = TRUE) to DESCRIPTION. Then you can use **bold**, *italic*, `code`, and [function()] for cross-links.

devtools::document() converts roxygen comments to .Rd files in man/ and updates NAMESPACE. You write roxygen; you never touch man/ directly.

TipOpinion

Document the “why”, not just the “what”. @param x A numeric vector tells me the type. @param x Body mass in grams; must be positive tells me how to use it.

A common pattern in real packages is documenting multiple related functions on the same help page using @rdname:

#' Arithmetic operations
#'
#' @param x A numeric vector.
#' @return A numeric vector.
#' @name arithmetic
NULL

#' @rdname arithmetic
#' @export
square <- function(x) x^2

#' @rdname arithmetic
#' @export
cube <- function(x) x^3

Both ?square and ?cube now point to the same help page. This is useful when functions are closely related and easier to understand together.

Exercises

  1. Add roxygen documentation to the square() function from the previous exercise. Include @param, @return, @export, and @examples. Run devtools::document() and view the help with ?square.
  2. Create a second function cube() in the same package. Document both using @rdname so they share a help page. Run devtools::document() and verify with ?square.

32.5 Dependencies

Your package will use functions from other packages. How you declare that matters.

Imports: packages your code needs to work. Listed in DESCRIPTION. Use package::function() in your code:

# Good: explicit
clean_data <- function(df) {
  dplyr::filter(df, !is.na(value))
}

Suggests: packages needed only for tests, vignettes, or examples. Not installed automatically when a user installs your package.

Depends: rarely appropriate. Attaches the package when yours is loaded. This means library(mypackage) also runs library(dependency), which pollutes the user’s namespace. Almost always wrong; use Imports instead.

The rule: use Imports and call functions with ::. This makes dependencies explicit and avoids namespace collisions. If typing dplyr::filter() everywhere feels verbose, you can import specific functions into your namespace with the roxygen tag @importFrom dplyr filter. This writes importFrom(dplyr, filter) into NAMESPACE, letting you call filter() without the prefix. Use this sparingly: one or two heavily-used functions per dependency, not every function you call.

# Bad: which filter? stats::filter or dplyr::filter?
filter(data, x > 0)

# Good: unambiguous
dplyr::filter(data, x > 0)

usethis::use_package("dplyr") adds to Imports in DESCRIPTION. usethis::use_package("ggplot2", type = "Suggests") adds to Suggests.

When you use a Suggested package in tests or vignettes, guard the code:

test_that("plotting works", {
  skip_if_not_installed("ggplot2")
  p <- ggplot2::ggplot(data, ggplot2::aes(x, y)) + ggplot2::geom_point()
  expect_s3_class(p, "ggplot")
})

skip_if_not_installed() prevents the test from failing on machines where the Suggested package is not available.

Minimize dependencies. Every dependency is a potential breakage point. If you only need one function from a package, consider whether you can write it yourself.

A useful diagnostic: usethis::use_package() will refuse to add a package that is not installed, which catches typos early. And devtools::check() will warn if you import a package but never use it.

32.6 Testing with testthat

usethis::use_testthat(edition = 3) sets up the test infrastructure: tests/testthat/, tests/testthat.R, and the necessary DESCRIPTION fields.

usethis::use_test("bmi") creates tests/testthat/test-bmi.R. Tests are organized in test_that() blocks:

test_that("bmi computes correctly", {
  expect_equal(bmi(70, 1.75), 70 / 1.75^2)
  expect_length(bmi(c(70, 80), c(1.75, 1.80)), 2)
})

test_that("bmi rejects invalid input", {
  expect_error(bmi("a", 1.75))
})

Key expectations:

  • expect_equal(): compare with tolerance (good for floating-point results).
  • expect_identical(): compare exactly (good for integers, strings, logicals).
  • expect_true(), expect_false(): boolean checks.
  • expect_error(), expect_warning(), expect_message(): check for conditions.
  • expect_length(), expect_named(): structural checks.
  • expect_snapshot(): capture output and compare to a stored reference. Good for complex output that is hard to specify by hand.

devtools::test() runs all tests. devtools::test_active_file() runs just the file you are editing.

How many tests should you write? Enough to cover the normal case, the edge cases, and the error cases. For a function like bmi(), that means: correct output for typical input, correct output for vectorized input, correct behavior for zero or negative values, and an error for non-numeric input. Four or five tests per function is a reasonable starting point.

TipOpinion

Test the contract, not the implementation. Your test should pass even if you rewrite the function body, as long as the inputs and outputs stay the same.

Exercises

  1. Write tests for the square() function: test that square(3) is 9, square(-2) is 4, square(0) is 0, and square(c(1, 2, 3)) returns c(1, 4, 9).
  2. Add a test that checks square() returns a numeric vector. (Hint: expect_type().)

32.7 Vignettes

Vignettes are long-form documentation: tutorials, workflows, explanations. They complement function docs (which are reference) by showing how functions work together.

usethis::use_vignette("getting-started")

This creates a template in vignettes/. R Markdown vignettes (.Rmd) are the standard: code chunks run during package build, and the output is HTML. Quarto vignettes (.qmd) are the newer option, with more features but requiring Quarto as a system dependency.

A good package has at least one vignette: “Getting Started” or “Introduction to mypackage.” It shows the reader how to use the package end-to-end, not just individual functions. The difference between a reference manual and a tutorial is the difference between a dictionary and a story: one lists definitions, the other shows them in context.

A vignette template looks like this:

---
title: "Getting Started with mypackage"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with mypackage}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Installation

Install from GitHub:

```r
pak::pak("username/mypackage")

32.8 Basic usage

Load the package and run a simple example:

library(mypackage)
result <- bmi(70, 1.75)
result

The YAML header is boilerplate. The content is yours. Write it as if the reader has never seen your package before and has five minutes to decide whether to use it.


## R CMD check {#sec-r-cmd-check}

`R CMD check` is the automated quality gate. It runs roughly 50 tests: does the `NAMESPACE` match the exports? Do all examples run without error? Are all arguments documented? Do the tests pass? Is the package installable?


::: {.cell}

```{.r .cell-code}
devtools::check()

:::

Three levels of feedback:

  • ERROR: something is broken. Must fix.
  • WARNING: something is wrong. Must fix for CRAN; should fix regardless.
  • NOTE: something is unusual. Fix if you can; explain if you can’t.

The goal: 0 errors, 0 warnings, 0 notes. Each of those roughly 50 tests exists because someone, somewhere, shipped a broken package. The strictness reflects the cost of breakage on CRAN, where 20,000+ packages depend on each other. When the check says NOTE, it is not being pedantic: it is remembering a bug someone else already found. A clean check means your package is well-formed. A check that passes on your machine but fails on another platform is still a problem, which is why CI (continuous integration) matters (see Section 32.9).

Common issues and how to fix them:

  • “Undocumented arguments”: add @param tags in your roxygen comments.
  • “Undefined global functions or variables”: you are calling a function from another package without ::. Use dplyr::filter() instead of filter(), or add @importFrom dplyr filter to your roxygen.
  • “Non-standard file/directory found”: add the file to .Rbuildignore with usethis::use_build_ignore("filename").
  • “No visible binding for global variable”: common with tidyverse code that uses unquoted column names. Fix with utils::globalVariables() or use .data$column from rlang.

Run check() often during development, not just at the end. Each run should find at most one or two issues. If you wait until the end to run check() for the first time, you may face dozens of notes and warnings at once, which is demoralizing and hard to debug.

Exercises

  1. Introduce a deliberate error in your package: remove the @param tag for one argument. Run devtools::check() and find the NOTE. Fix it and re-check.
  2. Add dplyr::filter to a function without adding dplyr to Imports. Run devtools::check(). What feedback do you get?

32.9 Sharing your package

GitHub: push your package to GitHub. Users install with pak::pak("username/mypackage") or devtools::install_github("username/mypackage"). This is the lowest barrier to sharing.

pkgdown: usethis::use_pkgdown_github_pages() builds a website from your documentation, README, and vignettes. Automatic deployment with GitHub Actions. Professional documentation for free.

CRAN: the official repository. devtools::submit_cran(). Strict review: 0 errors, 0 warnings, ideally 0 notes. Human reviewers check CRAN policies. Worth it for packages with a broad audience.

R-universe (r-universe.dev): automated package hosting. Lower barrier than CRAN, still discoverable. You point it at your GitHub repo, and it builds binaries for all platforms automatically.

Version numbers follow semantic versioning (semver.org): MAJOR.MINOR.PATCH. PATCH for bug fixes, MINOR for new backward-compatible features, MAJOR for breaking changes. The tidyverse follows this strictly: when dplyr goes from 1.0 to 1.1, your code will not break; when it goes to 2.0, check the changelog.

The progression is natural: you build a package for yourself, put it on GitHub so colleagues can use it, add pkgdown so new users can find their way around, and eventually submit to CRAN if the package solves a general problem. Each step adds effort but also reach.

For GitHub-hosted packages, continuous integration is nearly free. usethis::use_github_action("check-standard") adds a GitHub Actions workflow that runs R CMD check on Linux, macOS, and Windows every time you push. If the check fails, GitHub shows a red X next to the commit. If it passes, you get a green checkmark and a badge for your README:

usethis::use_github_action("check-standard")

This catches platform-specific bugs you would never find on your own machine. Common cross-platform issues include path separators (Windows uses \, everything else uses /), case-sensitive file systems (Linux is case-sensitive, macOS and Windows are not), and system library availability.

Exercises

  1. Push a package to GitHub and install it on another machine (or in a fresh R session) using pak::pak(). Does it install cleanly?
TipOpinion

Start with GitHub. Add pkgdown when your package has users. Submit to CRAN when it is stable and general-purpose. CRAN gives your package visibility, credibility, and a guarantee that it installs cleanly across platforms. If your package solves a real problem, aim for CRAN.