31 Connecting to other languages

Call sum() in R. You are calling C. Call qr() and you are calling LAPACK Fortran routines that predate R itself by decades. The entire numerical core of the language, from matrix multiplication to sorting to random number generation, lives in compiled code that R merely orchestrates from above, translating your high-level expressions into fast, low-level operations that crunch numbers the way hardware prefers. R was designed to be a glue language, a thin layer of expressiveness sitting on top of compiled muscle. So what happens when you need muscle of your own?

This chapter covers the practical interfaces: how to call C, C++, Rust, Python, and Fortran from R, with complete working examples for each. Chapter 29 introduced the SEXP type system and memory model; Section 28.6 in Chapter 28 showed when compiled code is worth the trouble. Here we go deeper into each interface, starting with the one that underlies all the others.

31.1 C via `.Call()`

.Call() is R’s native foreign function interface, the bedrock on which every other approach (Rcpp, extendr, all of them) ultimately rests. When Rcpp generates a wrapper or extendr produces a binding, what R actually loads and dispatches is a .Call() entry point. Understanding this interface means understanding the foundation that everything else stands on. Section 29.10 in Chapter 29 introduced the three mechanisms for calling compiled code (.Primitive(), .Internal(), .Call()); here we focus on .Call() with complete examples.

A C function callable from R takes SEXP arguments and returns a SEXP, where SEXP is a pointer to R’s internal object representation (see Chapter 29). Any R objects you allocate inside the function must be shielded from the garbage collector using PROTECT, then released with UNPROTECT before returning. Get the bookkeeping wrong and things go sideways fast.

Here is a complete example: a C function that computes the sum of a numeric vector.

// sum_c.c
#include <R.h>
#include <Rinternals.h>

SEXP sum_c(SEXP x) {
    int n = length(x);
    double *px = REAL(x);
    double total = 0.0;
    for (int i = 0; i < n; i++) {
        total += px[i];
    }
    return ScalarReal(total);
}

Compile and load it from R:

system("R CMD SHLIB sum_c.c")
dyn.load("sum_c.so")  # sum_c.dll on Windows
.Call("sum_c", as.numeric(1:1000))

R CMD SHLIB invokes the system C compiler with the correct flags and include paths, dyn.load() loads the shared library into R’s process, and .Call() dispatches to the function by name.

Notice what happens inside the function. REAL(x) extracts the underlying C double* array from a numeric SEXP; ScalarReal() wraps a C double back into a SEXP. Because this function does not allocate any new R objects, there is nothing to PROTECT. If it did allocate (say, a result vector via allocVector(REALSXP, n)), you would need to PROTECT that allocation and UNPROTECT(1) before returning.

That PROTECT/UNPROTECT protocol is the source of most bugs in hand-written C extensions. Forget a PROTECT and the garbage collector frees your object mid-computation, producing a segfault that may not surface until weeks later in someone else’s R session. Add a PROTECT without a matching UNPROTECT and you overflow the protection stack. The count passed to UNPROTECT must match the number of PROTECT calls exactly, which is precisely the kind of tedious, error-prone bookkeeping that higher-level interfaces were invented to eliminate. But the accessor macros underneath those interfaces are still worth knowing.

Other accessors follow the same pattern: INTEGER(x) for integer vectors, LOGICAL(x) for logical vectors, STRING_ELT(x, i) for string vectors (which are arrays of CHARSXP pointers, not C strings), VECTOR_ELT(x, i) for list elements.

Opinion

Writing raw C against R’s API is rarely the right choice for new code. The reason to learn it is to read existing code: base R, data.table, and hundreds of CRAN packages use .Call() directly, and knowing the interface makes their source legible in a way that no amount of documentation can substitute for.

Exercises

Write a C function that takes an integer vector and returns its maximum value. Compile it with R CMD SHLIB, load it, and test it from R. Use INTEGER() instead of REAL().
Modify the sum_c function to return a length-1 numeric vector allocated with allocVector(REALSXP, 1) instead of using ScalarReal(). You will need PROTECT and UNPROTECT. Verify it produces the same result.

31.2 C++ via Rcpp

The PROTECT/UNPROTECT dance is exactly the kind of ceremony that makes writing raw C against R’s API feel like defusing a bomb. What if something else handled the wiring? Rcpp wraps R’s C API in C++ classes that manage type conversion and memory automatically: no protection macros, no SEXP arithmetic, no accessor juggling. You write ordinary C++ and Rcpp translates it into the .Call() machinery underneath.

The same sum function, rewritten:

// sum_rcpp.cpp
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
double sum_rcpp(NumericVector x) {
    int n = x.size();
    double total = 0.0;
    for (int i = 0; i < n; i++) {
        total += x[i];
    }
    return total;
}

The // [[Rcpp::export]] attribute tells Rcpp::sourceCpp() to generate the .Call() wrapper automatically. From R:

Rcpp::sourceCpp("sum_rcpp.cpp")
sum_rcpp(as.numeric(1:1000))

One function call compiles, links, loads, and registers the function; the turnaround from edit to test is seconds. The convenience extends beyond compilation, too, because Rcpp handles type conversion on both sides of the boundary.

Rcpp provides wrapper classes for all common R types: NumericVector, IntegerVector, CharacterVector, LogicalVector, List, DataFrame, NumericMatrix. These proxy the underlying SEXP without copying, so you pay no conversion cost on the way in, and return values are converted back to R objects automatically.

Sugar expressions are Rcpp’s vectorized operations, mirroring R’s vectorized functions in C++:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
NumericVector abs_diff(NumericVector x, NumericVector y) {
    return abs(x - y);  // sugar: vectorized, no explicit loop
}

// [[Rcpp::export]]
LogicalVector above_threshold(NumericVector x, double threshold) {
    return x > threshold;  // sugar: vectorized comparison
}

Sugar covers abs, sum, mean, min, max, ifelse, which, any, all, pow, sqrt, and many more. When sugar is sufficient, your C++ code looks almost identical to R but runs at compiled speed.

RcppArmadillo adds the Armadillo linear algebra library:

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

// [[Rcpp::export]]
arma::vec solve_system(arma::mat A, arma::vec b) {
    return arma::solve(A, b);
}

Armadillo provides matrix decompositions (QR, SVD, Cholesky, eigenvalues), sparse matrix support, and an expression template engine that fuses operations to avoid temporaries. If your bottleneck is linear algebra beyond what R’s BLAS provides, RcppArmadillo is the natural tool.

When Rcpp shines: tight loops over vector elements, element-wise operations that resist vectorization in R, recursive algorithms like tree traversal or dynamic programming, and any situation demanding fine-grained control over iteration. Where Rcpp cannot help is I/O-bound code or code that already calls optimized C internally. You cannot make sum() faster by rewriting it in Rcpp, because sum() is already C. So what about a language that offers the same low-level performance but catches your memory bugs before they happen?

Exercises

Write an Rcpp function that computes the running maximum of a numeric vector (each element is the max of all elements up to that index). Compare its speed with cummax() from base R using bench::mark().
Write an Rcpp function that takes a numeric vector and returns the indices of all values greater than the mean. Use a loop, not sugar. Then rewrite it using sugar (which(x > mean(x))). Which version is faster?
Using RcppArmadillo, write a function that computes the ordinary least squares coefficients (X^T X)^{-1} X^T y for a matrix X and vector y. Compare with R’s lm.fit().

31.3 Rust via extendr

C and C++ give you PROTECT/UNPROTECT and hope you get the count right. Rust takes a different approach entirely: its type system prevents memory errors at compile time, enforcing ownership rules where every value has exactly one owner, references carry explicit lifetimes, and the compiler flatly rejects code that could produce dangling pointers or use-after-free bugs. No garbage collector needed, no protection stack to manage, no segfaults lurking in code that “usually works.”

The extendr crate bridges Rust and R, and the R package rextendr provides the interactive workflow:

rextendr::rust_function("
fn sum_rust(x: &[f64]) -> f64 {
    x.iter().sum()
}
")

sum_rust(as.numeric(1:1000))

rust_function() compiles a single Rust function and loads it into R, similar to Rcpp::cppFunction(). For larger projects, rust_source() compiles an entire Rust file.

A slightly more involved example: computing the nth Fibonacci number iteratively.

rextendr::rust_function("
fn fib(n: i32) -> i32 {
    if n <= 1 { return n; }
    let mut a = 0i32;
    let mut b = 1i32;
    for _ in 2..=n {
        let tmp = a + b;
        a = b;
        b = tmp;
    }
    b
}
")

fib(10)
#> [1] 55

No PROTECT, no UNPROTECT, no SEXP. The type conversion between R and Rust is handled by extendr’s #[extendr] macro (used in source files) or inferred by rust_function().

Rust’s ownership model says every value has exactly one owner; when ownership transfers, the old name becomes invalid. R’s copy-on-modify semantics (Section 28.3) say an object is shared until someone modifies it, at which point R copies it so the original is preserved. The two mechanisms look different — Rust enforces ownership at compile time; R defers to runtime copying — but they guarantee the same thing: a value will not change while someone else is looking at it. That guarantee is why Rust backends like the Polars data frame library can hand data to R through extendr without defensive copying.

But does the safety justify the ecosystem trade-off?

Rcpp vs extendr: Rcpp has over 2,600 reverse dependencies on CRAN and has been battle-tested for 15 years; its documentation, Stack Overflow coverage, and library of examples are unmatched in the R world. extendr is younger, with a smaller ecosystem and fewer production examples. The trade-off is safety. Rcpp inherits C++’s memory model, where segfaults and undefined behavior remain possible if you make mistakes. Rust eliminates those entire categories of bug at compile time, which matters more the larger and more complex the compiled codebase becomes.

For package development, rextendr::use_extendr() sets up the directory structure and build configuration. The polars package (R bindings to the Polars data frame library) and gifski (GIF encoding) are production examples of extendr-based packages on CRAN.

Opinion

If you are starting a new package today and the compiled code is non-trivial (more than a few hundred lines), Rust is worth serious consideration. The upfront cost of learning ownership semantics pays for itself in bugs you never have to debug. For quick one-off functions or small performance patches, Rcpp’s lower friction and larger community still win.

Exercises

Install rextendr and write a Rust function that counts the number of values in a numeric vector that exceed a given threshold. Call it from R and verify the result.
Compare the compile time of Rcpp::cppFunction() and rextendr::rust_function() for equivalent simple functions. Which has faster turnaround?

31.4 Python via reticulate

Calling Python from R is not about speed. Python’s interpreter is slower than R’s for numerical work. The reason to call Python is access: scikit-learn, PyTorch, TensorFlow, Hugging Face Transformers, spaCy, and hundreds of other libraries either have no R equivalent or their R equivalents lag behind by months or years. When a new deep learning architecture drops, the Python implementation comes first. Always.

The reticulate package embeds a Python interpreter inside R’s process, with no inter-process communication overhead; R and Python share the same memory space.

Importing modules:

library(reticulate)

np <- import("numpy")
pd <- import("pandas")

x <- np$array(c(1, 2, 3, 4, 5))
np$mean(x)
#> [1] 3

The $ operator accesses Python attributes and methods. R vectors convert to NumPy arrays automatically, and NumPy arrays convert back to R vectors.

Sourcing Python scripts:

# helpers.py contains:
# def normalize(x):
#     return (x - x.mean()) / x.std()

source_python("helpers.py")
normalize(c(1, 2, 3, 4, 5))

source_python() executes a Python file and makes its top-level functions available in R’s global environment as regular R functions.

Type conversion rules: R numeric vectors become NumPy arrays, R data frames become Pandas DataFrames, R lists become Python dicts, R TRUE/FALSE become Python True/False, and R NULL becomes Python None. These conversions happen automatically in most cases, and for large arrays the conversion is zero-copy when the memory layout is compatible (both R and NumPy store doubles as contiguous 64-bit IEEE 754 values).

Calling scikit-learn from R:

sklearn <- import("sklearn")
linear_model <- import("sklearn.linear_model")

X <- matrix(rnorm(200), ncol = 2)
y <- X[, 1] * 3 + X[, 2] * -1 + rnorm(100, sd = 0.5)

model <- linear_model$LinearRegression()
model$fit(X, y)
model$coef_
#> [1]  2.98 -1.02

The model object lives in Python, but you interact with it from R using $. Predictions, coefficients, scores: all accessible through the same operator.

Managing Python environments: reticulate can use system Python, virtualenvs, or conda environments. Specify which Python to use before loading reticulate:

Sys.setenv(RETICULATE_PYTHON = "/usr/bin/python3")
# or
reticulate::use_virtualenv("myproject")
# or
reticulate::use_condaenv("myenv")

Call this before import(). Once the Python interpreter starts, you cannot switch to a different one within the same R session.

Exercises

Use reticulate to import Python’s collections module and call Counter on a character vector. Verify the result matches R’s table().
Create a NumPy array of 1 million random values with np$random$standard_normal(), then pass it to an R function (e.g., mean()). Does reticulate copy the data or share it? Use bench::mark() with varying sizes to find out.

31.5 Fortran

Every linear model you have ever fitted in R ultimately ran Fortran code. Every call to qr(), svd(), chol(), and matrix multiplication (%*%) dispatches to BLAS and LAPACK routines written in a language that predates C by over a decade, and those routines have been optimized by numerical analysts for longer than most programming languages have existed. Fortran is also the oldest language in R’s foreign function toolkit. Understanding its interface explains something that puzzles people who benchmark R against “faster” languages: R’s numerical performance is competitive precisely because R does not do the numerical work itself.

The .Fortran() interface passes R vectors to Fortran subroutines by copying them in and out:

! dot_product.f90
subroutine dotprod(x, y, n, result)
    implicit none
    integer, intent(in) :: n
    double precision, intent(in) :: x(n), y(n)
    double precision, intent(out) :: result
    integer :: i
    result = 0.0d0
    do i = 1, n
        result = result + x(i) * y(i)
    end do
end subroutine

system("R CMD SHLIB dot_product.f90")
dyn.load("dot_product.so")

result <- .Fortran("dotprod",
                    x = as.double(1:5),
                    y = as.double(6:10),
                    n = 5L,
                    result = double(1))
result$result
#> [1] 130

.Fortran() returns a named list with all arguments, including outputs. This copy-in-copy-out semantics is straightforward but wasteful for large data; the newer .Call() interface with C wrappers around Fortran code avoids the copies.

You will rarely write new Fortran for R. The interface matters for two reasons: reading legacy code (many statistical packages on CRAN have Fortran backends dating to the 1990s), and understanding why R’s numerical performance is strong despite its interpreted overhead. When someone says “R is slow,” they are talking about R’s interpreter loop, not the compiled Fortran that does the actual linear algebra. But with four modern interfaces available (C, C++, Rust, Python) plus Fortran, which one should you actually reach for?

31.6 When to use what

The answer depends on why you need another language in the first place.

You need speed in a tight loop: profile first (Section 28.1). If the bottleneck is a loop that cannot be vectorized, use Rcpp (largest ecosystem, fastest iteration cycle) or Rust via extendr (memory safety, better for large codebases). Raw C via .Call() if you want zero dependencies.

You need a library that only exists in Python: use reticulate. This includes deep learning (PyTorch, TensorFlow), NLP (spaCy, Transformers), and computer vision. Do not rewrite Python libraries in R; call them.

You need numerical linear algebra beyond base R: RcppArmadillo for dense matrices, RcppEigen for sparse matrices. Or call LAPACK directly via .Fortran() if you need a specific routine.

You are building a package with substantial compiled code: consider Rust if the team knows it, or is willing to learn. The compile-time safety checks catch bugs that would otherwise surface as sporadic segfaults in users’ R sessions. For smaller amounts of compiled code, Rcpp is fine.

You have legacy Fortran code: wrap it with .Fortran() or write a thin C wrapper and use .Call().

The ordering from Section 28.7 still applies: pure R first, vectorize, pre-allocate, switch engines (data.table, Arrow, DuckDB), then compiled code. Calling another language is a cost, one that adds build dependencies, complicates installation, and makes debugging harder. Pay that cost only when the benefit is clear.

Opinion

R was designed from the start to sit on top of compiled code. The foreign function interface is the architecture, not a feature bolted on later.

31.7 Packages worth studying

Real-world packages show how these interfaces work at scale, and each of the following is open source; reading their src/ directory teaches more than any tutorial can.

C backends:

data.table: the core grouping, joining, and sorting engine is C. The src/ directory is a masterclass in writing high-performance C against R’s API. fread() and fwrite() are C implementations of CSV reading and writing that outperform most alternatives.
stringi: wraps the ICU (International Components for Unicode) C library. Provides fast, correct string operations covering Unicode normalization, collation, regex, and transliteration.

C++ backends:

arrow: R bindings to the Apache Arrow C++ library. Columnar in-memory format for zero-copy data exchange between systems.
torch: R bindings to LibTorch (PyTorch’s C++ backend). No Python dependency; the C++ library is called directly via .Call() with custom binding code.
dplyr: the core verbs dispatch to C++ for grouped operations via vctrs and internal C code.

Rust backends:

polars: R bindings to the Polars DataFrame library, built with extendr. A complete data manipulation engine written in Rust, exposed to R through generated .Call() wrappers.
gifski: GIF encoding library, also via extendr. A small, clean example of a Rust-powered R package.

Python bridges:

tensorflow and keras3: use reticulate to call TensorFlow/Keras. The R API mirrors the Python API closely.
spacyr: wraps spaCy for natural language processing.

The pattern across all of these is the same: R provides the user-facing API (function names, argument handling, documentation, S3/S4 dispatch), while the compiled backend provides the computation. The interface layer is thin. A high-level language for expressiveness orchestrating a low-level language for speed: that is the architecture R was built for.

In Chapter 1, this book described two traditions: Church’s lambda calculus (expressions and functions) and Turing’s machine (instructions and state). R descends from Church, but it calls into Turing’s world every time it invokes C, Fortran, or Rust. The foreign function interfaces in this chapter are the bridge between those two models. You write the logic in R — composing functions, passing closures, piping transformations — and the performance-critical inner loops run in a language that manipulates memory directly. Church’s model gives you the expressiveness and composability to think clearly about what the code should do. Turing’s model gives you the control over memory layout and instruction scheduling to make it fast. R’s foreign function interfaces sit at the boundary, translating between a world of expressions and a world of addresses. The .Call() interface is where Church’s abstractions meet Turing’s machine, and the fact that R was designed with that boundary in mind is what makes the translation practical rather than painful.

What happens when you apply the same principle not to a single function but to an entire project?

31.8 References and sources

C (the foundation):

R Core Team, Writing R Extensions, chapter 5 (“System and foreign language interfaces”). The official guide to .C(), .Call(), .External().
.Call() is the modern interface: pass SEXPs, return SEXPs, full control. .C() is the old interface (copies data, limited types). .External() is rarely used.
R_RegisterCCallable() / R_GetCCallable(): sharing C functions between packages without linking.

C++ via Rcpp:

Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp (2013). The standard reference.
Dirk Eddelbuettel & Romain Francois, “Rcpp: Seamless R and C++ Integration” (2011, JSS).
Hadley Wickham, Advanced R (2e), chapter 25.

Rust via extendr:

extendr project (extendr.github.io). Rust extensions for R, inspired by PyO3.
The rextendr package vignettes cover both interactive use and package integration.

Python via reticulate:

Kevin Ushey et al., “reticulate: Interface to Python” (CRAN). Full documentation at rstudio.github.io/reticulate.

Fortran:

R Core Team, Writing R Extensions, section 5.2. The .Fortran() interface.
LAPACK Users’ Guide (netlib.org). The linear algebra library that R calls internally.

31.1 C via .Call()

Exercises

31.2 C++ via Rcpp

Exercises

31.3 Rust via extendr

Exercises

31.4 Python via reticulate

Exercises

31.5 Fortran

31.6 When to use what

31.7 Packages worth studying

31.8 References and sources

31.1 C via `.Call()`