// sum_c.c
#include <R.h>
#include <Rinternals.h>
SEXP sum_c(SEXP x) {
int n = length(x);
double *px = REAL(x);
double total = 0.0;
for (int i = 0; i < n; i++) {
total += px[i];
}
return ScalarReal(total);
}31 Connecting to other languages
R is a glue language. It was built to sit on top of compiled code, orchestrating fast libraries from a high-level interface. The entire numerical core of R (matrix multiplication, sorting, random number generation) is written in C and Fortran. When you call sum(), you are calling C. When you call qr(), you are calling LAPACK Fortran routines. The R layer provides expressiveness; the compiled layer provides speed.
This chapter covers the practical side: how to call C, C++, Rust, Python, and Fortran from R. Chapter 29 introduced the SEXP type system and memory model. Section 28.6 in Chapter 28 showed when compiled code is worth the trouble. Here we go deeper into each interface, with complete working examples.
31.1 C via .Call()
.Call() is R’s native foreign function interface. Every other approach (Rcpp, extendr) eventually produces code that R loads the same way. Understanding .Call() means understanding the foundation. Section 29.9 in Chapter 29 introduced the three mechanisms for calling compiled code (.Primitive(), .Internal(), .Call()); here we go deeper into .Call() with complete examples.
A C function callable from R takes SEXP arguments and returns a SEXP. SEXP is a pointer to R’s internal object representation (see Chapter 29). You must protect any R objects you allocate from the garbage collector using PROTECT, and release them with UNPROTECT before returning.
Here is a complete example: a C function that computes the sum of a numeric vector.
Compile and load it from R:
system("R CMD SHLIB sum_c.c")
dyn.load("sum_c.so") # sum_c.dll on Windows
.Call("sum_c", as.numeric(1:1000))R CMD SHLIB invokes the system C compiler with the correct flags and include paths. dyn.load() loads the shared library into R’s process. .Call() dispatches to the function by name.
A few things to notice. REAL(x) extracts the underlying C double* array from a numeric SEXP. ScalarReal() wraps a C double back into a SEXP. This function does not allocate any new R objects, so there is nothing to PROTECT. If it did (say, allocating a result vector with allocVector(REALSXP, n)), you would need to PROTECT that allocation and UNPROTECT(1) before returning.
The PROTECT/UNPROTECT protocol is the source of most bugs in hand-written C extensions. Forget a PROTECT and the garbage collector frees your object mid-computation, causing a segfault. Add a PROTECT without a matching UNPROTECT and you overflow the protection stack. The count passed to UNPROTECT must match the number of PROTECT calls exactly. This bookkeeping is tedious and error-prone, which is precisely why higher-level interfaces exist.
Other accessor macros follow the same pattern: INTEGER(x) for integer vectors, LOGICAL(x) for logical vectors, STRING_ELT(x, i) for string vectors (which are arrays of CHARSXP pointers, not C strings), VECTOR_ELT(x, i) for list elements.
Writing raw C against R’s API is rarely the right choice for new code. The main reason to learn it is to read existing code: base R, data.table, and hundreds of CRAN packages use .Call() directly. Knowing the interface makes you a better reader of R’s source.
Exercises
- Write a C function that takes an integer vector and returns its maximum value. Compile it with
R CMD SHLIB, load it, and test it from R. Remember to useINTEGER()instead ofREAL(). - Modify the
sum_cfunction to return a length-1 numeric vector allocated withallocVector(REALSXP, 1)instead of usingScalarReal(). You will needPROTECTandUNPROTECT. Verify it produces the same result.
31.2 C++ via Rcpp
Rcpp is the most popular way to write compiled code for R. It wraps R’s C API in C++ classes that handle type conversion and memory management automatically. No PROTECT/UNPROTECT, no SEXP arithmetic, no accessor macros. You write ordinary C++ and Rcpp translates it.
The same sum function in Rcpp:
// sum_rcpp.cpp
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
double sum_rcpp(NumericVector x) {
int n = x.size();
double total = 0.0;
for (int i = 0; i < n; i++) {
total += x[i];
}
return total;
}The // [[Rcpp::export]] attribute tells Rcpp::sourceCpp() to generate the .Call() wrapper automatically. From R:
Rcpp::sourceCpp("sum_rcpp.cpp")
sum_rcpp(as.numeric(1:1000))One function call compiles, links, loads, and registers the function. The turnaround from edit to test is seconds.
Rcpp provides wrapper classes for all common R types: NumericVector, IntegerVector, CharacterVector, LogicalVector, List, DataFrame, NumericMatrix. These proxy the underlying SEXP without copying, so you pay no conversion cost on the way in. Return values are converted back to R objects automatically.
Sugar expressions are Rcpp’s vectorized operations. They mirror R’s vectorized functions in C++:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector abs_diff(NumericVector x, NumericVector y) {
return abs(x - y); // sugar: vectorized, no explicit loop
}
// [[Rcpp::export]]
LogicalVector above_threshold(NumericVector x, double threshold) {
return x > threshold; // sugar: vectorized comparison
}Sugar covers abs, sum, mean, min, max, ifelse, which, any, all, pow, sqrt, and many more. When sugar is sufficient, your C++ code looks almost identical to R but runs at compiled speed.
RcppArmadillo adds the Armadillo linear algebra library:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::vec solve_system(arma::mat A, arma::vec b) {
return arma::solve(A, b);
}Armadillo provides matrix decompositions (QR, SVD, Cholesky, eigenvalues), sparse matrix support, and an expression template engine that fuses operations to avoid temporaries. If your bottleneck is linear algebra beyond what R’s BLAS provides, RcppArmadillo is the natural tool.
When Rcpp shines: tight loops over vector elements, element-wise operations that cannot be expressed as vectorized R, recursive algorithms (tree traversal, dynamic programming), and any situation where you need fine-grained control over iteration. Rcpp does not help with I/O-bound code or code that is already calling optimized C internally (you cannot make sum() faster by rewriting it in Rcpp, because sum() is already C).
Exercises
- Write an Rcpp function that computes the running maximum of a numeric vector (each element is the max of all elements up to that index). Compare its speed with
cummax()from base R usingbench::mark(). - Write an Rcpp function that takes a numeric vector and returns the indices of all values greater than the mean. Use a loop, not sugar. Then rewrite it using sugar (
which(x > mean(x))). Which version is faster? - Using RcppArmadillo, write a function that computes the ordinary least squares coefficients (X^T X)^{-1} X^T y for a matrix X and vector y. Compare with R’s
lm.fit().
31.3 Rust via extendr
Rust is a systems language with a type system that prevents memory errors at compile time. Where C and C++ give you PROTECT/UNPROTECT and hope you get the count right, Rust enforces ownership rules: every value has exactly one owner, references have explicit lifetimes, and the compiler rejects code that could cause dangling pointers or use-after-free bugs. No garbage collector needed.
The extendr crate bridges Rust and R. The R package rextendr provides the interactive workflow:
rextendr::rust_function("
fn sum_rust(x: &[f64]) -> f64 {
x.iter().sum()
}
")
sum_rust(as.numeric(1:1000))rust_function() compiles a single Rust function and loads it into R, similar to Rcpp::cppFunction(). For larger projects, rust_source() compiles an entire Rust file.
A slightly more involved example: computing the nth Fibonacci number iteratively.
rextendr::rust_function("
fn fib(n: i32) -> i32 {
if n <= 1 { return n; }
let mut a = 0i32;
let mut b = 1i32;
for _ in 2..=n {
let tmp = a + b;
a = b;
b = tmp;
}
b
}
")
fib(10)
#> [1] 55No PROTECT, no UNPROTECT, no SEXP. The type conversion between R and Rust is handled by extendr’s #[extendr] macro (used in source files) or inferred by rust_function().
Rcpp vs extendr: Rcpp has over 2,600 reverse dependencies on CRAN and has been battle-tested for 15 years. Its documentation, Stack Overflow coverage, and library of examples are unmatched. extendr is younger (relatively new as of this writing) and has a smaller ecosystem. The trade-off is safety: Rcpp inherits C++’s memory model, where segfaults and undefined behavior are possible if you make mistakes. Rust eliminates those categories of bug at compile time.
For package development, rextendr::use_extendr() sets up the directory structure and build configuration. The polars package (R bindings to the Polars data frame library) and gifski (GIF encoding) are production examples of extendr-based packages.
If you are starting a new package today and the compiled code is non-trivial (more than a few hundred lines), Rust is worth serious consideration. The upfront cost of learning ownership semantics pays for itself in bugs you never have to debug. For quick one-off functions or small performance patches, Rcpp’s lower friction and larger community still win.
Exercises
- Install
rextendrand write a Rust function that counts the number of values in a numeric vector that exceed a given threshold. Call it from R and verify the result. - Compare the compile time of
Rcpp::cppFunction()andrextendr::rust_function()for equivalent simple functions. Which has faster turnaround?
31.4 Python via reticulate
Calling Python from R is not about speed. Python’s interpreter is slower than R’s for numerical work. The reason to call Python is access: scikit-learn, PyTorch, TensorFlow, Hugging Face Transformers, spaCy, and hundreds of other libraries have no R equivalent, or their R equivalents lag behind.
The reticulate package embeds a Python interpreter inside R’s process. There is no inter-process communication overhead; R and Python share the same memory space.
Importing modules:
library(reticulate)
np <- import("numpy")
pd <- import("pandas")
x <- np$array(c(1, 2, 3, 4, 5))
np$mean(x)
#> [1] 3The $ operator accesses Python attributes and methods. R vectors convert to NumPy arrays automatically, and NumPy arrays convert back to R vectors.
Sourcing Python scripts:
# helpers.py contains:
# def normalize(x):
# return (x - x.mean()) / x.std()
source_python("helpers.py")
normalize(c(1, 2, 3, 4, 5))source_python() executes a Python file and makes its top-level functions available in R’s global environment as regular R functions.
Type conversion rules: R numeric vectors become NumPy arrays. R data frames become Pandas DataFrames. R lists become Python dicts. R TRUE/FALSE become Python True/False. R NULL becomes Python None. These conversions happen automatically in most cases. For large arrays, the conversion is zero-copy when the memory layout is compatible (both R and NumPy store doubles as contiguous 64-bit IEEE 754 values).
Calling scikit-learn from R:
sklearn <- import("sklearn")
linear_model <- import("sklearn.linear_model")
X <- matrix(rnorm(200), ncol = 2)
y <- X[, 1] * 3 + X[, 2] * -1 + rnorm(100, sd = 0.5)
model <- linear_model$LinearRegression()
model$fit(X, y)
model$coef_
#> [1] 2.98 -1.02The model object lives in Python, but you interact with it from R using $. Predictions, coefficients, and scores are all accessible.
Managing Python environments: reticulate can use system Python, virtualenvs, or conda environments. Specify which Python to use before loading reticulate:
Sys.setenv(RETICULATE_PYTHON = "/usr/bin/python3")
# or
reticulate::use_virtualenv("myproject")
# or
reticulate::use_condaenv("myenv")Set this early, before calling import(). Once the Python interpreter starts, you cannot switch to a different one within the same R session.
Exercises
- Use reticulate to import Python’s
collectionsmodule and callCounteron a character vector. Verify the result matches R’stable(). - Create a NumPy array of 1 million random values using
np$random$standard_normal(). Pass it to an R function (e.g.,mean()). Does reticulate copy the data or share it? Usebench::mark()with varying sizes to find out.
31.5 Fortran
R’s numerical backbone is Fortran. Every call to qr(), svd(), chol(), and matrix multiplication (%*%) dispatches to Fortran BLAS and LAPACK routines. Every linear model you have ever fitted in R ultimately ran Fortran code. Fortran is also the oldest language in R’s foreign function toolkit, and understanding its interface explains why R’s numerical performance is competitive with languages that look faster on paper.
The .Fortran() interface passes R vectors to Fortran subroutines by copying them in and out:
! dot_product.f90
subroutine dotprod(x, y, n, result)
implicit none
integer, intent(in) :: n
double precision, intent(in) :: x(n), y(n)
double precision, intent(out) :: result
integer :: i
result = 0.0d0
do i = 1, n
result = result + x(i) * y(i)
end do
end subroutinesystem("R CMD SHLIB dot_product.f90")
dyn.load("dot_product.so")
result <- .Fortran("dotprod",
x = as.double(1:5),
y = as.double(6:10),
n = 5L,
result = double(1))
result$result
#> [1] 130.Fortran() returns a named list with all arguments, including outputs. This copy-in-copy-out semantics is simple but wasteful for large data. The newer .Call() interface with C wrappers around Fortran code avoids the copies.
You will rarely write new Fortran for R. The interface matters for two reasons: reading legacy code (many statistical packages on CRAN have Fortran backends dating to the 1990s), and understanding why R’s numerical performance is strong despite its interpreted overhead. When someone says “R is slow,” they are talking about R’s interpreter loop, not about the compiled Fortran that does the actual linear algebra.
31.6 When to use what
The decision depends on why you need another language in the first place.
You need speed in a tight loop: profile first (Section 28.1). If the bottleneck is a loop that cannot be vectorized, use Rcpp (largest ecosystem, fastest iteration cycle) or Rust via extendr (memory safety, better for large codebases). Raw C via .Call() if you want zero dependencies.
You need a library that only exists in Python: use reticulate. This includes deep learning (PyTorch, TensorFlow), NLP (spaCy, Transformers), and computer vision. Do not rewrite Python libraries in R; call them.
You need numerical linear algebra beyond base R: RcppArmadillo for dense matrices, RcppEigen for sparse matrices. Or call LAPACK directly via .Fortran() if you need a specific routine.
You are building a package with substantial compiled code: consider Rust if the team knows it, or is willing to learn. The compile-time safety checks catch bugs that would otherwise surface as sporadic segfaults in users’ R sessions. For smaller amounts of compiled code, Rcpp is fine.
You have legacy Fortran code: wrap it with .Fortran() or write a thin C wrapper and use .Call().
The ordering from Section 28.7 still applies: pure R first, vectorize, pre-allocate, switch engines (data.table, Arrow, DuckDB), then compiled code. Calling another language is a cost: it adds build dependencies, complicates installation, and makes debugging harder. Pay that cost only when the benefit is clear.
The fact that R makes it straightforward to call C, C++, Rust, and Python is one of its greatest strengths. Many languages treat foreign function interfaces as an afterthought. R was designed from the start to sit on top of compiled code. Use that design.
31.7 Packages worth studying
Real-world packages show how these interfaces work at scale. Each of these is open source; reading their src/ directory teaches more than any tutorial.
C backends:
data.table: the core grouping, joining, and sorting engine is C. Thesrc/directory is a masterclass in writing high-performance C against R’s API.fread()andfwrite()are C implementations of CSV reading and writing that outperform most alternatives.stringi: wraps the ICU (International Components for Unicode) C library. Provides fast, correct string operations covering Unicode normalization, collation, regex, and transliteration.
C++ backends:
arrow: R bindings to the Apache Arrow C++ library. Columnar in-memory format for zero-copy data exchange between systems.torch: R bindings to LibTorch (PyTorch’s C++ backend). No Python dependency; the C++ library is called directly via.Call()with custom binding code.dplyr: the core verbs dispatch to C++ for grouped operations viavctrsand internal C code.
Rust backends:
polars: R bindings to the Polars DataFrame library, built with extendr. A complete data manipulation engine written in Rust, exposed to R through generated.Call()wrappers.gifski: GIF encoding library, also via extendr. A small, clean example of a Rust-powered R package.
Python bridges:
tensorflowandkeras3: use reticulate to call TensorFlow/Keras. The R API mirrors the Python API closely.spacyr: wraps spaCy for natural language processing.
The pattern across all of these is the same: R provides the user-facing API (function names, argument handling, documentation, S3/S4 dispatch), and the compiled backend provides the computation. The interface layer is thin. This division of labor is what makes R effective as a language for data analysis: you get the expressiveness of a high-level language with the performance of a low-level one.
31.8 References and sources
C (the foundation):
- R Core Team, Writing R Extensions, chapter 5 (“System and foreign language interfaces”). The official guide to
.C(),.Call(),.External(). .Call()is the modern interface: pass SEXPs, return SEXPs, full control..C()is the old interface (copies data, limited types)..External()is rarely used.R_RegisterCCallable()/R_GetCCallable(): sharing C functions between packages without linking.
C++ via Rcpp:
- Dirk Eddelbuettel, Seamless R and C++ Integration with Rcpp (2013). The standard reference.
- Dirk Eddelbuettel & Romain Francois, “Rcpp: Seamless R and C++ Integration” (2011, JSS).
- Hadley Wickham, Advanced R (2e), chapter 25.
Rust via extendr:
- extendr project (extendr.github.io). Rust extensions for R, inspired by PyO3.
- The
rextendrpackage vignettes cover both interactive use and package integration.
Python via reticulate:
- Kevin Ushey et al., “reticulate: Interface to Python” (CRAN). Full documentation at rstudio.github.io/reticulate.
Fortran:
- R Core Team, Writing R Extensions, section 5.2. The
.Fortran()interface. - LAPACK Users’ Guide (netlib.org). The linear algebra library that R calls internally.