pkgdown/mathjax-config.html

Skip to contents

Overview

This vignette serves three purposes:

  1. Build intuition: Use pixel morphing as a visual analogy for assignment problems

  2. Scale up: Demonstrate approximation strategies when exact LAP becomes infeasible

  3. Scientific applications: Show how matching applies to ecology, physics, and chemistry

This vignette is different: Unlike the other couplr documentation, it emphasizes understanding over doing. If you’re looking to solve a matching problem today, start with vignette("getting-started") or vignette("matching-workflows"). Come here when you want to understand why these algorithms work and when approximations are appropriate.

Who This Vignette Is For

Audience: Advanced users, researchers, algorithm developers, curious minds

Prerequisites:

What You’ll Learn:

  • Why exact LAP becomes infeasible for large n

  • Three approximation strategies and their trade-offs

  • How matching problems appear in ecology, physics, and chemistry

  • Mathematical connections to optimal transport theory

Time to complete: 45-60 minutes (conceptual reading)

Documentation Roadmap

Vignette Focus Difficulty
Getting Started Basic LAP solving, API introduction Beginner
Algorithms Mathematical foundations, solver selection Intermediate
Matching Workflows Production matching pipelines Intermediate
Pixel Morphing Scientific applications, approximations Advanced

You are here: Pixel Morphing (Advanced)

Why Pixels?

Pixels provide an ideal testbed for understanding assignment problems:

  • Each pixel is an entity with measurable properties

  • Color = feature (what it looks like)

  • Position = spatial location (where it is)

  • The matching is visually verifiable: you can see if it worked

The same algorithms that morph images smoothly also track particles in physics, align molecules in chemistry, and match vegetation plots in ecology.

Scientific Applications at a Glance

Domain Entities Features Spatial
Ecology Vegetation plots Species composition Geographic location
Physics Particles Intensity, size Predicted position
Chemistry Atoms Element type 3D coordinates
Images Pixels RGB color (x, y) position

The General Matching Problem

Problem Formulation

Given two sets of entities A={a1,,an}A = \{a_1, \ldots, a_n\} and B={b1,,bn}B = \{b_1, \ldots, b_n\}, find the optimal one-to-one correspondence by minimizing

minπi=1nci,π(i) \min_{\pi} \sum_{i=1}^{n} c_{i,\pi(i)}

where the cost combines feature similarity and spatial proximity:

cij=αdfeature(ai,bj)+βdspatial(𝐱i,𝐱j). c_{ij} = \alpha \, d_{\text{feature}}(a_i, b_j) + \beta \, d_{\text{spatial}}(\mathbf{x}_i, \mathbf{x}_j).

Feature distance dfeatured_{\text{feature}}: domain-specific similarity

  • Ecology: Bray-Curtis dissimilarity between species vectors

  • Physics: difference in particle intensity or size

  • Chemistry: penalty for mismatched atom types

  • Images: Euclidean distance in RGB color space

Spatial distance dspatiald_{\text{spatial}}: physical proximity

  • Ecology: geographic distance between plot centers

  • Physics: Euclidean distance accounting for predicted motion

  • Chemistry: 3D distance between atomic coordinates

  • Images: 2D pixel position distance

Weights α,β0\alpha, \beta \ge 0 balance feature matching vs. spatial coherence.

Computational Challenge

Exact solution: solve the full n×nn \times n LAP.

  • Complexity: O(n3)O(n^3) using Jonker-Volgenant

  • Feasible: up to n1000n \approx 1000 (about 30×3030 \times 30 images, or 10001000 plots/particles/atoms)

  • Prohibitive: for n=10000n = 10\,000 (100×100100 \times 100 images), runtime and memory become expensive

Real applications often involve

  • High-resolution images: 200×200=40000200 \times 200 = 40\,000 pixels

  • Large ecological surveys: 5000+5000+ plots

  • Particle tracking: 10000+10\,000+ particles per frame

  • Molecular dynamics: 100000+100\,000+ atoms

We therefore need approximations that are much faster but still produce high-quality matchings.

Visual Illustration: Pixel Morphing

To make the abstract ideas concrete, we visualize them using image morphing where

  • entities = pixels

  • features = RGB color values

  • spatial position = (x,y)(x, y) coordinates

We first show the static input images (all at 80×8080 \times 80 for display), then the animated morphs produced by different matching strategies.

Source image A: photographTarget image B: photographSource image A: circle iconTarget image B: circle icon

The first pair are real photographs, the second pair are simple geometric shapes. Internally, all matching is computed on logical 40×4040 \times 40 grids; we then upscale to 80×8080 \times 80 purely for clearer display.

Exact Pixel Matching

The exact pixel morph uses a full LAP solution on a 1600×16001600 \times 1600 cost matrix. For each pair of pixels (i,j)(i, j) we compute

cij=αRGBiARGBjB2+β(xi,yi)(xj,yj)2, c_{ij} = \alpha \,\lVert \text{RGB}_i^A - \text{RGB}_j^B \rVert_2 + \beta \,\lVert (x_i, y_i) - (x_j, y_j) \rVert_2,

where color distances are normalized to [0,3][0, \sqrt{3}] (RGB in [0,1][0,1]) and spatial distances to [0,1][0,1] using the image diagonal.

Animated GIF showing exact pixel morphing between two photographsAnimated GIF showing exact pixel morphing between two circle icons

This yields an optimal one-to-one assignment of pixels. The resulting animations are smooth and artifact-free but require solving the full LAP.

Feature Quantization Morph (Strategy 1)

Animated GIF showing feature quantization morphing between two photographsAnimated GIF showing feature quantization morphing between two circle icons

In the feature quantization morph, similar colors are grouped, and groups are matched rather than individual pixels. Colors move as coherent “bands,” preserving global color structure but losing fine-grained per-pixel detail.

Hierarchical Morph (Strategy 2)

Animated GIF showing hierarchical morphing between two photographsAnimated GIF showing hierarchical morphing between two circle icons

The hierarchical morph first matches large patches, then refines within patches. The motion is locally coherent and scales well to large problems, at the price of potentially missing some globally optimal cross-patch matches.

Three Approximation Strategies

We now describe the three approximation strategies in detail. The animations above correspond directly to these methods.

Strategy 1: Feature Quantization

Core idea: reduce problem size by grouping entities with similar features, then match groups.

Mathematical Formulation

  1. Quantize features

Map continuous feature space to a finite palette

quantize:d{1,,k}, \text{quantize}: \mathbb{R}^d \to \{1, \ldots, k\},

where knk \ll n (for example k64k \approx 64 for n=1600n = 1600).

  1. Group by palette

Form groups GA(c)={i:quantize(fi)=c} G_A^{(c)} = \{ i : \text{quantize}(f_i) = c \} and similarly for BB.

  1. Match groups

Solve a k×kk \times k LAP between palette entries with costs

cij=αd(pi,pj)+βd(𝐱i,𝐱j), c'_{ij} = \alpha \, d(p_i, p_j) + \beta \, d(\bar{\mathbf{x}}_i, \bar{\mathbf{x}}_j),

where pip_i is the palette color and 𝐱i\bar{\mathbf{x}}_i the centroid position of group ii.

  1. Assign entities

Every entity in GA(c)G_A^{(c)} is assigned according to the group-to-group match.

Complexity Reduction

  • Original: O(n3)O(n^3) for an n×nn \times n LAP

  • Quantized: O(k3+nk)O(k^3 + n k) for the k×kk \times k LAP plus group assignment

  • Speedup: approximately (n/k)3(n/k)^3

For example, with n=1600n = 1600 (a 40×4040 \times 40 image) and k=64k = 64 you get

(160064)3=25315000 \left(\frac{1600}{64}\right)^3 = 25^3 \approx 15\,000

times fewer LAP operations.

Quality Trade-offs

Advantages

  • Very large speedups for big nn

  • Preserves global structure (similar features stay together)

  • Produces smooth, band-like motion without large jumps

Disadvantages

  • Loses detail within each palette group

  • Quantization artifacts when kk is too small

  • May miss optimal local pairings between similar but distinct feature values

The corresponding GIFs are the color walk morphs shown earlier.

Strategy 2: Hierarchical Decomposition

Core idea: split the domain into smaller subproblems by spatial partitioning, solve subproblems, and combine.

Mathematical Formulation

  1. Spatial partitioning

Divide the domain into m×mm \times m patches (for example m=4m = 4 so you get 1616 patches). Denote the subset of entities of AA in patch kk by

PA(k)={ai:𝐱iPatchk}. P_A^{(k)} = \{ a_i : \mathbf{x}_i \in \text{Patch}_k \}.

  1. Patch-level matching

Form patch representatives: centroid position and mean features per patch. Solve an m2×m2m^2 \times m^2 LAP between patches, with costs defined using the same feature and spatial distances but now at patch level.

  1. Recursive refinement

Within each matched patch pair (PA(k),PB(l))(P_A^{(k)}, P_B^{(l)}):

  • If |PA(k)|τ\lvert P_A^{(k)} \rvert \le \tau (a threshold, e.g. τ=50\tau = 50) solve the subproblem exactly.

  • Otherwise, partition that patch pair again and repeat.

  1. Combine solutions

Concatenate assignments from all leaf subproblems to obtain the global matching.

Complexity (Sketch)

With dd levels of decomposition (each level splitting into four patches), the work can be made close to O(nlogn)O(n \log n) in practice, compared to O(n3)O(n^3) for a single full LAP. Intuitively, the LAPs near the leaves are very small, and the costly large LAP is replaced by a series of much smaller ones.

Quality Trade-offs

Advantages

  • Scales to very large nn (tens of thousands of entities)

  • Preserves local structure: nearby entities tend to be matched within the same spatial patch

  • No feature discretization, so feature precision is retained

Disadvantages

  • May miss globally optimal cross-patch matches

  • Quality depends on partitioning scheme and threshold τ\tau

  • Possible boundary artifacts if important structure crosses patch boundaries

High-Level Algorithm

// Pseudocode for hierarchical LAP matching

FUNCTION match_hierarchical(region_A, region_B, threshold, level):

  // Base case: region small enough for exact LAP
  IF size(region_A) <= threshold THEN
    cost ← compute_cost_matrix(region_A, region_B, α, β)
    RETURN lap_solve(cost)
  END IF

  // Divide into 2×2 spatial grid (4 patches)
  patches_A ← spatial_partition(region_A, grid = 2×2)
  patches_B ← spatial_partition(region_B, grid = 2×2)

  // Compute patch representatives
  FOR each patch p DO
    centroid[p]      ← mean(positions in p)
    mean_feature[p]  ← mean(features in p)
  END FOR

  // Match patches using 4×4 LAP
  patch_cost ← matrix(4, 4)
  FOR i = 1 TO 4 DO
    FOR j = 1 TO 4 DO
      patch_cost[i, j] ← α·distance(mean_feature_A[i], mean_feature_B[j]) +
                         β·distance(centroid_A[i], centroid_B[j])
    END FOR
  END FOR

  patch_assignment ← lap_solve(patch_cost)

  // Recursively solve within matched patches
  assignments ← []
  FOR i = 1 TO 4 DO
    j ← patch_assignment[i]
    sub_assignment ← match_hierarchical(
      patches_A[i],
      patches_B[j],
      threshold,
      level + 1
    )
    assignments ← append(assignments, sub_assignment)
  END FOR

  RETURN concatenate(assignments)
END FUNCTION

The couplr implementation adds pragmatic details such as normalization of color and spatial distances, conversion between (x,y)(x, y) coordinates and raster indexing, and handling remainder patches when the grid does not divide evenly.

Strategy 3: Resolution Reduction

Core idea: solve the LAP on a coarse grid, then lift/upscale the assignment to the full-resolution grid.

Mathematical Formulation

  1. Downscale

Reduce spatial resolution by a factor ss (for example s=2s = 2):

A=downsample(A,s),B=downsample(B,s). A' = \text{downsample}(A, s), \qquad B' = \text{downsample}(B, s).

Now AA' and BB' each have n=n/s2n' = n / s^2 entities.

  1. Solve at low resolution

Compute an exact LAP solution on the n×nn' \times n' problem:

π=argminπi=1nci,π(i). \pi' = \arg\min_{\pi'} \sum_{i=1}^{n'} c'_{i,\pi'(i)}.

  1. Upscale assignment

Map the low-resolution assignment back to full resolution:

π(i)=upscale(π(coarse_index(i)),s), \pi(i) = \text{upscale}\!\bigl(\pi'(\text{coarse\_index}(i)), s\bigr),

where each full-resolution entity inherits the assignment of its coarse cell.

Complexity

  • Original: O(n3)O(n^3)

  • Downscaled: O((n/s2)3)=O(n3/s6)O\bigl((n/s^2)^3\bigr) = O(n^3 / s^6)

  • Speedup: s6s^6

For s=2s = 2 this gives a 64×64\times reduction in LAP work.

Quality Trade-offs

Advantages

  • Very simple to implement

  • Exact LAP at the coarse level

  • Large speedups for moderate ss

Disadvantages

  • Loss of fine detail and blocky artifacts

  • Assignment is no longer a true permutation at pixel level (multiple fine pixels can map to the same coarse target)

  • Quality deteriorates quickly for larger ss

In practice, resolution reduction is most useful as a crude initialization step for very large problems (n>100000n > 100\,000).

Strategy Comparison

Approach Speedup (vs. exact) Quality Best for
Exact LAP 1×1\times Optimal n1000n \le 1000
Feature quantization (n/k)3(n/k)^3 Good global structure Distinct feature groups
Hierarchical n3/2\approx n^{3/2} Good local structure Large nn, strong spatial structure
Resolution reduction s6s^6 Moderate Very large nn, rough initialization

Practical rules of thumb

  • n<1000n < 1000: use the exact LAP.

  • 1000<n<50001000 < n < 5000: feature quantization or a shallow hierarchy.

  • n>5000n > 5000: hierarchical decomposition with 2-3 levels.

  • n>50000n > 50\,000: combine s=2s = 2 resolution reduction with a hierarchical method.

Implementation Details of Exact Pixel Matching

We now spell out the exact LAP-based morph more concretely.

We again use the cost

cij=αRGBiARGBjB2+β(xi,yi)(xj,yj)2. c_{ij} = \alpha \,\lVert \text{RGB}_i^A - \text{RGB}_j^B \rVert_2 + \beta \,\lVert (x_i, y_i) - (x_j, y_j) \rVert_2.

The algorithm:

// Pseudocode for exact pixel matching

// Step 1: Compute full cost matrix (normalized)
n_pixels ← height × width
cost ← matrix(0, n_pixels, n_pixels)

FOR i = 1 TO n_pixels DO
  FOR j = 1 TO n_pixels DO
    // RGB color distance (normalized to [0, sqrt(3)])
    color_dist ← sqrt((R_A[i] - R_B[j])^2 +
                      (G_A[i] - G_B[j])^2 +
                      (B_A[i] - B_B[j])^2) / (255 · sqrt(3))

    // Spatial distance (normalized to [0, 1] by diagonal)
    spatial_dist ← sqrt((x_A[i] - x_B[j])^2 +
                        (y_A[i] - y_B[j])^2) / diagonal_length

    // Combined cost
    cost[i, j] ← α · color_dist + β · spatial_dist
  END FOR
END FOR

// Step 2: Solve with Jonker-Volgenant
assignment ← lap_solve(cost, method = "jv")

// Step 3: Generate morph frames by linear interpolation
FOR frame_idx = 1 TO n_frames DO
  t ← frame_idx / n_frames  // Time parameter in [0, 1]

  FOR pixel_i = 1 TO n_pixels DO
    j ← assignment[pixel_i]  // Matched target pixel

    // Interpolate position
    x_new[pixel_i] ← (1 - t) · x_A[pixel_i] + t · x_B[j]
    y_new[pixel_i] ← (1 - t) · y_A[pixel_i] + t · y_B[j]

    // Keep source color (transport-only, no blending)
    RGB_new[pixel_i] ← RGB_A[pixel_i]
  END FOR

  frames[frame_idx] ← render(x_new, y_new, RGB_new)
END FOR

The couplr implementation handles indexing, raster layout, and shows or saves the resulting GIFs.

Approximate performance: up to about 100×100100 \times 100 (10 000 pixels) on typical hardware is fine with the exact LAP.

Application to Scientific Domains

We now return from pixel morphs to the scientific settings that motivated them.

Ecology: Vegetation Plot Matching

Problem: match nn vegetation plots surveyed at time tt to nn plots at time t+Δtt + \Delta t to track community dynamics.

Feature distance: Bray-Curtis dissimilarity between species abundance vectors

dBC(a,b)=s|asbs|s(as+bs), d_{\text{BC}}(a, b) = \frac{\sum_s \lvert a_s - b_s \rvert} {\sum_s (a_s + b_s)},

where as,bsa_s, b_s are abundances of species ss in plots aa and bb.

Spatial distance: geographic distance (e.g. in kilometers) between plot centers.

Exact solution for small studies (n<100n < 100):

// Pseudocode for ecological plot matching
FOR i = 1 TO n_plots_t DO
  FOR j = 1 TO n_plots_tplus DO

    // Bray-Curtis dissimilarity for species composition
    numerator   ← sum over species s of |abundance_t[i, s] - abundance_tplus[j, s]|
    denominator ← sum over species s of (abundance_t[i, s] + abundance_tplus[j, s])
    bc_distance ← numerator / denominator

    // Geographic distance (kilometers)
    geo_distance ← sqrt((x_t[i] - x_tplus[j])^2 +
                        (y_t[i] - y_tplus[j])^2)

    // Combined cost (α = 0.7 emphasizes species composition)
    cost[i, j] ← 0.7 · bc_distance + 0.3 · (geo_distance / max_distance)

  END FOR
END FOR

plot_correspondence ← lap_solve(cost)

For large studies (n>1000n > 1000) a hierarchical approach by region is more practical:

// Hierarchical decomposition by geographic region

// 1. Divide landscape into spatial grid (e.g. 10 km × 10 km cells)
regions_t     ← spatial_partition(plots_t,     grid_size = 10 km)
regions_tplus ← spatial_partition(plots_tplus, grid_size = 10 km)

// 2. Compute region representatives
FOR each region r DO
  mean_composition[r] ← average species vector across plots in r
  centroid[r]         ← geographic center of r
END FOR

// 3. Match regions (small LAP: ~100 regions)
region_cost       ← compute_cost(mean_composition, centroids, α = 0.7, β = 0.3)
region_assignment ← lap_solve(region_cost)

// 4. Within matched regions, solve plot-level LAP
full_assignment ← []
FOR r = 1 TO n_regions DO
  r_matched ← region_assignment[r]
  plots_A   ← plots in regions_t[r]
  plots_B   ← plots in regions_tplus[r_matched]

  // Local LAP (smaller problem, e.g. 50 × 50)
  cost_local       ← compute_plot_cost(plots_A, plots_B, α = 0.7, β = 0.3)
  local_assignment ← lap_solve(cost_local)

  full_assignment ← append(full_assignment, local_assignment)
END FOR

RETURN full_assignment

This allows tracking individual plot trajectories across time, distinguishing stable communities, successional trends, and invasion fronts.

Physics: Particle Tracking

Problem: track nn particles between frame tt and t+Δtt + \Delta t in experimental video.

Feature distance: differences in intensity, size, or shape.

Spatial distance: displacement relative to predicted motion:

dspatial(i,j)=𝐱i+𝐯iΔt𝐱j2, d_{\text{spatial}}(i, j) = \bigl\| \mathbf{x}_i + \mathbf{v}_i \Delta t - \mathbf{x}_j \bigr\|_2,

where 𝐯i\mathbf{v}_i is the estimated velocity from previous frames.

We also impose a maximum displacement dmaxd_{\max} beyond which matches are physically implausible.

Exact solution (moderate nn):

// Pseudocode for particle tracking with velocity prediction

// Initialize cost matrix as forbidden everywhere
cost ← matrix(Inf, n_particles_t, n_particles_tplus)

FOR i = 1 TO n_particles_t DO
  // Predict position using previous velocity
  x_predicted ← x_t[i] + v_x_t[i] · Δt
  y_predicted ← y_t[i] + v_y_t[i] · Δt

  FOR j = 1 TO n_particles_tplus DO
    // Distance from predicted position
    dx ← x_predicted - x_tplus[j]
    dy ← y_predicted - y_tplus[j]
    spatial_distance ← sqrt(dx^2 + dy^2)

    // Only consider physically plausible matches
    IF spatial_distance <= max_displacement THEN
      // Feature similarity (intensity, size, etc.)
      feature_distance ← |intensity_t[i] - intensity_tplus[j]|

      // Combined cost
      cost[i, j] ← α · feature_distance + β · spatial_distance
    END IF
  END FOR
END FOR

// Solve assignment (Inf entries are forbidden)
particle_tracks ← lap_solve(cost)

// Update velocities from assignments
FOR i = 1 TO n_particles_t DO
  j ← particle_tracks[i]
  velocity_new[i] ← (position_tplus[j] - position_t[i]) / Δt
END FOR

For dense tracking (n>5000n > 5000), we can first cluster particles:

// Two-stage: clustering then local matching

// Stage 1: spatial clustering
clusters_t     ← spatial_cluster(particles_t,     radius = 2 · pixel_size)
clusters_tplus ← spatial_cluster(particles_tplus, radius = 2 · pixel_size)

// Compute cluster representatives
FOR each cluster c DO
  centroid[c]       ← mean position of particles in c
  mean_intensity[c] ← mean intensity
  mean_velocity[c]  ← mean velocity (if available)
END FOR

// Match clusters
cluster_cost   ← compute_cluster_similarity(clusters_t, clusters_tplus)
cluster_tracks ← lap_solve(cluster_cost)

// Stage 2: within matched clusters, track individual particles
full_tracks ← []
FOR c = 1 TO n_clusters DO
  c_matched   ← cluster_tracks[c]
  particles_A ← particles in clusters_t[c]
  particles_B ← particles in clusters_tplus[c_matched]

  cost_local ← compute_particle_distance(
    particles_A, particles_B,
    max_displacement = 5,
    α = 0.3,
    β = 0.7
  )

  local_tracks ← lap_solve(cost_local)
  full_tracks  ← append(full_tracks, local_tracks)
END FOR

RETURN full_tracks

This yields efficient and robust trajectories even for very dense particle fields.

Chemistry: Molecular Conformation Alignment

Problem: align two conformations of the same molecule (e.g. a protein) with nn atoms to compute RMSD and analyze structural change.

Feature distance: strict element matching

delement(i,j)={0,if elementi=elementj,,otherwise. d_{\text{element}}(i, j) = \begin{cases} 0, & \text{if } \text{element}_i = \text{element}_j, \\ \infty, & \text{otherwise.} \end{cases}

Spatial distance: 3D Euclidean distance between atomic coordinates.

Exact LAP for small molecules:

// Pseudocode for molecular conformation alignment

n_atoms ← number of atoms in molecule
cost    ← matrix(0, n_atoms, n_atoms)

FOR i = 1 TO n_atoms DO
  FOR j = 1 TO n_atoms DO

    // Enforce strict element type matching
    IF element_type_A[i] ≠ element_type_B[j] THEN
      cost[i, j] ← Inf
    ELSE
      dx ← x_A[i] - x_B[j]
      dy ← y_A[i] - y_B[j]
      dz ← z_A[i] - z_B[j]
      cost[i, j] ← sqrt(dx^2 + dy^2 + dz^2)
    END IF

  END FOR
END FOR

// Solve alignment
alignment ← lap_solve(cost)

// Compute RMSD
sum_sq_dist ← 0
FOR i = 1 TO n_atoms DO
  j            ← alignment[i]
  sum_sq_dist ← sum_sq_dist + cost[i, j]^2
END FOR

rmsd ← sqrt(sum_sq_dist / n_atoms)

For large biomolecules, we again use a hierarchical strategy, this time by secondary structure elements (helices, sheets, loops, etc.), aligning segments first and then atoms within matched segments.

Implementation Notes

Customizing Morph Duration

The morphing examples use default settings, but you can customize the number of frames and speed:

# From inst/scripts/generate_examples.R
generate_morph <- function(assignment, pixels_A, pixels_B,
                           n_frames    = 30,   # number of frames
                           frame_delay = 0.1)  # delay between frames (seconds)
{
  frames <- lapply(seq(0, 1, length.out = n_frames), function(t) {
    interpolate_frame(t, assignment, pixels_A, pixels_B)
  })

  save_gif(frames, delay = frame_delay)
}

Total animation duration is n_frames * frame_delay seconds.

Using the Example Code

The morphing implementation is provided in inst/scripts/generate_examples.R:

# View the source
example_script <- system.file("scripts", "generate_examples.R", package = "couplr")
file.show(example_script)

# Or source to use its helpers
source(example_script)

# Apply to your own data
my_cost       <- build_cost_matrix(my_data_A, my_data_B)
my_assignment <- lap_solve(my_cost)

Regenerating Examples

To regenerate all demo GIFs and PNGs:

source("inst/scripts/generate_examples.R")

This will write the assets under inst/extdata.

Mathematical Foundation: Optimal Transport

The matching problems discussed here are discrete instances of optimal transport.

Monge Problem

The original Monge formulation (1781) seeks a transport map T:ABT: A \to B minimizing

Ac(𝐱,T(𝐱))dμ(𝐱). \int_A c(\mathbf{x}, T(\mathbf{x})) \,\mathrm{d}\mu(\mathbf{x}).

Kantorovich Relaxation

Kantorovich (1942) relaxed this to a transport plan γ\gamma on A×BA \times B:

minγA×Bc(𝐱,𝐲)dγ(𝐱,𝐲) \min_{\gamma} \int_{A \times B} c(\mathbf{x}, \mathbf{y}) \,\mathrm{d}\gamma(\mathbf{x}, \mathbf{y})

subject to marginal constraints on γ\gamma.

Discrete Linear Assignment

For discrete uniform distributions with nn points in AA and BB we obtain exactly the linear assignment problem:

minπSni=1nci,π(i), \min_{\pi \in S_n} \sum_{i=1}^n c_{i,\pi(i)},

which is what couplr solves efficiently.

Wasserstein Distance

With cij=d(𝐱i,𝐱j)c_{ij} = d(\mathbf{x}_i, \mathbf{x}_j) (often Euclidean distance), the optimal cost defines the 11‑Wasserstein distance:

W1(μ,ν)=minπSni=1nci,π(i). W_1(\mu, \nu) = \min_{\pi \in S_n} \sum_{i=1}^n c_{i,\pi(i)}.

This appears in

  • Earth mover’s distance for image retrieval

  • Distributional similarity in statistics

  • Generative modeling (e.g. Wasserstein GANs)

Further Reading

Optimal Transport Theory

  • Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport. Foundations and Trends in Machine Learning.

  • Villani, C. (2008). Optimal Transport: Old and New. Springer.

Scientific Applications

Ecology

  • Anderson, M. J. et al. (2011). Navigating the multiple meanings of beta diversity. Ecology Letters.

  • Legendre, P., & Legendre, L. (2012). Numerical Ecology. Elsevier.

Physics

  • Adrian, R. J., & Westerweel, J. (2011). Particle Image Velocimetry. Cambridge University Press.

  • Crocker, J. C., & Grier, D. G. (1996). Methods of digital video microscopy. Journal of Colloid and Interface Science.

Chemistry

  • Kabsch, W. (1976). A solution for the best rotation to relate two sets of vectors. Acta Crystallographica.

  • Coutsias, E. A. et al. (2004). Using quaternions to calculate RMSD. Journal of Computational Chemistry.

Assignment Algorithms

  • Burkard, R., Dell’Amico, M., & Martello, S. (2009). Assignment Problems. SIAM.

  • For implementation details in this package see vignette("algorithms").

Connection to couplr Workflows

The approximation strategies in this vignette become relevant when working with couplr’s practical matching functions.

When Do These Strategies Apply?

Strategy couplr Implementation When to Use
Exact LAP match_couples(method = "jv") n < 3,000
Feature quantization Implicit via scale = "robust" Reduces effective feature space
Hierarchical matchmaker(block_type = "cluster") n > 3,000, use blocking
Resolution reduction Custom code Very large n

Practical Recommendations

For n < 3,000: Use match_couples() with exact algorithms:

result <- match_couples(left, right, vars = c("x", "y", "z"), auto_scale = TRUE)

For 3,000 < n < 10,000: Use blocking to create smaller subproblems:

blocks <- matchmaker(left, right, block_type = "cluster", n_blocks = 10)
result <- match_couples(blocks$left, blocks$right, vars = vars, block_id = "block_id")

For n > 10,000: Use greedy matching:

result <- greedy_couples(left, right, vars = vars, strategy = "sorted")

For n > 50,000: Combine strategies (blocking + greedy within blocks), or implement custom approximations using the techniques in this vignette.

Limitations of Approximation Strategies

Each approximation trades accuracy for speed. Know the failure modes:

Strategy Works Well When Fails When
Feature quantization Features cluster naturally Continuous features, fine distinctions matter
Hierarchical Spatial locality is meaningful Optimal matches cross boundaries
Resolution reduction Coarse structure suffices Fine detail matters

Summary

This vignette explored optimal matching through the lens of pixel morphing and scientific applications.

Key Ideas:

  1. Assignment = matching: LAP finds optimal correspondences between two sets

  2. Scalability matters: O(n3)O(n^3) becomes prohibitive for n>3,000n > 3{,}000

  3. Three approximations: Feature quantization, hierarchical decomposition, resolution reduction

  4. Same math, different domains: Pixels, particles, plots, and atoms all use the same algorithms

The same algorithms that morph images smoothly also track particles in physics, align molecules in chemistry, and match vegetation plots in ecology. Together, the methods in couplr let you move between exact optimal matchings and principled approximations, depending on problem size and accuracy requirements.


See Also