Overview
The Hungarian algorithm was published in 1955. Sixty years later, most R packages still use nothing else.
couplr implements eighteen algorithms, including methods that exist in no other R package: Gabow-Tarjan, Orlin-Ahuja, Network Simplex, Ramshaw-Tarjan.
Who This Vignette Is For
Audience: Researchers curious about optimization algorithms, developers choosing the right method for their problem, anyone wondering why modern algorithms beat Hungarian by 20x or more.
Prerequisites:
Basic understanding of assignment problems (matching rows to columns)
Familiarity with cost matrices (you want to minimize total cost)
Comfort with big-O notation (helpful but not required)
What You’ll Learn:
Why Hungarian is slow and how later algorithms improved it
How primal-dual, auction, and network flow approaches work
Which algorithm to choose for your problem
How couplr’s automatic selection works
The Race
But first: the same problem, five different solutions.
When you run five different assignment algorithms on identical input, they all find the same optimal answer, but the fastest finishes 22 times quicker than the slowest.
The slowest happens to be Hungarian, the algorithm everyone learns in textbooks. CSA, the fastest here, came out four decades later. That gap represents years of algorithmic refinement that most production software never adopted.
Why would anyone need five ways to solve the same problem? Because they don’t all behave the same under different conditions. The Hungarian method that handles a 100×100 matrix without complaint becomes painfully slow at 1000×1000. The Auction algorithm that dominates large dense problems stumbles on small sparse ones. Different matrix sizes, different sparsity patterns, different cost distributions: each situation favors a different algorithm.
couplr gives you all of them, and it picks the right one automatically.
The Problem
Before the algorithms, the problem. It’s simple to state:
Given workers and jobs, where assigning worker to job costs , find the assignment that minimizes total cost.
Mathematically:
where is a permutation (each worker gets exactly one job, each job gets exactly one worker).
Simple to state. Not simple to solve efficiently.
There are possible assignments. For , that’s 2.4 quintillion possibilities. Brute force is impossible. We need structure.
The LP Formulation
The assignment problem is a linear program. Let indicate whether worker is assigned to job :
The constraint matrix is totally unimodular: every square submatrix has determinant 0, 1, or -1. This guarantees integer solutions even when we relax to .
Duality: The Key to Efficiency
The dual LP introduces prices for workers and for jobs:
Complementary slackness links primal and dual: if in an optimal assignment, then (the constraint is tight). This means optimal assignments use only tight edges where the dual constraint holds with equality.
The reduced cost of edge is . An edge is tight when .
Different algorithms exploit this duality in different ways.
The Classics
Hungarian Algorithm (1955)
The algorithm everyone learns. Published by Harold Kuhn, based on work by Hungarian mathematicians Kőnig and Egerváry.
The idea: Maintain dual prices such that for all pairs. Edges where equality holds are “tight”: the only edges that can appear in an optimal solution.
The algorithm:
Initialize prices. Find tight edges.
Find a maximum matching using only tight edges.
If the matching is complete, done.
Otherwise, update prices to create new tight edges. Repeat.
Complexity:
The problem: That hides a large constant. The price updates and augmenting path searches are expensive. For a 1000×1000 matrix, you might wait 10+ seconds.
cost <- matrix(c(10, 19, 8, 15, 10, 11, 9, 12, 14), nrow = 3, byrow = TRUE)
result <- lap_solve(cost, method = "hungarian")
print(result)
#> Assignment Result
#> =================
#>
#> # A tibble: 3 × 3
#> source target cost
#> <int> <int> <dbl>
#> 1 1 3 8
#> 2 2 2 10
#> 3 3 1 9
#>
#> Total cost: 27
#> Method: hungarianHungarian works. It’s clean and easy to teach. But in 1987, two Dutch researchers found something faster.
Jonker-Volgenant Algorithm (1987)
Roy Jonker and Anton Volgenant asked: what if we start with a good guess and fix it?
The key insight: Column reduction. Before any sophisticated search, greedily assign each row to its cheapest available column. This often gets most of the matching right immediately.
The algorithm:
Column reduction: For each column, find the two smallest costs. The difference is the “advantage” of the best row.
Reduction transfer: Assign rows to columns, handling conflicts by dual variable updates.
Augmentation: For any remaining unmatched rows, use Dijkstra-style shortest path search.
Complexity: Still , but with a much smaller constant. Often 10-50× faster than Hungarian in practice.
set.seed(123)
n <- 100
cost <- matrix(runif(n * n, 0, 100), n, n)
result <- lap_solve(cost, method = "jv")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 149.09JV became the de facto standard. For dense problems up to a few thousand rows, it’s hard to beat.
But JV has a limitation: it’s fundamentally serial. Each augmenting path depends on the previous. For very large problems, we need a different approach.
The Scaling Revolution
In the late 1980s, researchers discovered a powerful trick called ε-scaling. The idea: relax the optimality requirement. Instead of demanding exact answers at every step, tolerate a small error ε. Start with a large ε, which lets you make big sloppy steps and rapid progress. Then shrink ε over multiple phases until it’s essentially zero. Now you have an exact answer.
This transforms how the algorithm behaves. Large ε means big steps and rapid progress; small ε means careful refinement. The total work can end up being less than doing everything exactly from the start.
Four algorithms exploit this insight: Auction, CSA, Gabow-Tarjan, and Orlin-Ahuja.
Auction Algorithm (1988)
Dimitri Bertsekas asked: what if we thought of assignment as an economics problem?
The metaphor: Workers are buyers. Jobs are goods. Each job has a price. Workers bid for their favorite jobs. Prices rise when there’s competition. Equilibrium = optimal assignment.
The algorithm:
Each unassigned worker finds their best job (highest value minus price).
The worker bids: new price = old price + (best value - second-best value) + ε.
If someone else held that job, they become unassigned.
Repeat until everyone is assigned.
Why ε matters: Without ε, two workers could bid infinitely against each other, each raising the price by 0. The ε ensures progress.
Complexity: where is the cost range.
couplr offers three Auction variants:
| Variant | method = |
Key Feature |
|---|---|---|
| Standard | "auction" |
Adaptive ε, queue-based |
| Scaled | "auction_scaled" |
ε-scaling phases |
| Gauss-Seidel | "auction_gs" |
Sequential sweep |
set.seed(123)
n <- 100
cost <- matrix(runif(n * n, 0, 100), n, n)
result <- lap_solve(cost, method = "auction")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 149.09Auction shines for large dense problems. But it’s sensitive to ε. Get it wrong and performance degrades, or the algorithm cycles forever.
The next algorithm makes ε-scaling systematic.
Cost-Scaling Algorithm / CSA (1995)
Andrew Goldberg and Robert Kennedy asked: what if we scale ε automatically?
The idea: Start with . In each phase, halve ε and refine the current solution. After phases, ε is essentially zero: optimality.
Why it’s fast: Each phase is cheap because the previous phase’s solution is a good starting point. The algorithm exploits its own progress.
Complexity: amortized, often faster in practice.
set.seed(456)
n <- 100
cost <- matrix(runif(n * n, 0, 100), n, n)
result <- lap_solve(cost, method = "csa")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 192.48CSA often wins benchmarks for medium-large dense problems. It’s the workhorse.
But there’s an even stranger approach: what if instead of scaling costs, you scaled bits?
Gabow-Tarjan Algorithm (1989)
Harold Gabow and Robert Tarjan developed a clever algorithm based on binary representations. It’s also one of the most complex to implement.
The insight: Integer costs have a natural scale: binary digits. Process costs from most significant to least significant bit. At each scale, solve a simpler problem. Use that solution to warm-start the next scale.
The algorithm (simplified):
Initialize at the coarsest scale (most significant bit only).
Double the scale: multiply all costs by 2. This “doubles” the current solution’s slack.
Restore 1-feasibility (see below).
Use Hungarian-style search for augmenting paths.
Repeat until all bits are processed.
What is 1-feasibility? Standard dual feasibility requires for all edges. 1-feasibility relaxes this: we allow . The “1” comes from the current bit position. At each scaling phase, we only need reduced costs to be within 1 of optimal. When we refine to the next bit, we tighten the bound. After processing all bits, the slack is less than 1, which for integers means exactly 0: true optimality.
Complexity: where is the maximum cost.
Rarely seen outside academic papers. The bookkeeping across scaling phases is complex enough that most implementations skip it.
set.seed(42)
n <- 50
# Use integer costs with large range - Gabow-Tarjan's strength
cost <- matrix(sample(1:100000, n * n, replace = TRUE), n, n)
result <- lap_solve(cost, method = "gabow_tarjan")
cat("Total cost:", get_total_cost(result), "\n")
#> Total cost: 151632Gabow-Tarjan is primarily of theoretical interest. It provides the best known worst-case bounds for integer costs. But there’s one more scaling algorithm, with even better theoretical complexity.
Orlin-Ahuja Algorithm (1992)
James Orlin and Ravindra Ahuja developed a double-scaling algorithm that scales both costs AND capacities simultaneously.
The insight: Cost-scaling alone gives . But if we also scale flow capacities, we can exploit the structure of sparse graphs. At each scale, we only need to push units of flow before refining.
The algorithm:
Scale costs as in Gabow-Tarjan (process bits from high to low).
-
At each cost scale, use capacity scaling:
Start with large capacity increments
Find augmenting paths that can carry flow
Halve and repeat until
The capacity scaling limits work per phase to augmentations.
Why appears: The assignment problem has units of flow total. With capacity scaling, each phase handles flow units, and there are phases per cost scale. This geometric structure yields the improved bound.
Complexity: where is the number of edges.
For sparse problems where , this is dramatically better than .
set.seed(111)
n <- 50
cost <- matrix(sample(1:100000, n * n, replace = TRUE), n, n)
result <- lap_solve(cost, method = "orlin")
cat("Total cost:", get_total_cost(result), "\n")
#> Total cost: 123035Orlin-Ahuja provides the best theoretical bounds for sparse problems with large cost ranges. The implementation complexity is substantial: maintaining blocking flows across scaling phases requires careful data structure engineering. In practice, the overhead often makes it slower than CSA for dense problems. But for large sparse instances, it’s asymptotically optimal.
That’s four scaling algorithms, each trading precision for speed in a different way. But there’s a completely different way to think about the problem entirely.
The Network View
Every algorithm so far thinks in terms of assignments: matching workers to jobs. But assignment problems are secretly flow problems.
Model the assignment as a network: - A source node connected to all workers (capacity 1 each)
Workers connected to jobs (with costs)
Jobs connected to a sink node (capacity 1 each)
Find minimum-cost flow of value
This perspective gives us two more algorithms.
Network Simplex
The simplex method, specialized for networks. The key insight: for network flow problems, the simplex basis corresponds to a spanning tree of the graph. This makes basis operations (pivoting) much faster than general LP.
The algorithm:
Initialize: Find a spanning tree that supports a feasible flow (e.g., all flow through a single hub).
Compute potentials: For tree edges, set . This determines all node potentials uniquely (up to a constant).
Price non-tree edges: For each edge , compute reduced cost .
Pivot: If any non-tree edge has , adding it to creates a cycle. Push flow around the cycle until some tree edge saturates. Remove that edge; the non-tree edge enters the basis.
Repeat until all reduced costs are non-negative (optimality).
Why trees?: In a tree, there’s exactly one path between any two nodes. This means: - Flow is uniquely determined by supplies/demands
Potentials can be computed in by tree traversal
Each pivot changes potentials (along a tree path), not
Complexity: typical, strongly polynomial with anti-cycling rules.
Best for: Problems where you need dual variables for sensitivity analysis. Problems already formulated as network flows. Cases where you want guaranteed finite convergence.
set.seed(789)
n <- 100
cost <- matrix(runif(n * n, 0, 100), n, n)
result <- lap_solve(cost, method = "network_simplex")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 156.88Network Simplex is a standard tool in operations research. Not always the fastest, but reliable and provides rich dual information.
Push-Relabel Algorithm
Goldberg and Tarjan’s push-relabel algorithm (1988), adapted for minimum-cost flow.
The key idea: Traditional algorithms maintain a valid flow and search for augmenting paths. Push-relabel inverts this: maintain a preflow (flow that may violate conservation at intermediate nodes) and work locally to eliminate violations.
Key concepts:
Excess: . Preflow allows at non-sink nodes.
Height function: with and for residual edges .
Admissible edge: Residual edge where (flow goes “downhill”).
The algorithm:
Initialize: Saturate all edges from source. Set , for .
-
While any node has excess :
Push: If admissible edge exists, push flow along it.
Relabel: If no admissible edge, set .
When no excess remains at intermediate nodes, we have a valid maximum flow.
For minimum-cost flow: Modify to only push along edges with zero reduced cost, and relabel using potential updates. This gives the cost-scaling push-relabel variant.
Complexity: for max-flow, for min-cost flow.
Strengths: Highly parallelizable since pushes are local operations. Excellent cache behavior. Dominates in practice for max-flow; competitive for min-cost flow on dense graphs.
set.seed(222)
n <- 100
cost <- matrix(runif(n * n, 0, 100), n, n)
result <- lap_solve(cost, method = "push_relabel")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 169.7Two network perspectives. Same problem. Different algorithmic approaches.
But all these algorithms assume dense, square matrices. Real problems are messier.
The Specialists
HK01: Binary Costs
When costs are only 0 or 1, we don’t need the full machinery.
The algorithm: Hopcroft-Karp for maximum cardinality matching, run on zero-cost edges first. Then add 1-cost edges as needed.
Complexity: for binary costs.
set.seed(101)
n <- 100
cost <- matrix(sample(0:1, n^2, replace = TRUE, prob = c(0.3, 0.7)), n, n)
result <- lap_solve(cost, method = "hk01")
cat("Total cost:", get_total_cost(result), "\n")
#> Total cost: 0When you have binary costs and large , HK01 is dramatically faster.
SAP and LAPMOD: Sparse Problems
When 80% of entries are forbidden (Inf or NA), why store them?
SAP (Shortest Augmenting Path) and LAPMOD use sparse representations: adjacency lists instead of dense matrices.
Complexity: where is the number of allowed edges.
set.seed(789)
n <- 100
cost <- matrix(Inf, n, n)
edges <- sample(1:(n^2), floor(0.2 * n^2)) # Only 20% allowed
cost[edges] <- runif(length(edges), 0, 100)
result <- lap_solve(cost, method = "sap")
cat("Total cost:", round(get_total_cost(result), 2), "\n")
#> Total cost: 794.94For very sparse problems, SAP can be orders of magnitude faster than dense algorithms.
Ramshaw-Tarjan: Rectangular Problems (2012)
Most algorithms assume square matrices. When you have workers and jobs, standard approaches pad with dummy workers at zero cost. This works but wastes effort: the algorithm processes entries when only matter.
Ramshaw and Tarjan (2012) developed an algorithm that handles rectangularity natively by exploiting the structure of unbalanced bipartite graphs.
The key insight: In a rectangular assignment, we match all rows but only of the columns. The dual problem has different structure: row duals are unconstrained, but column duals must satisfy for unmatched columns.
The algorithm:
Maintain dual variables with and for free columns.
Use a modified Dijkstra search that respects the asymmetric dual constraints.
When augmenting, update duals to preserve feasibility without padding.
Complexity: using Fibonacci heaps, or with simpler structures.
For highly rectangular problems (e.g., matching 100 treatments to 10,000 controls), this avoids the cost of padding to square.
set.seed(333)
n_rows <- 30
n_cols <- 100 # Highly rectangular: 30 × 100
cost <- matrix(runif(n_rows * n_cols, 0, 100), n_rows, n_cols)
result <- lap_solve(cost, method = "ramshaw_tarjan")
cat("Matched", sum(result$assignment > 0), "of", n_rows, "rows\n")
#> Warning: Unknown or uninitialised column: `assignment`.
#> Matched 0 of 30 rowsWhen you have significantly more columns than rows (or vice versa), Ramshaw-Tarjan avoids the wasted work of padding. The newest algorithm in couplr’s collection, and essential for large-scale matching problems where treatment and control pools have very different sizes.
Beyond Standard Assignment
couplr includes specialized solvers for variations on the assignment problem.
K-Best Solutions (Murty’s Algorithm)
What if you want the 2nd best assignment? The 3rd best? The k-th best?
cost <- matrix(c(10, 19, 8, 15, 10, 18, 7, 17, 13, 16, 9, 14, 12, 19, 8, 18),
nrow = 4, byrow = TRUE)
kbest <- lap_solve_kbest(cost, k = 5)
summary(kbest)
#> # A tibble: 5 × 4
#> rank solution_id total_cost n_assignments
#> <int> <int> <dbl> <int>
#> 1 1 1 49 4
#> 2 2 2 50 4
#> 3 3 3 50 4
#> 4 4 4 51 4
#> 5 5 5 51 4Use cases: Sensitivity analysis. Alternative plans when the optimal is infeasible. Understanding how costs affect solutions.
Bottleneck Assignment
Minimize the maximum edge cost instead of the sum.
cost <- matrix(c(5, 9, 2, 10, 3, 7, 8, 4, 6), nrow = 3, byrow = TRUE)
result <- bottleneck_assignment(cost)
cat("Bottleneck (max edge):", result$bottleneck, "\n")
#> Bottleneck (max edge): 6Use cases: Load balancing. Fairness constraints. Worst-case optimization.
Sinkhorn: Soft Assignment
Entropy-regularized optimal transport. Instead of hard 0/1 assignment, produce a doubly-stochastic transport plan.
cost <- matrix(c(1, 2, 3, 4), nrow = 2)
result <- sinkhorn(cost, lambda = 10)
print(round(result$transport_plan, 3))
#> [,1] [,2]
#> [1,] 0.25 0.25
#> [2,] 0.25 0.25Use cases: Probabilistic matching. Domain adaptation. Wasserstein distances.
Dual Variables
Extract dual prices for sensitivity analysis.
cost <- matrix(c(10, 19, 8, 15, 10, 18, 7, 17, 13), nrow = 3, byrow = TRUE)
result <- assignment_duals(cost)
cat("Row duals (u):", result$u, "\n")
#> Row duals (u): 8 10 7
cat("Col duals (v):", result$v, "\n")
#> Col duals (v): 0 0 0Use cases: Shadow prices. Identifying critical assignments. Marginal cost analysis.
The Benchmark
You’ve seen what the algorithms do. Now: how fast?
For dense matrices: CSA and JV are consistently fastest. Hungarian falls behind rapidly. Auction and Network Simplex are solid middle-ground choices.
For sparse matrices: SAP and LAPMOD are 10× faster than dense algorithms. Use them.
Quick Reference
| Algorithm | Complexity | Best For | Method |
|---|---|---|---|
| Hungarian | Pedagogy, small | "hungarian" |
|
| Jonker-Volgenant | expected | General purpose | "jv" |
| Auction | Large dense | "auction" |
|
| Auction (Gauss-Seidel) | Spatial structure | "auction_gs" |
|
| Auction (Scaled) | Large dense, fastest | "auction_scaled" |
|
| CSA | amortized | Medium-large dense | "csa" |
| Cost-Scaling Flow | General min-cost flow | "csflow" |
|
| Gabow-Tarjan | Large integer costs | "gabow_tarjan" |
|
| Orlin-Ahuja | Large sparse | "orlin" |
|
| Network Simplex | typical | Dual info needed | "network_simplex" |
| Push-Relabel | Max-flow style | "push_relabel" |
|
| Cycle Canceling | Theoretical interest | "cycle_cancel" |
|
| HK01 | Binary costs only | "hk01" |
|
| SSAP (Dial) | Integer costs, buckets | "ssap_bucket" |
|
| SAP | Sparse (>50% forbidden) | "sap" |
|
| LAPMOD | Sparse (>50% forbidden) | "lapmod" |
|
| Ramshaw-Tarjan | Rectangular matrices | "ramshaw_tarjan" |
|
| Brute Force | Tiny () | "bruteforce" |
Or just use method = "auto" and let couplr choose.
References
Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly.
Jonker, R., & Volgenant, A. (1987). A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing.
Bertsekas, D. P. (1988). The auction algorithm: A distributed relaxation method. Annals of Operations Research.
Gabow, H. N., & Tarjan, R. E. (1989). Faster scaling algorithms for network problems. SIAM Journal on Computing.
Goldberg, A. V., & Kennedy, R. (1995). An efficient cost scaling algorithm for the assignment problem. Mathematical Programming.
Orlin, J. B., & Ahuja, R. K. (1992). New scaling algorithms for the assignment and minimum mean cycle problems. Mathematical Programming.
Ramshaw, L., & Tarjan, R. E. (2012). On minimum-cost assignments in unbalanced bipartite graphs. HP Labs Technical Report.
Goldberg, A. V., & Tarjan, R. E. (1988). A new approach to the maximum-flow problem. Journal of the ACM.
Murty, K. G. (1968). An algorithm for ranking all assignments in order of increasing cost. Operations Research.
Cuturi, M. (2013). Sinkhorn distances: Lightspeed computation of optimal transport. NeurIPS.
Burkard, R., Dell’Amico, M., & Martello, S. (2009). Assignment Problems. SIAM.