The Mathematical Theory of Optimal Transport and Its Applications
Introduction
Optimal transport is a beautiful mathematical theory that addresses a fundamental question: What is the most efficient way to move mass from one distribution to another? Originally formulated by Gaspard Monge in 1781 in the context of earthworks, this theory has experienced a renaissance in recent decades and now impacts numerous fields from economics to machine learning.
Historical Development
Monge's Original Problem (1781)
Monge asked: Given a pile of soil (source) and an excavation to fill (target), what is the cheapest way to transport the soil? Formally, given two probability measures μ and ν, find a transport map T that pushes μ forward to ν while minimizing the total transport cost.
Kantorovich's Relaxation (1942)
Leonid Kantorovich generalized Monge's problem by allowing "mass splitting," transforming the problem into a linear programming formulation. This relaxation made the problem more tractable and earned Kantorovich the Nobel Prize in Economics in 1975.
Mathematical Formulation
The Monge Problem
Given: - Source measure μ on space X - Target measure ν on space Y - Cost function c(x,y) representing the cost of moving mass from x to y
Find a transport map T: X → Y that minimizes:
∫ c(x, T(x)) dμ(x)
subject to T#μ = ν (T pushes μ forward to ν)
The Kantorovich Problem
Instead of a deterministic map, consider transport plans γ (joint probability measures on X × Y with marginals μ and ν):
inf {∫∫ c(x,y) dγ(x,y) : γ ∈ Π(μ,ν)}
where Π(μ,ν) is the set of all couplings with marginals μ and ν.
Wasserstein Distances
When c(x,y) = d(x,y)^p for a metric d, the optimal transport cost defines the Wasserstein-p distance:
W_p(μ,ν) = (inf_{γ∈Π(μ,ν)} ∫∫ d(x,y)^p dγ(x,y))^(1/p)
This provides a natural metric on probability measures, turning the space of probability distributions into a metric space.
Key Theoretical Results
Brenier's Theorem (1991)
For the quadratic cost c(x,y) = |x-y|²/2 on ℝⁿ with absolutely continuous measures, there exists a unique optimal transport map, and it is the gradient of a convex function: T(x) = ∇φ(x).
Monge-Ampère Equation
The optimal transport map satisfies a nonlinear PDE called the Monge-Ampère equation:
det(D²φ(x)) · ν(∇φ(x)) = μ(x)
This connects optimal transport to the theory of fully nonlinear elliptic PDEs.
Benamou-Brenier Formula
The Wasserstein-2 distance can be computed via:
W_2²(μ,ν) = inf ∫₀¹ ∫ |v_t(x)|² dμ_t(x) dt
where the infimum is over velocity fields vt and curves μt connecting μ to ν.
Applications
1. Economics and Game Theory
- Matching problems: Optimal assignment of workers to jobs
- Hedonic pricing: Understanding how product attributes determine prices
- Market equilibrium: Analyzing competitive equilibria in matching markets
2. Machine Learning and Data Science
Generative Adversarial Networks (GANs) - Wasserstein GANs use optimal transport distances for more stable training - Provides meaningful loss functions even when distributions have disjoint supports
Domain Adaptation - Transporting knowledge from source to target domains - Color transfer between images using optimal transport maps
Clustering and Classification - Wasserstein barycenters for averaging distributions - Document classification using earth mover's distance
3. Image Processing and Computer Vision
Image Registration - Aligning medical images using optimal transport - Non-rigid image matching
Texture Synthesis - Generating textures by transporting exemplar distributions - Style transfer in neural networks
Shape Analysis - Comparing shapes via their mass distributions - Interpolation between shapes
4. Computational Biology
Single-Cell Genomics - Comparing cell populations across conditions - Trajectory inference in developmental biology - Waddington-OT for understanding cell differentiation
Population Genetics - Analyzing genetic drift using optimal transport - Comparing genomic distributions
5. Fluid Dynamics and Physics
Incompressible Euler Equations - Geometric formulation as geodesics in Wasserstein space - Understanding turbulence and fluid mixing
Plasma Physics - Particle transport in fusion reactors
6. Urban Planning and Logistics
Transportation Networks - Optimizing public transit routes - Facility location problems - Supply chain optimization
Traffic Flow - Modeling congestion using mean-field games on Wasserstein space
7. Statistics and Probability
Goodness-of-Fit Tests - Two-sample testing using Wasserstein distances - More powerful than traditional tests in high dimensions
Uncertainty Quantification - Comparing probability distributions in Bayesian inference - Robust optimization under distributional uncertainty
8. Gradient Flows and PDEs
Many important PDEs can be viewed as gradient flows in Wasserstein space: - Heat equation: Gradient flow of entropy - Fokker-Planck equation: Describes diffusion processes - Porous medium equation: Models groundwater flow
This perspective provides new analytical tools and numerical methods.
Computational Methods
Linear Programming
For discrete measures, optimal transport reduces to a linear program solvable by: - Simplex method - Network flow algorithms
Sinkhorn Algorithm
Adding entropic regularization enables fast computation: - Alternating projections (Sinkhorn-Knopp) - Complexity: O(n² log n) vs O(n³ log n) for linear programming - Widely used in machine learning applications
Semi-Discrete Transport
When one measure is discrete and one is continuous: - Reduces to solving a convex optimization problem - Applications in quantization and clustering
Recent Developments
Computational Optimal Transport
- GPU implementations of Sinkhorn algorithm
- Multi-scale methods for large problems
- Neural network parameterizations of transport maps
Unbalanced Optimal Transport
Relaxing the mass conservation constraint: - Hellinger-Kantorovich distance - Applications where sources and targets have different total mass
Optimal Transport on Graphs and Networks
- Discrete optimal transport for network data
- Applications in graph matching and network alignment
Quantum Optimal Transport
- Extending classical OT to quantum states
- Applications in quantum information theory
Challenges and Open Problems
- Computational Complexity: Exact computation scales poorly to high dimensions
- Curse of Dimensionality: Statistical estimation rates degrade in high dimensions
- Non-Euclidean Spaces: Extending theory to manifolds and metric spaces
- Dynamical Formulations: Understanding time-dependent optimal transport
- Stochastic Problems: Incorporating uncertainty in the transport problem
Conclusion
Optimal transport has evolved from an 18th-century engineering problem into a central tool in modern mathematics, connecting geometry, analysis, probability, and PDEs. Its applications span an impressive range of fields, from theoretical physics to practical machine learning. The theory continues to develop rapidly, driven by computational advances and new application domains.
The elegance of optimal transport lies in its ability to provide both: - Theoretical insights: Deep connections between different areas of mathematics - Practical tools: Efficient algorithms for real-world problems
As computational power increases and new applications emerge, optimal transport theory is likely to play an increasingly important role in data science, artificial intelligence, and scientific computing.