Skip to Main content Skip to Navigation
Theses

Leveraging regularization, projections and elliptical distributions in optimal transport

Abstract : Comparing and matching probability distributions is a crucial in numerous machine learning (ML) algorithms. Optimal transport (OT) defines divergences between distributions that are grounded on geometry: starting from a cost function on the underlying space, OT consists in finding a mapping or coupling between both measures that is optimal with respect to that cost. The fact that OT is deeply grounded in geometry makes it particularly well suited to ML. Further, OT is the object of a rich mathematical theory. Despite those advantages, the applications of OT in data sciences have long been hindered by the mathematical and computational complexities of the underlying optimization problem. To circumvent these issues, one approach consists in focusing on particular cases that admit closed-form solutions or that can be efficiently solved. In particular, OT between elliptical distributions is one of the very few instances for which OT is available in closed form, defining the so-called Bures-Wasserstein (BW) geometry. This thesis builds extensively on the BW geometry, with the aim to use it as basic tool in data science applications. To do so, we consider settings in which it is alternatively employed as a basic tool for representation learning, enhanced using subspace projections, and smoothed further using entropic regularization. In a first contribution, the BW geometry is used to define embeddings as elliptical probability distributions, extending on the classical representation of data as vectors in R^d.In the second contribution, we prove the existence of transportation maps and plans that extrapolate maps restricted to lower-dimensional projections, and show that subspace-optimal plans admit closed forms in the case of Gaussian measures.Our third contribution consists in deriving closed forms for entropic OT between Gaussian measures scaled with a varying total mass, which constitute the first non-trivial closed forms for entropic OT and provide the first continuous test case for the study of entropic OT. Finally, in a last contribution, entropic OT is leveraged to tackle missing data imputation in a non-parametric and distribution-preserving way.
Complete list of metadata

https://tel.archives-ouvertes.fr/tel-03084452
Contributor : Abes Star :  Contact
Submitted on : Monday, December 21, 2020 - 10:46:08 AM
Last modification on : Thursday, May 6, 2021 - 4:19:44 PM

File

91826_MUZELLEC_2020_archivage....
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-03084452, version 1

Collections

Citation

Boris Muzellec. Leveraging regularization, projections and elliptical distributions in optimal transport. Optimization and Control [math.OC]. Institut Polytechnique de Paris, 2020. English. ⟨NNT : 2020IPPAG009⟩. ⟨tel-03084452⟩

Share

Metrics

Record views

136

Files downloads

61