Research

I am broadly interested in theoretical machine learning, nonparametric statistics and probability theory. The following are some of the topics I am working on right now:

Statistical Optimal Transport

Suppose \(X\) and \(Y\) are \(\mathbb R^d-\)valued random vectors on a probability space \((\Omega, \mathscr F, \mathbb P)\) with distributions \(\mu := \mathbb P \circ X^{-1}\) and \(\nu := \mathbb P \circ Y^{-1}\) respectively. It is often of interest to find, if it exists, a measurable map \(T \colon \mathbb R^d \to \mathbb R^d\) such that the pushforward \(T_{\#} \mu := \mu \circ T^{-1}\) of \(\mu\) under \(T\) equals \(\nu.\) It is possible that no such map exists, for example, if \(\mu = \delta_{x_0}\) for some \(x_0 \in \mathbb R^d\) and \(\nu\) does not equal a Dirac measure, because for any measurable \(T\colon \mathbb R^d \to \mathbb R^d\) the pushforward \(T_{\#}\mu\) is \(\delta_{T(x_0)}.\)

On the other hand, if multiple such maps exist, then which one should we prefer? In the Monge's formulation of optimal transport problem, we have a measurable cost function \(c \colon \mathbb R^d \times \mathbb R^d \to [0, \infty]\), and we want to find an optimal map

\[\begin{aligned} T_0 \in \argmin_{T_{\#}\mu = \nu} \int_{\mathbb R^d} c(x, T(x)) \, \mathrm{d}\mu(x),\end{aligned}\]

provided, of course, that it exists. It is not obvious when such a map exists, however, the following refinement by (McCann, 1995) of the result discovered independently by (Knott and Smith, 1984) and by (Brenier, 1987) tells us when an optimal map exists, assuming quadratic cost. As a piece of terminology, we call a set \(A \subseteq \mathbb R^d\;\)small if it has a Hausdorff dimension at most \(d-1.\) So, for example, Lebesgue-negligible sets are small.

Theorem: Let \(\mu\) and \(\nu\) be two Borel probability measures on \(\mathbb R^d\), with \(d\) a positive integer. Suppose that \(\mu\) does not give mass to small sets, and that the cost function \(c\colon \mathbb R^d \times \mathbb R^d \to [0, \infty]\) is the squared Euclidean norm \(c(x,y) = \left\lVert x-y\right\rVert^2.\) Then there is exactly one (upto \(\mu-\)a.e. of course) measurable map \(T \colon \mathbb R^d \to \mathbb R^d\) such that \(T_{\#} \mu = \nu\), and moreover \(T = \nabla \varphi\) for some convex function \(\varphi \colon \mathbb R^d \to \mathbb R.\)

In statistical optimal transport, the explicit form of the measures \(\mu\) or \(\nu\) is unknown, and instead only independent samples from them are observed. To that end, let \(X_1, X_2, \ldots\) be independent copies of \(X,\) and let \(Y_1, Y_2, \ldots\) be independent copies of \(Y.\) Let \(\mu_n := \frac{1}{n} \sum_{i=1}^n \delta_{X_i}\) denote the empirical measure based on the sample \((X_1, \ldots, X_n)\) of the first \(n\) observations, and likewise for \(\nu_n.\)

Deep Learning

References

Brenier, Y.: Decomposition polaire et rearrangement monotone des champs de vecteurs. C. R. Acad. Sci. Paris Ser. I Math., 305:805–808, 1987.
Knott, M. and Smith, C. S.: On the optimal mapping of distributions. Journal of Optimization Theory and Applications, 43(1):39–49, 1984.
McCann, R.J.: Existence and uniqueness of monotone measure-preserving maps. Duke Mathematical Journal, 80(2):309 – 323, 1995.
Santambrogio, F.: Optimal Transport for Applied Mathematicians. Progress in Nonlinear Differential Equations and Their Applications, vol. 87. Birkhäuser/Springer, Cham (2015). Calculus of variations, PDEs, and modeling.
Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003).

Statistical Optimal Transport

Deep Learning

References

Professional Activities

Reviewer

Teaching