Aditya Makkar
Research

I am broadly interested in theoretical machine learning, nonparametric statistics and probability theory. The following are some of the topics I am working on right now:

Statistical Optimal Transport

Suppose \(X\) and \(Y\) are \(\mathbb R^d-\)valued random vectors on a probability space \((\Omega, \mathscr F, \mathbb P)\) with distributions \(\mu := \mathbb P \circ X^{-1}\) and \(\nu := \mathbb P \circ Y^{-1}\) respectively. It is often of interest to find, if it exists, a measurable map \(T \colon \mathbb R^d \to \mathbb R^d\) such that the pushforward \(T_{\#} \mu := \mu \circ T^{-1}\) of \(\mu\) under \(T\) equals \(\nu.\) It is possible that no such map exists, for example, if \(\mu = \delta_{x_0}\) for some \(x_0 \in \mathbb R^d\) and \(\nu\) does not equal a Dirac measure, because for any measurable \(T\colon \mathbb R^d \to \mathbb R^d\) the pushforward \(T_{\#}\mu\) is \(\delta_{T(x_0)}.\)

On the other hand, if multiple such maps exist, then which one should we prefer? In the Monge's formulation of optimal transport problem, we have a measurable cost function \(c \colon \mathbb R^d \times \mathbb R^d \to [0, \infty]\), and we want to find an optimal map

\[\begin{aligned} T_0 \in \argmin_{T_{\#}\mu = \nu} \int_{\mathbb R^d} c(x, T(x)) \, \mathrm{d}\mu(x),\end{aligned}\]
provided, of course, that it exists. It is not obvious when such a map exists, however, the following refinement by (McCann, 1995) of the result discovered independently by (Knott and Smith, 1984) and by (Brenier, 1987) tells us when an optimal map exists, assuming quadratic cost. As a piece of terminology, we call a set \(A \subseteq \mathbb R^d\;\)small if it has a Hausdorff dimension at most \(d-1.\) So, for example, Lebesgue-negligible sets are small.

Theorem: Let \(\mu\) and \(\nu\) be two Borel probability measures on \(\mathbb R^d\), with \(d\) a positive integer. Suppose that \(\mu\) does not give mass to small sets, and that the cost function \(c\colon \mathbb R^d \times \mathbb R^d \to [0, \infty]\) is the squared Euclidean norm \(c(x,y) = \left\lVert x-y\right\rVert^2.\) Then there is exactly one (upto \(\mu-\)a.e. of course) measurable map \(T \colon \mathbb R^d \to \mathbb R^d\) such that \(T_{\#} \mu = \nu\), and moreover \(T = \nabla \varphi\) for some convex function \(\varphi \colon \mathbb R^d \to \mathbb R.\)

In statistical optimal transport, the explicit form of the measures \(\mu\) or \(\nu\) is unknown, and instead only independent samples from them are observed. To that end, let \(X_1, X_2, \ldots\) be independent copies of \(X,\) and let \(Y_1, Y_2, \ldots\) be independent copies of \(Y.\) Let \(\mu_n := \frac{1}{n} \sum_{i=1}^n \delta_{X_i}\) denote the empirical measure based on the sample \((X_1, \ldots, X_n)\) of the first \(n\) observations, and likewise for \(\nu_n.\)

Deep Learning

References

Professional Activities
Reviewer

NeurIPS 2022, AISTATS 2022, ICLR 2022, NeurIPS 2021, AAAI 2021, ICML 2020

Teaching

Fall 2021, Spring 2021 — Masters level course on introduction to machine learning