Eric Jang, Shixiang Gu, Ben Poole. Categorical Reparameterization with Gumbel-Softmax. CoRR, 2016.

Jang et al. introduce the Gumbel Softmax distribution allowing to apply the reparameterization trick for Bernoulli distributions, as e.g. used in variational auto-encoders. Given a distribution $\pi = (\pi_1,\ldots,\pi_k)$ over classes $1,\ldots,k$, a categorical sample is assumed to be encoded in the one hot encoding, i.e. if the class is $i$ the vector $z \in \mathbb{R}^k$ is zero except for $z_i$ which is one. In order to apply the reparameterization trick, which has to be differentiable with respect to the input distribution, they first draw values $g_1, \ldots, g_k$ from the Gumbel distribution. The probability distribution function of a Gumbel distribution with parameters $\mu$ and $\beta$ has the form

$Gumbel(\mu, \beta) = \frac{1}{\beta}\exp(-\frac{x - \mu}{\beta} + \exp(-\frac{x - \mu}{\beta}))$.

For $g_i \sim Gumbel(0,1)$ categorical samples can be drawn as follows:

$z = one_hot(\arg\max_i g_i + \log \pi_i)$.

Note that $g_i \sim Gumbel(0,1)$ can be reparameterized as follows: $g_i = -\log(-\log(u_i))$ with $u_i \sim Uniform(0,1)$. In practice, the $\arg\max$ is approximated using the softmax function to sample vectors $y$:

$y_i = \frac{\exp(\frac{\log \pi_i + g_i}{\tau}}{\sum_{j = 1}^k \exp(\frac{\log \pi_j + g_j}{\tau}}$

The samples $z$ are then drawn from the so-called Gumbel-Softmax distribution which, for infinitely small $\tau$ approximates the Gumbel distribution. As the distribution is smooth for $\┼žau > 0$, it allows to appyl the reparameterization trick and sample near-categorical samples from a discrete distributions such as a Bernoulli distribution.

In experiments, they show that this technique allows to train variational auto-encoders with a discrete latent code, such as several Bernoulli variables.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.