IAM

FEBRUARY2017

READING

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio. Generative Adversarial Networks. CoRR, abs/1406.2661.

Goodfellow et al. introduce adversial nets, a generative, neural network based model. The idea is quite simple, which makes the approach even more interesting. Two networks — a generator and a discriminator — are trained in a minimax fashion. Concretely, the generator network tries to learn a data distribution $p_{\text{data}}$ given noise input from a noise distribution $p_z$. The discriminator network tries to distinguish samples from the true distribution $p_{\text{data}}$ from samples generated by the generator network. The corresponding objectives looks as follows:

$\min_G \max_D V(D,G) = E_{x \sim p_{\text{data}}(x)}[\log D(x)] + E_{x \sim p_z(z)}[\log(1 - D(G(z)))]$(1)

Here, it is also clear what is meant by "minimax fashion". In Equation (1), $D$ denotes the discriminator network, which tries to detect samples form $p_{\text{data}}$ with high confidence (i.e. $D(x)$ close to $1$ such that $\logā” D(x)$ is close to zero). The generator network $G$ tries to fool the discriminator network into thinking that the generated samples come from $p_{\text{data}}$. The simplest way for doing this, of course, is to imitate (i.e. learn) $p_{\text{data}}$.

For solving Equation (1), Goodfellow et al. alternatively optimize for $D$ given that $G$ is fixed (for a fixed number of $k$ iterations) and then optimize for $G$ assuming $D$ being fixed. The details are summarized in Algorithm 1.

for $t = 1,\ldots$
    for $i = 1,\ldots,k$
        sample mini-batch $\{z^{(1)},\ldots,z^{(m)}\}$ from $p_g$
        sample mini-batch $\{x^{(1)},\ldots,x^{(m)}\}$ from $p_{\text{data}}$
        update the discriminator by gradient ascent: $\nabla_{\theta_d} \frac{1}{m} \sum_{i = 1}^m \left[\log D(x^{(i)}) + \log (1 - D(G(z^{(i)})))\right]$
    sample mini-batch $\{z^{(1)}, \ldots, z^{(m)}\}$ from $p_g$
    update the generator by gradient descent: $\nabla_{\theta_g} \frac{1}{m} \sum_{i = 1}^m \log(1 - D(G(z^{(i)})))$

Algorithm 1: Mini-batch, stochastic gradient descent training for generative adversarial nets. Note that $\theta_d$ denote the parameters of the discriminator and $\theta_g$ denote the parameters of the generator.

Interestingly, the evaluation of such generative models does not seem to be fully worked out (at least at the time of writing). Goodfellow et al. state that "this method of estimating the likelihood has somewhat high variance and does not perform well in high dimensional spaces but it is the best we know of". In particular, they resort to the approach in [1].

  • [1] X. Breleux, Y. Bengio, P. Vincent. Quickly generating representative samples from an RBM-derived process. Neural Computation, 23 (8), 2011.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.