14^{th}FEBRUARY2017

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu,David Warde-Farley, Sherjil Ozair, Aaron C. Courville, Yoshua Bengio. *Generative Adversarial Networks*. CoRR, abs/1406.2661.

What is **your opinion** on the summarized work? Or do you know related work that is of interest? **Let me know** your thoughts in the comments below:

Goodfellow et al. introduce adversial nets, a generative, neural network based model. The idea is quite simple, which makes the approach even more interesting. Two networks — a generator and a discriminator — are trained in a minimax fashion. Concretely, the generator network tries to learn a data distribution $p_{\text{data}}$ given noise input from a noise distribution $p_z$. The discriminator network tries to distinguish samples from the true distribution $p_{\text{data}}$ from samples generated by the generator network. The corresponding objectives looks as follows:

$\min_G \max_D V(D,G) = E_{x \sim p_{\text{data}}(x)}[\log D(x)] + E_{x \sim p_z(z)}[\log(1 - D(G(z)))]$(1)

Here, it is also clear what is meant by "minimax fashion". In Equation (1), $D$ denotes the discriminator network, which tries to detect samples form $p_{\text{data}}$ with high confidence (i.e. $D(x)$ close to $1$ such that $\logā” D(x)$ is close to zero). The generator network $G$ tries to fool the discriminator network into thinking that the generated samples come from $p_{\text{data}}$. The simplest way for doing this, of course, is to imitate (i.e. learn) $p_{\text{data}}$.

For solving Equation (1), Goodfellow et al. alternatively optimize for $D$ given that $G$ is fixed (for a fixed number of $k$ iterations) and then optimize for $G$ assuming $D$ being fixed. The details are summarized in Algorithm 1.

Algorithm 1: Mini-batch, stochastic gradient descent training for generative adversarial nets. Note that $\theta_d$ denote the parameters of the discriminator and $\theta_g$ denote the parameters of the generator.

Interestingly, the evaluation of such generative models does not seem to be fully worked out (at least at the time of writing). Goodfellow et al. state that "this method of estimating the likelihood has somewhat high variance and does not perform well in high dimensional spaces but it is the best we know of". In particular, they resort to the approach in [1].

Quickly generating representative samples from an RBM-derived process. Neural Computation, 23 (8), 2011.