IAM

Check out our CVPR'18 paper on weakly-supervised 3D shape completion — and let me know your opinion! @david_stutz
24thFEBRUARY2018

READING

Daniel Jiwoong Im, Sungjin Ahn, Roland Memisevic, Yoshua Bengio. Denoising Criterion for Variational Auto-Encoding Framework. AAAI, 2017.

Im et al. introduce a denoising criterion for variational auto-encoders. When considering the model of a variational auto-encoder, i.e. $p(z)$ being a prior on a latent code, $p_\theta(x|z)$ being a generator and $q_\psi(z|x)$ being an inference model, Im et al. Additinally consider a corruption process. In particular, $q_\psi(z|\widetilde{x})$ replaces the inference model and $p(\widetilde{x}|x)$ describes the corruption process. Considering the objective optimized by variational auto-encoders, i.e.

$\log p_\theta(x) \geq E_{q(z|x)}\left[\log\frac{p_\theta(x,z)}{q_\psi(z|x)}\right]$,

they show that it holds:

$\log p_\theta(x) \geq \mathcal{L}_{dvae} \geq \mathcal{L}_{cvae}$

where

$\mathcal{L}_{cvae} = E_{\widetilde{q}(z|x)}\left[\log\frac{p_\theta(x,z)}{q_\psi(z|\widetilde{x})}\right]$

$\mathcal{L}_{dvae} = E_{\widetilde{q}(z|x)}\left[\log\frac{p_\theta(x,z)}{q_\psi(z|\widetilde{x})}\right]$

for the distribution $\widetilde{q}\psi(z|x) = \int q_\psi(z|\widetilde{x})p(\widetilde{x}|x)d\widetilde{x}$. In other words, instead of maximizing $\mathcal{L}_{cvae}$, $\mathcal{L}_{dvae}$ can be maximized and might even result in a tighter lower bound. In words, and regarding the practical implementation, this means that it is sufficient to adapt the general training procedure by sampling a corrupted input $\widetilde{x} \sim p(\widetilde{x} |x)$ before sampling the code $z \sim q(z|x)$ and the reconstruction from $p_\theta|z)$. Note that the proof can be found in the appendix of the paper and Im et al. Additionally provide a good discussion of how the objective can be interpreted.

 

They provide experiments on the MNIST and Frey Faces datasets. For the MNIST dataset, treating the input as binary data, salt and pepper noise was employed. For the Frey Faces dataset, treated as real-valued data, a Gaussian noise distribution was chosen.

 

Code of the proposed denoising variational auto-encoder can be found on GitHub: jiwoongim/DVAE.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: