21^{th}JANUARY2017

I. Goodfellow, Y. Bengio, A. Courville. *Deep Learning*. Chapter 17, MIT Press, 2016.

What is **your opinion** on the summarized work? Or do you know related work that is of interest? **Let me know** your thoughts in the comments below or get in touch with me:

In chapter 17, after discussing structured predictions using graphical models in Chapter 16, Goodfellow et al. briefly introduce basic Monte Carlo methods for sampling. While these methods might not be new to most students in computer vision and machine learning, I found a repetition of the concepts quite useful. Nevertheless, I want to emphasize that there are more appropriate readings regarding the details of Monte Carlo methods.

The basic idea of Monte Carlo Sampling is NOT to sample from a distribution, but to approximate the expectation of a function $f(x)$ under a distribution $p(x)$. If it is possible to draw from $p(x)$, the basic approach is to use the estimator

$\hat{s}_n = \frac{1}{n} \sum_{i = 1}^n f(x^{(i)})$

with samples $x^{(i)}$ drawn from $p(x)$. This estimator is not biased, and given finite variance, i.e. $Var[f(x^{(i)})] < \infty$, the estimator converges to the true expected value for an increasing number of samples. If it is not possible to sample from $p(x)$, an alternative distribution $q(x)$ can be introduced and the same estimator can be used:

$\hat{s}_n = \frac{1}{n} \sum_{i = 1}^n \frac{p(x^{(i)}) f(x^{(i)})}{q(x^{(i)})}$ for $x^{(i)} \sim q$

The estimator is still unbiased, and the minimum variance is obtained for

$q^\ast(x) = \frac{p(x) |f(x)|}{Z}$

where $Z$ is the partition function. Generally, low variance is achieved for $q(x)$ being high whenever $p(x)|f(x)|$ is high.

Goodfellow et al. also briefly discuss Markov Chain Monte Carlo for sampling and Gibbs Sampling. However, I found that there are better resources to study these techniques.