IAM

05thOCTOBER2019

READING

Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens. Adding Gradient Noise Improves Learning for Very Deep Networks. CoRR abs/1511.06807 (2015).

Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration:

$\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$

where the variance $\sigma^2$ is adapted throughout training as follows:

$\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$

where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization.

Also find this summary on ShortScience.org.

What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: