Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens. Adding Gradient Noise Improves Learning for Very Deep Networks. CoRR abs/1511.06807 (2015).

Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration:

$\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$

where the variance $\sigma^2$ is adapted throughout training as follows:

$\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$

where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization.

Also find this summary on ShortScience.org.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.