Neelakantan et al. study gradient noise for improving neural network training. In particular, they add Gaussian noise to the gradients in each iteration:

$\tilde{\nabla}f = \nabla f + \mathcal{N}(0, \sigma^2)$

where the variance $\sigma^2$ is adapted throughout training as follows:

$\sigma^2 = \frac{\eta}{(1 + t)^\gamma}$

where $\eta$ and $\gamma$ are hyper-parameters and $t$ the current iteration. In experiments, the authors show that gradient noise has the potential to improve accuracy, especially given optimization.

