R. Pascanu, T. Mikolov, Y. Bengio. On the difficulty of training recurrent neural networks. ICML, 2013.

Pascanu et al. discuss the problems of exploding and vanishing gradients for recurrent neural networks. While these problems where first and foremost discussed in the context of recurrent neural networks and backpropagation through time, the presented solutions, e.g. gradient clipping, are applicable to general (convolutional) neural networks as well (e.g. [], []). Based on the assumption of a first-order method for optimization they also give sufficient and necessary conditions for the two problems based ont he eigenvalues of the involved weight matrices. Without dsicussing their regularization approach to gradient vanishing, gradient explosion is mitigated using gradient clipping. This means that gradients are clipped when exceeding a pre-defined threshold. See the paper for details.

  • [] I. J. Goodfellow, Y. Bengio, A. C. Courville. Deep Learning. MIT Press, 2016.
  • [] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going Deeper with Convolutions. CoRR, 2014.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.