Pascanu et al. discuss the problems of exploding and vanishing gradients for recurrent neural networks. While these problems where first and foremost discussed in the context of recurrent neural networks and backpropagation through time, the presented solutions, e.g. gradient clipping, are applicable to general (convolutional) neural networks as well (e.g. , ). Based on the assumption of a first-order method for optimization they also give sufficient and necessary conditions for the two problems based ont he eigenvalues of the involved weight matrices. Without dsicussing their regularization approach to gradient vanishing, gradient explosion is mitigated using gradient clipping. This means that gradients are clipped when exceeding a pre-defined threshold. See the paper for details.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: