Xu et al. investigate the performance of different types of rectified linear activation functions on the Cifar-10 and Cifar-100 datasets. To this end, they consider the general rectified linear unit (ReLU), the leaky ReLU (LReLU), the parametric ReLU (PReLU) and the randomized ReLU (RReLU). The first three types are illustrated in Figure 1. The latter case, i.e. PReLU, is a leaky rectified linear unit where the amount of leakage is learned during training using backpropagation.
Their experiments show mixed results (best examined in the paper using Figures 2 - 4 as the corresponding discussion is very limited). On the one hand, e.g. on Cifar-10, LReLU with significant leakage does not result in better error curves, while on Cifar-100, leakage seems to be beneficial. Still, across the board, they conclude that the original ReLU is outperformed by all three additional types. However, the experiments are quite limited.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: