I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, Y. Bengio. Maxout Networks. ICML, 2013.

Goodfellow et al. propose maxout units as better alternative to rectified linear units (ReLUs) when using dropout. A maxout unit basically represents max pooling across channels:

$h_i(x) = \max_{j \in [1, k]} z_{ij}$

$z_{ij} = x^T W_{\cdot ij} + b_{ij}$

with $W \in \mathbb{R}^{d \times m \times k}$ and $b \in \mathbb{R}^{m \times k}$. When training with dropout, dropout is applied prior to the multiplication by the weights. They also provide a proof that maxout networks are universal approximators and the beneficial properties of maxout units for performance and training are shown experimentally on several datasets.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.