I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, Y. Bengio. Maxout Networks. ICML, 2013.

Goodfellow et al. propose maxout units as better alternative to rectified linear units (ReLUs) when using dropout. A maxout unit basically represents max pooling across channels:

$h_i(x) = \max_{j \in [1, k]} z_{ij}$

$z_{ij} = x^T W_{\cdot ij} + b_{ij}$

with $W \in \mathbb{R}^{d \times m \times k}$ and $b \in \mathbb{R}^{m \times k}$. When training with dropout, dropout is applied prior to the multiplication by the weights. They also provide a proof that maxout networks are universal approximators and the beneficial properties of maxout units for performance and training are shown experimentally on several datasets.

