"Maxout Networks", Goodfellow et al. • David Stutz

APRIL2017

READING

I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, Y. Bengio. Maxout Networks. ICML, 2013.

DEEP LEARNING

Goodfellow et al. propose maxout units as better alternative to rectified linear units (ReLUs) when using dropout. A maxout unit basically represents max pooling across channels:

$h_i(x) = \max_{j \in [1, k]} z_{ij}$

$z_{ij} = x^T W_{\cdot ij} + b_{ij}$

with $W \in \mathbb{R}^{d \times m \times k}$ and $b \in \mathbb{R}^{m \times k}$. When training with dropout, dropout is applied prior to the multiplication by the weights. They also provide a proof that maxout networks are universal approximators and the beneficial properties of maxout units for performance and training are shown experimentally on several datasets.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.

IAM

DAVIDSTUTZ

READING

SEARCHTHEBLOG

ARCHIVES

TAGS