# DAVIDSTUTZ

12thAPRIL2017

$h_i(x) = \max_{j \in [1, k]} z_{ij}$
$z_{ij} = x^T W_{\cdot ij} + b_{ij}$
with $W \in \mathbb{R}^{d \times m \times k}$ and $b \in \mathbb{R}^{m \times k}$. When training with dropout, dropout is applied prior to the multiplication by the weights. They also provide a proof that maxout networks are universal approximators and the beneficial properties of maxout units for performance and training are shown experimentally on several datasets.