João F. Henriques, Andrea Vedaldi. Warped Convolutions: Efficient Invariance to Spatial Transformations. ICML, 2017.

Henriques and Vedaldi introduce warped convolutions meant to introduce additional invariances in convolutional neural networks. The presented idea is as simple as brilliant. Given the regular convolution on continuous images

$H(u;i) = \int I(x – u)F(x)dx = \int I(t(x))F(x)dx = H(t;I)$,

where $t$ may a an invertible transformation, this generalized convolution operation can be written as follows:

$H(u;I) = \int I(t_u(x))F(x)dx$

$=\int I(t_u(t_v(x_0)))F(t_v(x_0))\left|\frac{dt_v(x_0)}{dv}\right|$

$=\int I(t_{u + v}(t_v(x_0)))F(t_v(x_0))\left|\frac{dt_v(x_0)}{dv}\right|$

$\int \hat{I}(u + v) \hat{F}(v)dv$

This holds as long as $t_u$ and $t_v$ are additive (i.e. $t_u \circ t_v = t_{u + v}$) and for so-called pivot points $x_0$ for which $u \mapsto t_u(x_0)$ is bijective. In the context of convolutional neural networks, this means that the input image $I$ is warped and a transformed kernel is applied. However, as the kernels are learned anyway, the network can also directly learn $\hat{F}$.

Using a few simple examples, they demonstrate warped convolutions. E.g. scaling and aspect ratio changes can be implemented using the transformation

$t_u(x) = \left[\begin{array}x_1 s^{u_1}\\x_2 s^{u_2}\end{array}\right]$

where $s$ controls the degree of scaling. The exponential is necessary for the transformation to be additive. In addition, the domain has to be $\mathbb{R}_+^2$ in order to find a valid pivot point $x_0$.

What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.