Szegedy et al. propose GoogleNet, based on many of the discussed Inception modules, for image recognition and object detection. Their main contribution is the so-called Inception module, motivated by the work of Aurora et al. . The inception module is shown in Figure 1 where $1 \times 1$ convolutional layers in front of the $3 \times 3$ and $5 \times 5$ convolutional layers are supposed to reduce the dimensionality. There are two main motivations of the Inception module (as I see it):
The concrete incarnation of the Inception idea is presented and called GoogleNet. Using GoogleNet, trained on the ImageNet dataset, they show improved performance on ImageNet.