Szegedy et al. propose GoogleNet, based on many of the discussed Inception modules, for image recognition and object detection. Their main contribution is the so-called Inception module, motivated by the work of Aurora et al. . The inception module is shown in Figure 1 where $1 \times 1$ convolutional layers in front of the $3 \times 3$ and $5 \times 5$ convolutional layers are supposed to reduce the dimensionality. There are two main motivations of the Inception module (as I see it):
The concrete incarnation of the Inception idea is presented and called GoogleNet. Using GoogleNet, trained on the ImageNet dataset, they show improved performance on ImageNet.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: