Howard discusses several approaches of improving the performance of deep networks on the ImageNet dataset, based on the model of []. These approaches are mostly concerned with data augmentation for training and ensemble prediction for testing:
Data augmentation: In addition to the random crops, horizontal flipping and random lighting changes employed in [], Howard uses brightness, contrast and color manipulations (unfortunately, the details have been omitted). Furthermore, instead of cropping to images to the training size, Howard only re-scales the smallest size and then selects random crops withing the remaining image.
Testing: Howard averages the predictions of several different inputs. Therefore, $90$ transformations are considered, including crops, translations and scales. A greedy algorithm is used to select a subset of these 90 transformations yielding the best results.
[] A. Krizhevsky, I. Sutskever, G. Hinton. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 2012.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.
Howard discusses several approaches of improving the performance of deep networks on the ImageNet dataset, based on the model of []. These approaches are mostly concerned with data augmentation for training and ensemble prediction for testing: