Howard discusses several approaches of improving the performance of deep networks on the ImageNet dataset, based on the model of []. These approaches are mostly concerned with data augmentation for training and ensemble prediction for testing:

  • Data augmentation: In addition to the random crops, horizontal flipping and random lighting changes employed in [], Howard uses brightness, contrast and color manipulations (unfortunately, the details have been omitted). Furthermore, instead of cropping to images to the training size, Howard only re-scales the smallest size and then selects random crops withing the remaining image.
  • Testing: Howard averages the predictions of several different inputs. Therefore, $90$ transformations are considered, including crops, translations and scales. A greedy algorithm is used to select a subset of these 90 transformations yielding the best results.
