Hendrycks and Dietterich propose ImageNet-C and ImageNet-P benchmarks for corruption and perturbation robustness evaluation. Both datasets come in various sizes, and corruptions always come in different difficulties. The used corruptions include many common, realistic noise types such as various types of blur and random noise, brightness changes and compression artifacts. ImageNet-P differs from ImageNet-C in that sequences of perturbations are generated. This means, for a specific perturbation type, 30 different frames are generated; thus, less corruption types in total are used. The remainder of the paper introduces various evaluation metrics; these are usually based on the fact that the label of the corrupted image did not change. Finally, they also highlight some approaches to obtain more “robust” models against these corruptions. The list includes a variant of histogram equalization that is used to normalize the input images, the use of multi-scale or feature aggregation architectures and, surprisingly, adversarial logit pairing. Examples of ImageNet-C images can be found in Figure 1.
Figure 1: Examples of images in ImageNet-C.