Huang et al. study adversarial attacks on reinforcement learning policies. One of the main problems, in contrast to supervised learning, is that there might not be a reward in any time step, meaning there is no clear objective to use. However, this is essential when crafting adversarial examples as they are mostly based on maximizing the training loss. To avoid this problem, Huang et al. assume a well-trained policy; the policy is expected to output a distribution over actions. Then, adversarial examples can be computed by maximizing the cross-entropy loss using the most-likely action as ground truth.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below or get in touch with me: