Cao et al. propose KARMA, a method to defend against data poisening in an online learning system where training examples are obtained through crowdsourcing. The setting, however, is somewhat constrained and can be described as human-in-the-loop. In particular, there is the system, which is maintained by an administrator, and there are users – among them there might be users with malicious intents, i.e. attackers. KARMA consists of two steps: identifying (possibly polluted) training examples that cause mis-classification of samples within a small oracle set; and then correcting these problems by removing clusters of polluted samples.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: