Carlini et al. propose several attacks to extract secrets form trained black-box models. Additionally, they show that state-of-the-art neural networks memorize secrets early during training. Particularly on the Penn treebank, after inserting a secret of specific format, the authors validate that the secret can be identified based on the models output probabilities (i.e., black-box access). Several metrics based on the log-perplexity of the secret show that secrets are memorized early during training and memorization happens for all popular architectures and training strategies; additionally, memorization also works for multiple secrets. Furthermore, the authors propose several attacks to extract secrets, most notably through shortest path search. Here, starting with an empty secret, the characters of the secret are identified sequentially in order to minimize log-perplexity. Using this attack, secrets such as credit card numbers are extractable from popular mail datasets.
What is your opinion on the summarized work? Or do you know related work that is of interest? Let me know your thoughts in the comments below: