This article is meant as an ad-hoc response to Ben Recht’s recent blog series on whether we need conformal prediction intervals. I have been thinking a lot about the use of conformal prediction myself and this seems like a good opportunity to share some thoughts and learnings from working on conformal prediction the past few years.
Last week, I presented our work on Monte Carlo conformal prediction — conformal prediction with ambiguous and uncertain ground truth — at the Vanderbilt Machine Learning Seminar Series. In this work, we show how to adapt standard conformal prediction if there are no unique ground truth labels available due to disagreement among experts during annotation. In this article, I want to share the slides of my talk.
I had the pleasure to present our work on evaluating and calibrating with uncertain ground truth at the seminar series of the PRECISE center at the University of Pennsylvania. Besides talking about our recent papers on evaluating AI models in health with uncertain ground truth and conformal prediction with uncertain ground truth, I also got to learn more about the research at PRECISE through post-doc and student presentations. In this article, I want to share the corresponding slides.
Conformal prediction uses a held-out, labeled set of examples to calibrate a classifier to yield confidence sets that include the true label with user-specified probability. But what happens if even experts disagree on the ground truth labels. Commonly, this is resolved by taking the majority voted label from multiple expert. However, in difficult and ambiguous tasks, the majority voted label might be misleading and a bad representation of the underlying true posterior distribution. In this paper, we introduce Monte Carlo conformal prediction which allows to perform conformal calibration directly against expert opinions or aggregate statistics thereof.
In supervised machine learning, we usually assume access to ground truth label for evaluation. In many applications, however, these ground truth labels are derived from expert opinions. Disagreement among these experts is typically ignored using simple majority voting or averaging. Unfortunately, this can have severe consequences by over-estimating performance or mis-guiding model selection. In our work presented in this article, we tackle this problem by introducing a statistical framework for aggregating expert opinions.
While attending the Heidelberg Laureate Forum this year, I got to meet Letitia Parcalabescu who is running a YouTube channel called the AI Coffee Break. Among other topics, we talked abou my PhD research on adversarial robustness. Part of our conversasion can now be found on her YouTube channel.
Similar to my article series on adversarial robustness, I was planning to have a series on bit errors robustness accompanied by PyTorch code. Instead, due to time constraints, I decided to condense the information into a single article. The code for the originally planned six articles is available on GitHub.
This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year!
In September, I received the DAGM MVTec dissertation award 2023 for my PhD thesis. DAGM is the German association for pattern recognition and organizes the German Conference on Pattern Recognition (GCPR) which is Germany’s prime conference for computer vision and related research areas. I feel particularly honored by this award since my academic career started with my first paper published as part of the young researcher forum at GCPR 2015 in Aachen.
Another alternative to the regular Lp-constrained adversarial examples that is additionally less visible than adversarial patches or frames are adversarial transformations such as small crops, rotations and translations. Similar to Lp adversarial examples, adversarial transformations are often less visible unless the original image is available for direct comparison. In this article, I will include a PyTorch implementation and some results against adversarial training.