IAM

RESEARCH

Monte Carlo Conformal Prediction

Quick links: OpenReview | Pre-Print | Code | StatML Slides | Vanderbilt Machine Learning Seminar Slides

Abstract

Figure 1: Short summary of the motivation and results of Monte Carlo conformal prediction.

Conformal Prediction (CP) allows to perform rigorous uncertainty quantification by constructing a prediction set $C(X)$ satisfying $\mathbb{P}_{agg}(Y \in C(X))\geq 1-\alpha$ for a user-chosen $\alpha \in [0,1]$ by relying on calibration data $(X_1,Y_1),...,(X_n,Y_n)$ from $\mathbb{P}=\mathbb{P}_{agg}^{X} \otimes \mathbb{P}_{agg}^{Y|X}$. It is typically implicitly assumed that $\mathbb{P}_{agg}^{Y|X}$ is the ``true'' posterior label distribution. However, in many real-world scenarios, the labels $Y_1,...,Y_n$ are obtained by aggregating expert opinions using a voting procedure, resulting in a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$. This is the case for most datasets, even well-known ones like ImageNet. For such ``voted'' labels, CP guarantees are thus w.r.t. $\mathbb{P}_{vote}=\mathbb{P}_{agg}^X \otimes \mathbb{P}_{vote}^{Y|X}$ rather than the true distribution $\mathbb{P}_{agg}$. In cases with unambiguous ground truth labels, the distinction between $\mathbb{P}_{vote}$ and $\mathbb{P}_{agg}$ is irrelevant. However, when experts do not agree because of ambiguous labels, approximating $\mathbb{P}_{agg}^{Y|X}$ with a one-hot distribution $\mathbb{P}_{vote}^{Y|X}$ ignores this uncertainty. In this paper, we propose to leverage expert opinions to approximate $\mathbb{P}_{agg}^{Y|X}$ using a non-degenerate distribution $\mathbb{P}_{agg}^{Y|X}$. We then develop Monte Carlo CP procedures which provide guarantees w.r.t. $\mathbb{P}_{agg}=\mathbb{P}_{agg}^X \otimes \mathbb{P}_{agg}^{Y|X}$ by sampling multiple synthetic pseudo-labels from $\mathbb{P}_{agg}^{Y|X}$ for each calibration example $X_1,...,X_n$. In a case study of skin condition classification with significant disagreement among expert annotators, we show that applying CP w.r.t. $\mathbb{P}_{vote}$ under-covers expert annotations: calibrated for $72\%$ coverage, it falls short by on average $10\%$; our Monte Carlo CP closes this gap both empirically and theoretically. We also extend Monte Carlo CP to multi-label classification and CP with calibration examples enriched through data augmentation.

Download & Citing

The paper is available on OpenReview and ArXiv:

Paper on OpenReview Paper on ArXiv

@article{StutzTMLR2023,
    title={Conformal prediction under ambiguous ground truth},
    author={David Stutz and Abhijit Guha Roy and Tatiana Matejovicova and Patricia Strachan and Ali Taylan Cemgil and Arnaud Doucet},
    journal={Transactions on Machine Learning Research},
    issn={2835-8856},
    year={2023},
    url={https://openreview.net/forum?id=CAd6V2qXxc},
}

Code

The code for this paper can be found on GitHub:

Code on GitHub

It allows to reproduce the results from the paper on the toy dataset, on the multi-label MNIST variant and in the future also on the skin condition classification dataset; some of the included components:

  • The toy dataset from the paper
  • Implementation of plausibility regions as detailed in v1 of the paper on ArXiv
  • Implementations of standard and (ECDF-corrected) Monte Carlo conformal prediction
  • Several utilities to combine p-values

Updates

Dec 2023: The code is now available on GitHub.

Oct 2023: The paper was accepted at TMLR.