Humanly certifying superhuman classifiers

Qiongkai Xu, Christian Walder, Chenchen Xu

Research output: Chapter in Book/Report/Conference proceedingConference proceeding contributionpeer-review


This paper addresses a key question in current machine learning research: if we believe that a model’s predictions might be better than those given by human experts, how can we (humans) verify these beliefs? In some cases, this “superhuman” performance is readily demonstrated; for example by defeating top-tier human players in traditional two player games. On the other hand, it can be challenging to evaluate classification models that potentially surpass human performance. Indeed, human annotations are often treated as a ground truth, which implicitly assumes the superiority of the human over any models trained on human annotations. In reality, human annotators are subjective and can make mistakes. Evaluating the performance with respect to a genuine oracle is more objective and reliable, even when querying the oracle is more expensive or sometimes impossible. In this paper, we first raise the challenge of evaluating the performance of both humans and models with respect to an oracle which is unobserved. We develop a theory for estimating the accuracy compared to the oracle, using only imperfect human annotations for reference. Our analysis provides an executable recipe for detecting and certifying superhuman performance in this setting, which we believe will assist in understanding the stage of current research on classification. We validate the convergence of the bounds and the assumptions of our theory on carefully designed toy experiments with known oracles. Moreover, we demonstrate the utility of our theory by meta-analyzing large-scale natural language processing tasks, for which an oracle does not exist, and show that under our mild assumptions a number of models from recent years have already achieved superhuman performance with high probability—suggesting that our new oracle based performance evaluation metrics are overdue as an alternative to the widely used accuracy metrics that are naively based on imperfect human annotations.
Original languageEnglish
Title of host publicationThe Eleventh International Conference on Learning Representations
Subtitle of host publicationICLR 2023
Place of PublicationAppleton, WI
Publication statusSubmitted - 2 Feb 2023
Externally publishedYes
EventInternational Conference on Learning Representations (11th : 2023) - Hybrid
Duration: 1 May 20235 May 2023
Conference number: 11th


ConferenceInternational Conference on Learning Representations (11th : 2023)
Abbreviated titleICLR 2023
Internet address


Dive into the research topics of 'Humanly certifying superhuman classifiers'. Together they form a unique fingerprint.

Cite this