Claims
Reloaded

An online tool for estimating the probability of false outperformance claims

Claims Reloaded Mission

mission

The mission of Claims Reloaded is to provide an open-source toolkit for assessing the validity of claims of superior performance in biomedical segmentation and classification tasks, enabling researchers and reviewers to detect unsupported claims.

Assessing the reliability of performance claims

Performance comparisons are fundamental in medical imaging AI research, often driving claims of superiority based on relative improvements in common performance metrics. However, such claims frequently rely on empirical mean performance. Claims Reloaded quantifies the probability of false claims based on a Bayesian approach that leverages reported results alongside empirically estimated model congruence to estimate whether the relative ranking of methods is likely to have occurred by chance.

model_winner_illustration

Generate a report in 4 simple steps

1. Select your scenario

Choose whether you are applying the researcher or the reviewer workflow, depending on your evaluation objectives.

2. Specify the task type

Indicate whether your analysis concerns a classification or segmentation task to ensure correct methodological handling.

3. Provide your data

Researchers can upload score-level data in CSV format, containing the per-case performance values (sample file). Reviewers can directly enter the required experiment parameters.

4. Generate the report

Claims Reloaded processes the provided inputs and automatically generates a structured, statistically grounded report. You can then download the final document.

Core Funding

Claims Reloaded Publication

"Claims Reloaded" is based on the publication "False Promises in Medical Imaging AI? Assessing Validity of Outperformance Claims." The goal of the paper is to assess whether reported outperformance in medical imaging AI genuinely reflect superior methods. Although most studies claim improvements based on average metrics, these comparisons often overlook statistical uncertainty. They find that while over 80% of papers claim outperformance, many such claims are fragile: 86% of classification papers and 53% of segmentation papers have more than a 5% probability of being false. This highlights a major weakness in current benchmarking practices, where many reported advances lack solid statistical support.

Please cite our paper if you use our online tool. For more information:

Publication and Citation