Probabilities of False Claims Toolkit

Possible use cases

In this section, we present two standard scenarios where Claims Reloaded provides substantial value: one tailored to reviewers evaluating performance results in scientific manuscripts, and one tailored to researchers analyzing their own benchmarking outcomes. Both groups face a shared challenge revealed in our study: a large proportion of outperformance claims in medical imaging AI are insufficiently supported and may arise by chance rather than reflect true methodological innovation.

You can benefit from Claims Reloaded if ...

… you are a reviewer evaluating a manuscript that claims state-of-the-art performance:

Our study showed that although most medical imaging AI papers introducing novel methods claim outperformance, only a small proportion support these claims with statistical testing. Combined with the widespread reliance on mean performance metrics without reporting variability, this leads to a high probability of false claims of model superiority in both classification and segmentation tasks.

As a reviewer, you can use our tool to quickly assess whether the reported improvements in a manuscript are statistically meaningful or likely due to random fluctuations, small test sets, or high model congruence.
… you are a researcher benchmarking novel algorithms:

Our study showed that typical performance differences claimed in medical imaging AI papers are very small (median ≈ 0.01 for both Accuracy and DSC), and with commonly used datasets these differences are likely to occur by chance.

By integrating our Bayesian approach into the development pipeline, researchers can avoid drawing incorrect conclusions and can report their findings with transparency and statistical credibility.

Sample Report and Further Information

You can download a sample report created by Claims Reloaded to explore its capabilities. For detailed information about the underlying methodology, you may continue to the “How does it work?” page:

How does it work?