False claims estimator
Using a Bayesian approach, this tool estimates the probability that a reported outperformance claim is false.
It evaluates whether the observed difference between two methods could have occurred by chance, given the test-set size
and the reported performance values. The methodology adapts to the type of task: for classification, it models
case-wise agreement patterns between methods; for segmentation, it incorporates both performance variability
and correlation between methods' per-case scores. The estimator currently supports Accuracy-based comparison for
classification and Dice Similarity Coefficient (DSC) values for segmentation.
Select the classification mode
Expected CSV format
Segmentation task
Your CSV file must contain one row per test case.
The first column must be named case_id,
followed by one column per algorithm (alg_*).
Each value should report the Dice Similarity Coefficient (DSC) achieved by an algorithm on a single test case.
Please ensure that all DSC values lie within the interval [0, 1].
| case_id | alg_01 | alg_02 | alg_03 |
| 1 | 0.9254 | 0.8712 | 0.9701 |
| 2 | 0.6753 | 0.7330 | 0.8902 |
| 3 | 0.8120 | 0.7991 | 0.9405 |
| 4 | 0.9012 | 0.8450 | 0.9603 |
| ... | ... | ... | ... |
Expected CSV format
Binary classification task
Your CSV file must contain one row per test case.
The first two columns must be named case_id and ground_truth,
followed by one column per algorithm (alg_*).
Accuracy values must use a consistent encoding {0, 1}.
| case_id | ground_truth | alg_01 | alg_02 | alg_03 |
| 1 | 1 | 1 | 0 | 1 |
| 2 | 0 | 0 | 1 | 0 |
| 3 | 1 | 1 | 1 | 0 |
| 4 | 1 | 0 | 1 | 0 |
| ... | ... | ... | ... | ... |
Expected CSV format
Multiclass classification task
Your CSV file must contain one row per test case.
The first two columns must be named case_id and ground_truth,
followed by one column per algorithm (alg_*).
Accuracy values must use a consistent encoding {0, 1, 2, …}.
| case_id | ground_truth | alg_01 | alg_02 | alg_03 |
| 1 | 0 | 0 | 1 | 0 |
| 2 | 2 | 2 | 2 | 1 |
| 3 | 1 | 1 | 0 | 1 |
| 4 | 3 | 3 | 2 | 3 |
| 5 | 2 | 2 | 2 | 3 |
| 6 | 0 | 0 | 0 | 0 |
| ... | ... | ... | ... | ... |