Clarification on Scoring

Dear @ULFEnCMICCAI2025Participants, Thank you once again for your contributions to the ULF-EnC Challenge at MICCAI 2025 and for your patience as we worked through the validation and testing phases. We would like to provide a short clarification: 1. Pre-determined metrics : The evaluation metrics (SSIM, PSNR, MAE, NMSE with specified weights) were pre-determined and independently set up prior to the challenge. They were not adjusted in a way that would alter the ranking of teams. 2. Software bug in score display : Some participants may have noticed scores being displayed in the “_3.x_” range at the early stages of the validation. This was the result of a software bug in the way total scores were released. This was corrected in a later part of the validation per our previous announcement. Apologies if it causes any confusion. 3. Acknowledgment & improvement : We sincerely thank all participants for their effort, understanding, and engagement. We acknowledge the confusion caused by the bug and will take this experience forward to improve the communication in future editions of the challenge. We greatly appreciate the innovative work and spirit of collaboration you have brought to the ULF-EnC 2025 Challenge. With thanks and best regards, ULF-EnC 2025 Organising Team

Created by Kh Tohidul Islam KhTohidulIslam
Hi Tohid, I just read the discussion and as one of the participants I would also like to weigh in. There are a few important details here: In the first few days of the validation leaderboard, there was no masking and a clear final score was shown, calculated without the /32. Naturally, many participants focused on PSNR as the most important metric. I understand this may not have been the intended design, but it was visible to everyone and we all assumed this was the way to go. PSNR is not a linear scale, and uploading a matrix of 0s would not give a PSNR of 0. This suggests that dividing by 32 is not a mathematically correct or fair way of normalising scores in my opinion. Changing the scoring approach at this late stage feels unfair to participants who have optimized for their initial understanding of the challenge. At the very least, consistency across the challenge would help maintain trust and integrity. For these reasons, I would strongly encourage reconsidering the new scoring method and instead going back to the original scoring, even if it is not perfect. Consistency is more important than retroactively correcting the metric, especially since the communication of it was not ideal. Kind regards
Hi Tohid, I'm glad to hear you are planning to release the evaluation code. My apologies for any confusion I contributed to here. You're right that releasing evaluation code in advance is not a formal requirement. However, it is generally standard practice. For example, both the BRaTs and UNICORN Lighthouse challenges made their evaluation code available throughout the competition this year. Similarly, TrackRad, a smaller challenge I was peripherally involved with the organisation of, shared their evaluation scripts before submissions opened. In each of these cases, it helped to avoid misunderstandings during the test phase. I'm happy to discuss my thoughts on the challenge over Zoom if you'd like to reach out. Overall, it was a really positive experience for me and my team. We learnt a lot and achieved results well beyond our initial expectations. Best, David
Hi tohid Based on all the exchanges so far, I just want to note respectfully that I do not share your perspective. Bests
Hi @asalehi09, Thanks for your message. To clarify: there were no changes to the scoring metrics between validation and test phases. From the very beginning, the challenge documentation specified the weighted final scoring (SSIM 70%, PSNR 10%, MAE 10%, NMSE 10%). As noted in the discussion posted on 7/29/2025 (https://www.synapse.org/Synapse:syn65485242/discussion/threadId=12209), the validation leaderboard did not include a final score. The test phase applied the same pre-defined weighted scoring, with PSNR normalised only to keep its contribution consistent with the intended 10%. Kind regards, Tohid
Hi Tohid, I think almost everyone agrees that there was no adjustment or change in the scoring during the validation phase, but changes were made in the test phase. If these changes are not openly announced, it doesn’t feel fair and it completely ignores the work we and other teams put into tuning models parameters based on the validation leaderboard.
Dear David, Thank you for your message. To clarify, the response “under consideration” does not imply that we are not going to publish the evaluation code. Regarding your note on “releasing the evaluation code before the test phase is standard practice” in MICCAI challenges, we are not aware of an official guideline requiring release before the test phase. If you have such documentation, we would greatly appreciate it if you could share it with us for future reference. Best regards, Tohid
Hi Tohid, Thanks for your reply. I appreciate that an enormous amount of your time and effort has gone into the coordination and management of this challenge (not to mention obtaining the data in the first place). On the evaluation code: the published challenge documentation (Code Availability section) states: “The evaluation software used for ranking submissions will be publicly available to ensure transparency and reproducibility. The evaluation scripts, including metric computation (SSIM, PSNR, MAE, and NMSE), submission validation, and ranking procedures, will be hosted in the following public GitHub repository... The repository will be regularly updated to reflect any modifications or improvements in the ranking methodology.” This is a clear commitment to releasing the code, not something “under consideration.” Could you please provide a timeline for when the repository will be made public? Releasing the evaluation code before the test phase is standard practice in MICCAI challenges, and would have helped avoid the kind of communication issues experienced here around PSNR normalisation. Best, David
Hi David, Thank you for your message. To clarify, PSNR normalisation (/32) was applied only to align PSNR with the same scale as the other metrics. This step does not alter the intended weighting; PSNR still contributes 10% to the final score, exactly as stated in the original challenge document. The adjustment simply prevents raw PSNR values (measured in dB) from overshadowing the other metrics because of scale differences. We understand that this detail may not have been communicated as clearly as it could have been, and we appreciate your patience and understanding. SSIM has always been the dominant factor at 70%, and this balance was preserved. The normalisation was therefore consistent with the published scoring scheme and not a change to the rules. The release of the evaluation code is still under consideration. Kind regards, Tohid
Hi, Thanks for confirming that no changes were made after validation. Could you please point us to the documentation or announcement where the use of PSNR normalization (/32) was specified prior to the test phase? This would help reconcile the difference between the published challenge description (which lists raw PSNR at 10%) and the evaluation formula used in the test phase. Could you also please publish the evaluation code used during scoring, as previously indicated? Best, David

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Clarification on Scoring page is loading…