Dear @ULFEnCMICCAI2025Participants, We are pleased to share several important updates regarding the ULF-EnC Challenge. **Testing Phase Results** We have now completed the evaluation and scoring of the testing phase submissions. Congratulations to all participating teams for their efforts and contributions! Leaderboard – Testing Phase: https://www.synapse.org/Synapse:syn65485242/wiki/634769 * All metrics were computed on masked brain regions only (background excluded). * PSNR values were normalized by dividing by a maximum PSNR of 32. * The final masked score for each submission was then calculated as: $$\(\text{PSNR}_{\text{norm}} = \frac{\text{PSNR}_{\text{masked}}}{32.0}\)$$ $$\(\text{Final Score} = 0.7 \times \text{SSIM}_{\text{masked}} + 0.1 \times \text{PSNR}_{\text{norm}} + 0.1 \times \left(1 - \text{MAE}_{\text{masked}}\right) + 0.1 \times \left(1 - \text{NMSE}_{\text{masked}}\right)\)$$ **Short Paper Decisions** Decisions on short paper submissions have been sent to the corresponding authors via CMT. * Each paper was reviewed by at least three reviewers, followed by a meta-review, and finalised by the organising committee. * Congratulations to all accepted papers, and thank you to every author for your contributions. **Presentation Format** * Top 3 teams (each): 12-minute oral presentations + 3 minutes Q&A. * All other teams: Poster presentations, following the MICCAI poster guidelines. **Camera-Ready Submission Instructions** * Revise your paper in response to reviewer and meta-reviewer comments. * Submit the final camera-ready version by September 19, 2025 (Pacific Time) via CMT. * Please include your test phase results in the final version. (Teams unable to provide Docker test results should instead include updated validation results.) * Ensure your manuscript follows the required citation guidelines provided in the challenge documentation. * Note the correction regarding scanner information: **_The challenge data was acquired using a Siemens Biograph mMR (3T) scanner and a Hyperfine Swoop (64mT) scanner._** This corrects the earlier erroneous mention of Siemens Skyra. Dataset Information – ULF-EnC: https://www.synapse.org/Synapse:syn65485242/wiki/631229 Please update your manuscript accordingly if you referenced the scanner. **Final Note** We sincerely thank all teams for their hard work and participation. If you have any questions, concerns, or suggestions, please use the challenge discussion board to reach us. We look forward to your final camera-ready papers and to celebrating your contributions at MICCAI 2025! Best regards, The ULF-EnC 2025 Organising Team

Created by Kh Tohidul Islam KhTohidulIslam
In light of this clarification, I believe that we should also be allowed to resubmit our results. I originally had a model that achieved better SSIM but slightly worse PSNR. Under the previously understood scoring formula, its final score was lower, but under the clarified scoring protocol I believe it would achieve a stronger result.
Hi @HaiMiao and @asalehi09, Thank you for your message and for sharing your perspective. I’d like to clarify that no new metric was introduced after the competition. The evaluation has always been based on the same four metrics specified in the challenge document: SSIM (70%), PSNR (10%), MAE (10%), and NMSE (10%). What we did was apply a normalisation step to PSNR (dividing by 32) to ensure it was on a comparable scale with the other metrics. This adjustment did not change the intended weighting of PSNR at 10%; it only prevented PSNR values (measured in decibels) from disproportionately influencing the final score due to scale differences. The relative importance of each metric remained exactly as published, with SSIM as the dominant factor. I understand your concern about timing, and we acknowledge that the normalisation detail should have been communicated more clearly during the competition. For transparency, please refer to the Clarification on Scoring discussion: https://www.synapse.org/Synapse:syn65485242/discussion/threadId=12362, where we explained this in detail. We very much appreciate your engagement and feedback, and will take this lesson forward to improve communication in future challenges. Kind regards, Tohid
Thank you for your message and for the clarifications provided. We would like to share a short note from our side: 1. Validation leaderboard usage: During the validation stage, we updated our network and training based on the feedback from the leaderboard. 2. PSNR influence: We observed that PSNR had a stronger effect on the validation score compared to other metrics, and therefore focused our parameter tuning and model improvements in that direction. 3. Fairness concern: The main idea behind the validation leaderboard, as we understood it, was to know where we stand and adjust our models accordingly. Since many of our updates were guided by this process, I personally believe it would not be fair to change or reinterpret the scoring afterwards. With appreciation for your efforts and for organising the Challenge,
@KhTohidulIslam Hello, Thank you for the effort to the competition. What I find most unacceptable is the time of the announcement of the new metric . It's too late. If the metric were posted during the competition, we could have accepted it. I’d like to make an analogy: it’s like we’re running a marathon, and after it’s over, the judges say they will rank us based on our performance in the first hundred meters. I think it's ridiculous. Best Haimiao
Hi @XiaoyuBai, @asalehi09, @levente1, @jgro4702525, and @dwaddington, Thank you for raising your concerns and contributing to this discussion. We have already posted an announcement clarifying the details of the final scoring. To reiterate: there were no changes made to the masking procedure or to the PSNR calculations after the validation phase. We hope this provides reassurance that the evaluation process was consistent across validation and test phases. We also trust that the community is aware of the expected outcomes across these scenarios (e.g., validation and test). With appreciation, ULF-EnC 2025 Organising Team
Hi, I'd also like to extend my gratitude to challenge organisers. I too share the same concerns as other participants about the unannounced change in weighted scoring and reiterate that our team (among others) directly optimised to the original published weighted scoring that was outlined in the challenge document and GitHub. Had we have been made aware to any changes in weighting, we would have dramatically changed our approach and optimisation. We are also trying to understand our >1 dB PSNRmasked dip between validation and testing. If we were to run the same testing script on submitted validation data would it produce the same masked results as the validation leaderboard? Could you please update the GitHub with the final evaluation script (with masking steps) as soon as possible? Kind regards, James
Hi, Thank you to the organizers for all their efforts. I also write to extend support for the points raised. I understand the organizers’ rationale for the updated score metric, as it prevents PSNR from being too dominant of a factor in the final score. This has directly benefited teams such as ours, who have decided to optimize for all voxel-wise metrics rather than primarily PSNR. Nevertheless, I am also surprised that this decision has been made so late in the challenge; it would seem to be more appropriate to announce such a change before model submission, or to otherwise keep the score metric constant. If we revert to the original equation, it seems 2 out of the 3 winners would still remain, with @asalehi09 rising to first place (3.718) and our team dropping to second (3.705). The more concerning effect of this change, however, is that the remaining winning team would drop out of the top 3, and the team currently in seventh place would rise to third place with a score of 3.700 (if anyone calculated this differently, please let me know). It would be unfortunate to revoke a winning placement, but also to not recognize a team who scored high based on the original equation. Is there any way the effort of both of these teams could be acknowledged? Best, Levente
Hi, Thank you for all your work in organizing this challenge. I would like also to add my support to the comment raised about the adjustment of PSNR (dividing by 32) in the final evaluation. My last submission was tuned to improve PSNR, and from my point of view this change had a clear impact on the final results. I believe it would have been very helpful if this adjustment had been communicated earlier, so that participants could focus on the metrics that were ultimately important — just as was done when the switch to masked metrics was announced.
TL;DR: The published challenge rules weighted PSNR at 10%, but at test time PSNR was normalized and reduced to negligible weight. Many teams, including ours, optimized for PSNR under the published criteria. This change directly altered the leaderboard and the winner, which undermines confidence in the fairness of the result. Hi, Thanks for all your efforts coordinating the ULF-EnC challenge and for sharing the test phase results. I have two questions and a broader concern regarding the evaluation process: 1. Were there any changes made to masking or PSNR calculations after the validation phase? Our PSNR test values were lower than expected, and we’re trying to understand whether any pipeline modifications could explain that. 2. Could you please release the evaluation code? You previously indicated this would be shared after Docker evaluation. While we would not have placed under either the old or new scoring, I want to echo others’ concerns about the change in scoring between validation and test time. This change lacked transparency and amounts to a dramatic shift in the goalposts that directly changed the outcome of the challenge. Teams, including ours, optimized their models for PSNR because that was the metric explicitly stated in the challenge design and used during validation. Replacing that with a normalized, near-zero-weighted version of PSNR after Docker submission not only undermines those efforts, but also retroactively redefines what success looks like. This is not a minor adjustment — the team that won under the revised scoring would not have won under the original, published criteria. It is clear that the team who would have ranked first under the published rules should be recognized as such. Changing the scoring after submissions closed undermines confidence in the legitimacy of the result. Best, David
Hi, Thank you very much for your reply. I understand that dividing PSNR by 32 was intended to align it with the other metrics, while keeping SSIM at 70% weight. The problem is that this adjustment was never mentioned. To be frank, in the competition my main concern is the final score, and this change has a significant impact on it. During my model training, I strictly followed the released evaluation file to compute the final score, and that file did not include PSNR/32. At the same time, I also relied on the formula 0.7 × SSIM + 0.1 × PSNR + 0.1 × (1 − MAE) + 0.1 × (1 − NMSE) to compare my results against other participants on the validation leaderboard (and I recall that before the final scores were removed from the leaderboard, they were also calculated using this same formula). If the evaluation procedure is modified, I think you should notify us in a timely manner. For example, when the switch to using masked data for the metrics was made, there was an official email announcement. Otherwise, the evaluation and tuning work we perform locally loses its meaning.
Hi @XiaoyuBai, we applied the normalization (PSNR/32) only to align it with the other metrics, without changing the intended weightings (SSIM remains 70%).
Hi, I would like to express a concern. In the initial evaluation description, there was no mention of PSNR normalization, and the emphasis was placed on SSIM as the most important metric. Applying normalization only at the final stage, without it being specified from the beginning, creates an inconsistency that may impact the transparency of the evaluation. Thank you very much for your time and understanding. Best, Xiaoyu

Testing Phase Results & Camera-Ready Instructions page is loading…