I am writing about a reproducible failure in the Task 1b (enhancement) scoring pipeline on Synapse. Several of our submissions of LISA_enhanced_predictions have failed inside the scoring container, never producing results.json with the same crash in the no-reference BRISQUE metric. Recent affected submission IDs include 9767243 and 9767291. The traceback (from the Task 1b scoring job) ends in piq's BRISQUE: File "/app/scripts/task1b.py", line 1096, in score_task1b_case_level case_brisque = metrics.brisque_qc(case_slices) File "/app/scripts/task1b.py", line 619, in brisque_qc values.append(float(metric(batch).item())) ... File ".../piq/brisque.py", line 153, in _aggd_parameters assert (count_left > 0).all(), ... AssertionError: Expected input tensor (pairwise products of neighboring MSCN coefficients) with values below zero to compute parameters of AGGD This assertion fires on a (near-)constant 2D slice e.g. a low-contrast background/edge plane for which the MSCN coefficients have no negative pairwise products, so piq raises and the whole job hits permanentFail. What we have verified on our side: every one of our 114 enhanced volumes is clean under the public piq 0.8.0 — running real piq.brisque per 2D slice along all three axes, under both per-file and case-global [0,1] normalization, yields zero AGGD failures, with no NaN/Inf and no degenerate slice sizes. We can reproduce the historical failure exactly on older data and confirm our current submission passes our oracle. In other words, our verification and the scoring container disagree on the same data. We suspect the gap is one of: 1) The scoring container uses a different (older) piq version than 0.8.0 the brisque weights are pulled from the piq v0.4.0 release in the logs and the AGGD count_left behavior differs between versions; and/or 2) brisque_qc / score_task1b_case_level builds and normalizes case_slices differently than we assume (e.g. reference-based or fixed-range normalization, a specific slice axis, batching, or the data_range passed to piq.BRISQUE), which can push a low-contrast background slice below the degeneracy boundary that piq 0.8.0 tolerates. Could you please share: - the exact piq version (and any pinned dependencies) used in the Task 1b scoring container; and - the relevant preprocessing in scripts/task1b.py — specifically how brisque_qc / score_task1b_case_level construct case_slices, the slicing axis, batching, the normalization applied, and the data_range passed to piq.BRISQUE. Alternatively, would the team consider guarding the per-slice BRISQUE call so a degenerate slice is skipped or penalized rather than crashing the entire scoring job? That would make the metric robust to background/edge slices for all participants. I'm happy to share our verification script or a minimal reproducing example if helpful. Thank you very much for your time and for organizing the challenge.

Created by Ujjwal Baid ujjwalbaid
Hi, Could you please help me with submission ID 9767472 for Task 1(b)? For some reason, it has been stuck in the "Evaluation" phase for more than 12 hours. Thank you for your assistance.

Issue with Task 1(b) validation evaluation page is loading…