Dear organisers, We are writing to ask for some clarification regarding the evaluation of our submission: - It seems unclear to us which metrics will be used to evaluate our submission. We understood that the F1-score would be used at the pixel-level and the AP would be used at the sample-level, is that correct? In such case, can you confirm that the implementation of the AP at the sample-level is given by `process_file_samplewise` from and then sklearn.metrics.average_precision_score? And that the pixel-level F1-score is computed using `get_f1_score_clean_list` from, meaning that `process_file_pixelwise` from is never used? - Could you please explain how we can obtain ?X * Y * Z scores per sample (where X * Y * Z is the dimensionality of the data sample)? for pixel-level as stated on Synapse? From our understanding, the given code does not compute X * Y * Z scores per sample, but rather a score depending on the number of ground truth and predicted objects. - Should we add _Medical Out of Distribution Analysis Challenge 2020 Organizers_ as viewers to our project or is there a specific team for 2024? - Lastly, can you confirm that we are allowed an unlimited number of submissions to the task ?Toy Examples - Pixel-Level? and ?Toy Examples - Sample-Level? to test the docker on the submission system and obtain the toy-test scores, whereas we are only allowed 3 submissions to the entire testing set? Many thanks in advance, Best wishes, CB, TDV, HR & MS

Created by MaŽlys Solal msolal
Hi, Thanks for your questions. Let me try to clarify that: - Yes (to all of them) (the process_file_pixelwise is legacy code from 2020-2022) - That simply means that you should give one score per pixel/voxel ( i.e., X times Y time Z scores ( given that X is the width, Y the height, and Z the depth of a sample image )) - Please add Medical Out of Distribution Analysis Challenge 2020 Organizers . - Yes, you have an unlimited number of the Toy Queues, but please only use them to validate your submission and check that it is consistent with your own results, For the final test set you are allowed three submissions (but can be extended if sth doesn't run through or fails and is not a hard hard limit for the submission platform, this is mainly to not congest our submission pipeline ), and please note that only your last/ latest submission will count. Cheers, David

Clarification about the evaluation page is loading…