Dear Contestants,
Apologies for the delay in data description. This year, we packaged the data as tar shards so you can load it directly with the Webdataset library. https://pypi.org/project/webdataset/
For the labeled training set, we ensured that each label appeared in every tar shard file and did our best to distribute patches evenly across the shards. They are randomly distributed with a fixed seed.
In the current version of the data description, we provided an example for you to load the packaged data using webdataset. See https://www.synapse.org/Synapse:syn74274097/wiki/639588
I will update the data description in the following days, but I think considering there are contestants who have previous experience. It would be clearer to be addressed here as well.
For the evaluation, we follow the tradition from past years' challenges and ask you to submit ONE CSV file with one column for the patch names and another for the predicted labels for the validation set.
We will release the validation set (in tar shards) and unlabeled training images (in tif files), along with scripts to patch and package the large tif images, in the coming days.
Feel free to ask any questions.
Cheers,
Jayden