Hi, thank you for organizing this interesting challenge. I have looked the leaderboard table at https://www.synapse.org/Synapse:syn66330133/tables/ I wonder why some submissions with large number of found clusters (Clusters_Sel_50) and hits (Hits_Sel_50) are not among the top ranked submissions. See for example id 9753352 from AUBioinformaticsLab (46 hits) and id 9753343 from KutumLab-T035 (42 hits). Kind Regards, Viktor

Created by Viktor Drgan viktor
Thanks for the information.
Yes, two submissions had impossibly good results and were removed from the leaderboard. They both identified >25 clusters in their top 50 compounds in step #2, while the next best is 7 clusters and the next best 4 clusters. If these amazing results can be reproduced with the test set where all SMILES are produced with the same algorithm (file Step3_TestData_Target2035.parquet - labels should of course be ignored), the submissions will for sure be put back on the leaderboard!
Hi viktor thanks for the interest, We discovered that for step 2, SMILES strings of the true positives in the test set were generated with a different software than the negatives, which can be picked-up, for instance, by SMILES embedding ML models. We think this was picked up by the non-ranked teams (who did not fare as well on Step 1). We are working with them to clarify their predictions. Thanks for the comprehension.

Submissions with large number of found clusters and hits not selected as the best submission? page is loading…