Created by dskhanirfan It will be used to differentiate teams that have the same number of true positive clusters in their selections. what is the importance of the 4th column for evaluation? yes, there should be 200 "1" in the 2nd column and 500 "1" in the 3rd column. for the submission file for step 1, All 339,258 RandomIDs from Step1_TestData_Target2035.parquet in the second and third column it should be 1 if they are in cluster 200 and 1 if they are in 500 or 1 in both columns. right? Actually no. The 138 undisclosed (secret) true positives are on top of the 14 public true positives.
These are 14 true positives that are available in the published literature, in the PDB and in other publicly available platforms. We compiled the list here so that you don't have to look for them. what does it mean by 14 public domain compounds (SMILES) in STEP 1 training set? It will be available soon we are just finishing the terms & conditions @mschapira I am unable to download the files The ASMS WDR91 dataset is a small scale screen of an unrelated library. This file should be ignored. If any hit was confirmed out of this screen, they would be in the file 14_public_domain_WDR91_ligands.csv available to all on synapse. @mschapira From the dataset you mentioned above, The WDR91 DEL dataset has 375595 rows, there is this ASMS WDR91 data which has 5225 rows. Is that also input data? Will a test set be made available? How do we deposit our predictions? The training set is the WDR91 DEL dataset where the partner is HitGen. There's a WDR91 dataset here: https://aircheck.ai/datasets. Is that the training set? Hi thanks for the interest, challenge will launch end of the month, data will me made available then and everybody will be informed.