This is an exciting event! Can we compute other molecular features other than _'ATOMPAIR', 'MACCS', 'ECFP6', 'ECFP4', 'FCFP4', 'FCFP6', 'TOPTOR', 'RDK', 'AVALON'_ as stated in the wiki? Thanks.

Created by legend123
Smiles strings for the training set are proprietary to HitGen and cannot be shared. McCloskey et al 2020 [https://doi.org/10.1021/acs.jmedchem.0c00452] showed in Supp. Fig. 3 that for some protein targets a DNN trained on ECFP4 fingerprints performed as well as GCNN trained on smiles. Our hope is that you will find clever strategies, maybe combining different types of fingerprints, maybe extracting signal from building block composition, maybe using many other tricks, to demonstrate that indeed training on fingerprints is sufficient to deliver novel hits. And if you are in the top 3 in the retrospective step, we will cover all costs for the prospective step. In future DREAM challenges, we will use another DEL library and will disclose smiles for the training set. But let's first see how far the community can push the boundaries of ML models trained on fingerprints.
But will SMILES only be available for the test set and not for the training set? That way we cannot train our models on SMILES? Would it not be possible to open up and share the SMILES strings for the training sets?
SMILES for the test set will be available on May 28th, when everyone has submitted their Step #1 predictions.
@mschapira is there a timeline for smiles code access?
Sure! Once you have access to smiles strings, you are free to generate and use any molecular feature.

Can we use/generate new molecular features? page is loading…