Dear @FirstDREAMTarget2035DrugDiscoveryChallenge
Together with the step 2 test set information we would like to share a pre-trained model MMELON (Multi-view Molecular Embedding with Late Fusion) using SMILES and developed for this challenge.
Any participant can feel free to use the code or our prediction to build on their own.
See here the [git for the model called MMELON](https://github.com/BiomedSciAI/biomed-multi-view/tree/dream_challenge?tab=readme-ov-file ) fine-tuned using the 14 molecules known to be targets of WDR91.
An example submission file from fine-tuning MMELON is also there:
https://github.com/BiomedSciAI/biomed-multi-view/blob/dream_challenge/Step2-submission-file.csv
Best
Pablo Meyer
Created by Pablo Meyer jeriscience We are happy to share **a second foundation mode**l, [Mammal](https://github.com/BiomedSciAI/biomed-multi-alignment) that is able to do **zero shot predictions **for any protein (given its amino acid sequence) and for any molecule given their smiles, so this fits perfectly to try on this dataset. See Drug target interaction [link](https://github.com/BiomedSciAI/biomed-multi-alignment?tab=readme-ov-file#drug-target-interaction)
Here is the sequence for the target WDR91 shared by Matthieu:
PEQPFIVLGQEEYGEHHSSIMHCRVDCSGRRVASLDVDGVIKVWSFNPIMQTKASSISKSPLLSLEWATKRDRLLLLGSGVGTVRLYDTEAKKNLCEININDNMPRILSLACSPNGASFVCSAAAPSLTSQVPGRLLLWDTKTMKQQLQFSLDPEPIAINCTAFNHNGNLLVTGAADGVIRLFDMQQHECAMSWRAHYGEVYSVEFSYDENTVYSIGEDGKFIQWNIHKSGLKVSEYSLPSDATGPFVLSGYSGYKQVQVPRGRLFAFDSEGNYMLTCSATGGVIYKLGGDEKVLESCLSLGGHRAPVVTVDWSTAMDCGTCLTASMDGKIKLTTLLAHKA
with a N-biotin tag and C-terminal histag
Thanks Hi Pablo, thanks so much for highlighting the aspects to consider. It did cross my mind that the SMILES of the DELs would be relevant and useful. Alternatively the DELs of the SMILES would also be great -- this might help in identifying the useful and noisy synthon combinations. Would it be possible to share the BB (building block) decomposition of the ligands in the test set? Thanks to the organizers for sharing the SMILES.
Ashok **Clarification**
A participant was surprised by the recent post suggesting us to use IBM's MMELON Fundational model, particularly because DELs are not used at all to train this model, nor can they be (easily) used for finetuning given the necessity to use SMILES and the fact that we don't have the SMILES of the DELs. When sharing the model, we wanted to give the opportunity to participants to potentially use it as a tool with the goal of seeing what ideas and modifications they would add to it. We did not use any of the training data as indeed SMILEs are not available. We just fine-tuned the model with the 14 available molecules (and their SMILEs) that come from the leakage of these as targets of WDR91.
This amount of data is very little for our model to perform as we would like to, but maybe there are ways to make it better.
Hope this clarifies, and if you find this useful.
Pablo
Got it, thanks for the clarification. Hi @apalania this is correct, thanks for the interest. Be aware that we dont know how well our model performed, we did not use any of the training data as SMILEs are not available. We just fine-tuned the model with the 14 available molecules (and their SMILEs) that come from the leakage of these as targets of WDR91. This amount of data is very little for our model to perform as we would like to, but maybe there are ways to make it better. Thank you for sharing the foundational model. We were wondering if the link to the csv provided is the link to your predictions based on finetuning MMELON for the DREAM dataset using a training data (n=100) made by combining the 14 public-domain WDR91 ligands (label = 1) and 86 randomly selected compounds from the 339k set (label = 0).
Please let me know if my understanding is correct. Thanks
Drop files to upload
Foundational model available for step 2 predictions page is loading…