Hi there, I was thinking about this question more and realized I was confused about the phase of the challenge you were discussing. For the leaderboard, we use a small, random, subset of the full set of null models to calculate your score. The goal of this is to add some noise to the score during leaderboard phase to prevent data leakage with multiple submissions. However, the score will be in the correct neighborhood (especially as it is log-transformed). For the final round, we use the full set of null models to calculate a more accurate p-value. Apologies again for any confusion. Best, Robert

Hi there - I assume you are talking about Subchallenge 1. Let me know if this is not the case. For SC1, we are evaluating the top ten predicted targets for each compound, and comparing the accuracy to the top ten predicted targets for each null model. We are using the full number of null models, but a *subset of the targets* from each (top 10) to assess predicted accuracy vs a null prediction. Apologies for any confusion from my wording. I hope this helps! Best, Robert

