Hi All,
We've activated the **unified scoring and ranking of teams** on the leaderboards. This exact same scoring/ranking method is what we will be using on your last submission for the final round datasets after Sept 30. So this should give you a sense for how the rankings work.
**For each leaderboard TF/cell type**, you will see a rank for your team's latest submission in a new 'rank' column in the leaderboard table.
The smaller the rank the better i.e. rank 1 is the top ranked submission.
If you see a score of NaN it just means that submission was not your latest submission (only the latest submission is used in ranking).
If you see no value for a rank, it just means the submission has not yet been ranked. Ranks will be updated daily using the latest submission from each team.
You will also see a **unified global leaderboard rank** for each team aggregated across all leaderboard TF/cell types on the main Leaderboard page here https://www.synapse.org/#!Synapse:syn6131484/wiki/402030 . These ranks are also updated daily.
================================================
**UNIFIED SCORING CODE:** We've released code which will be used to calculate the final submission rankings. The code can be found at https://www.synapse.org/#!Synapse:syn7247157 . Note that it is heavily integrated with the synapse leaderboard code because it needs access to other submissions to calculate the merged scoring, and so may require some modification to run as a standalone. We released it anyway so that you can take a look at the internals if you were curious.
At a high level, the rankings are calculated using the following method:
First, for each leaderboard, each of the 4 relevant evaluation measures (auROC, auPRC, recall at 10% FDR, and recall at 50% FDR) are calculated for each 10 bootstrap samples, where each sample contains 90% of the relevant chromosomes being used in evaluation. The subsamples are chosen such that the label imbalance is the same across each sample. Then, the rank are calculated for each of these measures, and the final score leaderboard specific score is calculated as:
\sum_{all 4 measures}{\log{min(0.5, rank/(num_submissions+1))
These scores are then ranked for each bootstrap sample, and the second lowest/best rank among all bootstrap samples is reported as the leaderboard specific rank.
The leaderboard specific ranks are used to score the global ranks. The overall challenge scores are calculated by taking the average over all of the leaderboard specific normalized ranks for each of the ten bootstrap samples. If a user does not submit to a particular leaderboard they are assigned the default score of 0.5. score_lb indicates the minimum over the bootstrap samples, score_mean is the mean, and score_ub is the maximum. The reported rank is the minimum rank across all of the bootstrap samples. For example, if one user has a rank of "2" on 1 bootstrap sample and "3" on the nine others, then their assigned rank is "2".
Thanks to Nathan for implementing and activating this. If you have any questions please post on the discussion board.
Thanks,
Anshul on behalf of the ENCODE-DREAM Challenge organizers