Dear CPTAC Dream challenge participants, As brought up by some participants, many models making individual predictions for each phosphosites require large space in the docker image and can hardly fit in current size limitation (10GB). We therefore update the 3.3 - Accessing Data with phosphosites need to be predicted focusing on those with less than 30% missing rate for both the training and testing data. Please make predictions for only those phosphosites and your performance will only be evaluated on these subsets of phosphosites. Best, NCI-CPTAC Proteogenomics DREAM admins

Created by Zhi Li lz0718
Dear Hongyang, The scoring metric for sc2 is **at least 5 observations**. It's really the exact data we provided in the filtered and the evaluation_data folder. Sorry for the confusing. Best, Mi
with this new data you had better revalculate the leaderboard. maybe we are not top. then you need to invite someone else to conference.
> We didn't change the data. then why you needed to make a post? >Considering only a subset of phosphosites with high coverage for final round scoring will make your model more tuned to real proteomic questions. do you believe what you are saying?
@deepimagine1 and hongyang   To be fair, I don't think we will do worse. Unless we make substantial mistakes in final. Because the rest of the teams are in much worse situation that they have to both risk in method, and now also risk in infrastructure with the new change. It is to our great advantage that we only need to risk the infrastructure.   The rest of teams in this new situation, are practically out of the game. This change is very sudden and unexpected to most of us.   However, I want to re-propose my proposal of taking last valid submission. @gustavo @saezrodriguez @pwang @thomas.yu The reason is simple but let me put it straightforward. This organizing team is incapable of setting up things correctly within their first 10 trials. If the current queue is set up correctly, I will eat a tube of toothpaste in this year's conference. You need us to submit early to help you identify the problems in the queues, so that you can practically cut off on the 20th. This needs to happen today or tomorrow the lastest to ensure things run through.  
Can you clarify what **sub2** scoring metric is? @lz0718 Is it gene with **<30% missing rate** or **at least 5 observations** (3.6 Prediction Scoring Metrics)? The challenge has one week to go but it's still not clear which scoring metric is used... Now we only evaluate 1,300 out of the original 10,000 (**13% NOT** 80% or 90%) phosphosites. For teams that focus on phosphosites with large missing rate in past 3 months, this is a disaster.. I think at least you should consider scores of both 30% and 5 observations..
I think this decision is not fair. Because it is only good for the participants who made individual models for each phosphosite. As Yuanfang mentioned , so far we have performed round 1, round 2 to fit all the phosphosites. But now we have to determine the performance with only a fraction of phosphosites. Is this consistent with what challenge intended? Thanks..
so : 4908 lines in breast phospho, 1319 lines in ovarian phospho. can we try now and you take the last submission (without reporting scores/log). i am not confident your system works as the way you expect. @thomas.yu
Dear Yuangfang, The format will remain the same. You only need to predict the following phosphosites in your predictions.tsv and confidence.tsv. Entity ID Filename Content syn11422981 breast_phosphosites.txt (4907 Phosphosites) syn11422982 ovarian_phosphosites.txt (1318 Phosphosites) You can download them at 3.3 - Accessing Data of the challenge page. Best, NCI-CPTAC Proteogenomics DREAM admins
our model was tuned for filling missing data. but anyway. can you tell me and the rest of the teams, what is the expected lines in each submission matrix????
Dear Yuanfang, We didn't change the data. Considering only a subset of phosphosites with high coverage for final round scoring will make your model more tuned to real proteomic questions. And we hope every participant can enjoy these precious data without worring about the docker size limitation. Again you can utilize all data not limited to certain missing rate to maximize the power of your model as you prefer. We hope we addressed your concerns. Best regards, NCI-CPTAC Proteogenomics DREAM admins
@gustavo
this is so annoying!!!!!! this is the 12th time you changed your data. our models were trained for sites with missing with actually, a method that is supposedly working in subchallenge 1 but never worked there due to i believe completely wrong simulation method @saezrodriguez to unsupervisely filling in the missing values. @saezrodriguez how can you change data at this stage, this is to say our last three months of work was in vain. if you have to do that, you will need to justify and give separate winners for both the original data lane the the new data lane. to get scored, can you tell me what is the exact expected number of lines for each sub-challenge output file, please? among your 12 versions????????

Update on Phosphosites to be Predicted page is loading…