Yes

Created by Thomas Yu thomas.yu
so sorry to linger around the same problem, ma/thomas, but i saw you removed the 0.99X entries, which is great and totally makes sense. but the thing is, if we can produce 0.99X, we can obviously produce any value between 0-0.99. i think we need a more scientific strategy to remove entries beyond absolute values. such as when running them separately for each input, the result must binary reproduce... was that already something included in the current leaderboard validation?   thanks a ton. yuanfang
but the submissions from me and till the one until ~0.4 should all be cheating (just to different extents, from like me direct printing answers to supervised training) utilizing multiple matrices. how do you enforce that doesn't happen in the next round which has materialized incentives?   thanks.  
That's right, but according to the instruction the prediction matrix was not supposed to have NA values. We will make adjustment on the scoring metrics.
thanks hongyang for clarification, does that #2 explain why i got almost perfect correlation, but totally off rmse?   maybe just release the leaderboards @thomas.yu first, even it might be some error. because often from the scores reported back we can figure out what is going on wrongly.
Hi @pacificma I found the evaluation R codes from here: [scoringfunctions](https://github.com/Sage-Bionetworks/NCI-CPTAC-Challenge/blob/be15f9ffad1794dd2f629b0b9fc58ef1bc29f16e/scoring_harness/scoring_functions.R) 1. sub1 nrmsd If I print "" instead of "0", the nrmsd is higher. This is because R reads "" as NA by default. And in line 11 of your evaluation code, NA positions are excluded when calculating nrmsd. For example, if the predictions and true values of a gene are: pred=c(0,0,8,2,10) and true=c(0,0,16,2,13). The mean-square (ms) is 14.6=mean((pred-true)^2^,na.rm=T). If I change pred to c(NA,NA,8,2,10), the ms is 24.3=mean((pred-true)^2^,na.rm=T). (pred-true)^2^ is divided by 3 instead of 5. My suggestion is setting NA to 0 first. e.g. add this between line 9 and 10 to your code: d.predict[is.na(d.predict)]=0 2. sub1 correlation According to the scoring metric formula, correlation is calculated for all missing positions (fake+true missing). However, in your evaluation code line 23: d.predict[d.true==0]=NA; The true missing positions are excluded and only fake missing positions are evaluated. Did I misunderstand the scoring metric? Thanks!
sorry i will have to check with him when i return to work, the earliest probably Thursday. but don't the system should check and filter out submissions that have missing? the ones i submitted still looks quite suspicious actually. don't your think, with such inconsistency in
did you mean print "" instead of "0" in the prediction files?
i understand now, Ma, with the help of my student. sorry of being stupid. but he said your scoring might still have a minor bug, he said 'If you print “” instead of “0”, the NRMSD is higher than true NRMSD'.
but training data sets only have one ground truth...
Thank you very much for the explanation. I will go through the training set again to see if this is the case.
Hi, I guess that is probably because the 100 truths for 100 observed data set are not exactly the same. Have clarified in the description now. If that was the reason, sorry for any confusion.
that's what i said, either the scoring is wrong, or the truth file was printed wrongly....
Hi, exact same matrix as the underlying truth will return a nrmse 0 and a correlation 1.
actually i didn't know this, so i practically just printed the gold standard..sorry about it... but somehow, the gold standard doesn't get an nrmse = 0, although the correlation is perfect. Do you think something went wrong in the evaluation code?
No, we can't. Input for prediction function should be only one observed data matrix.

(Webinar #1) For subchallenge 1, can we borrow information between 10 training datasets to train the prediction function? page is loading…