Hi,
We are implementing the scoring function to test our different approaches against the data_true.txt of subchallenge 1:
$$\(NRMSE = \frac{\sqrt{(\sum_{i=1}^{n_{missing}} (y_i - x_i)^2)/n_{missing} }}{y_{max}-y_{min}}\)$$
Considering that we have at least one zero in almost all protein for every sample in data_true.txt, and that the function is applied on each protein, then the $$\(y_{min}\)$$ will be almost always 0. Which is equivalent to only divide by $$\(y_{max}\)$$.
In your scoring code, do you exclude those zeros ?
Thank you
Created by Mickael Leclercq mickael you need to take average on the square prediction error of missing spot only, including both non-zeros and zeros. Hello All,
From above, I understand y_max and y_min values for each proteins.
Suppose, I want to calculate protein_1, NRMSE value of data_obs_1.txt.
Where, yi : i = 1,...80 True values of data_true.txt
and xi : i =1,..,80 Computed values of data_obs_1.txt
Do I need to consider those imputed values which has corresponding zeros in data_true.txt while calculating NRMSE
or I need to consider only those imputed values which have corresponding nonzero true data values?
Yes, you are right. For one protein, the y~max~ and y~min~ should be based on true values (positive) from **all samples**, instead of just true values (positive) from **samples with NAs**. In other words, the y in the denominator represents the full set of values, and y in the numerator is a subset.
Is my understanding correct? Please find the definition of all those terms in the document of scoring matrix.
And I cut and paste the related sentence here just in case you have missed some part:
'ymax and ymin are maximum and minimum of the same protein among all samples which the true intensities have **positive** value.'
Drop files to upload
Scoring function with zeros values as normalization page is loading…