Recently we worked on the NCI-CPTAC DREAM Proteogenomics Challenge data sub1 for missing protein levels. We are trying to model the protein abundance by our recent developed method.
But that, we just simply calculate the correlation of the protein abundance between different samples. It turned out that the average correlation is 0.976, which we considered it is extremely high. As such, it turned out the simple method like replacing the missing value by mean would be working great on these testing datasets.
Just wanted to follow up with you for this problem. Do you have any update on the data for Sub1 challenge ? Or is it the way it should be like this extremely high correlation ?
Created by sujun li sujli The high correlation between samples is normal (because protein abundances extend a large range). Mean imputation is not a good strategy. We (you) can do much better than mean imputation. The current training data of sub1 challenge is the final data.
Drop files to upload
Extremely High Correlation Observed in Sub1 page is loading…