I might be wrong, please correct me if I am.
But I think there is a small oversight in scoring scripts for SC2 and SC3 .
Thankfully I don't think it will have any big effects.
From: https://github.com/Sage-Bionetworks/NCI-CPTAC-Challenge/blob/master/scoring_harness/scoring_functions.R
```
correlation_by_row <- function(pred_path, truth_path) {
prediction <- read.csv( pred_path, row.names = 1 , check.names = F, sep="\t")
test_prot <- read.csv( truth_path, row.names = 1 , check.names = F, sep="\t")
prediction <- prediction[rownames(test_prot), colnames(test_prot)]
test_prot <- test_prot[rownames(test_prot), colnames(test_prot)]
mat1 <- as.matrix(prediction)
mat2 <- as.matrix(test_prot)
corr_vec <- c()
for(i in 1:length(mat1[ ,1]) ) {
c <- rbind(mat1[i, ], mat2[i, ]) ; c <- c[ ,complete.cases(c)]
if(length(which(apply(c, 1, var) == 0)) > 0) { corr_vec <- c(corr_vec , 0 ) } else
{
temp <- cor.test(mat1[ i, ], mat2[ i , ])
pcorr <- temp$estimate # pearson correlation
if (is.na(pcorr)) {pcorr<-0}
corr_vec <- c(corr_vec , pcorr)
}
}
names(corr_vec) <- rownames(mat1)
return(mean(corr_vec))
}
```
I think what you wanted is `c <- c[ ,complete.cases(t(c))]` in place of `c <- c[ ,complete.cases(c)]`
For eg:
```
prediction <- c(1,2,3,4,5,6)
test_prot <- c(12,NA,NA,23,NA,45)
c <- rbind(prediction,test_prot)
c[ ,complete.cases(c)]
[,1] [,2] [,3]
prediction 1 3 5
test_prot 12 NA NA
#Instead, we want -
c[ ,complete.cases(t(c))]
[,1] [,2] [,3]
prediction 1 4 6
test_prot 12 23 45
```
This is unlikely to have made a difference unless in obscure cases such as following.
```
#if observed protein has missing value, then the check for zero variance is made only on half of the values (just alternate columns) for both prediction and test_prot
prediction <- c(5.4,5.6,5.4,5.5,5.4,5.5) #true variance is non-zero but small
test_prot <- c(4.5,7.8,NA,6.5,5.8,5.7)
c <- rbind(prediction,test_prot)
c <- c[ ,complete.cases(c)]
[,1] [,2] [,3]
prediction 5.4 5.4 5.4
test_prot 4.5 NA 5.8
length(which(apply(c, 1, var) == 0)) > 0
[1] TRUE
# So in this case, the correlation will be falsely set to Zero.
```
Does this make sense? I am not complaining, but just reporting what I observed.
Thanks.
Created by Sunil Kalmady Sunil in the current setting, any issue with correlation is set to zero. Thanks, but in your current update there is no check for missing values.
...
c <- rbind(mat1[i, ], mat2[i, ])
if we don't check for `sum(complete.cases(t(c))) > 2` , we will get "not enough finite observations error" in places where there are two or less valid observed protein values.
... Good catch ! We will update asap. hopefully nothing big will change.
Drop files to upload
?? Reporting a minor flaw (potentially without any consequence) in scoring scripts page is loading…