Missing gold standard samples in RNAseq evaluation data file of leaderboard 1 (SC2 and SC3)

Dear all, I am using RNAseq data file to obtain my predictions for SC2 and SC3. In my latest try, the failure message was the following: FAILURE_REASON: predictions.tsv: All sample Ids in the goldstandard must also be in your file. You are missing: TCGA-23-1123,TCGA-24-1103,TCGA-24-1428,TCGA-24-1430,TCGA-24-1435 SC2 Express lane: Object_id: 9642104 Log_file: syn11059071 Testing data RNAseq (both sc2 and sc3) /evaluation_data/prospective_ova_rna_seq_sort_common_gene_15121.txt I have checked the reported missing gold standard sample IDs at the input evaluation data RNAseq file for SC2 and SC3 and I think that they are also missing there. Could anyone confirm this error? Thanks.

Created by Alicia Amadoz aamadoz
Is this issue solved or addressed..? I am now struck with similar error in sub_2. Inputs are appreciated. Regards, Vijay N
Dear all, Thanks for your help. Finally, this problem was solved. In addition to import data, data.frame R function also has the option check.names. So, when going from a matrix to a data frame, the names of the columns are also checked and modified dashes with points. I corrected this option with check.names=F within data.frame function and I had my script scored with training data. Regards, Alicia
I think the issue might be the file you have in /train_data/ . Try using "/evaluation_data/prospective_ova_rna_seq_sort_common_gene_15121.txt" . When the Docker image is ran, it populates the test data in /evaluation_data/, so if you use the RNA seq file from there, you should have the correct number of samples in the right order.
Dear all, Thanks for your answers and suggestions. However, I still have the same error when using **express lane** of **sub-challenge 2**: ``` predictions.tsv: All sample Ids in the goldstandard must also be in your file. You are missing: TCGA-23-1123,TCGA-24-1103,TCGA-24-1428,TCGA-24-1430,TCGA-24-1435 ``` This time I did several changes: 1) ``` /train_data/retrospective_ova_rna_seq_sort_common_gene_15121.txt ``` is my input test file included in my dockerfile. 2) ``` check.names=F ``` was included and I double check the sample names through different parts of my code. 3) In addition, I checked that the reported missing samples are included in my input test file. Nevertheless, I also added them in purpose before printing predictions.tsv Despite all of these modifications, my submission test is **INVALID** (**ObjectID**: 9648831; **LogFile**: syn11334407) and I have some questions: - How should I use the express lane to test my code properly? - Which are the differences between obtaining a predictions.tsv file [Your prediction file has been stored, but you will not have access to it.] but not valid? - Which are the requirements of a valid predictions.tsv file in both sub-challenge 2 and sub-challenge 3? Thanks for your time and patience. Regards.
Dear all, It is important to note that the express lane data is just the training data copied, so the sample names are different from that of the test data. That being said, you are reading in the correct file. I looked at your prediction file and it seems that when you are reading in the `evaluation_data`, R converts any column headers that have dashes to periods. Please make sure you are doing `read.csv(check.names=F)` so that the correct headers are generated. Best, Tom
Also Zhi: The example shows the header is coming from the prediction object, if the header isn't from the RNA-seq file, am I wrong that we don't know where the prediction object got its sample names? The error message also is complaining about missing TGCA like headers.
I was having the same error until the docker server crashed. Alicia, do you have any other files named predictions* in /output? That was my only guess as to why this might be happening. If you do then that might be why, but I can't confirm until the server is back up.
Dear Zhi, Thank you for your answer. If the testing data have independent samples and don't have TCGA prefix, then I don't understand the **failure reason** that has been reported: FAILURE_REASON: predictions.tsv: All sample Ids in the goldstandard must also be in your file. **You are missing: TCGA-23-1123,TCGA-24-1103,TCGA-24-1428,TCGA-24-1430,TCGA-24-1435** Our prediction method only takes RNAseq file as input and the samples of the /evaluation_data/prospective_ova_rna_seq_sort_common_gene_15121.txt file are the samples that appear in predictions.tsv outfile. What should I do to overcome this error and get my script valid in the evaluation system? Thanks.
Dear Alicia, The testing data have independent samples rather than TCGA collection, so they don't have 'TCGA' prefix. Please refer to the dryrun script here as a good example: https://github.com/Sage-Bionetworks/NCI-CPTAC-Challenge-Examples/blob/master/sc2/Dry_Run_SC2.R. Best, Zhi

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Missing gold standard samples in RNAseq evaluation data file of leaderboard 1 (SC2 and SC3) page is loading…