Hi,
I have a few questions.
1. I have just found out about this challenge and it's quite possible I won't be able to complete my submission by the deadline this Sunday. Is it still possible to participate in the competition without the first submission?
I am looking at the data for the Subchallenge 2 training data.
2. Only 4795 of the 7061 genes in the outcome tables (retrospective_ova_JHU_proteome_sort_common_gene_7061.txt and retrospective_ova_PNNL_proteome_sort_common_gene_7061.txt) are in the predictors tables (retrospective_ova_JHU_proteome_sort_common_gene_7061.txt and retrospective_ova_PNNL_proteome_sort_common_gene_7061.txt). This means that I can make models only for 4795 genes. I am missing something?
3. The same goes for samples. Only 105 samples are in common between the outcome tables (retrospective_ova_JHU_proteome_sort_common_gene_7061.txt and retrospective_ova_PNNL_proteome_sort_common_gene_7061.txt) and the union of the samples in the predictors tables (retrospective_ova_JHU_proteome_sort_common_gene_7061.txt and retrospective_ova_PNNL_proteome_sort_common_gene_7061.txt). This means that I can only use these 105 samples for training/testing. Is this correct?
Thank you very much.
Created by Bogdan Done bogdandone Hi,
2) If there is no corresponding RNA/CNA for a protein, you can still predict it based on other RNA/CNA
3) there is indeed no microarray in test data. But you can still use it to train the model if you want and see for yourself if it's a good idea to use it or not.
Best,
Mi
Thank you Mi.
It appears that I did not name the predictor tables correctly in my question above.
I am still a little bit confused about the answer on question 2. The ovarian cancer RNA-Seq data and the CNA data contain 15782 gene identifiers in total (the union of the identifiers in the 2 tables), but only 6061 gene identifiers are in common with those in the outcome tables (retrospective_ova_JHU_proteome_sort_common_gene_7061.txt and retrospective_ova_PNNL_proteome_sort_common_gene_7061.txt). This means that I do not have predictors for 1000 genes (out of 7061), and I won't be able to produce models for them.
In regards to the answer on question 3, I have to use the RNA-Seq data instead of the microarray data because it appears that the input files in the testing phase have only RNA-Seq data and CNA data, and no microarray data (based on the names of the input files described for the testing procedure: prospective_ova_rna_seq_sort_common_gene_15121.txt and prospective_ova_CNA_median_sort_common_gene_11859.txt). What am I missing?
Thank you again for your help.
Hi,
1) No need to submit to leaderboard, you can wait for final round.
2) you have to predict 7061 proteins' abundance, which you train with mRNA or CNV.
3) in case you use mRNA: If you use microarray as predictor you will have 174 samples for training. if you choose RNAseq, it's 105.
You don't necessarily need to take those in common.
Best,
Mi