In data description the follow is written:
"Consistently with the choice made by the authors of the original publications [1,2], we decided to use RNA-seq (RSEM z-score, median aggregated) for breast cancer."
But in the original publications [1,2] used upper quartile normalized RSEM data, which were log2 transformed and median centered.
Were the data for the challenge normalized the same way as in the publications? If not, then could you describe it?
Created by Ábel Fóthi FAbika Dear all,
I'm a bit confused here. So, RNAseq training and testing input files will have the same normalization and z-score pre-processing?
Thanks, Dear all,
We apologize for the delayed response. It took some time to find the original data as they've been relocated on TCGA firehose since our initial download.
Please refer to data CANCERTYPE.mRNAseq_Preprocess.Level_3.2016012800.0.0.tar.gz under each cancer type directory on TCGA firehose for mRNAseq training data.
Thanks,
Zhi
Hi Mi,
so basically you choose the normalized dataset and applied the "mRNAseq_preprocessor" and calculated the Z-score ? Hi Irene,
Here it is : http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/OV/20160128/
Best,
Mi Hi, Mi,
Could you please send us the link to RNAseq_RSEM_Z_Score?
I can only see these files in the mRNAseq, can not see the RSEM_Z_Score file though.
illuminahiseq_rnaseqv2-RSEM_isoforms (MD5)
illuminahiseq_rnaseqv2-exon_quantification (MD5)
illuminahiseq_rnaseqv2-RSEM_genes_normalized (MD5)
illuminahiseq_rnaseq-exon_expression (MD5)
illuminahiseq_rnaseqv2-junction_quantification (MD5)
illuminahiseq_rnaseqv2-RSEM_isoforms_normalized (MD5)
mRNAseq_Preprocess (MD5)
illuminahiseq_rnaseqv2-RSEM_genes (MD5)
illuminahiseq_rnaseq-splice_junction_expression (MD5)
illuminahiseq_rnaseq-gene_expression (MD5)
the link I used which is pointed from the data data descriptive section is:
http://firebrowse.org/?cohort=BRCA&download_dialog=true
thank you,
Dear Abel,
We downloaded the RNAseq data directly from TCGA firehose (RNAseq_RSEM_Z_Score).
Best,
Mi