Hi, I'm trying to find more information on how these files (syn30821562) were processed. I understand that the provenance links contain such information, but when I click on the provenance links, I do not have access.
My questions are:
1) Has APOE4 been regressed out (e.g., ROSMAP_Residualized_counts_(diagnosis-sex).tsv, syn26967455).
2) Diagnosis and sex in the file name indicates that diagnosis and sex variation has been added back to the data, is that correct?
If possible, please reference the GitHub link for code that was used to process the files.
Created by Robert Valenzuela rkvalenzuela Hi Will, thank you for your response and feedback. If I'm following correctly, according to the Covariate section of syn26967461 (ROSMAP html), the following were significant covariates, which includes APOE4:
individualID, diagnosis, tissue, race, spanish, apoe4_allele, sex, final_batch, pmi, RIN, RIN2, age_death, AlignmentSummaryMetrics_PCT_PF_READS_ALIGNED, RnaSeqMetrics_PCT_INTRONIC_BASES, RnaSeqMetrics_PCT_INTERGENIC_BASES and RnaSeqMetrics_PCT_CODING_BASES
In the Model development section (syn26967461), forward step-wise regression was utilized to determine covariates to add to the model and were retained if that covariate resulted in the lowest BIC for a given iteration. This process was iterated through 10 times and resulted in the following model of the resultant 10-covariates (and the model happened to be the same for each of the three cohorts, ROSMAP/syn26967461, MAYO/syn27024974, and MSBB/syn27068766) :
diagnosis + (1 | final_batch) + tissue + scale(RIN) + sex + scale(RnaSeqMetrics_PCT_INTRONIC_BASES) + scale(RnaSeqMetrics_PCT_CODING_BASES) + scale(RnaSeqMetrics_PCT_INTERGENIC_BASES) + scale(RIN2) + scale(AlignmentSummaryMetrics_PCT_PF_READS_ALIGNED)
Hence, although APOE4 was a significant covariate, it was not ultimately controlled for in the final model.
Although not explicitly stated, it appears that age_death was also added as a covariate, presumably for two instances (i.e., *_Residualized_counts_(age-death).tsv, and *_Residualized_counts_(diagnosis-sex-age-death).tsv) and the resultant variation was added back to the residuals (syn30821562). The residualized files with the variation added back from the specified covariate(s) were:
age-death
diagnosis
diagnosis-sex
diagnosis-sex-age-death
How do I make a request for additional residualized files to be processed with specified covariate-variation added back to the residual file?
Thank you again for your help.
Robert
Hi @rkvalenzuela,
Source code for the data preparation can be found here: https://github.com/Sage-Bionetworks/ampad-rnaseq-reprocessing/blob/0e67c1119afea361f1c0a17cd2bae2a0730479e9/code/metadata_preprocessing/rosmap_preprocessing.R
I believe that the html file here presents some results that might help address your questions as well: syn26967461. From what I can gather, APOE4 was identified as a significant covariate and adjusted for. I hope that these resources help, but let me know if you're still stuck and I can try to dig in deeper or reach out for help.
Best,
Will