The description of the normalization of the MSBB RNAseq data says, "Known covariate factors, including sex, race, age, RIN, PMI, batch and site, were corrected using a linear model to remove the confounding effects." https://www.synapse.org/#!Synapse:syn3157743
I see the covariate variables provided in the new meta data files except "site":
> names(rnaseq_covar)
[1] "synapseId" "sampleIdentifier" "fileName"
[4] "BrodmannArea" "barcode" "fileType"
[7] "individualIdentifier" "batch" "RIN"
[10] "TotalReads" "Mapped" "rRNA.rate"
> names(clinical_covar)
[1] "individualIdentifier" "PMI" "RACE"
[4] "AOD" "CDR" "SEX"
[7] "NP.1" "PlaqueMean" "bbscore"
[10] "Apo1" "Apo2"
Can anyone explain the "site" variable? I was thinking of site as the site of data collection, rather than brain area.
Thank you!
Created by Dave Airey david_c_airey Thanks. One more question about the RNAseq meta data file. RIN and BATCH are not present for re-sequenced samples. Should they both be carried forward from the first sequencing of a given sample? I can see this being so for RIN but I worry about BATCH. Can you clarify the missings in the meta data file? Thanks @minghui.wang I'll update the page That description is outdated. The data was corrected for PMI, RACE, Batch, SEX, RIN and Exonic rate. I assume that is brain region - all samples came from the Sinai brain bank. @minghui.wang, or @noambeckmann - can you please confirm if the reference to 'site' here: syn3157743 should be brain region. See Known covariate factors, including sex, race, age, RIN, PMI, batch and **site**, were corrected using a linear model to remove the confounding effects." This sentence, "Known covariate factors, including sex, race, age, RIN, PMI, batch and **site**, were corrected using a linear model to remove the confounding effects.", in the description here: https://www.synapse.org/#!Synapse:syn3157743. I've copied the section from this description below. I'm assuming by 'site' they mean brain area or brain site?
RNA seq data processing
Normalization and covariates correction
Genes with least 1 read count in at least 10 libraries were considered present, otherwise removed. The trimmed mean of M-values (TMM) normalization method in the R/bioconductor edgeR package was employed to estimate scaling factors so as to adjust for differences in library sizes. Known covariate factors, including sex, race, age, RIN, PMI, batch and **site**, were corrected using a linear model to remove the confounding effects.
Covariates are provided: Demographic traits and neuropathological data were collected on the samples used for this project including postmortem interval, race, age of death, clinical dementia rating, clinical neuropathology diagnosis, CERAD, Braak, sex, and a series of neuropathological variables.
Hi Dave, where do you see 'site'?
Drop files to upload
Question about new MSBB meta data page is loading…