I am trying to analyze the proteomics data published on synapse.org. In the MayoRnaSeq data neurodegenerative diseases like AD, PSP, pathological aging, etc, there is one cohort with 5 batches and 3 conditions (AD, PSP, and control) in all the 5 batches, there are terms egis and mgis. I am writing because I am unable to understand what those terms mean and why in my PCA, (unable to upload the plot here) these form different clusters as outliers would do. Is it advisable to remove these values from the dataset or do they have important information?
Please see the below as an illustration to explain my question. Such terms are present in all batches (b1,b2,b3,b4,b5)
mayo_b5_egis_02
mayo_b5_egis_24
mayo_b5_egis_45
mayo_b5_mgis_01
mayo_b5_mgis_23
mayo_b5_mgis_44
Also, these data are not annotated to any conditions (AD, PSP or control )
The data has the following synapseID: syn7431988 and the file I am trying to analyze is Mayo_Proteomics_TC_proteinoutput.txt
I tried to look at the Mayo_Proteomics_TC_searchparameters.xml
But couldn't understand the 'egis' and 'mgis' terms that appeared in all the 5 batches of the data.
I will be grateful if I could understand the above-mentioned parameters and can get answers to my questions.
Created by sumode Thank you! Hi @sumode The 'gis' files are global internal standards, which is why they grouping separately in your PCA. You can map the sampleIDs to the individual covariates (syn3817650) through syn9782771. We are working on providing a simpler set of metadata for this study.