Hi all,
I just started using the ROSMAP WGS data. the joint VCF files contain 1196 subjects, and looking at the metadata, they all seem to be either from 'MAP' or 'ROS' cohorts. However, according to the cohorts' description here https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyDetails?Study=syn22264775, the number of samples should be the following: ROSMAP: 1200 samples
MSBB: 349 samples, Mayo: 349 samples, which sum is 1898. This is the same number included in the VCF files' name: NIA_JG_1898_samples_GRM_WGS_b37_JointAnalysisXXXX.
So what I would like to know is:
1) do the 1196 come from the three datasets?
2) where are the missing ~700?
Thank you very much
Created by Marianna Sanna marianna Hi Jared,
thank you for you reply.
I will then use those files.
Best wishes,
Marianna Hi Marianna,
The VCF file of the three cohorts are split by cohort and chromosome.
ROSMAP:
https://www.synapse.org/#!Synapse:syn11707419
Mayo:
https://www.synapse.org/#!Synapse:syn11707308
MSBB:
https://www.synapse.org/#!Synapse:syn11707204
The VCF file for everything is very large (you saw 25 GB for all of ROSMAP only), so that's why the splits occur.
Let me know if you have further questions.
Regards,
Jared
Marianna,
Thanks Marianna. I will look into the VCF file shortly.
Regards,
Jared Hi Jared Hendrickson,
thank you for getting back to me. One of the files is _syn11714389_ .
Thanks for looking into this.
Best wishes,
Marianna Hi Marianna Sanna,
Can you please direct me to the exact VCF file you are looking at, preferably by Synapse ID? From there, I can do some data exploration and contact other members of my team if needed.
Regards,
Jared