Hi,
I am referring to the dataset syn52335505, trying to match the de-identified libraries to WGS genotype data in syn11707419.
1. The specimenID in metadata file MIT_ROSMAP_Multiomics_assay_snATACseq_metadata.csv cannot be found the specimenID in ROSMAP_biospecimen_metadata.csv. Are they supposed to be individualID instead? As they are formatted like (RXXXXXXX.183)
2. Assuming those are individual ids, in the publication, there are 92 individuals. 92 unique snATAC-seq libraries were found in metadata file MIT_ROSMAP_Multiomics_assay_snATACseq_metadata.csv, but there is 1 individual ID appearing twice with different suffix (RXXXXXXX.183 and .184). Are they the same donor?
3. The sample names in the vcf files are not all found in the WGS metadata file. For instance, some are named ROS/MAP XXXXXXXX. Are they proj ID instead?
4. After matching of individual IDs (assuming my understanding of the naming convention above is correct), I ended up with only 83 donors, which is not the same as the reported 92.
Thanks in advance!
Created by Aaron Kwok aaronkwc @aaronkwc Hi Aaron,
I indeed see common specimenIDs between the two files from the Synapse preview. Yes, the specimenIDs are of the format RXXXXXXX.183. The individualIDs are instead of the form RXXXXXXX without the period and additional digits at the end. Please let me know which specimenIDs you do not see common between the biospecimen metadata and snATACseq metadata files so that I may investigate further.
individualIDs uniquely identify each donor, so yes, RXXXXXXX.183 and .184 are the same donor.
I will forward your remaining two questions to our data contributor for this study.
Thank you,
Victor Hello @aaronkwc,
Apologies for the slow reply. I've downloaded the respective files and found the same observations as you. Tagging Abby and Victor who might have some insights on the metadata. @abby.vanderlinden @victor.baham do you have any ideas about the following?
- the specimenIDs in MIT_ROSMAP_Multiomics_assay_snATACseq_metadata.csv seem to be the Rosmap Individual ID's with an extension added to the end. Should this whole string be treated as a unique specimenID?
- There are sample ID's in the joint VCF files (syn11707419) that map to a Rosmap project ID. Is there any more information about these samples?
I'm also tagging @hmathys for the questions regarding donor/individuals numbers.
Best,
Will
Drop files to upload
Some individuals in snATAC-seq cannot be found in WGS metadata page is loading…