Hi,
I have some problem with the AMP-AD_ROSMAP_WGS data. I cannot link each row (ie each 'mutation) in vcf files to specific samples (for example, https://www.synapse.org/#!Synapse:syn10997292, there is no WGS_id or project_id in the vcf files). In that , I have no idea whether the sample is patient or control, from blood or brain genomic DNA. There was a file named rosmap_WGS_id_key.csv. However, none of the WGS_id and projid are contained in the downloaded vcf files. In this way, I cannot locate each mutation to specific sample ID. Could you please kindly show me how to resolve this question? I saw threads about this problem a couple of years ago in the forum. I guess it may have already been solved. Thank you very much for your kindly help!
Best wishes,
Fan
Created by Fan Mei meif2021 Hi Abby,
Thank you for the clarification, I was looking at the annotated files rather than the full VCF. The latter is exactly as you described with an allele count per sample. Thank you for your help!
Best,
Matthew Hi Fan and Matthew,
The individual IDs in the VCF files are the column headers in the header line after INFO and FORMAT. Each column after "FORMAT" contains one individual, and the variant and associated information is included in the format specified by the VCF header.
Is it possible that you're looking at the annotated VCFs? Those look like population-level or site-only files that only contain the list of variants, and not individual columns. Take a look again at the [wiki](https://www.synapse.org/#!Synapse:syn10901595) and make sure you click **"see more"**, then scroll down to the very bottom of the section. This includes a table showing the different file types (e.g. variants.annotated.vcf.gz vs variants.vcf.gz). The individual IDs should be in the un-annotated VCF files.
Let me know if this gives you what you're looking for!
Best,
Abby Dear Mette,
I am writing to ask if there is any update to this issue as mentioned in Fan's comment above? Thank you!
Best,
Matthew Dear Mette,
Thanks again for the great effort! May I ask an update about the "info" column of the VCF file after contact the data provider? Is there any more information in the WGS data to link each variant with the individual biospecimen (syn10901595)? Thank you very much!
Best wishes,
Fan Hi Mette,
Thank you very much for your information. It really helps a lot! I look forward to the reponse about the link of variants to given individual. Thanks again!
Best wishes,
Fan We will contact the data provider and respond here. Best - Mette Hi,
Thank you for your response! It does appear that the vcfs have the specimen information, but this information is in the header of the vcf and there is no indication of what variant is found in a given individual (the information is missing from the "INFO" column of the VCF). This makes it challenging to genotype individuals based on whether they have a loss of function variant of interest. Is there a version of the data for which this variant-level information is available? Or is there something in the VCF header that can be mapped to information in the individual rows? Thank you!
Best,
Matthew Thank you both for your interest in the data.
To first answer the question about syn10997292. That file does not have specimen information, but is a file with genetic variant annotation. See the vcf files for data on the specimens. This table on this page has a description of the datafiles: https://www.synapse.org/#!Synapse:syn10901595
To map the specimenIDs to the clinical data, see the biospecimen metadata file: syn21323366 (note that this has biospecimens used for all the assays and therefore contains more than the WGS specimens). Map the individualID (R..) from that file to the clinical metadata: syn3191087. Note the QC info on the samples: https://www.synapse.org/#!Synapse:syn12177996
We are working on a biospecimen metadata file that includes the QC info for each sample (and more explicitly maps to the assay). If you are not already signed up, request to join the [AMP-AD_DataReleaseUpdates](https://www.synapse.org/#!Team:3372003) team. That team gets tagged in this discussion forum when we have a data release or make an update Hello,
I have been running into the same issue: I do not see a specimen, sample or projid associated with each variant, in contrast to the mciroarray data where the column names correspond to specimen IDs indicated in the metadata. Is there a way to link the information from the WGS metadata (syn21314542) to the variants in the WGS data? Thank you!
Best,
Matthew
Drop files to upload
Inquiry about AMP-AD_ROSMAP_WGS data page is loading…