Hi there, first of all thank you for sharing such a fantastic resource!
For the ChIP-seq metadata (syn25724249) in the SuperAgerEpiMap, the specimen IDs etc. don't match the name of the files in the ChIP-seq data repository (which are often labelled with single letters and numbers instead. Is there a newly updated metadata file that contains the new file names?
Thank you!
Created by Samuel Keat KeatSam Thanks for your interest in our data, Sam. and thanks for the explanation, Laura. Yeah, these data were all paired-end data.
If you download them and kept the original file names, you will also find they have consistent file names with read1 with R1, and read2 with R2.
Best,
Pengfei Hi both,
Never mind - i've worked out a way around it. Counting the number of total reads within the files makes them identifiable in the metadata, so i've been able to identify those last few synapseIDs.
Thanks a lot for your help!
Sam Hi Laura,
Of course yes - I've given each of these in their constituent read pairs.
Hi @PFDong ! This is for samples that are labelled with the same specimen ID, individual ID and same tissue and - making them indistinguishable from the others - the same batch. If I can get the assay target for each of these I'll be able to distinguish them from the others.
Thank you @lheath and @PFDong !
Sam Hi @KeatSam,
I believe the portal curators assumed these sets of fastqs were paired-end reads. All the metadata we have on each synid is in the annotations, so if there is a missing identifier for these files then we will need input from the data contributor @PFDong for further details. I'm sorry we couldn't clarify this for you quickly and easily!
@PFDong, we have a data user interested in the SuperAgerEpiMap Chipseq data, but they are having trouble discerning which files to download to obtain H3K27me3 data. Could you please clarify which files are which for @KeatSam? For instance, consider the pair of files [syn25727442](syn25727442) and [syn25728399](syn25728399) (full set of questioned pairs in thread above). Aside from the file names, these two files have identical annotations. Are these paired-end reads, or is one an input file and the other H3K27me3 data? Thank you for your help!
Best,
Laura
Hi Laura,
I've been able to get the vast majority of assay targets and tissues etc. from the file names, but for these Synapse IDs (in pairs of reads), I am unsure as they have the same specimen and batch ID. Do you have the assay target (input, H3K4me3, H3K27ac) for these files? Then I can fully map every file.
Read1 Read2
syn25727442 syn25728399
syn25727619 syn25728585
syn25727645 syn25728610
syn25728062 syn25729014
syn25727692 syn25728659
syn25728128 syn25729082
syn25727466 syn25728423
syn25727702 syn25728669
syn25727733 syn25728695
syn25728163 syn25729115
syn25727764 syn25728726
syn25728195 syn25729139
syn25727481 syn25728440
syn25728225 syn25729162
syn25727876 syn25728838
syn25728301 syn25729240
syn25727907 syn25728875
syn25728329 syn25729263
Thank you! Hi @KeatSam,
I can see from the annotations table that each individualID has multiple files, which is expected. If I sort by individualID, specimenID, assay, and tissue (since multiple brain regions were assayed), I can see from the ChiPSeq file names ('names' column in the annotations table) that for each specimenID, there are multiple chipseq fastq files per specimenID. If you can discern from the file names which files you want (as some are labeled with 'input' and some by batch, and still others by celltype/K23me3), you can pull the synapse IDs of the files you want directly from the annotation table.
If you cannot discern which files are useful from the annotations table, I think it might help me to have a specific example (set of synapseIDs) for what you are experiencing.
Best,
Laura
Hi Laura,
Thanks a lot for your reply. I've been able to access the annotations now - thank you.
However, I'm still left with the same problem. Whilst most files will be identifiable based on their sequencing batch and coupled to their specimenID and individualID (for which each ID has up to 4 files associated to it), those that have the same sequencing batch and same specimen IDs - and yet have different assays (one example is two files belong to the same specimenID, individualID and batch ID - but one is H3K27me3 and one is Input, with no way of distinguishing between the two) - I'm unable to pair up to the relevant file. Is there a way around this?
Thank you! Hi @KeatSam,
For this project, the specimenIDs for each chipseq fastq file are in each fastq file's annotations. The annotations contain both the individualID and the specimenID that will then match what is in the chipseq metadata. The easiest way to get all the annotations is to go through the AD Knowledge Portal study page for this [study](https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyData?Study=syn25672226). Click on the "Study Data" tab, scroll down to the study data itself (below the metadata), and find the Download Options arrow (above and to the right of the pie charts). Click on this arrow and select the option to "Export Table." This table will contain the names of the files, the specimenIDs, and the individualIDs for all of the data files in this project, and you can filter based on assay (ChIPSeq) from here.
I hope that helps!
Best,
Laura
Drop files to upload
SuperAgerEpiMap ChIP-seq Data - File Names Different to Metadata page is loading…