Hello,
I need some help associating fastq files from the SEA-AD study to each individual, for all assays (snATAC, snRNA, multiome). I don't see key:value pairs to map fastq.gz file names to individual IDs (or specimen IDs) in biospecimen metadata or assay specific metadata files.
For example, for two Gene Expression fastq files, syn31115955 or syn31116392, which individualID or specimenID do these fastq files correspond to in either the snRNA assay metadata: SEA-AD_assay_snRNAseq_metadata.csv (syn31454453) or the biospecimen metadata: SEA-AD_biospecimen_metadata.csv (syn31149119) ?
I'm having the same issue for fastq files from ATAC/Epigenetics: syn31106991, or multiome: syn31114299
Thanks in advance for any help,
Chris
Created by Chris Rhodes crhodes4 I apologize for not mentioning the The synapse ID for the two files mentioned above. The synapse ID is syn30961086 for 1001246846-raw_feature_bc_matrix.h5 and the synapse ID is syn30961158 for M1TX_191204_103_F01.csv.
Thanks! Hi team,
I am still having trouble mapping the data between the different files. For example, according to the manifest M1TX_191204_103_F01.csv bio specimen file maps to the data matrix file 1001246846-raw_feature_bc_matrix.h5. However, The number of cell in the CSV file is different from the number of cells in the h5 file. More importantly, the number of matches between the cell name (using only the prefix) and the column name from the Seurat_object created from the h5 file is only for a fraction of the cells. What am I doing wrong? How do I match the column names in the h5 file to the cell metadata information Dear team,
I may have figured this out. The error was happening because these two .h5 files were labelled as fastq in the manifest file. Also the file name nomenclature for some of these .h5 files does not match a pattern seen in others (part of the file name is also part of the sample_id in the corresponding csv file. But I guess it is not required.
Thanks!
Best regards,
Rajesh Dear team,
I am having trouble mapping certain h5 files to the corresponding individual id. for example, which individual or biospecimen does the h5 file syn30961086 map to?
Thanks.
Best regards,
Rajesh Hi there, the AD Portal uses file annotations to map individual and specimen IDs to particular data files. When you bulk download data files from Synapse via the web browser or one of the programmatic clients, a download manifest is generated that has all the annotations associated with the files (as @qwang178 helpfully mentioned above!).
General documentation on Synapse annotations is [here](https://help.synapse.org/docs/Annotating-Data-With-Metadata.2667708522.html). However, the annotations for all files in a study are actually displayed in tabular format in the AD Knowledge Portal website, which you can use to explore and filter data. For example, you can find all the metadata and data files from the SEA-AD study on the "Study Data" tab of [this page](https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage?Study=syn26223298). We also have AD Portal help docs on how to [navigate and filter data in the portal](https://help.adknowledgeportal.org/apd/Accessing-Data.2137260765.html#AccessingData-Getstartedexploringdata), as well as [tutorials](https://help.adknowledgeportal.org/apd/Use-Cases.2426404941.html) demonstrating how to bulk download data and join clinical and biospecimen metadata to data files via annotations.
Please let me know if that helps -- I'm happy to answer any other questions you may have.
Best,
Abby If you download the whole data directory (which contains h5, fastq and csv files), there is a file named SYNAPSE_METADATA_MANIFEST.tsv. It has the mapping information for all the files within the directory.
Drop files to upload
mapping SEA-AD fastq file names to specimen ID page is loading…