Hello! I am trying to download the fastq files from "ROSMAP RNAseq fastq files" (syn21589959), and was searching for the corresponding metadata, which I assume is in "ROSMAP_assay_RNAseq_metadata.csv" (syn21088596). However, I was unable to match which fastq file corresponds to its respective specimen ID. Am I looking at the wrong projects to match with each other? Much help would be greatly appreciated! Thank you!

Created by Young-Jun Jeon jeonrpm2020
So sorry, I think I figured this out. The project ID corresponds to each donor, so I was able to use both the clinical metadata file and the study metadata csv file and the legend from the data dictionary here: https://www.synapse.org/#!Synapse:syn3191087 Hope this helps others who may be trying to do the same thing.
Hi again @abby.vanderlinden , I found a file called ROSMAP_clinical.csv and it has project IDs, however the individual IDs (from ROSMAP metadata) do not match the sample IDs on the clinical data csv. The project ID however does match, so can we assume that the project ID is unique to each sample? (And for the bulk microglia ROSMAP gene expression data, we should see 4 project IDs per sample (sequenced on 2 lanes for paired end seq?) Sorry for the confusion. Please email me if you'd like to see the files I am referring to (medwards@umass.edu).
Hi @jaclynbeck @abby.vanderlinden, I am also struggling to find the appropriate metadata for my study (ROSMAP, gene expression, SYN11468526). I see the metadata csv file with the sample ID and accompanying barcodes, but there isn't any sample data for sex and/or gender, age, etc. Where can I find this data, especially if my study is focused on sex differences? Thank you, Mélise Edwards
Hello, I just took a look at ROSMAP_biospecimen_metadata.csv and I do see an entry for RISK_100, on line 5168. It should be in the "specimenID" column. I found it by loading the csv into R as `tmp` and running `grep("RISK_100", tmp$specimenID)`. Can you try this (or something similar if you're not working in R) and verify whether it does/doesn't find that specimen ID on your end? Jaclyn
Hello Jaclyn, I still have issues in mapping the ROSMAP fastq files to their pathology states (AD or control). Taking the RISK_100_S63_R1_001.fastq.gz file as an example, I looked into both "ROSMAP_assay_RNAseq_metadata.csv" and "ROSMAP_biospecimen_metadata.csv", but I failed in find the "RISK_100" id in these two files. So I cannot know which fastq files correspond to AD samples and which control samples. I would be highly appreciated if you can help. Thanks in advance! Best, Hongdong
The only information I can find is from the wiki page for [syn3388564](https://www.synapse.org/#!Synapse:syn3388564): "Then RNA-Seq data were processed by our parallelized and automatic pipeline. These pipeline include trimming the beginning and ending bases from each read, identifying and trimming adapter sequences from reads, detecting and removing rRNA reads, aligning reads to reference genome. We used the non-gapped aligner Bowtie to align reads to transcriptome reference and then applied RSEM to estimate expression levels for all transcripts." and the supplementary methods from the [published paper](https://doi.org/10.1038%2Fs41593-018-0154-9), which says the same thing. So it sounds like Bowtie was the main program for alignment. The paper says they used GRCh37 to align histone modification data, so I assume they used the same reference for the RNA Seq data although they do not explicitly state that. If you need more in-depth information on their pipeline I think you will need to contact the authors directly. I hope that helps! Jaclyn
Thank you for the reply. May I ask what how the BAM files were mapped? What programs and reference files were used during the mapping process? Thanks once again!
Hello! The FASTQ files should all be in the format `_S<#>_R<1/2>_001.fastq.gz`. So, for example, files "RISK_100_S63_R1_001.fastq.gz" and "RISK_100_S63_R2_001.fastq.gz" are for specimenID "RISK_100" in the RNAseq_metadata file. A lot of the data was provided as BAM files instead of FASTQ files, so the FASTQ folder will not have files for all the specimens listed in the metadata file. You can find those BAM files with the other specimen data here: [syn22333035](https://www.synapse.org/#!Synapse:syn22333035). I hope that helps! Let me know if you have any other questions, Jaclyn

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

How to find which fastq file corresponds to which specimen ID page is loading…