Hello, I have a question related with number of samples in MayoRNAseq dataset. I hope you can help me with that. When I search for "syn9738945" and perform those filterings in order "rnaSeqReprocessing" - "fastq" - "MayoRNAseq" -"Temporal Cortex", I am obtaining 552 fastq files (276 samples). According to Allen 2016 et. al article, 80 Control and 84 Alzheimer's Disease subjects must be included in this dataset. I downloaded all 552 fastq files, and checked whether the number of AD and CONTROL samples are consistent with the numbers given in the article. Among downloaded files, I found 82 AD and 78 Control samples. 2 AD and 2 Control samples were removed from the dataset I see. Were those samples actually removed? Or am I missing something? If they were removed, then why? I will be glad if you explain me the difference in the number of samples, and please let me know if I am missing something. Thank you so much for your time!

Created by Dilara
Hello @mxa24 , Thank you so much for your detailed explanation. It helped a lot. Best wishes, Dilara
Hello @Dilara The data was shared early in the research cycle before completion of QC and analysis. Therefore the most up to date information can be found in the study documentation on synapse which was/is updated as more information becomes available. - The number of fastq files of 552 should be correct, due to the two TCX samples that failed sex check not being included in the RNAseq reprocessing effort. - The number of AD and controls can be determined by linking the biospecimen and individual metadata files and referencing the diagnosis, exclude and excludeReason fields. - There were 3 individuals (1950, 1925 and 1957) that were later identified as no longer being controls upon updates to neuropathology information and so they were flagged in the exclude columns above and masked (NA) in the metadata files so that they would not be inadvertently included in analysis. - RNAseq samples from these 3 individuals have an exclude reason of "(Pathology) - Does not meet control criteria (Braak > 3.0)". - Two of the above three "control" individuals were part of the TCX dataset: 1950_TCX and 1925_TCX which is why the updated metadata has 78 controls and not 80. I hope that clears things up for you! Mariet
Hello @abby.vanderlinden and @mxa24 , Thank you so much for your help. According to my calculations, there should be two more missing subjects. According to Allen 2016 et. al. article, 80 Control and 84 Alzheimer's Disease subjects should be included but there are 78 Control and 82 AD subjects existing. So, 2 Control and 2 AD subjects are missing. You said 2 of them were removed (132_TCX and 844_TCX). Is there any possibility that there should be 2 more missing subjects? By the way, I checked the flagged samples. 29 samples were flagged. There are 23 samples between [1005-1123] TCX, and they are already included in 552 fastq files. On the other hand, remaining 6 subjects belong to progressive supranuclear palsy samples. Thank you so much for your time.
Hello @abby.vanderlinden and @Dilara Some samples were flagged due to failing various quality control metrics - these are flags provided in the biospecimen metadata file (syn20827192). There were 2 TCX samples that failed sex check and it was decided to remove the data for these from the study (132_TCX and 844_TCX). There are 278 TCX samples, with 2 fastq files per sample = 556. Removing the 4 fastq files for the 2 samples failing sex-check results in 552 TCX fastq files. I hope that helps! Mariet
Hi there, it does seem like there are only 552 fastq files from the Mayo rnaSeqReprocessing study samples (the underlying folder is here: https://www.synapse.org/#!Synapse:syn8612203). I'm not sure why this doesn't match up with the numbers reported in the paper. Perhaps @mxa24 can help? Best, Abby

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

Number of Samples In Mayo RNA-Seq Dataset page is loading…