Hi, I'm having problems with finding the information about how some data are processed (and to what experiment does it correspond to). I am after RNAseq raw read counts and generally looking for .tsv files. If I go to the MayoRNAseq Study data section, there is MayoRNAseq_RNAseq_TCX_geneCounts.tsv (syn4650257), MayoRNAseq_RNAseq_TCX_geneCounts_normalized.tsv (syn4650265), MayoRNAseq_RNAseq_CBE_geneCounts.tsv (syn5201012), and some others. When I open these files in the synapse it allows me to backtrack the file paths to get some idea what these correspond to. With this I was able to get to syn6126177, which lists all counts for cerebellum samples of MayoRNAseq. In this particular case, there are 5 .tsv count files. What might be the difference between MayoRNAseq_RNAseq_CBE_transcriptCounts.tsv (syn5600773) and MayoRNAseq_RNAseq_CBE_geneCounts.tsv (syn5201012)? There is also MayoRNAseq_RNAseq_CBE_transcriptCounts_normalized.tsv (syn6126177), but I cannot find what normalisation this is supposed to be. These files belong to the Gene Expression (RNAseq - SNAPR) section for which in the methods section it is stated that: "Explanation of available files and post-processing: The individual read count files produced by SNAPR are merged into a single file: "AMP-AD MayoRNAseq UFL-Mayo-ISB mRNA Alzheimers Disease IlluminaHiSeq2000 CBE geneExp raw count Homo sapiens" with combine_count_files.pl. These merged count files are normalized with the tmm_normalization.R script which uses the edgeR implementation of TMM to calculate CPM. Differences in library size were normalized across samples using the EdgeR function calcNormFactors. Normalized read counts were then converted to cpm with the cpm function, also in EdgeR. These normalized counts are saved as "AMP-AD MayoRNAseq UFL-Mayo-ISB mRNA Alzheimers Disease IlluminaHiSeq2000 CBE geneExp TMM normalized Homo sapiens"." Which does not help me much. I think there must be easier way to find this out, but I'm just missing it. Thanks, T

Created by Tapio Nevalainen newsky
Thanks a lot! T
Hello, It looks like for the files that end in "transcriptCounts.tsv", reads were summarized/counted at the transcript level (where there may be multiple transcripts per gene), while the files that end in "geneCounts.tsv" were summarized at the gene level. Unless you need information about which transcripts within a gene are more/less abundant, the "geneCounts.tsv" files are probably what you want. As far as normalization, based on their description it sounds like they did this procedure in R (code not tested): ``` library(edgeR) y <- DGEList(counts_matrix) y <- calcNormFactors(y, method = "TMM") y <- cpm(y) ``` So the "normalized.tsv" counts files should have values that are equivalent to [counts per million (CPM) x a scaling factor for each sample] which edgeR calculated. You can check that my assumption is right by seeing if the column sums are all close to 1e6. As far as the names given in the methods writeup (like "AMP-AD MayoRNAseq UFL-Mayo-ISB mRNA Alzheimers Disease IlluminaHiSeq2000 CBE geneExp raw count Homo sapiens"), I think it's highly likely that when they put the data on Synapse they shortened the filenames to what we see now, and forgot to update the description to match. So in summary I think what you want are the files ending in "geneCounts.tsv", which are the raw reads for the temporal cortex and cerebellum. Or, if you want to use their TMM-normalized values, you would want the "geneCounts_normalized.tsv" files instead. Hopefully that answers your questions, but if I missed something or you have more questions please reply here and I'll do my best to help. Jaclyn

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

Problems with linking data with the method descriptions page is loading…