Hi, I've been trying to map reads in raw MSBB fastq files to hg38 genome instead of converting mapped bam files back to fastq files. The problem is that I've noticed that the fastq files are generally 20-40 times smaller than paired bam files (~10 times smaller after uncompress), which you can see clearly on the download page (https://www.synapse.org/#!Synapse:syn7416949). I then took a look at the FastQC results of the fastq files and compared the total number of reads shown there with that reported by the meta file (MSBB_RNAseq_covariates.csv). Here are the distributions of number of reads reported by QC and by meta file: FastQC ![Number of reads reported by FastQC](https://i.imgur.com/kZ9hsb2.png) Meta file ![Number of reads reported in meta file](https://i.imgur.com/6coB651.png) As you can see, the average number of reads reported by FastQC is less than 1 million while the average number of reads reported in meta file is around 40 million. Does this mean that the unmapped fastq files are somehow incomplete/damaged? If so, would you please upload complete fastq files as replacement? Thanks, Yikai

Created by Yikai Luo yikai1014
Hi Dr. Peters, Thanks a lot for the clarification! I will download the bam files remapped to hg38 as replacement. Yikai
Hi @yikai1014 - the fastq files in syn7416949 are just the unmapped reads not included in the aligned bam files. They were provided in order to enable users to recreate the full fastq. You may want to use the data from the RNAseq reprocessing project - syn9702085. This project is an AMP-AD consortium collaboration where bam files from 3 large RNAseq studies )including MSBB) were converted back to fastq and aligned to hg38. Here is a description of what was done and the data files - syn17010685

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

MSBB unmapped fastq files (syn7416949) have way lower number of reads than reported in meta file page is loading…