Hi I was trying to process fastq files from ROSMAP microglia RNA seq and single nucleus RNA seq. I am wondering if the fastq files have already trimmed or still need to using fastx to trimmed the poor reads?(syn11578941,syn17055069)

Created by Junming Hu hjunming
@hjunming If you access the data through the AMP-AD Knowlege Portal (which is a site built on top of Synapse) it may make it easier to see the connectedness between the data. [See here](https://adknowledgeportal.synapse.org/#/Explore/Studies?Study=syn3219045). Note, the study summary and methods description for all the assays, links to the metadata and data files. And, at the very end - related studies, including the 'snRNAseqPFC_BA10'. That study is data as published here: https://www.nature.com/articles/s41586-019-1195-2 "may I asked syn16780177 is coming from which region?"; under Study Data (for the main ROSMAP study) filter 'data type' by gene expression - > click on the link to the scrnaSeq assay files and notice the tissue annotation (it is all from the DLPFC). Brain region information is also available ion the biospecimen metadata file
Hi @meagan Thanks for your help! BTW I noticed that there have another rosmap single nucleus RNA sequencing files store in (https://www.synapse.org/#!Synapse:syn18485175). May I ask what difference between them? and also may I asked syn16780177 is coming from which region? Many Thanks
Hi @hjunming -- I've updated this so that these files will now have the same display name and download-as name, both in a Cellranger-friendly format.
Hi @hjunming -- Thanks for bringing this up. We'll look into this and get back to you.
Hi @Mette @vilasmenon could you please check the download files?
Hi Mette, Thanks for reply. I saw the table before. But when I download the file using (syn17055069). The file name totally changed. This is the name display when I download: **MFC-B1-2-Cog1-Path0-M_S13_L007_R2_001.fastq.gz** MFC-B1-S3-Cdx1-pAD1-F_S1_L007_R2_001.fastq.gz MFC-B1-S6-Cdx4-pAD0-M_S6_L007_R1_001.fastq.gz MFC-B1-S8-Cdx4-pAD1-M_S7_L007_R2_001.fastq.gz MFC-B1-S1-Cdx1-pAD0-F_S1_L006_R1_001.fastq.gz MFC-B1-S4-Cdx1-pAD1-M_S5_L007_R1_001.fastq.gz MFC-B1-S6-Cdx4-pAD0-M_S6_L007_R2_001.fastq.gz SYNAPSE_METADATA_MANIFEST.tsv MFC-B1-S1-Cdx1-pAD0-F_S1_L006_R2_001.fastq.gz MFC-B1-S4-Cdx1-pAD1-M_S5_L007_R2_001.fastq.gz MFC-B1-S7-Cdx4-pAD1-F_S3_L006_R1_001.fastq.gz batch2 MFC-B1-S2-Cdx1-pAD0-M_S4_L007_R1_001.fastq.gz MFC-B1-S5-Cdx4-pAD0-F_S2_L006_R1_001.fastq.gz MFC-B1-S7-Cdx4-pAD1-F_S3_L006_R2_001.fastq.gz MFC-B1-S3-Cdx1-pAD1-F_S1_L007_R1_001.fastq.gz MFC-B1-S5-Cdx4-pAD0-F_S2_L006_R2_001.fastq.gz MFC-B1-S8-Cdx4-pAD1-M_S7_L007_R1_001.fastq.gz This is the name show in website: MFC-B1-S1-Cdx1-pAD0-I1.fastq.gz MFC-B1-S1-Cdx1-pAD0-R1.fastq.gz MFC-B1-S1-Cdx1-pAD0-R2.fastq.gz MFC-B1-S2-Cdx1-pAD0-I1.fastq.gz MFC-B1-S2-Cdx1-pAD0-R1.fastq.gz MFC-B1-S2-Cdx1-pAD0-R2.fastq.gz MFC-B1-S3-Cdx1-pAD1-I1.fastq.gz MFC-B1-S3-Cdx1-pAD1-R1.fastq.gz MFC-B1-S3-Cdx1-pAD1-R2.fastq.gz MFC-B1-S4-Cdx1-pAD1-I1.fastq.gz MFC-B1-S4-Cdx1-pAD1-R1.fastq.gz MFC-B1-S4-Cdx1-pAD1-R2.fastq.gz MFC-B1-S5-Cdx4-pAD0-I1.fastq.gz MFC-B1-S5-Cdx4-pAD0-R1.fastq.gz MFC-B1-S5-Cdx4-pAD0-R2.fastq.gz MFC-B1-S6-Cdx4-pAD0-I1.fastq.gz MFC-B1-S6-Cdx4-pAD0-R1.fastq.gz MFC-B1-S6-Cdx4-pAD0-R2.fastq.gz MFC-B1-S7-Cdx4-pAD1-I1.fastq.gz MFC-B1-S7-Cdx4-pAD1-R1.fastq.gz MFC-B1-S7-Cdx4-pAD1-R2.fastq.gz MFC-B1-S8-Cdx4-pAD1-I1.fastq.gz MFC-B1-S8-Cdx4-pAD1-R1.fastq.gz MFC-B1-S8-Cdx4-pAD1-R2.fastq.gz if you find the first one from mine, you will not see them in website.
See the methods section for the 'gene expression (single nucleus RNAseq)': https://adknowledgeportal.synapse.org/#/Explore/Studies?Study=syn3219045. See note about filenames. If you follow the link to the singlecell RNAseq datafiles you will see that files are annotated with a 'specimenID'. That specimen ID maps to the single cell RNAseq assay metadata and biospecimen metadata file under 'metadata files'
@vilasmenon Hi I found something weird when I trying download ROSMAP Gene Expression (single nucleus RNA seq)(syn17055069) The filename I download are totally different from what I saw in website. some of them even did not match the instruction files. I am wondering if there is something wrong here. Could you please check it? Many Thanks
@Mette @vilasmenon Absolutely Yes!
@vilasmenon thank you for your input. I second the encouragement to let us know about interesting findings. @hjunming we are always interested in how the data is being used
@hjunming you're very welcome! Please let us know if you find anything interesting with the single-nuc RNA-seq data with and without trimming the reads.
Great! Thanks!
@hjunming For the single-nucleus RNA-seq data here, a given sample was not run across multiple lanes, so there was no need to combine them after alignment. For the bulk RNA-seq, yes, samples run across different lanes can be combined either before or after alignment. If using RSEM, it is better to combine them before alignment.
@vilasmenon one more question. for those samples with different lanes. did you combine them after alignment?
Thanks a lot@vilasmenon
Hi @hjunming - yes, for the bulk RNA-seq microglia data (syn11578941), the fastq files were trimmed. and were not processed using cellranger. It's the single-cell/single-nucleus RNA-seq data from the 10x platform that does not use any explicit trimming, apart from what is in the cellranger pipeline.
Hi@vilasmenon How about microglia RNA seq. It shows you are using fastx to trim the data. is that fastq files have being trimmed?
@hjunming In general, we use the cellranger pipeline (mkfastq and count) to generate and align the fastqs. It does minimal trimming, as far as I'm aware, but we do not do further trimming offline. I would be interested to know if there's a difference in the alignment percentage with an intermediate trimming step before running cellranger count on this data.
Many Thanks!
@vilasmenon - please see the question above regarding the microglia RNAseq.

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Hi I was trying to process fastq files from ROSMAP microglia RNA seq and single nucleus RNA seq. I am wondering if the fastq files have already trimmed or still need to using fastx to trimmed the poor reads?(syn11578941,syn17055069)

Drop files to upload

Question about the ROSMAP single cell fastq files and RNA seq data for Microglia page is loading…