Hi, I have noticed that under the same barcode there are some resequenced bam files, i.e., hB_RNA_9212.accepted_hits.sort.coord.bam ([syn5519241](syn5519241)) and hB_RNA_9212_resequenced.accepted_hits.sort.coord.bam ([syn5853553](syn5853553)). I would like to know how the MSBB team deal with these resequenced data in their subsequent analysis. Do they chose the higher quality one or merge the two files together? Thanks, Wei Hong

Created by Wei Hong weihong1991
John, The batch information is correct. The file naming is unfortunately confusing due to various rounds of transferring/relocation. Minghui
Thank you @Mette for resolving this issue so quickly. I might also suggest you rename the RNAseq_covariates file with a "\_December2018Update" suffix to reflect that it includes this fix. We were confused a bit on our end that this issue, first reported on 12/17/2018, appeared to have been fixed in a file with "\_November2018Update". I had another question about a similar issue. We have historically paired the ".accepted_hits.sort.coord.bam" file with the ".unmapped.fastq.gz" file based on the initial part of the filename. By paired, I mean treated as if they came from the same sequencing run, when we have done analysis that required realigning from scratch. However with the inclusion of complete batch information (for all RNAseq runs) in the RNAseq_covariates file, we've used this information to pair the files. I found one exception of sorts Based on filename, these two go together hB_RNA_10892.accepted_hits.sort.coord.bam hB_RNA_10892.unmapped.fastq.gz And these two (maybe) hB_RNA_10892_K77C014.accepted_hits.sort.coord.bam (just added recently by @karawoo) hB_RNA_10892_resequenced.unmapped.fastq.gz However, based on batch, the pairing appears to be different. Filename|Batch|synapseID hB_RNA_10892.accepted_hits.sort.coord.bam|E007C014|[syn5850147](syn5850147) hB_RNA_10892.unmapped.fastq.gz|K77C014|[syn5519316](syn5519316) hB_RNA_10892_K77C014.accepted_hits.sort.coord.bam|K77C014|[syn17013947](syn17013947) hB_RNA_10892_resequenced.unmapped.fastq.gz|E007C014|[syn5853375](syn5853375) Can you confirm the batch information is correct for these two sequencing runs? Might it be wise to rename the files if that is the case? The new filenaming including the batch (everything uploaded recently by @karawoo) is a great improvement in general!
@Mette @minghui.wang Thank you very much!
@weihong1991 - please see a new version of the file in syn6100548. Thanks for pointing this out
Hi Wei, Thanks for the great question. In addition to RIN score and rRNA rate criteria, sequencing depth is another factor that is used to select the best sample from replicates. In this case, the one with more mapped reads will be selected. Regarding the mismatched batch id for hB_RNA_9212, it seems the current metadata file is not the right version. @Mette I think there is something wrong here. I will re-send you the updated version in case you can't find it. Minghui
I will have to refer your question to the data contributor. @minghui.wang - cam you please take a look at the question above. Thanks!
Thanks Mette, I have checked the metadata file, both samples were marked as "Okay". In addition, both samples have RIN>4 and rRNA rate <5%, passed the QC thresholds descrbed in [syn3157743](syn3157743). However, there is only one "hB_RNA_9212" in the expression matrix. Besides, some bam files and corresponding unmapped fq files have different batch IDs. I.e., according to [syn6100548](syn6100548), the batch ID of hB_RNA_9212.accepted_hits.sort.coord.bam ([syn5519241](syn5519241)) is E007C014 and hB_RNA_9212.unmapped.fq.gz ([syn5519720](syn5519720)) is E2C014. Since the two files came from the same raw sequencing data, I suppose they should have the same batch ID. Thank you very much, Wei Hong
I recommend first taking a look at the RNAseq metadata file. It contains information on which samples were used: syn6100548

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

Question about resequenced data in MSBB page is loading…