Hello, Thank you for releasing the training data! I have a question about the structure of the training dataset. Could you please confirm the following? 1. Are patches in each .tar shard drawn from disjoint sets of slides/cases, or do patches from the same slide appear across multiple shards? 2. If the latter, is a manifest available that maps each patch key (e.g. train_<16hex>) to its source slide or case ID? This would allow running slide-grouped cross validation on the training data and produce a meaningful internal validation estimate before the the submission window opens. 3. If no per-slide mapping is available, do you have a recommended way to construct a slide-respecting train/val split from the released training data? Thanks! Best regards, Lingyi

Created by Lingyi Zhao lzhao
Hi @jaydenyou, No problem! Thanks for the update. Best, Lingyi
Hi @lzhao, It will take some time for updates, and it should be in the same place. Apologies for the delay. Cheers, Jayden
Hi @jaydenyou, Thanks for your detailed answers! They are very helpful. Will the mapping CSV file be released in the same place where the training dataset is? Best regards, Lingyi
Hi @lzhao Thanks for the questions. During our patching process, we found that the slide generated various numbers of patches for different labels. Sometimes, no ROI patches for that label got extracted. For that reason, we did not limit the extracted patches from the same slide to appear in different tar shards, so we treated each patch as a "case". For question 2, we have uploaded the mapping CSV file that contains the patient (anonymized id) and slide (anonymized id) mapping. It should be released soon. Cheers, Jayden

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

Task 5 training dataset patches structure page is loading…