I would like to better understand the rules for using Controlled Access data from the AD knowledge portal. I myself am a researcher and have been granted access and have my Data Use Certificate.
As part of my work, I develop bioinformatics pipelines and would like to understand if any data on the portal, if anonymised to different usernames, project id etc, can be utilised to put together a public test dataset for testing such pipeline.
Let's say I have been analysing ROSMAP data with my pipeline, and want to publish a methods paper on this very pipeline, under what circumstances, if any, can I provide my users the possibility to use this data to test the pipeline?
I understand that controlled access data cannot be shared, is that even when changed to different IDs?
If not could you point me to completely open datasets that can be use for test sets?
Thank you.
Created by ELEONORE SCHNEEGANS ems2817 Hm, that's one of the studies that just provides nominated targets for the Agora results explorer, I don't think it has any data. What types of data are you looking for -- bulk RNAseq, whole genome sequencing, etc? I can try to point you to a model study that might fit. Thank you for your replies!
@abby.vanderlinden, I have been looking at mouse model studies based on your suggestion.
The following study seems appropriate https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyDetails?Study=syn25454171 , yet I haven't found its data on synapse, the study data tab is empty?
Eléonore Hello,
Thank you for including me. The Data Use Certificate (DUC) for the AD Knowledge Portal includes specific statements in the Terms and Conditions prohibiting distribution of controlled access data. This includes: "You agree to keep the Data confidential and not to distribute it **in any form** to any entity or individual other than to collaborators who have signed a Data Use Certificate subject to applicable law." I would interpret "in any form" to include changes to the IDs.
I hope that helps, but let me know if there are follow-up questions.
Thank you,
on behalf of the Synapse Access and Compliance Team Hi Eléonore,
Thank you for this very thoughtful question. I'm going to tag our governance team in here -- @anthony.pena, do you have any guidance you can provide here for this type of scenario?
I'll also add that while all the human data in the AD Portal is controlled access, the model data is open access and does not have privacy concerns for sharing. You could potentially use data from one of the [mouse model studies](https://adknowledgeportal.synapse.org/Explore/Studies?QueryWrapper0=%7B%22sql%22%3A%22SELECT%20*%20FROM%20syn17083367%20ORDER%20BY%20isFeatured%20DESC%22%2C%22limit%22%3A25%2C%22offset%22%3A0%2C%22selectedFacets%22%3A%5B%7B%22concreteType%22%3A%22org.sagebionetworks.repo.model.table.FacetColumnValuesRequest%22%2C%22columnName%22%3A%22Species%22%2C%22facetValues%22%3A%5B%22Mouse%22%5D%7D%5D%7D) or [commercial iPSC studies](https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage/StudyDetails?Study=syn25914209) for this type of pipeline testing, as long as the original data contributors are properly acknowledged.