Hello,
I'm curious if it's possible to read data from the remote storage Synapse uses directly into memory, using something like the i/O module example and file stream in the accepted answer here:
https://stackoverflow.com/questions/44043036/how-to-read-image-file-from-s3-bucket-directly-into-memory
For a specific example, I would point to the Storage Location listed for syn26223298.
Is something like above even possible through the API? Using the NF Core Fetch NGS pipeline (https://nf-co.re/fetchngs/1.11.0) it looks like all fetched raw data for Synapse IDs are downloaded as a starting point. We're interested in reading directly from a bucket to avoid unneccessary transfer and storage of raw files.
Thanks,
Chris
Created by Chris Rhodes crhodes4 Hello @crhodes4,
This is a good question and something that I've thought about in the past. Ultimately, staging the files by copying them from S3 is how we do all of our data processing at Sage and I don't know the feasibility of loading files directly into memory for large-scale processing. That being said, there is a feature called an STS token that can grant you access to S3 objects without copying them through the Synapse Client: https://python-docs.synapse.org/reference/client/?h=sts#synapseclient.Synapse.get_sts_storage_token
This might allow you to read the files directly using a combination of boto and the i/O module. The major caveat with this feature is that STS must be enabled on a storage location before files are added to the folder. I suspect that the Synapse project that you have linked to is not configured this way, but if you are curious in testing this idea then let me know and I will ask our team to look into the settings.
Best,
Will
Drop files to upload
Read files directly from Synapse remote storage into local memory page is loading…