I have a largish collection of results from an NGS project that are already in an s3 bucket that I have successfully connected to synapse. However, files that existed prior to the integration (or, likely, files that are added to the bucket outside of synapse) are not visible in synapse. Is there a way to register these files into synapse without having to upload them again?
Thanks.
Created by Sean Davis sdavis2 As an update to the above, the Synapse R client will upload to the S3 bucket:
```
library(synapseClient)
synapseLogin()
f <- synStore(File("testdata.csv", parentId="syn8228256"))
```
In this example, syn8228256 has a separate S3 bucket backend. The `storageLocationId` is **NOT** optional. You can identify it with the following [REST call](http://docs.synapse.org/rest/GET/storageLocation.html):
```
syn.restGET("/storageLocation")
```
An important note is that uploading files using the Synapse clients (Python ~~or R~~) to external S3 buckets has not been implemented yet. So, even if you connect an S3 bucket to a Synapse project in read/write mode and then create a file, it will go to the Synapse-managed storage backend. Hence, the only way to utilize the external bucket is to link files as described above. We realize this is a significant limitation, but is scheduled to be addressed in the next release of the Python client. Hi Sean:
You can always add items to Synapse that already exist in an S3 bucket using something called an external [fileHandle]([http://docs.synapse.org/rest/org/sagebionetworks/repo/model/file/S3FileHandle.html). See example code below. Also, if files are added to the bucket (not using the Synapse API) using for example the AWS CLI you will have to manually add those as well.
### Python code example (R code would be very similar):
In order to add the file to Synapse you first have to create a filehandle wrapper around the file. This will require you to gather some of the metadata about the file such as md5, fileSize, contentType and S3 key and bucket name:
```
import synapseclient
from synapseclient import File
import json
syn = synapseclient.login()
filehandle = {'concreteType': 'org.sagebionetworks.repo.model.file.S3FileHandle',
'contentMd5' : #md5sum of file,
'contentSize': #size of file,
'fileName' : #name of file,
'contentType': #can be gueesed by mimetypes.guess_type(path) Must be: http://en.wikipedia.org/wiki/Internet_media_type
'bucketName': #The name of the S3 bucket where this file resides
'key': # The S3 key for this object
'storageLocationId' : # storage location descriptor which you got when setting the storage location for your project}
filehandle = syn.restPOST('/externalFileHandle/s3', json.dumps(filehandle), syn.fileHandleEndpoint)
```
Once the filehandle wrapper is created you can add the fileEntity to Synapse using the regular store operation but pass in the filehandleId to the constructor of the File entity:
```
f = syn.store(File(name='foo', parentId='syn123', dataFileHandleId=filehandle['id']))
````
Drop files to upload
"Registering" objects in s3 bucket after creation outside of Synapse. page is loading…