I'm trying to do my first ever download from Synapse. I followed the 'Programmatic Instructions' as best I could for downloading, but am not having success, either with the 'synapse' command line or via the sample Python code.
I created an access token for authentication and set the SYNAPSE_AUTH_TOKEN. That part seems to be fine, as it greets me by name when I run the python code. But via either method, after a short time (15-30 seconds), errors about 'too frequent for API calls' appear. Below is a sample python code and run:
**The code:**
```
#!/usr/bin/env python3
import synapseclient
import synapseutils
syn = synapseclient.Synapse()
syn.login()
files = synapseutils.syncFromSynapse(syn, 'syn51365303')
```
**And the run:**
```
(synapseclient) -bash:uger-c012:~/MattM/Synapse 1032 $ ./download.py
Welcome, mmaher!
[WARNING] Requests are too frequent for API call: /filehandle/batch. Allowed 240 requests every 60 seconds....
[WARNING] Retrying in 16 seconds
[WARNING] Requests are too frequent for API call: /filehandle/batch. Allowed 240 requests every 60 seconds....
[WARNING] Retrying in 16 seconds
[WARNING] Requests are too frequent for API call: /filehandle/batch. Allowed 240 requests every 60 seconds....
and so on....
```
Can someone shed light on how to resolve this?
Thank You
Created by Matthew Maher mmaher @YoungJ
I am sorry - I do not have any further suggestions around this at the moment. Overall stability of the Synapse Python/Command line client is one of my highest priorities. The best I can say at the moment is to continue the process as you have been doing.
Please keep an eye out for further releases where I will be addressing these large issues. I will be updating https://python-docs.synapse.org/news/ with information each release as the team is making improvements. Thanks for your reply. @bfauble
What I mean by that is that the download speed is fast in the early stages, whereas the download speed to these specific files can be very, very slow.
For example,
```
Downloading [####################]100.00% 522.3MB/522.3MB (150.1kB/s) RNASE3_P12724_OID20203_v1_Cardiometabolic.tar.synapse_download_130970926 Done...
Downloading [####################]100.00% 520.1MB/520.1MB (109.0kB/s) RNF31_Q96EP0_OID30473_v1_Inflammation_II.tar.synapse_download_131009916 Done...
Downloading [###################-]96.92% 504.0MB/520.0MB (117.6kB/s) RNF41_Q9H4P4_OID21179_v1_Oncology.tar.synapse_download_130970811
```
The third file (RNF41_Q9H4P4_OID21179_v1_Oncology.tar) has been on this schedule for ten hours. @YoungJ I'm not fully understanding what your problem is in this case.
You show a console log of it `Downloading`, does that not progress at all?
If you have partial files downloaded in your `download data` try to remove
`if not os.path.exists(local_file_path):`
from your script.
And add `ifcollision="keep.local"`.
ie:
```
def download_file(entity: dict, download_path: str) -> None:
"""
Download the file to the specified path.
Args:
entity: The file entity to download
download_path: The path to save the downloaded file
"""
syn.get(
entity=entity["id"],
downloadFile=True,
downloadLocation=download_path,
ifcollision="keep.local",
)
```
See info on this function here: https://python-docs.synapse.org/reference/client/#synapseclient.Synapse.get @thomas.yu @bfauble Sorry to bother you. can you help me ?
I wonder if it's the network problem or the file storage problem. As a result, there will be a file in my download data that has been downloaded but not completed, which makes it impossible to carry out the next download task.
like this
```
Downloading [#########-----------]46.45% 242.8MB/522.8MB (97.1kB/s) RFC4_P35249_OID31376_v1_Oncology_II.tar.synapse_download_131008851
```
I changed the above code due to a problem I encountered. A file interrupts the download process in synapse format. When I download this file manually, the code continues to run until I encounter the next file that breaks it. I don't know if it's a file storage problem or a network problem.
```
import os
import synapseclient
PARENT_CONTAINER_ID = "syn51365303"
PATH_TO_DOWNLOAD_CONTENT_TO = os.path.expanduser("/data_alluser/wx/Protein/test")
if not os.path.exists(PATH_TO_DOWNLOAD_CONTENT_TO):
os.mkdir(PATH_TO_DOWNLOAD_CONTENT_TO)
syn = synapseclient.Synapse()
syn.login()
def download_file(entity: dict, download_path: str) -> None:
"""
Download the file to the specified path.
Args:
entity: The file entity to download
download_path: The path to save the downloaded file
"""
local_file_path = os.path.join(download_path, entity["name"])
if not os.path.exists(local_file_path):
syn.get(
entity=entity["id"],
downloadFile=True,
downloadLocation=download_path,
)
# Loop over all files under the parent
children_under_parent_container = syn.getChildren(
parent=PARENT_CONTAINER_ID, includeTypes=["file"]
)
for child_under_parent_container in children_under_parent_container:
download_file(
entity=child_under_parent_container,
download_path=PATH_TO_DOWNLOAD_CONTENT_TO,
)
```
Thanks for your reply.
@thomas.yu Hi @CallMeYoungJ ,
There is currently not a command line solution that can avoid rate limits, so you would have to take the python code chunk that Bryan provided above: https://www.synapse.org/#!SynapseForum:threadId=10730&replyId=31279 and execute it. I see that it's the same project, so it should work directly.
We are looking to improve the python client to reduce the rate limit warnings, but the rate limits themselves are set by the platform. Hi! I have also encountered the same problem, is there any solution for the following code:
synapse get -r syn51365303
@bfauble Can you help?
Or if I'm being restricted due to frequent downloads @mmaher
> for your #2, this seems to have worked perfectly! it went off and downloaded file after file, even giving nice progress bars for each. My process got killed (unrelated to your script) about 2/3rds of the way through the filelist, so I'm now restarting it, hoping it will incrementally finish the job. Would I be correct in thinking that it does some sort of filesize or hash-value check to decide that files already downloaded, do not need to downloaded again?
Correct. The Python client stores a `.cacheMap` file in `~/.synapseCache` which contains the file MD5 and modified timestamp that is used for a cache hit/miss. In your case it should find the already downloaded files in cache and not download again.
> So I presume some prior interupted process was jamming things up? is there some way to force the lock breaking when invoking new command? Is there some way to know whether a lock is present and blocking the current process?
I am uncertain on this piece. I took a peek at the code and I did not see much on the reporting front that can give you a "peek behind the curtain". Hi! first, thanks for the rapid response the other day. My results with those were:
for your #1, that particular error message I had seen previously went away, but most of the time, the process just hung for a while and then exited, as if complete - but without downloading anything. One time it did download some files, but then when I killed and restarted the process, it went back to hanging.
**for your #2, this seems to have worked perfectly! ** it went off and downloaded file after file, even giving nice progress bars for each. My process got killed (unrelated to your script) about 2/3rds of the way through the filelist, so I'm now restarting it, hoping it will incrementally finish the job. Would I be correct in thinking that it does some sort of filesize or hash-value check to decide that files already downloaded, do not need to downloaded again?
I also, after quite a while, got a message that said: "Breaking lock whose age is: 7247.881851434708". So I presume some prior interupted process was jamming things up? is there some way to force the lock breaking when invoking new command?
Is there some way to know whether a lock is present and blocking the current process?
Thanks again for the assistance.
Hi @mmaher
I wanted to check in and make sure that you were unstuck and if either of the suggested solutions worked for your use-case. Hi @mmaher ,
This `synapseutils.syncFromSynapse` is an area of focus I am planning to make changes to in order to resolve issues like this. Essentially the crux of the problem here is that it's hitting rate limits very often due to the way that multi-threaded was set up between these utility functions and the core synapse client.
Take a look at this code here - This will download all files one-by one. The file download itself is multi-threaded instead of how the utils function is setup to download files in parallel.
I suspect that due to the size of the files in the project you are trying to download this will probably be faster.
```
import os
import synapseclient
PARENT_CONTAINER_ID = "syn51365303"
PATH_TO_DOWNLOAD_CONTENT_TO = os.path.expanduser("~/my_synapse_downloads")
if not os.path.exists(PATH_TO_DOWNLOAD_CONTENT_TO):
os.mkdir(PATH_TO_DOWNLOAD_CONTENT_TO)
syn = synapseclient.Synapse()
syn.login()
def download_or_create_folder(
entity: synapseclient.Entity, current_resolved_path: str
) -> None:
"""
If entity is a folder, create it on disk and then recursively call this function to
download all files and folders under this folder.
If entity is a file, download it to disk.
Args:
entity: The entity to execute this function for (either a file or folder)
current_resolved_path: The current path in the directory structure to download the content to
"""
is_folder = (
"type" in entity and entity["type"] == "org.sagebionetworks.repo.model.Folder"
)
is_file = (
"type" in entity
and entity["type"] == "org.sagebionetworks.repo.model.FileEntity"
)
if is_folder:
new_resolved_path = os.path.join(current_resolved_path, entity["name"])
# Create folder on disk
if not os.path.exists(new_resolved_path):
os.mkdir(new_resolved_path)
# Recursively call this function to download all files and folders under this folder
children_for_folder = syn.getChildren(
parent=entity["id"], includeTypes=["file", "folder"]
)
for child_for_folder in children_for_folder:
download_or_create_folder(
entity=child_for_folder,
current_resolved_path=new_resolved_path,
)
elif is_file:
syn.get(
entity=entity["id"],
downloadFile=True,
downloadLocation=current_resolved_path,
ifcollision="keep.local",
)
# Loop over all files and folder under the parent
children_under_parent_container = syn.getChildren(
parent=PARENT_CONTAINER_ID, includeTypes=["file", "folder"]
)
for child_under_parent_container in children_under_parent_container:
download_or_create_folder(
entity=child_under_parent_container,
current_resolved_path=PATH_TO_DOWNLOAD_CONTENT_TO,
)
```
@bfauble Can you help?
Drop files to upload
"Requests are too frequent for API call" page is loading…