Hello everyone! I have noticed a few issues while loading bam, bed and bigwig files to synapse:
1. I cannot open more than one connection at a time. I understand why this happens, at the same time, it would be wonderful if I could parallelize my uploading, as this could cut hours off my data-syncing procedure.
2. While uploading a file, synapse will suddenly, and presumably randomly, decide to throw an error. Uploading will then restart, get further, then throw another error. After N attempts, said file will usually upload successfully.
It's worth noting that subsequent attempts always go much faster than their predecessors.
3. My script will successfully upload bam, bed and bigwig files, then suddenly decide to stop completely when it hits a TBD roadblock. One example of such roadblock is the following set of errors. Here, the file happened to eventually upload successfully, but that is not always the case. These errors occur during the synapse.store (or syn.store, as it appears in my code) syncing:
File "RELEASE_DATA_TO_STAGING_082216.py", line 108, in
bed = syn.store(bed, activity=bed_activity, forceVersion=False)
File "/hpc/packages/minerva-common/py_packages/2.7/lib/python2.7/site-packages/synapseclient/client.py", line 996, in store
fileSize=local_state.get('fileSize', None))
File "/hpc/packages/minerva-common/py_packages/2.7/lib/python2.7/site-packages/synapseclient/client.py", line 1845, in _uploadToFileHandleService
file_handle_id = multipart_upload(self, filename, contentType=mimetype)
File "/hpc/packages/minerva-common/py_packages/2.7/lib/python2.7/site-packages/synapseclient/multipart_upload.py", line 212, in multipart_upload
**kwargs)
File "/hpc/packages/minerva-common/py_packages/2.7/lib/python2.7/site-packages/synapseclient/multipart_upload.py", line 312, in _multipart_upload
mp = Pool(8)
File "/hpc/packages/minerva-common/python/2.7.6/lib/python2.7/multiprocessing/dummy/__init__.py", line 151, in Pool
return ThreadPool(processes, initializer, initargs)
File "/hpc/packages/minerva-common/python/2.7.6/lib/python2.7/multiprocessing/pool.py", line 714, in __init__
Pool.__init__(self, processes, initializer, initargs)
File "/hpc/packages/minerva-common/python/2.7.6/lib/python2.7/multiprocessing/pool.py", line 176, in __init__
self._task_handler.start()
File "/hpc/packages/minerva-common/python/2.7.6/lib/python2.7/threading.py", line 745, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
[sloofl01@minerva4 psychencode_private_scripts]$
(32, 'EPIPE')Encountered an exception: . Retrying...
403 Client Error: Forbidden
AccessDeniedRequest has expired2016-08-23T01:32:01Z2016-08-23T01:32:14ZF529AE70734D613FMzBeuMiMaaUyJmHyCtCILIuQd+93g2jpTZxGz1CpTlf1YunWSLfcAcYhQyuT5GbcXJ607i4GTUc=Encountered an exception: . Retrying...
Uploading [####################]100.00% 3.6GB/3.6GB PEC_EpiMap_MSSM_ACC_Epigenomics_H3K27ac_NeuN+_HiSeq2500_CMC_HBCC_361.bam
Thank you for whatever help you can provide.
Laura
Created by Laura Sloofman lauragails Hi @shwetha , I'm not sure what you're referring to. Could you give me some more information? Thanks! what is lockdown command
Thanks, Kenneth, for posting the instructions. I don't have the mkvirtualenv command. I completed the first part. I will look into this tomorrow. To get it set up using virtualenv, you should just be able to get the version of python you want (possibly using the `module` method if your system has this). Assuming `virtualenv` is installed, then:
```
# standard location for virtual environments is ~/.virtualenvs
mkdir ~/.virtualenvs
# create a Python virtual environment called 'myenviron' - can change this
virtualenv ~/.virtualenvs/myenviron
# activate the virtual environment
source ~/.virtualenvs/myenviron/bin/activate
# install Synapse develop client
git clone git://github.com/Sage-Bionetworks/synapsePythonClient.git
cd synapsePythonClient
python setup.py install
# do stuff.....
# when done, exit the virtual environment
deactivate
```
See if you have the `mkvirtualenv` command installed - this is part of [virtualenv wrapper](http://http://virtualenvwrapper.readthedocs.io/), a suite of helper functions for managing virtual envs. If you do, you can do:
```
# instead of calling virtualenv directly
mkvirtualenv myenviron
# instead of sourcing the 'activate' script
workon myenviron
```
Let me know that this is successful! https://www.synapse.org/#!Synapse:syn5637528/discussion/threadId=686&replyId=7239
I am using version 1.5.1 and python 2.7.
Thanks for the suggestions I will see if I can get the new client on our cluster so I can try it out. We are aware of large file upload issues, and there are a number of ongoing improvements. What version of the Python client are you using? Recent versions of the client (current is 1.5.1) parallelize individual uploads, so this should be making efficient use of your bandwidth (moreso than uploading multiple files at the same time). The threading error is known and has been fixed in the develop branch (https://sagebionetworks.jira.com/browse/SYNPY-358), and will be available in version 1.6 of the Python client.
If you are interested in getting these improvements faster, you can install the client directly from GitHub and get the development branch (highly recommended to use a `virtualenv` virtual environment for this):
https://github.com/Sage-Bionetworks/synapsePythonClient#install-from-source
If you try this and it fixes your issues, please let us know here!
Drop files to upload
synapse.store is not working properly page is loading…