Hello Synapse members,
I'm trying to upload a number of large (~GB) data files, but some of the MD5 checksums do not match. I would like to try again with a mismatching file, but when I try to delete and then reupload the file, the system doesn't seem to be actually taking the file, since there is only a pause for initialization and then the system says that the file has been uploaded (with the same mismatched MD5), skipping the minutes-long wait for the initial upload. This makes me think that the file has simply been undeleted rather than reuploaded. However, the same thing happens when I try to "upload new version", and also when I use the alpha-feature of emptying my trash can first. How can I force the reupload of large files?
Thanks and best regards.
Created by Yan Li yan_li2 I've received permission to send the original file, but how should I do so? It is a ~994 Mb *.gz file, of GWAS data I believe.
I have not personally seen any error messages while uploading. However, when my coworker was uploading from the same dataset before bringing me into it, he observed (and I witnessed over his shoulder) a few different oddities with the web upload. These include Javascript error alerts (the file upload simply failed entirely), the progress bar fluctuating above and below 100% while uploading, and at least one instance where an uploaded file's displayed MD5 was that of a different file in the dataset. Unfortunately we were not documenting these errors so I can't describe more precisely. It only struck us as quite odd. Hello @yan_li2 -- following up on what @brucehoff said earlier, Is it possible to get a copy of this file to examine, i.e. the file as it was _before_ being uploaded to Synapse? We would like to see the file size, MD5 and perhaps other metadata on the pre-uploaded version. Also, when the upload occurred do you know if there was any sort of error message? @yan_li2 Is it possible to get a copy of this file to examine, i.e. the file as it was *before* being uploaded to Synapse? We would like to see the file size, MD5 and perhaps other metadata on the pre-uploaded version. Also, when the upload occurred do you know if there was any sort of error message? I am fairly sure that the files were not being changed while my coworker and I were uploading them. As far as I know, they were given to us as is specifically to be uploaded to Synapse. @yan_li2 , We agree that the problem is transient -- we cannot reproduce the error, even using the same file. We have an idea of how the problem might happen: If the file was modified while it was being uploaded (say additional content was added to a large archive while the archive was in the process of uploading) then the wrong MD5 might be computed on your computer and sent to Synapse. Do you think that might have occurred, i.e. that your coworker uploaded files while they were in the midst of changing?
@jay
@john-hill
I used the web browser's **Tools** -> **Upload or Link to a File** modal UI from the project folder. My browser is Chrome (Version 66.0.3359.181) and my computer is Windows 7.
This seems to be a transient thing, since I uploaded other files to that folder yesterday and they all displayed the expected checksums. But previously my coworker uploaded the same files (which I then deleted to retry) and got some wrong checksums. He also used the web UI. Hi @yan_li2: Thank you for reporting this issue. I am looking into it and have a question: Can you describe how you uploaded the file(s)? for example did you do so via the web browser, or from a command line session (e.g., using Python or R)? If the latter can you say what version of the client you are using? Any information you can provide is helpful. The checksum from the downloaded file matches my local checksum, but not what Synapse shows as MD5. I've sent a screenshot of the discrepancy to you. Perhaps Synapse is just displaying a different MD5? Hi Yan:
This should be impossible so I am glad you are pointing it out. What is the md5 if you redownload the file? I suggest perhaps downloading it through the webclient to make sure that a fresh version is downloaded. That is, does the md5 of the download match what Synapse is reporting or what you were expecting? Anyway you could share the ids of the file you are having issue with and the md5 you are expecting?
The MD5 reported for my file on Linux using **md5sum** and on Windows Powershell using **certutil -hashfile MD5** are consistent, which makes me think that my local checksum is correct. Also, my coworker, who was uploading different files from the same dataset, had similar issues with the Synapse checksum being different from the (Linux) checksum for some of his files, but he was able to delete-and-reupload his files to get a matching Synapse checksum, again suggesting that our local checksums are correct. He says he simply deleted and uploaded, so I was wondering why Synapse didn't seem to work the same way for me. Hi Yan:
I worry that the md5 of the file you are uploading is not what you think it is. Synapse computes the md5 before it starts uploading and will reject the file after completing if the md5 of the uploaded file doesn't match the md5 that started uploading. This is also the reason that you are unable to upload the file a second time - the long delay is the delay of computing the md5 of the local file which is passed into the upload - Synapse then responds that this exact same file was already uploaded by you and indicates that the upload is complete.
What makes you think that the md5 doesn't match? What happens if you try
```
md5 local/path/file #or md5sum local/path/file
```
does the output match that which is reported in Synapse?