Hi, I am running some scripts with the synapse client and SynapseUtils in Python. Specifically, I uploaded a dataset using the client and after some trial and error it all went well and I found that everything was properly updated. Today we found an error in the dataset processing so I have gone back and fixed it and regenerated the files under a new folder and have been trying to use the same scripts to upload the files to Synapse. Some files didn't change, but most either did or at the very least have a new MD5 hash (part of the fix was that the same data was getting saved with different hashes each time and so now if data isn't changed between versions we guarantee the same hash - don't think it's relevant to what's happening but thought I would mention in case I am wrong). The issue is that I am finding that some subset of files is not updating on the website properly and so the old versions are on the website despite having different MD5 hashes. Interestingly, I find that in the Synapse cache, both the old and the new file are there which I thought would mean the new version got uploaded. Below is the general workflow of the code. Any thoughts or help would be greatly appreciated. Best, Jordan Code: Step 1: Run code that generates a Manifest file based on the local files/folders that I want to upload to a project. Uses `synapseutils.generate_sync_manifest` and saves the manifest locally. Step 2: Run code that takes the manifest and uploads the data. Uses `synapseutils.syncToSynapse`. Perhaps where I might have created issues for myself is that it is a fair amount of data and I find that the uploads are not that fast, so I run parallel jobs where each will take the manifest file, chunk it into a specific section, save a temporary file with that section, and then run the `syncToSynapse` using that temporary manifest file. I find that this does occasionally give me a "SynapseHTTPError: 409 Client Error: An entity with the name", I am able to just rerun the jobs that failed to get the data onto Synapse. Step 3: Run a verification code that does something pretty similar to `synapseutils.walk` where I get all of the folders and files and check the folder structure and the files (setting download to False), and checking the md5 locally vs. on Synapse. This code is how I found that the uploads were not always replacing files (perhaps 10% of files are the old versions). Note: I was running this previously using relative file paths which it seems the code at some point replaces with absolute paths given the cache is in absolute paths, but I am retrying this now with absolute paths at every stage.

Created by Jordan Wilke wilke18

Synapse Utils syncToSynapse not updating files page is loading…