actually i have been trying to download for several times, now, each time, it appears to miss about 10% of the records, while it doesn't report any error. and it is extremely slow because i have to iterative over the talbe. would it be possible to just zip one file and upload it?
actually because of this, all your s3 servers are significantly slowed down today, and no dockers can be pushed in and reports "received unexpected HTTP status: 503 Slow Down"
Created by Yuanfang Guan ???? yuanfang.guan Some will be missing rest. Many will be missing return, because later versions of the app collect only outbound walking. i don't know if i have downloaded it correctly or if it was an uploading problem from the organizers side. most records i have here in testing are missing rest and return. is that what was expected? The names of the files are completely arbitrary, i.e. some might have extension ".tmp" other ".json" or even just ".". Also the filenames in the table don't have to be unique. That is, if you flatten the hierarchy you could overwrite files from different rows - this is part of the reason why they get deposited into their own folders in the cache. As for the .cachemap files, these are used to track downloaded files and if you have one the file is complete (not the other way around).
P.S. You are right about the limit subcommand not affecting anything - just made the suggestion in general. i found in all *tmp there is always a .cachemap file:
{"/home/gyuanfan/2017/PDDB/downloaded_rawdata_supplement/supplement_data_accel_walking_outbound/722/2323722/accel_w
alking_outbound.json.items-05d82d8e-ec55-47d2-b13b-749a341f555d2684344326748672112.tmp":"2017-09-13T03:46:02.000Z"}
which means it is indeed incomplete downloading, then how can i make this complete?
while in the *json file, there is not such cachemap file. i don't think the 99999999 really affects anything, as you have way less lines than that.
>you say that you have to run the command 3 times. What is happening? Are you not getting the files you are asking for?
i think i have found out one of the problems on my side. but again i am not clear if that is a problem rooted from downloading because most of the files are name as:
XXX/accel_walking_outbound.json.items-XXXX.json.tmp
i then grep the *tmp file,
then i found there is a small fraction named as
XXX/accel_walking_outbound.json.items-XXXX.json
so does that mean the *tmp are not completely downloaded?
thanks
Sorry, I should have been clearer. When you call synDownloadTableColumns Synapse packages the individuals files together for you into a tar.gz file that is downloaded by the synapseClient (aka client). Once the tar.gz file arrives the client also unzips all of those files into the synapseCache directory then removes the temporary tar.gz file. By having the files in the synapseCache the client knows which files it has already fetched and which are missing. If you move the files out of there repeated calls to synDownloadTableColumns will refetch all of the files again. My suggestion is to leave the files in the cache and update the query_table dataframe with a columns containing the links to the location of the files and use it like a manifest of the files.
I also, notice your other point - you say that you have to run the command 3 times. What is happening? Are you not getting the files you are asking for?
/Larsson
P.S. You can leave the limit 9999... of your query and it will fetch everything - i.e.
```
query_table <- synTableQuery("SELECT 'accel_walking_outbound.json.items' FROM syn10733842")
``` sorry i really don't understand. this is my whole code, which i never seen any tar.gz file being downloaded, instead, it is individual small directories, that i have to pull the files out from them:
library(synapseClient)
synapseLogin(username = 'yuanfang.guan', password = '??????')
dest_dir <- './'
prevCacheLoc <- synapseCacheDir()
synapseCacheDir("./")
query_table <- synTableQuery("SELECT 'accel_walking_outbound.json.items' FROM syn10733842 LIMIT 99999999")
json_files <- synDownloadTableColumns(query_table, "accel_walking_outbound.json.items")
~
~
i need todo it 3 times so that it covers ~99% of the data.
~ the tar.gz file gets unpacked into the synapseCache directory once it is downloaded. The returned json\_files contains the mapping between the identifiers in the rows in query_table and the filepaths.
* Specifically, the request for download gets batched up into multiple tar.gz files (each of about 2GB). this is what i do:
query_table <- synTableQuery("SELECT 'accel_walking_outbound.json.items' FROM syn10733842 LIMIT 99999999")
# download the files
json_files <- synDownloadTableColumns(query_table, "accel_walking_outbound.json.items")
but the results is individual files rather than a tar/zip. do you know why? Hi Yuanfang:
If you use the downloadTableColumn as shown in the [wiki](#!Synapse:syn8717496/wiki/448349) you will download using a tar.gz file.
/Larsson
Drop files to upload
can you please provide each data in one zip file page is loading…