Hi,
I am running the R code from github (https://github.com/Sage-Bionetworks/PDBiomarkerChallenge) on a computing cluster to download the data for rest features and walking features. However, it seems that the download speed is extremely slow. Three days have passed and only 5000 out of 34000 files are downloaded. I wonder whether this should be the case or I am missing something. Thanks in advance!
Created by zhicheng ji zji @sieberts One way to sleuth the issue is to assess the bandwidth to Amazon S3. @zji could place a large file in Amazon S3 then (1) download it to a machine in the Amazon cloud and (2) download to his/her own system. If there is a large difference then he could consult with his institution's IT group. You may recall there was an occurrence at our own site in which Amazon data transfer speeds suddenly became slow. Once we showed that the problem was specific to our site the next step was for our site's IT group to contact Amazon. Hope this helps. Hi @zji, that seems extremely slow. It should be taking on the order of hours to download the files. I just did a quick test of 1000 files on my cable modem at it took about 7 minutes (equivalent of about 4 hours for the entire datasset). Are you trying to download a lot of the files in parallel across nodes in your cluster - this can backfire where you get throttled for too many concurrent connections. Data are stored on AWS, so download speeds are a function of your connection with those servers. @brucehoff, do you have any tips for speeding up downloads?