1- Can someone kindly explain to me the structure of the data. I downloaded the complete dataset (129 GB). next i found in the tables section files like "Sensor Data - Part I", "Task Scores - Part I" etc. Currently as per my understanding, the data_file_handle_id is being linked with the downloaded data folders. so e.g. the file for data_file_handle_id '66518571' is in the folder '66518571'.
please correct me if i am wrong here.
2-the data_file_handle_id from "Sensor Data - Part I" are not available in the whole dataset that i downloaded. The complete downloaded dataset in of 129gb and contains all the data_file_handle_id from "Sensor Data - Part II" file, and 332 other data_file_handle are also there. But i cant find any of the data_file_handle_id from "Sensor Data - Part I" in the downloaded data.
3-How do I join "Sensor Data - Part I" with "Task Scores - Part I"?
Actually, I am not able to make sense that the rows in "Task Scores - Part I" corresponds to which raw sensor data points.
Created by umar khan ukhan.mscs18seecs You can't find those file handles because the file handles in the data_file_handle_id column of the Sensor Data tables and the file handles of the file entities in the project (which are the ones you download when you call `synapseutils.syncFromSynapse`) are different file handles. They reference the same data, but their file handle IDs are different.
While it's possible to align the data which you downloaded with the metadata in the Sensor Data tables ? by creating a file view and referencing the file annotations (specifically deviceLocation and devicePlatform) in conjunction with the file name (for the `participant_day` ) and the name of the parent folder (for the `subject_id`) ? it's not going to be easy. I highly recommend instead following the download instructions in step 6A [here](https://www.synapse.org/#!Synapse:syn20681023/wiki/594679). thank you for your help.
i downloaded the dataset using the following
import synapseclient
import synapseutils
syn = synapseclient.Synapse(cache_root_dir="D:\\HumanMotionEstimation\\nature\\data") # specified local dir i wanted the data to be saved at
syn.login('synapse_username','password')
files = synapseutils.syncFromSynapse(syn, 'syn20681023')
still stuck on point 2 :
the data_file_handle_id from "Sensor Data - Part I" are not available in the whole dataset that i downloaded. The complete downloaded dataset in of 129gb and contains all the data_file_handle_id from "Sensor Data - Part II" file, and 332 other data_file_handle are also there. But i cant find any of the data_file_handle_id from "Sensor Data - Part I" in the downloaded data.
> I downloaded the complete dataset (129 GB). next i found in the tables section files like "Sensor Data - Part I", "Task Scores - Part I" etc. Currently as per my understanding, the data_file_handle_id is being linked with the downloaded data folders. so e.g. the file for data_file_handle_id '66518571' is in the folder '66518571'.
Yes, that's right. Although it's not necessary to understand this relationship to use the data. It's much more convenient to work with the file paths in a dataframe. Here's example code which will download the data and add a field named `path` to the table:
```
import synapseclient as sc
syn = sc.login()
sensor_data_part_one = syn.tableQuery("select * from syn20681931")
sensor_data_part_one_paths = syn.downloadTableColumns(sensor_data_part_one, "data_file_handle_id")
sensor_data_part_one_df = sensor_data_part_one.asDataFrame()
sensor_data_part_one_df["path"] = sensor_data_part_one_df.data_file_handle_id.astype(str).map(sensor_data_part_one_paths)
```
> But i cant find any of the data_file_handle_id from "Sensor Data - Part I" in the downloaded data.
Do you mean that those files are missing from your Synapse cache directory? Without knowing how you downloaded the data I can't know why those files might be missing.
> How do I join "Sensor Data - Part I" with "Task Scores - Part I"
Use the `timestamp_start` and `timestamp_stop` fields from the Task Scores table to filter on those timestamps in the sensor data. I recommend concatenating each day's measurements into a single data frame for each subject_id/device/device_position and then filtering those timestamps when you want the sensor measurements for any given task. For example, if you concatenate the sensor measurements for days 1 through 4 for subject_id = 10_BOS, device = GENEActiv, and device_position = RightUpperLimb, then you can get the sensor measurements from that specific device/device_position by filtering timestamps which fall between `timestamp_start` and `timestamp_stop` for any given task performed by subject_id = 10_BOS.