I am currently working with the BRIGHTEN and START study datasets (for this project ), but I had a few queries around the original designs of them. In particular with the BRIGHTEN study (I think this paper - https://innovations.bmj.com/content/2/1/14) where there are two apps mentioned, however this distinction is not made in the data available here.
Further Queries I had were:
• I noticed UIDs are grouped by colours, is there a meaning behind these?
• Are calendar dates available?
• Both datasets have a variable called tasktype, where the values are either Survey/Passive, I am not clear what these mean?
o If two activities occur in the same hour and both have values of “Survey” are they the same activity performed? Or could this be a different activity but both are survey related, e.g. same type but not same activity? If so do you have data that helps distinguish between them?
o For task types of “passive-sensor”, what is the data being collected and through what mechanism (e.g. a different app on the phone such as apple health, or through a device such as FitBit)?
• The time of day is a good indicator of each tasktype being performed, but another interesting indicator to assess engagement is likely to be duration spent on each page/activity and the time to move between activities. From this data this isn't possible, but is this the re "less clean" data that is available?
• For Brighten, an RCT, I see in the published paper there are withdrawals during the study. If so are these patients in the datasets, and is there a way to identify them, and when they withdraw?
• In start, it is not clear what "Casestatus" refers to?
• The “list_of_healthCodes_tobe_removed” dataset, does the name mean the dataset is not relevant? Or does it refer to something that needs to be cleaned in these datasets?
I apologise if this is not appropriate for the discussion thread.
Thank you,
Jack