Dear organizers,
could you please briefly recapitulate the data artifact that you observed in the training and test data?
Also, could you describe how you adjusted for this in the scoring script and release the new scoring script?
Thank you very much!
Wolfgang
Created by Wolfgang Kopp wkopp Great job @YuanfangGuanandMarlenaDuda ! i think i used that field. but i don't think it really increased our result. probably 0.001 my gut feeling at most.....
i think i added max, min, variance, and time at the last stage, in total they increased 0.01. i am just reading the webinar slides, looks like it actually dropped our performance, which i think is also possible.
I have to say I am shocked at the performance. I know I am ~0.85 probably in the first two weeks when I did this challenge. In the end, I feel I am at 0.87 so, as I didn't have idea/time to further improve. But I didn't expect the difference could be so big. Thanks all!
'They had minimal impact on the final performance (0.87 vs. 0.86) on the age range that is tested, and 0.89 versus 0.88 on all ages.' it is digit-wise consistent with my estimation.
Hi Wolfgang-
I'm not sure we can rule out the possibility that even in the absence of including meta-data explicitly, a feature set could still be picking up effects correlated with time/filehandleId. Or perhaps there's another reason a feature set isn't as predictive in this subset. We can certainly continue to explore issues related to confounding in the community phase. We're not claiming ours to be the optimal approach, but instead was a sanity check to make sure the winning models weren't driven by this confounding, and we decided not to award challenge placement on this criteria for that reason.
Solly Ok, but unless one explicitly uses the filehandleId (integer value) to extract features, this artifact should be obsolete, shouldn't it?
For instance, our team only used as input the time-series w/o any meta-data. Still the performance seemed to have dramatically reduced.
Unless any other properties of the dataset have been changed due to the subsetting, I don't understand where the performance discrepancy comes from.
Wolfgang The issue is that PD status is correlated with time and thus the filehandleId, which are increasing numeric values related to the time a test was performed. This is strictly a data artifact and contain no information about the walking test, but is still predictive of PD status. To explore this, we selected a subset of the recordIds (deviceMotion_walking_outbound.json.items <= 2452481 in the training and <= 2580471 in the test) which show no statistical difference between PD and control, so that metadata should not be predictive in this subset.
We're working to get the slides and recording of the webinar as well as the scoreboard shared so you also have that reference.
Solly
Drop files to upload
Robustness of the score for subchallenge 1 page is loading…