It was said in the email: ' Baseline models (just including metadata: device, session, site, task, visit and deviceSide) do pretty well and you will have to beat these baseline models in your submissions.'.
So I tried baseline model. In fact, it is doing so well, that my signal extraction method fall far way behind (even it is above random). And because it does so well, the signal extraction methods don't even add up to it. It drops the performance of the meta.
Looking at the features, clearly, as another team pointed out, the site (BOS v.s. NYC) which has zero biological meaning, has the strongest signal, which overwrites all other factors, signal or other meta alike, and making other signals all not significant. As a result, in my view it is very possible that no one outperform the baseline, or an even much better baseline with only BOS/NYC and 1-2 other factors. In that case, would the organizers consider to evaluate BOS and NYC separately (because in BOS, all that feature will be 1, and in NYC, all that feature will be 0, so it is not affecting anything), to see whether the signal extraction methods actually have some signals.
thank you.
Created by Yuanfang Guan ???? yuanfang.guan I echo carla's concern, but I don't know how to address that since these info is already released. it becomes impossible to enforce not using this data. to be frank, i am very frustrated too, because no matter what, i cannot beat the clinical data baseline, although my feature extraction is actually meaningful. Hi Carla:
You are completely right. Ideally this information would not be influencing the challenge scoring and for generalizability we would only consider the accelerometer data. But considering that the data is already available it is impossible to know if this information was used to construct features. We choose to make it a requirement that you improve on this baseline model instead of regressing it out. Hello
I have a concern about the use of the baseline data as features. I think the purpose of this challenge was to capture any information related to Bradykinesia, Dyskinesia and Tremor using information of accelerometers in the wrists. By using metadata such as: device, session, site, task, visit and deviceSide, you are incorporating possible bias produced by how you design the experiment. For example, if you decide that all visit=2 where in "OFF" state, you will see higher score values for Tremor in visit 2 which is totally unrelated to the potential of wrist devices to assess/quantify the level of tremor.
To my opinion, the metadata must be used to remove the effects of this information in the features and not be used as features because it is too dependent of the collection data protocol. For example, other correct validated studies remove effects of age and gender that usually introduce bias in the performance.
Carla Hi Yuanfang:
I am sure that there is some time yet, to improve. That being said, I like your suggestion and we will explore options if there is no improvement over baseline.