My reading of the leaderboard and method descriptions :
Deep learning outperformed feature engineering in the mPower(iPhone) challenge where a fair amount of data (tens thousands records) is available
Thousands wearables records are still not enough itself. So data augmentation is beneficial.
When less than thousand wearable sensor records per task available are available, manual signal analysis and feature engineering outperform deep learning. Though transfer learning (e.g. sharing DNN weights between tasks) was not explored
Any comments?
Created by Vladimir Morozov vmorozov Another observation:
The DNN models on top of short-time Fourier transform were not better (probably slightly worse) than the DNN models using raw data. Nobody mentioned parameter tuning for DNN. Though it will not probably beneficial with little training data (i.e L-Dopa challenge)
Your interpretation is correct @vmorozov from the machine learning perspective; deep learning, due to numerous parameters involved needs more examples. If there were enough data and the data were better-controlled in collecting, I am convinced on data acquired from sensors, deep learning is the most appropriate approach.
It also depends on who implements it. I think if the data were in hands of better techniques, even at current size deep learning will out-perform signal extraction. This is the technical barrier I will crack in Jan for my lab. If there is a voice challenge followed in future years, you will see the performance change.
What you suggest for transferring learning is insightful, and we are developing strategies in extending this to hospital/center-acquired data. I hope Larsson and Solly's side would support us doing this in the long term.
This is really good hypothesis. Another big difference between the two sub challenges is the nature of the data. The L-Dopa dataset had much better control of what participants were doing as they were performed in the clinic under the supervision of staff. The mPower dataset required a lot more "correction" and/or filtering of tasks where people were not performing the exercise as prescribed or where there were confounders that might not be obvious (size of room for walk exercise, difference in footwear, difference in pocket locations etc).