In other words - will feature selection be performed or will all features submitted be used in training the model? Sample scenario: You've extracted 1,000 features. Some are correlated and others uninformative. A 100 feature subset obtained with feature selection methods obtains better results with most simple models (although ANNs may perform marginally better on the full feature set). Which set of features should we submit?

Created by Max Wang maxwg
This is something that you would have to decide for yourself. If your features are performing worse in a cross-validation or leave one out analysis on the training data it is likely that it will perform worse on the test data as well. Some of the methods in the ensemble are highly regularized and might do fine with highly correlated features others not. We will be releasing some code in the next couple of days that has our scoring metric so that you can evaluate this using the exact methodology.

Should we submit all features or just to best? page is loading…