I would like to ask whether the following features are able to be the feature input of model 1.race 2.age 3.NIH Racial Category 4.NIH Ethnicity Category

Hi Michael, There are a number of datasets that were normalized and provided for the challenge and there will certainly be a "project" effect. Keep in mind that the validation datasets were independently generated, so if you use the project column in your model, you will overfit to the datasets in the training data. You are welcome to use whatever information we provided in your model, just keep in mind the issue of overfitting to the training data. Cheers, Jim
@james.costello I want to get a bit more guidance regarding this statement. There is a column in the metadata entitled "project" that represents the deidentified study from which the specimens are derived from. This particular column yields a relatively high variable importance score (VARIMP) in my model tuning and feature selection processing. That being said, I am unsure if this particular feature column should be allowed in training of the PTB prediction models, as this variable is a representation of the effects from the meta-analysis, and not a true indicator of PTB. Also a bit more obviously, the week of delivery (delivery_wk) is also included in the metadata. Thanks, Michael
Hi Robin, Yes, if the data are provided, then you are welcome to use it in your models. Keep in mind that you are encouraged to think about engineering your own new features from the features that were provided as well. Kind Regards, Jim on behalf of the organizing committee

