Dear organisers, I would like to know how will the input of Task 3 be. Is it like the training cases, i.e., individual frames? Or do we need to process the entire video? Thanks!

Created by André Filipe Sousa Ferreira ShadowTwin41
The frames in the test set are sampled with the same frame rate as the validation set, both starting at frame 0. This means that the frame numbers sampled in the validation and test set match. Regarding the time limit: we request that evaluations for a task complete within a day.
Is there any way for us to know which frames were selected? What's the time limit?
Close, but not quite: Yes, the output is a csv file per video. Within a file the format matches the MOT challenge format. The only difference is that we've added keypoints to a row. Each row holds information for a tracked object. Depending on how many tracked objects are within a frame, you may have more than one row per frame. Additionally, the test set is sampled with the same frame rate as the validation set, so you don't have to evaluate each frame. However, you will also not be penalized for evaluating on all frames. Extra frames included in your csv that are not present in the ground truth file are dropped during HOTA calculation. Just be sure that you stay within time limits. I'll add this clarification to the wiki page as well. I hope this answered your question, and don't hesitate to follow up if there are any other concerns.
So, the output per video is a csv file with a row per frame for all frames?
Hi, The input will be videos found directly in the /input directory given to the docker. More information to the input and output of all tasks can be found here: https://www.synapse.org/Synapse:syn66256386/wiki/631726

Task 3 input page is loading…