1) Is there is a relation between the values of "Donor ID" in the MTG and A9 data files? I.e., , does ID=1 mean they are samples taken from the same donor? I see ID=0 to 33 look like similar donors in both the files, but for ID>33 entries like "Age at Death" start differing between the two files.
2) Will the area of brain be specified in test and validation set? If so, for a given donor, would there be rna-seq data sampled from both MTG and A9?
Basically, I'm wondering whether to treat A9 and MTG files coherently, or treat them as distinct data sets.
Thanks!
Created by Nikhil Karthik nkck Hi @ktravaglini
Regarding your note:
"The test and validation datasets may be from different cortical areas and/or have different donors."
Could you please clarify whether the donors in the validation and test sets (used for the Leaderboard) are completely independent from the 84 donors in the training set?
If the validation and test set donors are included among the training set donors, the clinical metadata (e.g., age, PMI, etc.) in a model may introduce a bias in the predictions, which could make the benchmarking less meaningful.
Thank you very much for your clarification. Hi @igarachv, The final validation and test datasets each come from a single brain region and you will be asked to make predictions on just that region when presented with that data. The region will remain unknown to the models. In terms of whether the model you submit should be trained on both MTG and DFC or use one for training and one for validation is up to you! -Kyle Hi @ktravaglini and community,
Thank you for the clarification about donor IDs lining up between the MTG and A9 datasets. I have a follow-up question regarding model input and predictions:
When testing the model, will I receive RNA-seq data from both brain areas (MTG and A9) for a given donor and be asked to make a single prediction combining both? Or will we have only data from one area?
If the model receives data from only one area, will it be indicated which area the data comes from?
Essentially, I want to understand whether I should treat MTG and A9 as distinct datasets for modeling, or whether they should be coherently integrated for prediction purposes.
Thanks in advance! Hi @nkck,
1) The donors and related metadata line up between the MTG and A9 datasets. I'm not sure how you are loading the files, but the IDs should look like this:
Categories (84, object): ['H19.33.004', 'H20.33.001', 'H20.33.002', 'H20.33.004', ..., 'H21.33.044',
'H21.33.045', 'H21.33.046', 'H21.33.047']
2) The test and validation datasets may be from different cortical areas and/or have different donors. The areas will have the same cellular taxonomy, have the same features/genes, and have the same measures of pathology. The actual AnnData file will be formatted identically.
Best,
Kyle
Drop files to upload
Relation between data from the 2 brain areas? page is loading…