I still a bit confused with the rule of not using 84 gene in situ data. So, what I'm thinking is like this. There are 2 tasks here:
1. I use expression data from the dge normalized data file to extract 20/40/60 genes from 84 genes
2. I use the extracted 20/40/60 gene expression data to find matching position compared to bdtnp file using some classification algorithm
Is this correct? Or should I select 20/40/60 genes in the first step other than 84 genes? Meaning I need to remove listed 84 genes and then do feature extraction to select 20/40/60 new genes?
Thank you.
Created by Bharata Kalbuaji barbarian You can use geometry.csv indeed, I meant that you can only use geometry.txt and nothing else from the challenge. Not very useful though... Hi, Pablo
From your comment "No, you have to select your 20/40/60 genes wo using any other information from geometry.csv",
'geometry.csv' shouldn't be 'bdtnp.txt?'
Can't we use 'geometry.txt' for selecting 20/40/60 genes also with "dge_normalized.txt"?
Thanks
Yes 2) seems ok, remember not to overfit as gene-patterns are also important
P Thanks for your quick reply. So when the genes are selected, it’s valid to do 2) ?
(Correction for the previous post: bdtnp.txt and geometry.txt files, not bdtnp.csv and geometry.csv) No,
you have to select your 20/40/60 genes wo using any other information from geometry.csv
P Hi Pablo,
Is it valid to
1. use 'bdtnp.csv' and 'geometry.csv' to select 20/40/60 genes?
2. learn a model using 'bdtnp.csv' and 'geometry.csv' to predict locations for 'dge' files, and only 20/40/60 genes selected above are included in the model?
Thanks. The way you are thinking about it seems right to me
thanks