Hi there,
I am a little bit confused about the dataset.
For example, we have a count matrix named `DS1C` processed from the scRNA-seq experiment. We get `DS1C_p10k` after downsampling by reads (to 10k in total).
Is there any further action taken to generate input files for test? Such as random mask 50% cells to simulate a dropout (`DS1C_p10k_p50mask`)?
The `input-ground truth` pair will be
- `DS1C_p10k_mask`-`DS1C_p10k`,
- `DS1C_p10k_mask`- `DS1C` ,
or
- `DS1C_p10k`- `DS1C` ?
Thanks so much!
Created by zoradeng Hi @zoradeng,
For Task 1, the testing data in the `/input` folder were downsampled either by reads or cells (with different proportions), from the ground truth data that was filtered out cells and genes with 0 counts, as well as mitochondrial genes. Therefore, the file structure of all downsampled/input files will be like below. The more details how the data was prepared could be found in the [Data > Task 1: scRNA-seq wiki page ](https://www.synapse.org/#!Synapse:syn26720920/wiki/620137).
```
/input/
├─ ds1c_p00625_n1.csv
├─ ds1c_p0125_n2.csv
├─ ds1c_p025_n3.csv
├─ ds1c_p07_n1.csv
├─ ...
├─ ds1c_p20k_n3.csv
```
Thank you for your question! I hope it helps.
Drop files to upload
Principles for test dataset construction page is loading…