Hi @nrappapo ,
Reposting your question here on the community forum.
```
Hi,
I'm looking at what I believe to be the raw counts from the Mayo clinic temporal cortex samples: Mayo_TCX_all_counts_matrix.txt, downloaded from here:
https://www.synapse.org/#!Synapse:syn12104376
The first 4 lines are the STAR output for the nonspecific binding (unmapped, multimapping, noFeature, ambiguous) and then, if I understand correctly, the specific counts per gene for each sample.
However, when I look at the counts, I am a little confused, as the numbers for the non specific binding are much higher. For example, for the first sample, 1005_TCX:
N_unmapped 2675822
N_multimapping 17597859
N_noFeature 6860299
N_ambiguous 3245162
But if I sum the counts for all genes in this sample, I get 129670, which is quite a low percentage. As far as I read online, this doesn't look right, and I was wondering if you had any insight about it.
Also, do you know what is the difference than how this file was obtained:
https://www.synapse.org/#!Synapse:syn4650257
```
For the first section of the question, looking at [syn8690799](https://www.synapse.org/#!Synapse:syn8690799). I get 40,185,486 counts (sum of rows 6:60730) for patient 1005_TCX. Do you mind sharing your code and confirming the file synID?
For the second part, the provenance for [syn8690799](https://www.synapse.org/#!Synapse:syn8690799) indicates that the counts were combined with [syn9757876](https://www.synapse.org/#!Synapse:syn9757876) and the counts were tabulated across the BAM files with this shell script: [run star mayo](https://www.synapse.org/#!Synapse:syn9757879) the input BAM files are here: [syn4894912](https://www.synapse.org/#!Synapse:syn4894912). The file MayoRNAseq_RNAseq_TCX_geneCounts.tsv ([syn4650257](https://www.synapse.org/#!Synapse:syn4650257)) doesn't appear to have any provenance associated with it so I'm not how the counts were tabulated. It was created by @bheavner so perhaps they can point us in the right direction!
best,
Jake
Created by Jake Gockley jgockley Hi, I am also wondering where the file MayoRNAseq_RNAseq_TCX_geneCounts.tsv (syn4650257) comes from and what is the difference wrt syn8690799. Is there any follow-up?
Thanks
Giulio thank you Jake !
I found my bug (R newbie here...), so in case someone ever faces the same issue, the solution was adding "stringsAsFactors = FALSE" to read_table. When I didn't do that, the columns loaded as factors, and the conversion to numeric values messed things up. All good now !