I remapped the ROSMAP RNAseq bams from ?ROSMAP RNAseq BAM files? to GRCh38. I noticed that
150_120419_0_merged.bam syn4212582
and
150_120419.bam syn4212581
are the same file. In my processing this sample has ~74M uniquely mapped reads in genes.
However, in the reprocessed data from ?AMP-AD human RNAseq re-processed data? with counts given in
rosmap_counts_matrix.txt syn6112813
this sample 150_120419 has ~145M uniquely mapped reads in genes which is about twice as many as I calculated.
Did syn6112813 merge the data from syn4212581 and syn4212582 and essentially double count this sample or is one of syn4212582 and syn4212581 the wrong file? The rest of the samples I processed have similar total counts to the counts given in the reprocessed data. The next largest number of total counts in these samples is ~80M.
Created by Derek Drake ddrake Hi @Raymond2018 - that does indeed look like a duplicate sample. The RNAseq reprocessing was done by converting the bams -> fastq and then realign. That may explain the small difference in the realigned data Hi, Derek & Dang,
I met with similar question in RNASeqReprosessing,
syn11253004: originated from 150_120419.bam [syn4212581]
syn11253751: originated from 150_120419_0_merged.bam [syn4212582]
From the MD5, indeed, the two original bam files should be the same.
However, the quantification file from the RNASeqReprocessing showed a small differences.
Does anyone here help to explain this confusion please?
regards,
Raymond Yuan SHANG
Dear Derek,
Apologies for the delay; we have been looking into some issues with the reprocessed data. It is probably the case that the counts in syn6112813 are duplicated from the two copies of 150_120419.bam. However, this file is now deprecated and will be replaced with a new one to be released here: syn7554308. We will also remove one of the copies of 150_120419.bam. Thanks for bringing this to our attention. Did this ever get resolved?
Thanks,
Derek Hi Derek,
We are looking into this and will get back to you shortly.
Thanks,
Ben