Hello,
Our group downloaded and processed ROSMAP RNA-seq fastq files for samples in batch1 (DLPFC bulk tissue) in 2018. Now there are many more samples added to batch1 (before we had a little over 400 samples, and now batch 1 has a little over 600 samples).
However, the samples that we have not processed yet do not have fastq files, and instead have tophat aligned bam files. Can get access to the raw fastq files for all batch1 DLPFC samples? Is there anyone we can contact for this data?
This is in order to consistently process samples. We are finding that running bam to fastq software on the tophat aligned bam files is both computationally intensive and we get discrepancies in the fastq files (they seem to be missing some reads).
Thank you.
Created by Grace Xiao gracexiao99 Hello,
Could you please help to understand the ROSMAP data available in the RNA-seq Harmonization Study that you linked? Specifically I am interested in the bam files here: https://www.synapse.org/#!Synapse:syn8540863
It appears that for ROSMAP there are 639 bam files after reprocessing. Are these all from different brain regions? Also, since there are many more samples in the original ROSMAP cohort (even for 1 brain region), is there a specific reason why these 639 samples were chosen?
Thank you.
Hello,
Thank you for your help. The synapse ID for the bam file I am referring to is syn4212541. I found that indeed re-running bam-to-fastq methods using bedtools and samtools both give me the same results and number of reads (42.6 million) while the fastq file I downloaded a while ago had 22.1 million reads which were a subset of the reads in the current tophat bam files.
I understand that you do not have the ability to share the original fastq files. I will use the harmonized grch38 bam files instead. Thank you. hi @gracexiao99,
Could you post the Synapse ID for one of these BAM files that you are finding this discrepancy? I want to make sure that 1) we aren't accidentally sharing a second set of processed BAM files and 2) that there isn't an issue with the files. Unfortunately, we do not have access to the original FastQ files, but the BAM files should contain all of the reads that you need. I suspect that these discrepancies are due to differences in the behavior of bedtools vs samtools (whether the conversion retains unaligned reads, secondary alignments, etc.) but I will take a look and see if I find anything.
Best,
Will Hi Grace, I'm still waiting for input from a colleague on the read discrepancies between your fastqs and the bams, but in the meantime, we recently released harmonized GRCh38-aligned bams and raw gene counts data from ROSMAP, MSBB, and Mayo RNAseq data. The original ROSMAP bams you asked about were part of this reprocessing effort. [The RNAseq Harmonization Study](https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage?Study=syn21241740). Thank you so much for your reply.
Is there a way to get fastq files for all samples in batch 1?
Here is an example of a discrepancy I see:
Originally, I have the fastq file for the sample labeled as "x" which has 22 million paired-reads in its fastq files. When I download the tophat bam file for the same sample and run bam to fastq using 2 separate methods (bedtools or samtools), these methods give me 41.2 million reads and 42.6 million reads for the methods, respectively. This makes it very difficult to make the original fastq files comparable to the new fastq files.
It would be very helpful to have original fastq for all the data.
Thank you. Hi there, unfortunately for batch 2 (syn21188662) of the ROSMAP bulk RNAseq, we only have bam files. @wpoehlm , can you help advise on the discrepancies after conversion to fastq?