I would like to download the bulk brain RNAseq from ROSMAP dataset.
As far as I`ve seen, there are only processed files available now (BAMs and FASTQs generated from BAMs).
Is unmapped reads data preserved separately anyhow anywhere? Would it be possible to access it?
Thank you in advance for your reply.
Created by Javier Simón-Sánchez simonsanchezj Thank you! They were kept for the MayoRNAseq and MSBB studies as well. For MSBB they are in separate files. Thank you so much!! @ben.logsdon Were they kept for MayoRNAseq and MSBB studies as well? Thank you! The unaligned original reads were kept in the ROSMAP BAM files. @millerh1 Thank you, I'll let you know when I test this I have not yet -- I think that they used bwa, so if they ran it with default settings I think unmapped reads will be present in the bams with a quality score of 0 and can be extracted using a samtools flag [Relevant discussion](https://www.biostars.org/p/227079/). I haven't had a chance to test this yet with these files though. I also have this question - @millerh1 did you find out whether the original reads were kept in the bam files?
Thank you! Thank you for looking into it! Do you have the information about what settings were used to run Bowtie during read alignment? Perhaps @larssono or @leiyu could chime in since they appear to be the main contributors to the project dataset. I am unaware otherwise how to locate this information. Those are the only files available I apologize for not being clearer -- the issue with those bam files would be that they were aligned with Bowtie in a manner which may have filtered out many reads of interest (I say 'may' because I cannot seem to locate any information about what setting were used in Bowtie for this study). They could be considered 'processed' bam files, meaning that you are unable to extract the original reads from them. However, 'raw' bam files would contain all the original reads and could be converted back to the original fastq files. Many studies for which only bam files are available offer a set of 'raw' bam files to allow users to recover the original fastq reads. I am wondering if this study offers any sort of access to 'raw' bam files so that users can recover the original fastq reads. The bam files are available through the ROSMAP study: syn3388564 Thank you! I was wondering then, if the original fastq files are unavailable, are there 'raw' BAM files saved somewhere which did not undergo a filtering step such that repetitive genomic elements and fusion reads would be included in them? We would likely be able to extract the original unfiltered reads from those if they exist. @millerh1 and @simonsanchezj, the original fastq for this study are not available which is what necessitated the fastq regeneration I have the same question -- have you gotten an answer to this yet? @Mette ,thank you. Are there any unmapped fastqs remaining from the very first processing and BAM file generation?
I would love to have a look at all reads, multimappers included, if it is possible. In my current understanding, the BAM files syn3388564 are uniquely unambiguously mapped reads only? The data available are the bam files generated by the contributor - syn3388564, and the regenerated fastq from the rnaSeqReprocessing study-syn9702085