Hi @abby.vanderlinden, Thank you for your reply back. I think after looking more closely at the headers of some of the .bam files I was able to figure out that they were already sorted using samtools and the duplicates were marked using Picard. I used the following command to do that: ``` samtools head R#######.bam ``` Header file would look like below for sample R#######.bam: ``` @HD VN:1.4 SO:coordinate @SQ SN:1 LN:249250621 @SQ SN:2 LN:243199373 @SQ SN:3 LN:198022430 @SQ SN:4 LN:191154276 @SQ SN:5 LN:180915260 @SQ SN:6 LN:171115067 @SQ SN:7 LN:159138663 @SQ SN:8 LN:146364022 @SQ SN:9 LN:141213431 @SQ SN:10 LN:135534747 @SQ SN:11 LN:135006516 @SQ SN:12 LN:133851895 @SQ SN:13 LN:115169878 @SQ SN:14 LN:107349540 @SQ SN:15 LN:102531392 @SQ SN:16 LN:90354753 @SQ SN:17 LN:81195210 @SQ SN:18 LN:78077248 @SQ SN:19 LN:59128983 @SQ SN:20 LN:63025520 @SQ SN:21 LN:48129895 @SQ SN:22 LN:51304566 @SQ SN:X LN:155270560 @SQ SN:Y LN:59373566 @SQ SN:MT LN:16569 @SQ SN:GL000207.1 LN:4262 @SQ SN:GL000226.1 LN:15008 @SQ SN:GL000229.1 LN:19913 @SQ SN:GL000231.1 LN:27386 @SQ SN:GL000210.1 LN:27682 @SQ SN:GL000239.1 LN:33824 @SQ SN:GL000235.1 LN:34474 @SQ SN:GL000201.1 LN:36148 @SQ SN:GL000247.1 LN:36422 @SQ SN:GL000245.1 LN:36651 @SQ SN:GL000197.1 LN:37175 @SQ SN:GL000203.1 LN:37498 @SQ SN:GL000246.1 LN:38154 @SQ SN:GL000249.1 LN:38502 @SQ SN:GL000196.1 LN:38914 @SQ SN:GL000248.1 LN:39786 @SQ SN:GL000244.1 LN:39929 @SQ SN:GL000238.1 LN:39939 @SQ SN:GL000202.1 LN:40103 @SQ SN:GL000234.1 LN:40531 @SQ SN:GL000232.1 LN:40652 @SQ SN:GL000206.1 LN:41001 @SQ SN:GL000240.1 LN:41933 @SQ SN:GL000236.1 LN:41934 @SQ SN:GL000241.1 LN:42152 @SQ SN:GL000243.1 LN:43341 @SQ SN:GL000242.1 LN:43523 @SQ SN:GL000230.1 LN:43691 @SQ SN:GL000237.1 LN:45867 @SQ SN:GL000233.1 LN:45941 @SQ SN:GL000204.1 LN:81310 @SQ SN:GL000198.1 LN:90085 @SQ SN:GL000208.1 LN:92689 @SQ SN:GL000191.1 LN:106433 @SQ SN:GL000227.1 LN:128374 @SQ SN:GL000228.1 LN:129120 @SQ SN:GL000214.1 LN:137718 @SQ SN:GL000221.1 LN:155397 @SQ SN:GL000209.1 LN:159169 @SQ SN:GL000218.1 LN:161147 @SQ SN:GL000220.1 LN:161802 @SQ SN:GL000213.1 LN:164239 @SQ SN:GL000211.1 LN:166566 @SQ SN:GL000199.1 LN:169874 @SQ SN:GL000217.1 LN:172149 @SQ SN:GL000216.1 LN:172294 @SQ SN:GL000215.1 LN:172545 @SQ SN:GL000205.1 LN:174588 @SQ SN:GL000219.1 LN:179198 @SQ SN:GL000224.1 LN:179693 @SQ SN:GL000223.1 LN:180455 @SQ SN:GL000195.1 LN:182896 @SQ SN:GL000212.1 LN:186858 @SQ SN:GL000222.1 LN:186861 @SQ SN:GL000200.1 LN:187035 @SQ SN:GL000193.1 LN:189789 @SQ SN:GL000194.1 LN:191469 @SQ SN:GL000225.1 LN:211173 @SQ SN:GL000192.1 LN:547496 @SQ SN:NC_007605 LN:171823 @RG ID:50302428 PL:ILLUMINA LB:50302428_541 SM:50302428 @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.738(86a30760afd8c3002421e207b7557896544a3805_1406042774) CL:picard.sam.MarkDuplicates INPUT=[bam/50302428_sorted.bam] OUTPUT=bam/50302428.bam METRICS_FILE=/dev/null REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false ``` And then it says toward the bottom that the file has already been sorted and marked with duplicates as follows: ``` CL:picard.sam.MarkDuplicates INPUT=[bam/50302428_sorted.bam] OUTPUT=bam/50302428.bam ``` In that case I think I've answered my own question. I also figured out how to remove the PCR duplicates afterward and still retain a fairly big file size after by using the following command: ``` samtools rmdup -s R#######.bam R#######.nd.bam ``` If Drs. Xu or Klein can add anymore information I need to know regarding these (?) .bam files before proceeding with using others peak calling tools, then that would be great. Otherwise, thank you for your reply and have a good rest of your week. -Phoebe

Hi there, I unfortunately don't have more details on the ROSMAP ChIPseq data, but hopefully someone from the Rush or Columbia teams can help. I see @xujishu uploaded the bam files originally, and @haklein's manuscript is cited in the reference. Drs. Xu or Klein, do you have any additional information you can share to help Pheobe with this ChIPseq data? Thanks, Abby

Clarification needed for syn5958425 BAM filespage is loading…