Hi @abby.vanderlinden, Thank you for your reply back. I think after looking more closely at the headers of some of the .bam files I was able to figure out that they were already sorted using samtools and the duplicates were marked using Picard. I used the following command to do that: ``` samtools head R#######.bam ``` Header file would look like below for sample R#######.bam: ``` @HD VN:1.4 SO:coordinate @SQ SN:1 LN:249250621 @SQ SN:2 LN:243199373 @SQ SN:3 LN:198022430 @SQ SN:4 LN:191154276 @SQ SN:5 LN:180915260 @SQ SN:6 LN:171115067 @SQ SN:7 LN:159138663 @SQ SN:8 LN:146364022 @SQ SN:9 LN:141213431 @SQ SN:10 LN:135534747 @SQ SN:11 LN:135006516 @SQ SN:12 LN:133851895 @SQ SN:13 LN:115169878 @SQ SN:14 LN:107349540 @SQ SN:15 LN:102531392 @SQ SN:16 LN:90354753 @SQ SN:17 LN:81195210 @SQ SN:18 LN:78077248 @SQ SN:19 LN:59128983 @SQ SN:20 LN:63025520 @SQ SN:21 LN:48129895 @SQ SN:22 LN:51304566 @SQ SN:X LN:155270560 @SQ SN:Y LN:59373566 @SQ SN:MT LN:16569 @SQ SN:GL000207.1 LN:4262 @SQ SN:GL000226.1 LN:15008 @SQ SN:GL000229.1 LN:19913 @SQ SN:GL000231.1 LN:27386 @SQ SN:GL000210.1 LN:27682 @SQ SN:GL000239.1 LN:33824 @SQ SN:GL000235.1 LN:34474 @SQ SN:GL000201.1 LN:36148 @SQ SN:GL000247.1 LN:36422 @SQ SN:GL000245.1 LN:36651 @SQ SN:GL000197.1 LN:37175 @SQ SN:GL000203.1 LN:37498 @SQ SN:GL000246.1 LN:38154 @SQ SN:GL000249.1 LN:38502 @SQ SN:GL000196.1 LN:38914 @SQ SN:GL000248.1 LN:39786 @SQ SN:GL000244.1 LN:39929 @SQ SN:GL000238.1 LN:39939 @SQ SN:GL000202.1 LN:40103 @SQ SN:GL000234.1 LN:40531 @SQ SN:GL000232.1 LN:40652 @SQ SN:GL000206.1 LN:41001 @SQ SN:GL000240.1 LN:41933 @SQ SN:GL000236.1 LN:41934 @SQ SN:GL000241.1 LN:42152 @SQ SN:GL000243.1 LN:43341 @SQ SN:GL000242.1 LN:43523 @SQ SN:GL000230.1 LN:43691 @SQ SN:GL000237.1 LN:45867 @SQ SN:GL000233.1 LN:45941 @SQ SN:GL000204.1 LN:81310 @SQ SN:GL000198.1 LN:90085 @SQ SN:GL000208.1 LN:92689 @SQ SN:GL000191.1 LN:106433 @SQ SN:GL000227.1 LN:128374 @SQ SN:GL000228.1 LN:129120 @SQ SN:GL000214.1 LN:137718 @SQ SN:GL000221.1 LN:155397 @SQ SN:GL000209.1 LN:159169 @SQ SN:GL000218.1 LN:161147 @SQ SN:GL000220.1 LN:161802 @SQ SN:GL000213.1 LN:164239 @SQ SN:GL000211.1 LN:166566 @SQ SN:GL000199.1 LN:169874 @SQ SN:GL000217.1 LN:172149 @SQ SN:GL000216.1 LN:172294 @SQ SN:GL000215.1 LN:172545 @SQ SN:GL000205.1 LN:174588 @SQ SN:GL000219.1 LN:179198 @SQ SN:GL000224.1 LN:179693 @SQ SN:GL000223.1 LN:180455 @SQ SN:GL000195.1 LN:182896 @SQ SN:GL000212.1 LN:186858 @SQ SN:GL000222.1 LN:186861 @SQ SN:GL000200.1 LN:187035 @SQ SN:GL000193.1 LN:189789 @SQ SN:GL000194.1 LN:191469 @SQ SN:GL000225.1 LN:211173 @SQ SN:GL000192.1 LN:547496 @SQ SN:NC_007605 LN:171823 @RG ID:50302428 PL:ILLUMINA LB:50302428_541 SM:50302428 @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.738(86a30760afd8c3002421e207b7557896544a3805_1406042774) CL:picard.sam.MarkDuplicates INPUT=[bam/50302428_sorted.bam] OUTPUT=bam/50302428.bam METRICS_FILE=/dev/null REMOVE_DUPLICATES=false VALIDATION_STRINGENCY=LENIENT PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false ``` And then it says toward the bottom that the file has already been sorted and marked with duplicates as follows: ``` CL:picard.sam.MarkDuplicates INPUT=[bam/50302428_sorted.bam] OUTPUT=bam/50302428.bam ``` In that case I think I've answered my own question. I also figured out how to remove the PCR duplicates afterward and still retain a fairly big file size after by using the following command: ``` samtools rmdup -s R#######.bam R#######.nd.bam ``` If Drs. Xu or Klein can add anymore information I need to know regarding these (?) .bam files before proceeding with using others peak calling tools, then that would be great. Otherwise, thank you for your reply and have a good rest of your week. -Phoebe

Hi there, I unfortunately don't have more details on the ROSMAP ChIPseq data, but hopefully someone from the Rush or Columbia teams can help. I see @xujishu uploaded the bam files originally, and @haklein's manuscript is cited in the reference. Drs. Xu or Klein, do you have any additional information you can share to help Pheobe with this ChIPseq data? Thanks, Abby

