Thanks for releasing the data promptly. However I have a few questions regarding the raw fastq files in syn53254216.
From a quick glance, there are at least 28 files with very small file size:
syn53282231 NPSAD-20201022-A1-cDNA_AGAATACAGG_HH7C2DSXY_L004_001.R1.fastq.gz fileSize=107
syn53281539 NPSAD-20201019-C2-cDNA_GAGTGACCTA_HMTHNDSXY_L003_001.R1.fastq.gz fileSize=116
syn53282232 NPSAD-20201022-A1-cDNA_AGAATACAGG_HH7C2DSXY_L004_001.R2.fastq.gz fileSize=117
syn53285392 NPSAD-20201029-C2-cDNA_AGGTGTCTGC_HNYM3DSXY_L004_001.R1.fastq.gz fileSize=117
syn53282232 NPSAD-20201022-A1-cDNA_AGAATACAGG_HH7C2DSXY_L004_001.R2.fastq.gz fileSize=117
syn53285392 NPSAD-20201029-C2-cDNA_AGGTGTCTGC_HNYM3DSXY_L004_001.R1.fastq.gz fileSize=117
syn53295063 NPSAD-20201104-C1-cDNA_TGGCATTCAC_HLNJLDSXY_L004_001.R1.fastq.gz fileSize=118
syn53281540 NPSAD-20201019-C2-cDNA_GAGTGACCTA_HMTHNDSXY_L003_001.R2.fastq.gz fileSize=139
syn53285393 NPSAD-20201029-C2-cDNA_AGGTGTCTGC_HNYM3DSXY_L004_001.R2.fastq.gz fileSize=147
syn53295064 NPSAD-20201104-C1-cDNA_TGGCATTCAC_HLNJLDSXY_L004_001.R2.fastq.gz fileSize=157
syn53285854 NPSAD-20201030-A2-cDNA_CTACTAGAGT_HLMHHDSXY_L004_001.R1.fastq.gz fileSize=227
syn53285855 NPSAD-20201030-A2-cDNA_CTACTAGAGT_HLMHHDSXY_L004_001.R2.fastq.gz fileSize=249
syn53282023 NPSAD-20201021-A1-cDNA_GAATGTTGTG_HH7C2DSXY_L002_001.R1.fastq.gz fileSize=255
syn53282024 NPSAD-20201021-A1-cDNA_GAATGTTGTG_HH7C2DSXY_L002_001.R2.fastq.gz fileSize=342
syn53281352 NPSAD-20201015-C2-cDNA_TATGCGTGAA_HH7C2DSXY_L002_001.R1.fastq.gz fileSize=445
syn53281353 NPSAD-20201015-C2-cDNA_TATGCGTGAA_HH7C2DSXY_L002_001.R2.fastq.gz fileSize=453
syn53284206 NPSAD-20201029-A1-cDNA_CTCCTGCCAC_HMTHNDSXY_L004_001.R1.fastq.gz fileSize=559
syn53284207 NPSAD-20201029-A1-cDNA_CTCCTGCCAC_HMTHNDSXY_L004_001.R2.fastq.gz fileSize=723
syn53282028 NPSAD-20201021-A2-cDNA_TGTCGGGCAC_HH7C2DSXY_L003_001.R1.fastq.gz fileSize=1497
syn53282029 NPSAD-20201021-A2-cDNA_TGTCGGGCAC_HH7C2DSXY_L003_001.R2.fastq.gz fileSize=2003
syn53283376 NPSAD-20201027-C1-cDNA_AGGACGAAAC_HLNJLDSXY_L003_001.R1.fastq.gz fileSize=2727
syn53283377 NPSAD-20201027-C1-cDNA_AGGACGAAAC_HLNJLDSXY_L003_001.R2.fastq.gz fileSize=3517
syn53283996 NPSAD-20201028-C1-cDNA_AATACAACGA_HLM3WDSXY_L001_001.R1.fastq.gz fileSize=3943
syn53283605 NPSAD-20201028-A1-cDNA_GCTTGTCGAA_HLM3WDSXY_L001_001.R1.fastq.gz fileSize=4272
syn53283997 NPSAD-20201028-C1-cDNA_AATACAACGA_HLM3WDSXY_L001_001.R2.fastq.gz fileSize=5203
syn53283606 NPSAD-20201028-A1-cDNA_GCTTGTCGAA_HLM3WDSXY_L001_001.R2.fastq.gz fileSize=5482
syn53280952 NPSAD-20201013-A1-cDNA_CCGATGGTCT_HLM3WDSXY_L003_001.R1.fastq.gz fileSize=22677
syn53280953 NPSAD-20201013-A1-cDNA_CCGATGGTCT_HLM3WDSXY_L003_001.R2.fastq.gz fileSize=29780
syn53421304 NPSAD-158-C2-HTO_CTGAAGCT_HFWF5DSX2_L002_001.R1.fastq.gz fileSize=737526
syn53421305 NPSAD-158-C2-HTO_CTGAAGCT_HFWF5DSX2_L002_001.R2.fastq.gz fileSize=1421528
I am wondering if any of these files could be truncated. MD5 checksum are the same. However the downloaded fastq files are very short.
How do we make sure the remaining files are correct too?
Thanks,
Created by Qi Wang qwang178 That file needs to be updated. I sent the updated version to Sage earlier today so it should be fixed soon. I've sent it directly to you just now via e-mail. Briefly, each pool has been produced in replicate so each sample is associated with 2x 4 files (R1/R2 HTO and R1/R2 cDNA). @bendl Thank you for the clarification. Another question for the meta data. In syn53446136 (NPS-AD_nucleus_hashing_metadata.csv), the 5th and 7th columns should be R2 reads, however they are all recorded as R1 in the table. Are they typos in some way? Hello Qi,
Those files are not truncated - they are fine in terms of file integrity. However, they are associated with a small fraction of pools that failed due to various technical reasons (wet lab and/or sequencing center). We aimed to release all raw data "as is" so we included them in the data release.
When responding to your questions, I've made some checks of FASTQ files and noticed that we omitted to upload two files (NPSAD-20201013-A1-B-HTO_TCTCGCGC_HLM3WDSXY_L003_001.R2.fastq.gz, NPSAD-20201016-A1-HTO-B_AGCGATAG_HLMHHDSXY_L002_001.R2.fastq.gz). We are submitting them now but it might take a few days before they show up here - please make sure you download them as well (after they become available).
Hi Qi Wang,
I'm not entirely certain why some of these files seem so small; the data contributor, Jaro Bendl (@bendl), is likely the best person to answer your questions.
Best,
Jessica