I am wondering if there is more information for batches in syn27034471.
Currently there are two kinds of batch information provided for the entry, libraryBatch and sequencingBatch. For the former, 2,187 out of the 2,826 rows have the column empty. For the later, 1,194 rows have "NYGC3" while other batches only have less than 300 subjects. Are there any more batches present in the NYGC3 sequencing batches? Should we use sequencingBatches for batch correction?
Thanks so much for the response.
Created by Qi Wang qwang178 Ah, you're right -- I mis-read my dataframe groupings. But yes, the sequencing batches are hierarchical to the contribution batches. Hi Abby,
Thanks for the reply. The 1,194 NYGC3 samples are actually all from data contribution batch2. For data contribution batch 2-4, their corresponding sequencing batch distributions are as follows:
1194 NYGC3 data contribution batch 2
13 NYGC1 data contribution batch 2
273 9 data contribution batch 2
94 NYGC2 data contribution batch 2
120 RISK_3 data contribution batch 3
136 RISK_4 data contribution batch 3
93 RISK_2 data contribution batch 3
168 15 data contribution batch 4 (ROSMAP_CognitiveResilience)
80 16 data contribution batch 4 (ROSMAP_CognitiveResilience)
So it looks like the sequencing batches are totally hierarchical to the data contribution batch and adding data contribution batch to them won't affect the final batch structure. Nevertheless, thanks for the information, we'll use the sequencing batch for batch correction.
Qi
Hi Qi,
The original ROSMAP RNAseq data was provided at the very beginning of the AMP-AD project, before we had standardized metadata. We've done our best to retrofit it, but there are some areas where it is still a little vague. The "notes" column in the metadata for this study contains information on the "data contribution batch" or "data cut", e.g. one of four times that data was submitted to the AD portal. For the specimens in data contribution batch 1, the sequencingBatch number is actually the batch samples were sequenced in. For data contribution batches 2-4, the sequencingBatch value is the same for all samples from that contribution batch, and so not likely indicative of the actual sequencing batch.
Descriptions of library prep and sequencing parameters for the ROSMAP bulk brain RNAseq can be found on the folder wikis [here](https://www.synapse.org/#!Synapse:syn3388564) and [here](https://www.synapse.org/#!Synapse:syn24175554). I know our internal scientists working on the harmonized DEG analysis across these three studies uses a combination of data contribution batch + sequencingBatch to create a "final batch" parameter, which is then used for batch correction.
Abby