Hi, I am wondering anyone has information on the batch effects correction on these scRNAseq data? According to the Wiki and the paper (https://www.biorxiv.org/content/10.1101/2023.03.07.531493v1.full.pdf), it says "The following pipeline was executed on the RNA count matrix: normalization and scaling by SCTransform method (with variable.features.n=2000, conserve.memory=T, Seurat package version 445), dimensionality reduction by PCA (Seurat RunPCA, npcs=30), construction of k-NN neighbor graph (Seurat FindNeighbors, dims=1:30) and Louvain community detection clustering (Seurat FindClusters, resolution=0.2, algorithm=1)." There is nothing about how to combine the 60+ batches from the read counts off cell ranger. Is seurat IntegrateLayers() or other tool used? Or the count matrices from different batches were simply combined? I did try to search but failed to find the information. Thank you very much! Li

Created by Li Sun rikku1983
@abby.vanderlinden @gilad.green @JessB Thank you very much for your reply. And thank you for the detailed explanation @gilad.green regarding to this concern. This makes a lot more sense to me now. Thank you all again!
Hi @rikku1983, Sorry for the delayed response! I did not use any integration tool and simply merged the count matrices. As any of the integration tools will "do" something to the data we preferred to first check if we have any serious batch effects and only if so fix. Otherwise to not "correct" for something that might not be there. Overall all people from all batches had all major cell types, and we also did not see any large difference by batch. Then the question of is there a batch effect would also relate to the question of the subclustering analysis and resolution. When I chose the final clustering structure I also checked that I don't have person/batch specific clusters. It is possible that if you would take the data and cluster in a much higher resolution (or say as an extreme only be interested in the K-NN graph) then there would then be splits by batch, and in that case might want to perform some batch correction. For the clustering depth that we described we didn't need that. One other point to take into account with this decision we made is the batch design: 1. Each batch had 8 participants multiplexed in it. 2. Batches had 2 technical replicates (A/B). 3. Each batch had a mix of people with different characteristics (see Supp fig 1a for the traits distribution in the batches) Hope this answers your question. Gilad Green
@masashi Would you be able to help answer this question about how the count matrices for the most recent 465 ROSMAP snRNAseq samples were combined? Thanks! Abby
Hi Li, I'm sorry to hear that you weren't able to find the information you need in the preprint, or in the study description Wiki content. The author of the paper and contributor of the data, @gilad.green, should be able to answer your question. Regards, Jessica

[syn53366818] ROSMAP snRNAseq experiment 2 cell type objects: How are the batch effects corrected? page is loading…