Hi,
When I plot the 1st and 2nd principal components of the normalized ROSMAP data, the batches (plates 1-6 and plates 7-8) make two clear clusters, as shown in the image https://homes.cs.washington.edu/~safiye/ROSMAP-PCs.JPG, although batch effect correction is performed while normalizing the FPKM values. How can we explain this separation?
Thanks!
Created by safiye celik safiye Plate 1-6 were sequenced around 2012 and Plate7-8 were done around end of 2014. Some of our analysis used Plate 1-6 only. You can correct samples all together if you prefer, there is no reason that you can not correct them together.
-Jishu
Hi JIshu,
Thanks for the reply. I am also wondering whether there is a specific reason why plates 1-6 and 7-8 were separated, but not all plates (1-8) corrected together. Looking forward to your reply. Thanks. Hi Ben and Safiye,
Here is our data processing pipeline.
Start with RSEM FPKM matrix,
1) Filter genes/isoforms if variance(sd()) <0.1 and mean/median <0.1 and >80% of samples have 0 expression values.
2) quantile normalization (qormalize.quantiles() function in R)
3) assign a small value, such as 0.0001, to values <0.0001 in post-quantile normalized FPKM values, then take log2
4) Run Combat with Batch as co-variance only then take anti-log2.
Hope it help. If not, let me know.
Thanks
-Jishu
Hi Safiye,
I have contacted @xujishu who may be able to answer your question more exactly since the Broad-Rush team performed that correction.
Thanks,
Ben Hi Ben, another question I have is: Is there a specific reason why plates 1-6 and 7-8 were separated, but not corrected altogether? I really appreciate your response. Thank you. Hi Ben,
As far as I know, ComBat was developed to remove batch effects from microarray data. Could you let me know the procedure you used to make use of ComBat for batch effect removal on the FPKM values in ROS/MAP RNA-Seq? Did you perform log-transformation first? That's what I did (I added 1 to all FPKMs and then log-transformed), but then all exact zero counts are removed by ComBat, and they are somewhere between -0.36 and 4.45 (after exp-transformation and subtracting 1) in the cleaned data, but there are no exact zeros. Having no zeros is expected, since ComBat standardizes the data first. However, in the cleaned ROS/MAP RNA-Seq data on the Synapse web site, there are lots of zeros, although this one is also batch effect corrected by ComBat. So, I am wondering what the difference is between your and my application of ComBat. Thanks! Thanks, Ben! This information helps a lot. After a global batch effect correction by using plates 1-6 as batch 1 and plates 7-8 as batch 2, the clustering of those two batches is gone, which is expected. Here it is: https://homes.cs.washington.edu/~safiye/ROSMAP-PCs-aftercorrection.PNG Dear Safiye,
The batches of plates 1-6 and plates 7-8 were batch corrected separately for individual plates (e.g. plates 1-6 and 7-8 respectively), but not across all plates.