I have tried using R, FileZilla, Python, and. Unix to get the files I need and start a Seurat single nucleus analysis. None of these software packages are compatible and I continue to get error messages for every single action I try. I can download files by manually downloading them, transferring using FileZilla, and then uploading them into Unity HPCC (https://unity.rc.umass.edu/). However when I try to start importing files for Seurat, nothing is recognized even when using the exact same code others have shared in these discussions. Here is my code for example trying to use R and python to retrieve data from syn21589957.
# Trying to read data in using R
```
data_dir <- '/home/medwards_umass_edu/ROSMAP_July/DATA/DLPFC_Study'
list.files(data_dir) # literally shows "barcodes.tsv", "genes.tsv", and "matrix.mtx"
> expression_matrix <- Read10X(data.dir = data_dir)
Error in Read10X(data.dir = data_dir) :
Barcode file missing. Expecting barcodes.tsv.gz
```
# Trying to get files directly using synapse environment and python just in case something weird happened when downloading files manually
```
>>> import synapseclient
>>> import synapseutils
>>> syn = synapseclient.Synapse()
>>> syn.login(authToken="")
Welcome, Melise9!
>>> files = synapseutils.syncFromSynapse(syn, ' syn21589957 ')
Error occurred while running in a sync context.
Traceback (most recent call last):
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseclient/core/async_utils.py", line 93, in wrap_async_to_sync
return asyncio.run(coroutine)
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseutils/sync.py", line 273, in _sync
entity_type = entity.get("concreteType", None)
AttributeError: 'str' object has no attribute 'get'
Downloading files: 0%| | 0.00/1.00 [00:00, ?B/s]
Traceback (most recent call last):
File "", line 1, in
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseutils/sync.py", line 173, in syncFromSynapse
root_entity = wrap_async_to_sync(
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseclient/core/async_utils.py", line 99, in wrap_async_to_sync
raise ex
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseclient/core/async_utils.py", line 93, in wrap_async_to_sync
return asyncio.run(coroutine)
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/home/medwards_umass_edu/.conda/envs/synapse-env/lib/python3.8/site-packages/synapseutils/sync.py", line 273, in _sync
entity_type = entity.get("concreteType", None)
AttributeError: 'str' object has no attribute 'get'
```
Can someone please explain why this might be happening? This has been incredibly frustrating and I wish there was more documentation on why these files are in the format they are in versus formats that Seurat recognizes for downstream analyses?
Created by Melise Edwards Melise9 Hello,
I can't tell exactly from your description what you are trying and why it's not working. I was able to do the steps I mentioned above to get all the metadata files and merge them, and create a Seurat object with the metadata, without any errors. Here is my code:
```
... do the steps above to call ReadMTX to get a counts matrix
clin <- synGet("syn3191087", downloadLocation="data/tmp")
bio <- synGet("syn21323366", downloadLocation="data/tmp")
meta <- synGet("syn23554294", downloadLocation="data/tmp")
clin = read.csv(clin$path)
bio = read.csv(bio$path)
meta = read.csv(meta$path)
metadata <- merge(meta, bio[, c("individualID", "specimenID")], by = "specimenID")
metadata <- merge(metadata, clin, by = "individualID")
.... do any re-mapping to create a "diagnosis" field as above
rownames(metadata) <- metadata$cell_name
metadata <- metadata[colnames(counts), ]
sobj <- CreateSeuratObject(counts, meta.data = metadata)
head(sobj@meta.data)
```
I hope this works for you. If it doesn't, I would suggest finding someone local who knows R well who can go through line by line with you to see where the error is. Hey Jaclyn,
Okay this is my final question I promise !
I have been stuck for 2 days on how to integrate all of these metadata files for Seurat. This project has ** clinical metadata ** with diagnostic information, sex, genotype, etc as well as **cell metadata** that we specify when making the Seurat object, and finally, the **snRNAseq metadata** about things like batch, barcodes, ID, etc.
I have tried:
- adding metadata before the creation of the Seurat Object by using "substr" to extract barcode information from the cell column, using merge or left_join to join metadata by = ID or barcode.
- adding metadata after creating the Seurat Object by using "AddMetaData"
Each time it has not yet worked for me. Do you know how I might incorporate all these metadata files and get Seurat to recognize it for DE testing which includes sex and genotype as a variable? Truly cannot thank you enough for your help. I feel so close to being able to finally move on with the analysis!! No problem!
I wouldn't do two separate Seurat objects, I'd just load it all into one. The Cain paper already investigated and verified that the cells from different samples/diagnoses don't need to be integrated or batch corrected (and I have played with the data set and can confirm this as well), so you should be able to follow this vignette pretty closely to process the data: https://satijalab.org/seurat/articles/pbmc3k_tutorial, and you can look at the methods section of the Cain paper to see what arguments they used for their PCA/UMAP stuff.
I *think* that all the cells in the processed data set have passed the QC in the paper, but you should check things like mitochondrial genes just to be sure. The authors have provided annotations for each cell type in this file: https://www.synapse.org/Synapse:syn23554294, which you already had to download in the steps I provided above, so you should not need to annotate the cell types yourself. They have both broader types and finer subtypes. You should be able to add this metadata to the cells when you create the Seurat object. This should probably work:
```
library(synapser)
library(Seurat)
... run code above that calls ReadMTX, giving you a counts matrix
metadata <- synGet("syn23554294", downloadLocation="")
... do any fixing of the metadata like diagnosis re-mapping as above
sobj <- CreateSeuratObject(counts, meta.data = metadata)
head(sobj@meta.data)
Idents(sobj) <- sobj$
```
Then follow the vignette for PCA/clustering and the annotations should match pretty closely with the clusters.
For DE analysis, the vignette goes over this a little but as a shortcut you can use:
```
Idents(sobj) <- sobj$diagnosis
markers <- FindAllMarkers(sobj)
```
or
```
Idents(sobj) <- sobj$
markers <- FindAllMarkers(sobj)
```
depending on what you're trying to compare.
I'm not sure how helpful this will be because I had to jump through a lot of hoops on my thesis dataset and there's a ton of extra stuff in here, like integrating data sets and custom functions I wrote, but this is the github repo for the analysis I did for my thesis: https://github.com/jaclynrbeck/TCellsAD2022/
Step03 is probably the most relevant, and I think I might have some useful plotting functions in there as well.
I hope that's helpful!
Jaclyn
Thank you so much again for your help Jaclyn. I'm so sorry to bother you once more, but do you have a good resource that can assist with assigning single cell analysis? I have 2 weeks to finish this project and I am really struggling. While there are tons of resources, youtube videos, etc online, it doesn't help when you are working with a very specific dataset (e.g. human brain data) and struggle to get started with basic steps.
Right now, I am trying to figure out how to annotate my cells and assign these diagnoses to the cells for DE testing. Cell annotation resources (like cell marker, celldex, SingleR, Allen Institute, etc) have been unhelpful so far and/or not specific to the brain. I tried manually annotating these cells instead of spending hours or days trying to troubleshoot how to upload an h5ad file to the AI "Map My Cells" site.
I am also confused about assigning groups for DE testing and when in the analysis pipeline this should happen. In the Satija lab vignettes (https://satijalab.org/seurat/archive/v3.0/immune_alignment.html), they upload their different treatment groups (control vs stimulated cells) into R as completely separate Seurat Objects. At this point, I am not sure if I should start over and create a new Seurat object for each diagnosis (e.g., AD, Non-AD...) or how to test AD cell types against non-AD cell types.
If you have a Github page or tools that helped you as you were new to single cell analyses I would be super grateful. Some Github pages don't include this step in their code or are coding in python and I'm not sure how to implement those steps in R.
Thank you so much again for taking the time to help me! I truly appreciate it. Yep! You'll need two files from the main ROSMAP metadata folder:
Clinical metadata: syn3191087
Biospecimen metadata: syn21323366
Assuming you have used syn23554294 for the cell barcodes/metadata, the `specimenID` in that file should all exist in the `specimenID` field of the biospecimen metadata. This gives you a conversion from `specimenID` -> `individualID`, which you can then use to get patient-level data like sex from the clinical metadata file. I usually just do a merge of barcodes/biospecimen by `specimenID` and merge that combined matrix with the clinical metadata by `individualID`.
There is not a "diagnosis" per se in any of the metadata files. You can either use some combination of `braaksc`, `ceradsc`, `cogdx`, and/or `dcfdx_lv` (see https://www.synapse.org/#!Synapse:syn3191090 for full description of what the numbers in each field means), or you can use what the Cain paper assigned to the samples:
Sample names containing "Cdx1-pAD0" or "Cog1-Path0": Control
"Cog1-pAD1" or "Cog1-Path1": Control w/ Pathology
"Cdx4-pAD0" or "Cog4-Path0": Non-AD Dementia
"Cdx4-pAD1" or "Cog4-Path1": AD
I use the latter for my work, and remapped with this code:
```
dx_defs <- c("Cdx1-pAD0" = "Control",
"Cog1-Path0" = "Control",
"Cdx1-pAD1" = "Control w/ Pathology",
"Cog1-Path1" = "Control w/ Pathology",
"Cdx4-pAD0" = "Non-AD Dementia",
"Cog4-Path0" = "Non-AD Dementia",
"Cdx4-pAD1" = "AD",
"Cog4-Path1" = "AD")
tmp <- as.data.frame(str_split(metadata$specimenID, "-", simplify = TRUE))
tmp <- paste(tmp$V4, tmp$V5, sep = "-")
metadata$diagnosis <- dx_defs[tmp]
```
I hope that helps! Good luck :)
Jaclyn @jaclynbeck do you have advice on how to assign or integrate metadata with these cells? Right now I am unsure how to assign sex and/or diagnosis to these individuals. Thank you so much for your time. @jaclynbeck I cannot thank you enough for taking the time to share this with me! I know this will help many others too who are newer to single nucleus RNA seq analyses. Thank you again!! I had this problem too! Here is what I did to solve it:
install the `synapser` R package: `install.packages("synapser", repos = c("http://ran.synapse.org", "http://cran.fhcrc.org"))`
Then in R:
```
library(synapser)
library(Seurat)
synLogin(authToken="")
barcodes <- synGet("syn23554294", downloadLocation="")
genes <- synGet("syn23554293", downloadLocation="")
mtx <- synGet("syn23554292", downloadLocation="")
counts <- ReadMtx(mtx = mtx$path,
cells = barcodes$path,
features = genes$path,
feature.column = 1,
skip.cell = 1,
skip.feature = 1)
# The column names and genes don't get read in correctly by ReadMtx because
# the input csvs aren't formatted as expected for MTX
colnames(counts) <- str_replace(colnames(counts), ",.*", "")
rownames(counts) <- str_replace(rownames(counts), ".*,", "")
```
Hopefully that works for you. These files aren't formatted exactly right to be read in by `Read10X`.
Jaclyn
Drop files to upload
Nothing is working! 5 weeks to finish before phd defense. Please help. page is loading…