Hi,
There are several hundred duplicated specimenID records in the ROSMAP RNAseq metadata table (https://www.synapse.org/#!Synapse:syn21088596). The duplications have similar but not identical information. E.g. two records for "01_120405"
specimenID | platform | RIN | rnaBatch | libraryBatch | sequencingBatch | libraryPrep | libraryPreparationMethod | isStranded | readStrandOrigin | runType | readLength
01_120405 | |7.7 |NA | 2 | nan | polyAselection | nan | TRUE | nan | pairedEnd | 101
01_120405 | | 7.7 |NA | 2 | 2 | polyAselection | | TRUE | | pairedEnd | 101
Can you please recreate a cleaner version?
Thanks,
Minghui
Created by Minghui Wang minghui.wang Hi @nicole.kauer @Mette
Duplicated records in ROSMAP_assay_RNAseq_metadata.csv
I think I have download the updated metadata,but I met the same question.
The download command is synapse get syn21088596 ,and Synapse Client version is 1.9.4
Downloaded syn21088596 file md5 value?f4cbf24f60de2794c043e01d7b1c3bd2? is same with the md5 value provided by the datasets.
Thanks for your reply!
@zhangliandong, downloading the file without specifying the version should give you the latest version of the file. The clients do have ways to download different versions of files, which can be seen in the documentation ([python](https://python-docs.synapse.org/build/html/index.html#synapseclient.Synapse.get), [R](https://r-docs.synapse.org/reference/synGet.html)).
As for the issue mentioned in this thread of duplicated data, this is currently being fixed. Please join the team mentioned above to get updates. Hi @nicole.kauer
ROSMAP RNAseq metadata table (https://www.synapse.org/#!Synapse:syn21088596) has modified in 10/08/2020 8:24 PM
I have downloaded this file in 09/09/2020.But I find the md5 value is same numbers.
How to get the newest files? downloaded it again?
Thanks for your reply! @jgockley, thanks! We are working on cleaning up the metadata.
@zhangliandong, if you join [the AMPAD_DataReleaseUpdates team](https://www.synapse.org/#!Team:3372003), you will get a notification when we release new data, including the updates to ROSMAP metadata. @nicole.kauer & @Mette
The Duplicate issue is back:
```
foo<-read.csv( synapser::synGet('syn21088596')$path, header=T, stringsAsFactors = F)
foo[ as.character(foo$specimenID) %in% '492_120515', ]
specimenID platform RIN rnaBatch libraryBatch sequencingBatch libraryPrep libraryPreparationMethod isStranded
492_120515 9.100000 NA 0 nan polyAselection nan True nan pairedEnd
492_120515 9.100000 NA 6 nan polyAselection nan True nan pairedEnd
492_120515 9.140738 NA 7 nan polyAselection nan True nan pairedEnd
492_120515 9.100000 NA 0 0 polyAselection True pairedEnd
492_120515 9.100000 NA 6 6 polyAselection True pairedEnd
492_120515 9.140738 NA 7 7 polyAselection True pairedEnd
```
@zhangliandong has a documented issue here: https://www.synapse.org/#!Synapse:syn2580853/discussion/threadId=6976 @minghui.wang The metadata file has been updated to have no duplicates. Thank you for bringing it to our attention! @minghui.wang Thanks for the heads up. We will look into it
Drop files to upload
Duplicated records in ROSMAP_assay_RNAseq_metadata.csv page is loading…