Hi,
Quick question on the diversity cohorts VCFs. The metadata shows that there should be 604 WGS samples, but in syn51732523 there are only SNP VCFs for 591 indv. Specifically it looks like the missing specimen IDs are;
R2310801
R2814322
R3928849
R5143011
R5422277
R5597995
R6163593
R6619132
R6898471
R7829110
R8705042
R9047269
R9609047
R9693165
Were these filtered out due to low quality?
Thanks!
Created by Jake Gockley jgockley Thanks @abby.vanderlinden ! Hey @jgockley , just wanted to let you know that the most recent version of all metadata files for the Diverse Cohorts study has been filtered to just the donors/samples for which we have sequencing data currently available. Sorry for the previous confusion! Thanks @jaclynbeck ! It looks like the missing samples are missing because they do not have complete metadata yet (for example, they might be missing some biospecimen data so the data files were not added yet). Additional sample data may be added if/when data curators are able to get full information for them. For now, you can ignore any sample info that does not have matching data files. Thank you @jaclynbeck Hello! Apologies for the delay. I'm still looking into this to try and get an answer for you. In the mean time, I did notice that the [WGS assay metadata](https://www.synapse.org/#!Synapse:syn51757644) also has 591 individuals instead of 604. My best guess right now is that either the "missing" samples were low quality or were an obvious X/Y chromosome mismatch. I'll try and get a more concrete answer for you soon!
Jaclyn