hello,
Would you like to provide the genotype data (https://www.synapse.org/#!Synapse:syn25671158)?
Yes, you have shared the vcf file. however, the # of SNPs is less than what you described.
Best,
Created by Benxia Hu BXH @BxChang the easiest way is to use plink or plink2 (https://www.cog-genomics.org/plink/1.9/):
to convert the binary format (need all three files: .bed, .bim, and .fam) as in this case:
PLINK_forSynapse_noCK_109ind.bed
PLINK_forSynapse_noCK_109ind.bim
PLINK_forSynapse_noCK_109ind.fam
run:
plink --bfile PLINK_forSynapse_noCK_109ind --recode --out PLINK_forSynapse_noCK_109ind_text
The output will be two files.
PLINK_forSynapse_noCK_109ind_text.ped
PLINK_forSynapse_noCK_109ind_text.map.
,ped file is a flat text file the genotype info (and the basic sample info, the same as in .fam file)
.map is a variant information file similar to .bim.
The standard PLINK .ped file will be about 10x times larger than a binary file.
Roman. hello,
thanks a lot.
I downloaded PLINK_forSynapse_noCK_109ind .bed, but it is not a text format. Would you like to tell me how to open PLINK_forSynapse_noCK_109ind.bed?
Best,
@BxChang Good point. The vcf file (syn28284263, syn28284267) includes only 437,543 directly genotyped SNPs which were extracted from the dataset after additional filtering steps similar to the ones used for the analyses.
The PLINK filess PLINK_forSynapse_noCK_109ind (syn26254707, syn26254710, syn26254711) include all 5,723,699 SNPs actually used in the analyses, including the imputed SNPs, after the MAF and filtering steps as utilized for creation of the vcf files above.
If you need all 571,496 SNPs directly genotyped SNPs, I can get those, but it will take some time as the data needs to be recovered from archive. If we start from all 571,496 SNPs and apply the same QC and MAF filetring as applied for all QTL analyses around 437,543 will be retained, Thanks a lot, Roman. thanks a lot Thanks for clarifying -- I tagged the data contributor above so hopefully he is able to help answer your question. Yes, the number of SNPs in the file is less than the number in the study methods description, Hi @BXH , do you mean the number of SNPs in the file is less than the number in the study methods description, "571,496 SNPs retained after QC"?
@romanko Can you help with this?