Overlap of samples with WGS and RNAseq in AMP-AD Diverse Cohorts

Hello, I was exploring the data in the diverse cohorts project ([syn51732482](syn51732482)) and noticed that not all samples that have RNAseq data also have WGS data and vice-versa. Looking at the biospecimen metadata ([syn51757645](syn51757645)), this is what I see: ``` > divco_biospec_meta <- read.csv("AMP-AD_DiverseCohorts_biospecimen_metadata.csv") > rnaSeq_iids <- unique(divco_biospec_meta[divco_biospec_meta$assay == "rnaSeq", ]$individualID) > wgs_iids <- unique(divco_biospec_meta[divco_biospec_meta$assay == "wholeGenomeSeq", ]$individualID) > length(wgs_iids) [1] 761 > length(rnaSeq_iids) [1] 844 > length(intersect(wgs_iids, rnaSeq_iids)) [1] 606 > length(setdiff(wgs_iids, rnaSeq_iids)) [1] 155 > length(setdiff(rnaSeq_iids, wgs_iids)) [1] 238 ``` I was wondering if I am missing something or doing something wrong in checking the overlap of samples with both assays, or if this is the case. Thanks in advance. Best regards, Upamanyu

Created by Upamanyu Ghose ug96
hello @ug96, Apologies for the very late reply. You are correct that there is not a perfect overlap of WGS and RNASeq donors in this study. Your numbers align with the published number of donors and content in the metadata, but I will double check with the raw fastq/bam files and let you know if I find any discrepancies. As for the joint calling effort, we unfortunately do not currently have the compute budget to reprocess all of the AMP 1.0 WGS data with an updated reference genome, so I am not surprised by the poor intersection. I am considering performing a genome liftover as a short term solution to make the old WGS harmonization data more useful, but don't yet have a concrete timeline for releasing this. Best, Will
Hello, I was wondering if someone is able to help me with the question in the previous message. Additionally, is there a planned analysis or release of jointly called WGS variants from the DivCo\_HS [syn68898203](syn68898203) and WGS\_Harmonization [syn22264775](syn22264775) studies? I want to merge the VCFs from the two, but the intersection between the common variants (MAF > 0.05) in the VCFs from "Diverse Cohorts + AMP 1.0" [syn68898203](syn68898203) and the WGS\_Harmonization [syn11707420](syn11707420) is quite small and I am not sure if it would be correct to merge the VCFs with such a small intersection. ## VCF Normalisation and MAF filter applied to both VCFs ```bash bcftools norm \ --no-version \ -m- \ -Ob \ --threads 4 \ -o "$SPLIT_VCF" \ "$INPUT_VCF" bcftools index "$SPLIT_VCF" echo echo "=========================" echo " Importing VCF into PLINK format..." echo "=========================" plink2 \ --bcf "$SPLIT_VCF" \ --set-all-var-ids "chr@_#_\$r_\$a_b38" \ --new-id-max-allele-len 1000 \ --make-pgen \ --out "${TEMP_DIR}/${BASE_NAME}" echo echo "=========================" echo " Applying MAF and HWE filters..." echo "=========================" plink2 --pfile "${TEMP_DIR}/${BASE_NAME}" \ --maf 0.05 \ --snps-only \ --rm-dup exclude-all \ --make-pgen \ --out "${TEMP_DIR}/${BASE_NAME}_filtered" plink2 --pfile "${TEMP_DIR}/${BASE_NAME}_filtered" \ --hwe 1e-6 midp \ --make-pgen \ --out "${TEMP_DIR}/${BASE_NAME}_hwe_filtered" plink2 --pfile "${TEMP_DIR}/${BASE_NAME}_hwe_filtered" \ --output-chr chrM \ --export bcf \ --out $OUTPUT_VCF ``` ## VCF intersection between DivCo\_HS and WGS\_Harmonization ```bash bcftools isec \ "$vcf1" "$vcf2" \ -p "$tmpdir/isec" VCF1_ONLY=$(bcftools view -H "$tmpdir/isec/0000.vcf" 2>/dev/null | wc -l) VCF2_ONLY=$(bcftools view -H "$tmpdir/isec/0001.vcf" 2>/dev/null | wc -l) SHARED=$(bcftools view -H "$tmpdir/isec/0002.vcf" 2>/dev/null | wc -l) TOTAL_VCF1=$(( VCF1_ONLY + SHARED )) TOTAL_VCF2=$(( VCF2_ONLY + SHARED )) OVERLAP_PCT=$(awk "BEGIN {printf \"%.2f\", 100*$SHARED/$TOTAL_VCF1}") echo "Overlap Results:" echo " VCF1 unique: $VCF1_ONLY" echo " VCF2 unique: $VCF2_ONLY" echo " Shared: $SHARED" echo " Overlap %: ${OVERLAP_PCT}%" ``` ### Sample output for chromosome 1: > Overlap Results: > VCF1 unique: 174922 > VCF2 unique: 41569 > Shared: 367566 > Overlap %: 67.76% Thank you. Best, Upamanyu

Your web browser must have JavaScript enabled in order for this application to display correctly.
If you are an automated web crawler from a search engine, follow this AJAX application crawl link

Drop files to upload

Overlap of samples with WGS and RNAseq in AMP-AD Diverse Cohorts page is loading…