Hi all! I tried to find tools to help download pGWAS data from the UKB‑PPP project but could not find any suitable solutions. In addition, there is a large amount of data to download and process, while in the end one usually keeps only a tiny fraction of it—namely significant QTLs above a given LOG10P threshold. I have therefore created a Python package `UKBPPP-DL` for easy, robust, and memory‑efficient downloading of pGWAS data from UKB-PPP, with the possibility to filter on a given −log10(P) threshold on the fly. I hope this package can help fellow scientists and avoid having each of us reinvent the wheel. If you spot any problems or think additional features would be useful, please don’t hesitate to contact me. You can check the [Github repository](https://github.com/nglm/ukbppp_dl) and [documentation](https://ukbppp-dl.readthedocs.io/en/latest/index.html#) for more information. Natacha Galmiche ### In short: #### Installation ```bash pip install ukbppp-dl ``` #### Usage ```python from ukbppp_dl.pgwas import keep_significant_qtls_from_region, PGWAS_REGIONS # Synapse directory containing pQTL summary statistics (here for Europe) REGION = PGWAS_REGIONS["European"] # Significance threshold for pQTLs (LOG10P > 7 corresponds to p-value < 1e-7) LOG10P_THRESHOLD = 7 # Whether to create a log file # (0: no log file, >0: create different levels of log files) CREATE_LOG = 2 # Whether to have an output text describing the function's run # (0: no text, >0: create different levels of verbosity) VERBOSE = 3 # set to a list of protein tar file names or synapse IDs if you want to process only specific proteins # PROTEIN_TO_PROCESS = ["ACOT13_Q9NPJ3_OID31522_v1_Oncology_II.tar", "syn52363271"] # otherwise set to None to process all proteins in the region PROTEIN_TO_PROCESS = None all_significant_qtls, log_reg = keep_significant_qtls_from_region( synapse_folder_id=REGION, download_location="./data", res_location="./results", log10p_threshold=LOG10P_THRESHOLD, create_log=CREATE_LOG, verbose=VERBOSE, delete_downloaded_tar=True, delete_chr_csv=True, protein_to_process=PROTEIN_TO_PROCESS, delete_tar_csv=False, delete_tar_log=False, delete_partial_logs=False, delete_partial_outputs=False, ) ```

Created by Natacha Galmiche Natacha

.sg-noscript { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif; max-width: 860px; margin: 40px auto; padding: 0 24px; color: #141414; line-height: 1.6; } .sg-noscript h1 { font-size: 1.8rem; margin-bottom: 0.25rem; } .sg-noscript h2 { font-size: 1.2rem; margin-top: 2rem; margin-bottom: 0.5rem; border-bottom: 1px solid #e0e0e0; padding-bottom: 0.25rem; } .sg-noscript ul { padding-left: 1.5rem; } .sg-noscript li { margin-bottom: 0.4rem; } .sg-noscript a { color: #1a6fa8; } .sg-noscript address { font-style: normal; } .sg-noscript .note { margin-top: 2rem; color: #666; font-size: 0.85rem; }

Synapse — A Collaborative Platform for Open Biomedical Science

Synapse is a collaborative data-sharing and analysis platform built and operated by Sage Bionetworks, a 501(c)(3) nonprofit biomedical research organization based in Seattle, Washington.

About Sage Bionetworks

Sage Bionetworks is a nonprofit research organization whose mission is to drive a new age of discovery through truly open science and radical collaboration.

Our vision is to create a world where silos within and across science and technology no longer exist, forging a path to optimal human health.

We are a trusted leader in data sharing and reuse, enabling a rapid acceleration in biomedical discoveries and the transformation of medicine. Better Science Together is the principle that guides our work with researchers, clinicians, patient communities, and funders worldwide.

What Synapse Does

Synapse is the platform Sage Bionetworks uses to make biomedical research data findable, accessible, interoperable, and reusable (FAIR). Researchers, clinicians, and data scientists use Synapse to:

Share large biomedical datasets across institutions, with appropriate access controls, data-use agreements, and governance.
Run reproducible analyses on shared data with documented provenance.
Coordinate consortium science across disease areas including Alzheimer's disease, neurofibromatosis, ALS, rare cancers, and others.
Power public-facing knowledge portals such as the AD Knowledge Portal, the NF Data Portal, and the ALS Knowledge Portal.

Nonprofit Identity

Sage Bionetworks
A 501(c)(3) nonprofit research organization
EIN: 26-4489946
Seattle, Washington, USA
sagebionetworks.org
Trust Center — Terms of Service, Privacy Policy, financial statements, and governance documents

Learn More

This static content is provided for search engines and users with JavaScript disabled. For the full Synapse experience, please enable JavaScript in your browser.

Drop files to upload

UKBPPP-DL: Easy, robust, and memory‑efficient download of significant pQTL data from PGWAS summary statistics page is loading…