HPN-DREAM breast cancer network inference challenge

Created By Laura Heiser lmheiser
HPN-DREAM breast cancer network inference challenge Should you have any questions related to the challenge, please visit our Community Forum. Should you require help about Synapse itself, please see the dedicated Synapse Help Page. The competitive phase of the challenge has ended. We are now receiving predictions for the collaborative round. We hope that teams in this phase will collaborate, building from what was learned in the competitive phase. Please see the competitive phase leaderboards for links to code and algorithms. Results of the submissions to the collaborative phase are published every other week in the "Collaborative Leaderboards". We would like to remind all participants of the terms of use every participant agreed to prior to gaining access to the data in the HPN-DREAM challenge The data are provided for use either within the HPN Challenge or within independent research projects. However, publication or presentation of analyses resultant from these data are embargoed until such time as (1) the results from the HPN Challenge and the best performing strategies used in that Challenge have been published and (2) the data generators have published their own analysis. You will be contacted by email through your Synapse-affiliated email address when these conditions have been met, which is expected to occur in Spring 2014. This information will also be posted within this Synapse project. Publication and presentation of analysis using these data is permitted after that date. However you may use the data and include it in grant submissions: for your grant submission citations, please use the following: "These data were provided by Laura Heiser and Joe Gray from Oregon Health Sciences University and were obtained through Synapse as part of the HPN-DREAM Breast Cancer Network Inference Challenge (syn1720047)". You may share these data with collaborators within your Institution but, in doing so, you are responsible for assuring that all subsequence recipients comply with these terms as well as the Synapse Terms and Conditions. Updates October 23, 2013: We are excited to announce the launch of the collaborative phase of the HPN-DREAM Challenge. The goal of the collaborative round is to foster interaction between challenge participants and further advancement of predictive algorithms through discussions on the forum, code sharing, and development of hybrid models. To that end, participants have been encouraged to share their code through the Synapse system, which can be accessed through the ?hot links? on the final leaderboard tables for Subchallenges 1 and 2. The challenge will run from October 23, 2013 through mid-January 2014. The leaderboard will be updated bi-weekly. Predictions uploaded to the Synapse evaluation site by Monday, November 4 at 11:59pm Pacific time will have the chance to appear on the first leaderboard, scheduled for November 6. If you have not yet shared the code associated with your final HPN-DREAM submission to Synapse, we encourage you to do so now. Directions are available here: https://www.synapse.org/#!Synapse:syn1977299 October 7: The polls are open for voting in Subchallenge 3!!We will use crowd-based peer-review to identify the most promising ideas for visualizing high-dimensional timecourse data like the type used in this challenge. Voting closes Friday, October 11, 2013 at 5pm Pacific time. Please see the Subchallenge 3 page for more details. September 23: Instructions for submitting codeThe process for depositing your code can be found on the following Synapse page: Instructions for Submitting Code to Synapse. Please note that in order to be eligible for a prize, your final entry must include submission of the underlying code into Synapse. Code should be deposited into Synapse by Thursday, September 26 at 5pm Pacific time. September 9: Write-up templates for available to download.Please see the "Files" section at the bottom of this page. September 3: Collaborative Bonus RoundAfter the close of the HPN-DREAM challenge, there will be a four-week ?bonus round? focused on collaborative model sharing. The goal of this challenge is two-fold: to rapidly improve models of cell signaling, and to foster community building. In this round, participants will openly share their models and the underlying code with other HPN-DREAM participants. Participants will then be able to learn about alternative methods and approaches for solving the questions laid out in the Network Inference and Timecourse Challenges, and use this information to improve their own models, or develop hybrid models that draw on code from multiple teams. The HPN-DREAM Collaborative Sharing round will run from September 15 through October 16. Models will be scored using the methods developed in the main HPN-DREAM challenge, and feedback will be provided through a weekly leaderboard. September 3: Information on final submissionQ1. How many final submissions are allowed? For Subchallenges 1A, 1B, 2A, and 2B, we will accept one submission per team. For Subchallenge 3, we will accept 2 submissions per team. Q2. What are the requirements for the final submission? The final submission should use the same file formatting structure that has been used in the Leaderboard submissions. The only difference is that the write-up must be complete for the final submission. We will soon post a write-up template for participants to use. Q3. What are the requirements for winning a prize? To qualify for prizes, participants must submit code that will become publically available as described in the Challenge Rules to which all participants agreed. This requirement is in the spirit of collaboration of this Open Challenge, and also to establish reproducibility for publication. Q4. How do I submit my code? The code used to generate the results for each subchallenge should be submitted as an additional zip file. Please note that the code should: (1) generate the submitted outcome from the corresponding challenge data, (2) be annotated with clear input-output help menus, (3) state the required platform for execution. These zip files should adhere to the following naming scheme: Subchallenge 1A: TeamName-Network-Code.zip Subchallenge 1B: TeamName-Network-Insilico-Code.zip Subchallenge 2A: TeamName-Prediction-Code.zip Subchallenge 2B: TeamName-Prediction-Insilico-Code.zip Q5. When will the final leaderboard update take place? The final leaderboard will be updated on Wednesday, September 11. Please remember that in order to appear on the leaderboard, submissions must be received by Monday, September 9 at 11.59pm Pacific time. Q6. What is the deadline for final submission? The final submission must be loaded into Synapse by Monday, September 16 at 11.59pm Pacific time. August 12: Nature Methods has agreed to the submission of an overview paperNature Methods (NM), the leading journal in the field of Science Methodology, has agreed to the submission to NM of an overview paper describing an analysis of the results and broadly applicable insights that arise from the HPN-DREAM Breast Cancer Network Inference Challenge. July 27: The Leaderboards are open for HPN-DREAM Subchallenge 1! The leaderboards for Subchallenges 1A and 1B are open for submissions. Entries will be scored, and results will be emailed back to participants. The leaderboard will be updated with team rankings on Wednesday, July 31, 2013. The submission deadline for appearance on this week?s leaderboard is Monday, July 29, 2013, 11:59pm (Pacific Time). The leaderboards will be updated weekly for the remainder of the Challenge season. Each team is permitted one entry to each leaderboard per week. Additional entries submitted in any given week will not be accepted. The first 3 teams that score 2 standard deviations above the null model will each be awarded a $300 cash prize, generously donated by Heritage Provider Network. July 22: Webinar video posted to YouTubeThanks to all who attended Friday's webinar. We have now made a video recording of it available on YouTube: http://www.youtube.com/watch?v=g6ll05TVOKM&feature=em-upload_owner July 16: Update on PRKCDThere is an annotation error in the main and full datafiles for the protein PRKCD. This error is present for the three cell lines that contain this protein (BT549, BT20, UACC812). Please note that the underlying data are correct, and that these data represent abundances of the phosphorylated form of PRKCD. In the MIDAS files: DV:PKC-delta should be annotated DV:PKC-delta_pS664. In the CSV files: PKC-delta should be annotated PKC-delta_pS664, and PRKCD should be annotated PRKCD_pS664. In the AbNames_CellLines.csv file: the antibody PKC-delta should be annotated PKC-delta_pS664, and PRKCD_pS664. The first leaderboard will open the week of July 15In order to appear on the leaderboard, models must be submitted for evaluation by July 8. The most recent submission from each team will be scored. June 20: HPN has provided $50,000 to be awarded as prizes!We are announcing the cash awards for the best performing teams (see Incentives section below for details). Funds for these awards have been generously donated by the Heritage Provider Network to promote progress in cancer research. Please see the YouTube video: HPN-DREAM Prize Announced June 20: New information on the experimental datasetNew validation experiments indicate that two of the phospho antibodies included in Main Data are of low quality. In light of this information, we will not include these phosphoproteins in the scoring of Sub-challenges 1A or 2A. Please see Additional Data Details for more information. Please do not use these data in your analyses, and exclude these nodes from your network model submissions. The phosphoproteins in question are: Antibody HUGO Cell Lines TAZ_pS89 WWTR1_pS89 MCF7,BT549,BT20,UACC812 FOXO3a_pS318_S321 FOXO3_pS318_pS321 MCF7,BT20 SynopsisThe overall goal of the Heritage-DREAM breast cancer network inference challenge is to quickly and effectively advance our ability to infer causal signaling networks and predict protein phosphorylation dynamics in cancer. We provide extensive training data from experiments on four breast cancer cell lines stimulated with various ligands. The data comprise protein abundance time-courses under inhibitor perturbations. We propose three specific sub-challenges:(1) Network Inference Participants are asked to infer causal signaling networks from training data. (2) Time-course Prediction Participants are asked to use the training data to build models that can predict trajectories of protein levels following inhibitor perturbation(s) not seen in the training data. (3) Visualization Participants are asked to design a visualization strategy for high-dimensional molecular time-course data sets such as the ones used in this challenge. In sub-challenges (1) and (2), we also provide a parallel set of challenges based on in silico data. BackgroundCells respond to their environment by activating signaling networks that trigger processes such as growth, survival, apoptosis (cell death), and migration. Post-translational modifications, notably phosphorylation, play a key role in signaling. In cancer cells, signaling networks frequently become compromised, leading to abnormal behaviors and responses to external stimuli. Many current and emerging cancer treatments are designed to block nodes in signaling networks, thereby altering signalling cascades. Although there is a wealth of literature describing canonical cell signaling networks, little is known about exactly how these networks operate in different cancer cells. Advancing our understanding of how these networks are deregulated across cancer cells will ultimately lead to more effective treatment strategies for patients. MotivationThis challenge is motivated by the following observations: Causal signaling links and system dynamics can vary depending on lineage and (epi)genetic background, such that the same perturbation can lead to different signaling responses in different backgrounds. There is an urgent need for computational approaches that can characterize causal signaling networks using data acquired in a specific background or context of interest, for example a specific cell line under defined culture conditions. There is also a need to address the related task of predicting dynamical trajectories in specific contexts and under specific perturbations. Despite advances in this field, inference of causal networks in mammalian biology remains challenging. Equally, building dynamical models that can generalize beyond training data to predict trajectories under unseen system perturbations remains highly non-trivial. The set of challenges we propose, based on experimental and in silico data, are designed to assess ability to learn causal signaling networks, predict dynamical trajectories, and visualize complex time-course data. DataParticipants will be provided with an extensive training dataset comprised of proteomics time-courses from four breast cancer cell lines, acquired under different ligand stimuli, and under inhibition of network nodes, as well as an in silico dataset with similar characteristics. The data are explained in detail on the Data Description page, and the structure and content of the files is described on the Data Files page (the files can also be downloaded from this page). ChallengesThe challenge consists of the following three sub-challenges (click on the links for detailed descriptions): 1) Sub-challenge 1: Network Inference The aim is to infer causal signaling networks using time-course data with perturbations on network nodes. This sub-challenge is split into two independent parts: A - Breast cancer proteomic data. B - In silico data. 2) Sub-challenge 2: Time-course Prediction The aim is to build dynamical models that can predict trajectories of phospho-proteins. An important emphasis is on the ability of models to generalize beyond the training data by predicting trajectories under perturbations not seen in the training data. This sub-challenge is split into two independent parts: A - Breast cancer proteomic data. B - In silico data. 3) Sub-challenge 3: Visualization The aim is to propose novel strategies to visualize these high-dimensional molecular time-course data. For sub-challenges (1) and (2), a complete submission should include solutions to both parts A and B, but for the purpose of feedback to participants, performance on the two parts will be shown separately on the leaderboard. Both sub-challenges address the same question, but in one case with experimentally derived data, and in the other with data generated from a computational model. Participants are allowed to use any other source of information to solve the challenges, including (but not limited to) known signaling biology and information regarding the specific cell lines. Participants may find the following resources useful: KEGG, BioCarta, and Science Cell Signaling. AssessmentNetworks and predictions will be rigorously assessed using unseen test data, and in the case of in silico sub-challenges tested against gold-standard networks or trajectories. IncentivesIncentives for Sub-challenge 1: $15,000 to the top-performing team (provided by HPN) Development of the winning method as a Cytoscape Cyni App. The development will be contributed by B. Schwikowski's group in the context of The National Resource for Network Biology (PI: Trey Ideker) Invitation to present results at the 2013 RECOMB/ISCB Regulatory and Systems Genomics/DREAM Conference. Incentives for Sub-challenge 2: $15,000 to the top-performing team (provided by HPN) Invitation to present results at the 2013 RECOMB/ISCB Regulatory and Systems Genomics/DREAM Conference. Incentives for Sub-challenge 3: $5,000 to the top-performing team (provided by HPN) Implementation of the concept (provided by SageBionetworks) Invitation to present at the 2013 RECOMB/ISCB Regulatory and Systems Genomics/DREAM Conference. Travel support HPN has also donated $15,000 to help fund the travel of top scoring teams to attend the 2013 RECOMB/ISCB Regulatory and Systems Genomics/DREAM Conference (November 8-12 in Toronto, Canada) where the results and winners will be announced. Manuscript Nature Methods has agreed to consider for publication the submission of an overview paper describing the results and insights that arise from the HPN-DREAM Breast Cancer Network Inference Challenge. The challenge organizers will invite the best performing team to co-author the paper. The rest of the participants in the challenge will also be invited to co-sign the paper as part of the HPN-DREAM consortium. Publication is contingent on the outcome of the standard peer review process, embracing the ideas behind a blind challenge. CreditsThe NCI Division of Cancer Biology funded the generation of the experimental RPPA data in this challenge via an Integrative Cancer Biology CCSB grant to Gray/Spellman/Mukherjee/Mills, and these groups made the data available for this challenge. The Spellman and Gray labs at Oregon Health and Science University (OHSU) carried out the cell line experiments used for the challenge. The Mills lab at MD Anderson Cancer Center generated the proteomic data on their RPPA platform. The Mukherjee lab at the Netherlands Cancer Institute (NKI) led the analyses underlying formulation of the experimental data challenges. The Koeppl lab (ETH) led the development of the in silico challenges. The challenge organization, development and formulation was a tight collaboration of several DREAMers, including: Laura Heiser (OHSU), Heinz Koeppl and Michael Unger (ETH), Sach Mukherjee and Steven Hill (NKI), Thea Norman, Bruce Hoff, Jay Hodgson, and Mike Kellen (Sage Bionetworks), Julio Saez-Rodriguez and Thomas Cokelaer (EBI), all under the leadership of Gustavo Stolovitzky (IBM). Trey Ideker and Benno Schwikowski of The National Resource for Network Biology, will provide support to develop the best performing network inference method into a Cytoscape Cyni App. The Heritage Provider Network generously donated the funds for the challenge awards and for the logistics and organization of the challenge. ReferencesThe cell lines have been well-described in the following manuscripts: Neve et al. 2006, Cancer Cell Heiser et al. 2012, PNAS DataRail and the MIDAS format are described in the following manuscript: Saez-Rodriguez et al. 2008, Bioinformatics List of linksAt the top of the page you can find links to pages that describe the data, describe and provide the data files, and describe each sub-challenge. Should you have any questions related to the challenge, please visit our Community Forum. Should you require help about Synapse itself, please see the dedicated Synapse Help Page.