The DREAM Toxicogenetics Challenge is complete!To view the results, please see the Final Scoring for Subchallenge 1 and Final Scoring for subchallenge 2. Leaderboard is closed and the extended training set released! A manuscript (and supplement, see syn1840307) describing the results is under review. The QuestionsThis challenge is designed to build predictive models of cytotoxicity as mediated by exposure to environmental toxicants and drugs. To approach this question, we will provide a dataset containing cytotoxicity estimates as measured in lymphoblastoid cell lines derived from 884 individuals following in vitro exposure to 156 chemical compounds. In subchallenge 1, participants will be asked to model interindividual variability in cytotoxicity based on genomic profiles in order to predict cytotoxicity in unknown individuals. In subchallenge 2, participants will be asked to predict population-level parameters of cytotoxicity across chemicals based on structural attributes of compounds in order to predict median cytotoxicity and mean variance in toxicity for unknown compounds. BackgroundThis challenge represents a groundbreaking new direction for toxicity testing and is intended to help understand how genetic variation affects individual response to exposure to environmental chemicals. For this Challenge, Sage Bionetworks and DREAM are teaming up with scientists at the University of North Carolina (UNC), the National Institutes of Environmental Health Sciences (NIEHS), and the National Center for Advancing Translational Sciences (NCATS). These groups have been working in close partnership to generate the largest ever population-scale toxicity screen in a human in vitro model system that leverages the 1000 Genomes Project. The NIEHS/NCATS/UNC team is providing Challenge participants with access to in vitro cytotoxicity screens for 156 drugs and environmental chemicals measured in lymphoblastoid cell lines derived from 884 participants in the 1000 Genomes Project. These cell lines are derived from individual representing 9 distinct geographic subpopulations across Europe, Africa, Asia, and the Americas. These data are paired with the extensive, publicly available genomic data from these cell lines, including DNA variation profiles by the 1000 Genomes Project and transcriptomic data by the Geuvadis project. The Toxicogenetics Challenge aims to ask a ?crowd? of researchers to use these data to elucidate the extent to which adverse effects (e.g., cytotoxicity) of compounds can be inferred from genomic and/or chemical structure data. The computational models built within this Challenge could be considered in certain decision-making contexts to inform government agencies as to which environmental chemicals and drugs are of the greatest potential concern to human health. The ChallengeParticipants have the opportunity to participate in two distinct subchallenges that use the same data. Participants can choose to participate in one or both subchallenges. Subchallenge 1: Predict interindividual variability in in vitro cytotoxicity based on genomic profiles of individual cell lines. For each compound, participants will be challenged to predict the absolute values and relative ranks of cytotoxicity across a set of unknown cell lines for which genomic data is available. For more information, click here. Subchallenge 2: For each compound, predict the concentration at which median cytotoxicity would occur, as well as inter-individual variation in cytotoxicity, described by the 5-95th%ile range, across the population. Each prediction will be scored based on the participant?s ability to predict these two parameters within a set of compounds excluded from the training set. For more information, click here. Data A detailed description of the data used in this Challenge is available on the data description page. Descriptions of data file content and formats is available on the data file description page. Instructions for accessing the data are available on the Training Data page. Timelines for participationThis Challenge will open on June 10 2013 and will close on September 15 2013 at 12:59 (Pacific Time). All models must be submitted by that time in order to considered for any prize. We will be accepting final submissions for both subchallenges starting in late August. For subchallenge 1, participants have the opportunity to test predictions against a test data set starting on July 24 2013 through our online leaderboard. Submitted predictions will be scored and all scores will be made publicly available on a leaderboard. In this way, participants can gauge the effectiveness of their models and make comparisons relative to other participants. Participants can submit any number of submissions to the leaderboard for Subchallenge 1 on behalf of their team. The data used to score predictions for the leaderboard are distinct from the data that will be used to score final predictions and to select the winner. The data used to score the leaderboard will be released in late August for use by participants in developing their final predictions. More information on scoring criteria and submission procedures is available on the Submitting Predictions page. Incentives The winner of each subchallenge will be invited to speak at the 2013 DREAM conference. This conference will be held on November 8-12 in Toronto, Canada in conjunction with the RECOMB/ISCB Conference on Regulatory and Systems Genomics. Winners will also be provided with an award to cover cost of travel. Nature Biotechnology, the leading journal in the field of Biotechnology and Computational Biology, has agreed to work with Sage Bionetworks and DREAM, enthusiastically supporting the submission to their journal of an overview paper describing an analysis of the results and broadly applicable insights that arise from this NIEHS/NCATS/UNC DREAM Toxicogenetics Challenge. The submitted paper will be rigorously peer reviewed and must be of the quality of other published work into to be published within the Nature Biotechnology journal. As has been the case with previous publications resulting from DREAM challenges, the challenge organizers will invite the best performing team from each subchallenge to co-author the paper. The rest of the participants in the challenge will also be invited to co-sign the paper as part of the NIEHS/NCATS/UNC DREAM consortium. Getting helpCommunity Forum A Community forum page has been developed for Challenge participants and organizers. If you have any questions or comments to make about the Challenge ? for either the organizers or other participants ? this is the place to post them. The Challenge organizers will post answers to all received questions on the forum. In addition, we will use this site to provide additional details about the challenge as they are developed. This will include exciting information about incentives, leaderboard availability and intermediate prizes and competitions. Synapse User Support The Synapse software platform will be used exclusively to distribute data and for submission of predictions within the Challenge. Both of these activities can be performed by interfacing with Synapse through the website (this page) or programmatically using R, python or the command line clients. For information on how to use Synapse, please refer to the Synapse User Guide. To learn more about the programmatic clients, see Getting started with the R synapseClient and Getting started with the Python client for Synapse. Community participationParticipants are encouraged to compete as teams. Team membership is not fixed and can fluctuate throughout the Challenge. You will be asked to provide a team name when you submit a prediction to be scored by Challenge organizers. There will be a limitation on the number of submissions allowed per team. You may participate on more than one team but every team must have a different team membership. CreditsData was generated by: This Challenge is being run by: INSTRUCTIONS FOR GETTING STARTED ARE AVAILABLE ON THE TRAINING DATA PAGE. Instructions for submitting predictions are available on the submissions page. All updates to Challenge information are listed here.
Accessing Training Data
Gold Standard - Test Data