NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge

Created By Lara Mangravite LaraMangravite
We would like to remind all participants of the terms of use every participant agreed to prior to gaining access to the data in the NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge. Cytotoxicity Data Terms of Use The data are provided ONLY for participation within the Toxicogenetics Challenge. Data were contributed jointly by NIEHS, NCATS, and UNC investigators. Publication of analyses resultant from these data are embargoed until such time as the results from the Toxicogenetics Challenge and the best performing strategies used in that Challenge have been published. You will be contacted by email through your Synapse-affiliated email address when the publication embargo is lifted, which is expected to occur in Spring 2014. This information will also be posted within this Synapse project. Publication of analysis using these data is permitted after that date. However you may include analyses of these data in presentations and grant submissions at any time. For your grant submission, please cite the data in the following manner: These data were provided by investigators at NIEHS, NCATS, and UNC, and were obtained through Synapse as part of the NIEHS-NCATS-UNC DREAM Toxicogenetics Challenge (syn1761567). By downloading the data, you agreed only to use these data within the context of the Toxicogenetics Challenge and to observe the terms of the publication embargo. You also agreed to send a copy of any published manuscript that uses these data to the data generators care of Rebecca Boyles in the Office of Scientific Information Management at the National Institute of Environmental Health Sciences at: rebecca.boyles(at)nih.gov. You may share these data with collaborators within your Institution but, in doing so, you are responsible for assuring that all subsequence recipients comply with these terms as well as the Synapse Terms and Conditions. Genotype Data Terms of Use Genotypes from the 1000 Genomes project are made available for the Challenge under the following terms of use. Challenge participants agree not to publish any analysis that use these data until after the Challenge has completed and the winning models have been published. In addition, a portion of the genotyped samples remain under a publication embargo by the 1000 Genomes Project, for studies of genetic variation and related population differences. Until the data generators can publish their first publication, you may only used the data for participation within the Toxicogenetics Challenge. They also agree only to publish finding pertaining to the prediction of variation in cytotoxicity response. In particular, studies of patterns of rare genetic variation, population genetic differences (other than summaries relevant to cytotoxicity), and genomic linkage disequilibrium patterns, remain subject to the embargo. Publications using the provided Challenge genotype data should cite the 1000 Genomes Phase publication (Nature 491, 56-65). Challenge organizers will notify you through your synapse-affiliated email address when this embargo is lifted. RNAseq Data Terms of Use The RNAseq data provided here was generated as part of the E-GEUV-1 RNA-Seq dataset (http://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/) and is under publication embargo until late in 2013. Challenge participants are permitted to use these data provided that they agree to cite the primary publication by the original data generators, once this have been published, in all future publications using the data. Participants agree to forgo submission of publications using these data until after the close of the Challenge in November 2013, and to only use these data to analyze and/or report models of cytotoxicity prediction. Specifically, studies of RNA variation, genotype-expression relationships, and population differences in expression remain are not permitted until after the data generators have published. Challenge organizers will provide an update on the appropriate citation and the embargo expiration as these become available to you using your synapse-associated email address. Scoring of final submissions is complete!To view the results, please see the Final Scoring for Subchallenge 1 and Final Scoring for subchallenge 2. Leaderboard is closed and the extended training set released!The extended training set was released through Synapse on August 30th upon the closure of the leaderboard. Note that you will have to sign the data access agreement affiliated with these files online before you can download it. There are two files. ToxChallengeCytotoxicityDataTrainSubchal1Extended.txt contains a 620 x 106 variable matrix with the EC10 values for the combination of the original training set + the leaderboard test set. This can be accessed at syn2186574 ToxChallengeCytotoxicityDataTrainSubchal2Extended.txt contains a 106 x 3 variable matrix containing the median, 5th quartile and 95th quartile as defined using these 620 samples. This can be accessed at syn2186866 Final submissions for both Subchallenge 1 and Subchallenge 2 will be accepted from September 3rd to Septmber 15th. For more details, please see the submissions page. The Leaderboard for Subchallenge 1 has been updated with new scoring metrics! Scoring metrics provided are: Mean ranking based on the RMSE overall RMSE mean ranking based on the Coefficient of determination Mean coefficient of determination Mean ranking based on Pearson Correlation Mean Pearson correlation The leaderboard will close on August 30th at noon (Pacific). $250 cash awards will be awarded to the top two teams with the highest mean rank as determined by RMSE and by Pearson Correlation (four total prizes). Scores for predictions submitted to the Subchallenge 1 Leaderboard will continue to be available for viewing on the real-time leaderboard page. All updates to Challenge information are listed here. This challenge is designed to build predictive models of cytotoxicity as mediated by exposure to environmental toxicants and drugs. To approach this question, we will provide a dataset containing cytotoxicity estimates as measured in lymphoblastoid cell lines derived from 884 individuals following in vitro exposure to 156 chemical compounds. In subchallenge 1, participants will be asked to model interindividual variability in cytotoxicity based on genomic profiles in order to predict cytotoxicity in unknown individuals. In subchallenge 2, participants will be asked to predict population-level parameters of cytotoxicity across chemicals based on structural attributes of compounds in order to predict median cytotoxicity and mean variance in toxicity for unknown compounds. BackgroundThis challenge represents a groundbreaking new direction for toxicity testing and is intended to help understand how genetic variation affects individual response to exposure to environmental chemicals. For this Challenge, Sage Bionetworks and DREAM are teaming up with scientists at the University of North Carolina (UNC), the National Institutes of Environmental Health Sciences (NIEHS), and the National Center for Advancing Translational Sciences (NCATS). These groups have been working in close partnership to generate the largest ever population-scale toxicity screen in a human in vitro model system that leverages the 1000 Genomes Project. The NIEHS/NCATS/UNC team is providing Challenge participants with access to in vitro cytotoxicity screens for 156 drugs and environmental chemicals measured in lymphoblastoid cell lines derived from 884 participants in the 1000 Genomes Project. These cell lines are derived from individual representing 9 distinct geographic subpopulations across Europe, Africa, Asia, and the Americas. These data are paired with the extensive, publicly available genomic data from these cell lines, including DNA variation profiles by the 1000 Genomes Project and transcriptomic data by the Geuvadis project. The Toxicogenetics Challenge aims to ask a ?crowd? of researchers to use these data to elucidate the extent to which adverse effects (e.g., cytotoxicity) of compounds can be inferred from genomic and/or chemical structure data. The computational models built within this Challenge could be considered in certain decision-making contexts to inform government agencies as to which environmental chemicals and drugs are of the greatest potential concern to human health. The ChallengeParticipants have the opportunity to participate in two distinct subchallenges that use the same data. Participants can choose to participate in one or both subchallenges. Subchallenge 1: Predict interindividual variability in in vitro cytotoxicity based on genomic profiles of individual cell lines. For each compound, participants will be challenged to predict the absolute values and relative ranks of cytotoxicity across a set of unknown cell lines for which genomic data is available. For more information, click here. Subchallenge 2: For each compound, predict the concentration at which median cytotoxicity would occur, as well as inter-individual variation in cytotoxicity, described by the 5-95th%ile range, across the population. Each prediction will be scored based on the participant?s ability to predict these two parameters within a set of compounds excluded from the training set. For more information, click here. Data A detailed description of the data used in this Challenge is available on the data description page. Descriptions of data file content and formats is available on the data file description page. Instructions for accessing the data are available on the Training Data page. Timelines for participationThis Challenge will open on June 10 2013 and will close on September 15 2013 at 12:59 (Pacific Time). All models must be submitted by that time in order to considered for any prize. We will be accepting final submissions for both subchallenges starting in late August. For subchallenge 1, participants have the opportunity to test predictions against a test data set starting on July 24 2013 through our online leaderboard. Submitted predictions will be scored and all scores will be made publicly available on a leaderboard. In this way, participants can gauge the effectiveness of their models and make comparisons relative to other participants. Participants can submit any number of submissions to the leaderboard for Subchallenge 1 on behalf of their team. The data used to score predictions for the leaderboard are distinct from the data that will be used to score final predictions and to select the winner. The data used to score the leaderboard will be released in late August for use by participants in developing their final predictions. More information on scoring criteria and submission procedures is available on the Submitting Predictions page. Incentives The winner of each subchallenge will be invited to speak at the 2013 DREAM conference. This conference will be held on November 8-12 in Toronto, Canada in conjunction with the RECOMB/ISCB Conference on Regulatory and Systems Genomics. Winners will also be provided with an award to cover cost of travel. Nature Biotechnology, the leading journal in the field of Biotechnology and Computational Biology, has agreed to work with Sage Bionetworks and DREAM, enthusiastically supporting the submission to their journal of an overview paper describing an analysis of the results and broadly applicable insights that arise from this NIEHS/NCATS/UNC DREAM Toxicogenetics Challenge. The submitted paper will be rigorously peer reviewed and must be of the quality of other published work into to be published within the Nature Biotechnology journal. As has been the case with previous publications resulting from DREAM challenges, the challenge organizers will invite the best performing team from each subchallenge to co-author the paper. The rest of the participants in the challenge will also be invited to co-sign the paper as part of the NIEHS/NCATS/UNC DREAM consortium. Getting helpCommunity Forum A Community forum page has been developed for Challenge participants and organizers. If you have any questions or comments to make about the Challenge ? for either the organizers or other participants ? this is the place to post them. The Challenge organizers will post answers to all received questions on the forum. In addition, we will use this site to provide additional details about the challenge as they are developed. This will include exciting information about incentives, leaderboard availability and intermediate prizes and competitions. Synapse User Support The Synapse software platform will be used exclusively to distribute data and for submission of predictions within the Challenge. Both of these activities can be performed by interfacing with Synapse through the website (this page) or programmatically using R, python or the command line clients. For information on how to use Synapse, please refer to the Synapse User Guide. To learn more about the programmatic clients, see Getting started with the R synapseClient and Getting started with the Python client for Synapse. Community participationParticipants are encouraged to compete as teams. Team membership is not fixed and can fluctuate throughout the Challenge. You will be asked to provide a team name when you submit a prediction to be scored by Challenge organizers. There will be a limitation on the number of submissions allowed per team. You may participate on more than one team but every team must have a different team membership. CreditsData was generated by: This Challenge is being run by: INSTRUCTIONS FOR GETTING STARTED ARE AVAILABLE ON THE TRAINING DATA PAGE. Instructions for submitting predictions are available on the submissions page.

syn1917676
syn1917737
syn1918351
syn2176073
syn2176803
syn2280489
syn1917704
syn1917707
syn1917708
syn1917739
syn1917740
syn1917742
syn1917748
syn1917893
syn1917897
syn1918358
syn1918364
syn1918365
syn1999583
syn2004277
syn2176074
syn2176974
syn2177050
syn2186574
syn2186866
syn2279428
syn2279429
syn2280490
syn2280491
syn2280497
syn1917741
syn1917746
syn1917747
syn1917749
syn1917750
syn1917751
syn1917753
syn1917754
syn1917755
syn1917756
syn1917758
syn1917759
syn1917760
syn1917761
syn1917762
syn1917763
syn1917764
syn1917765
syn1917766
syn1917767
syn1917768
syn1917769
syn1917770
syn1917771
syn1917772
syn1917773
syn1917774
syn1917775
syn1917777
syn1917778
syn1917779
syn1917780
syn1917781
syn1917782
syn1917783
syn1917784
syn1917785
syn1917786
syn1917787
syn1917788
syn1917789
syn1917790
syn1917792
syn1917793
syn1917794
syn1917795
syn1917796
syn1917797
syn1917798
syn1917888
syn1917905
syn1917906
syn1918350
syn1918352
syn1918355
syn1918356
syn1918357
syn1920173
syn1917799
syn1917800
syn1917806
syn1917807
syn1917808
syn1917809
syn1917810
syn1917812
syn1917813
syn1917814
syn1917815
syn1917816
syn1917817
syn1917818
syn1917819
syn1917823
syn1917824
syn1917826
syn1917827
syn1917831
syn1917832
syn1917833
syn1917834
syn1917835
syn1917836
syn1917837
syn1917838
syn1917839
syn1917840
syn1917841
syn1917842
syn1917843
syn1917844
syn1917845
syn1917846
syn1917847
syn1917848
syn1917849
syn1917850
syn1917851
syn1917852
syn1917853
syn1917854
syn1917855
syn1917856
syn1917859
syn1917860
syn1917863