Whole-cell parameter estimation DREAM challenge

Description of the 2013 whole-cell parameter estimation DREAM challenge including a synopsis of the challenge, a summary of the whole-cell model, and participant instructions.
Created By Brian Bot BrianMBot
Sponsored by: Dialogue for Reverse Engineering Assessments and Methods (DREAM) Sage Bionetworks IBM Research Covert Lab, Stanford University Numerate Mathworks News May 28, 2015 Challenge summary published in PLoS Computational Biology Oct 15 Best performers announced here Sep 16 Deadline for final submissions is Sep 23. Sub-challenge #3 results posted to the leader board. Congrats to team The ICM Poland! Sep 4 Sub-challenge #2 results posted to the leader board. Congrats to team The Whole-Sale Modelers! Aug 20 Sub-challenge #1 results posted to the leader board. Congrats to teams crux, The Whole-Sale Modelers, and newDream! Aug 20 BitMill limit increased to five (5) simultaneous simulations. Aug 1 Intermediate subchallenges and prizes announced! See Section 2.6 for more info about submitting Aug 9: Milestone 1 -- Best prediction score. 1st place: $200; 2nd place: PLoS gear, MATLAB student license Aug 23: Milestone 2 -- Most creative method. 1st place: $300; 2nd place: PLoS gear, MATLAB student license Sep 6: Milestone 3 -- Best parameter score. 1st place: $400; 2nd place: PLoS gear, MATLAB student license Sep 23: Final -- Best combined score. 1st place: ISCB/RECOMB, PLoS invitations July 11 Added support for HTTP proxies. Pull latest code and edit getConfig.m to setup proxy for BitMill. July 10 Feedback survey posted. Please give us feedback on how the challenge is going! July 10 Webinar slides and video posted July 9 Added metabolic sub-model linear program files for use with programs like libSBML, openCOBRA, lpsolve, and gurobi July 9 Added instructions on how to set enzyme copy numbers in metabolic sub-model (See Section 3.4.4) July 3 The first webinar will held July 9 at 11am PDT. Register here June 21 PLoS Computational Biology will publish manuscripts from the winning participants 1 SynopsisRecently while tinkering in the lab, we (the organizers) made an exciting breakthrough! Incredibly, we identified a mutant in silico strain of the Gram-positive bacterium Mycoplasma genitalium which grows 33% slower than wild-type! As a participant, we challenge you to determine how the mutant strain differs from the wild-type and why it grows more slowly. Specifically, the organizers have changed the values of 15 parameters of a recent whole-cell model of M. genitalium (Karr et al., 2012) compared to that of the wild-type strain. These 15 parameters along with 15 unmodified parameters are listed in Table 1 (see Section 2.3). Your goal is to identify which 15 parameters we modified as well as their new values, given the model's structure, the wild-type parameter values which are distributed with the model code, and data obtained from in silico experiments on the mutant strain. The challenge mimics a common scenario in scientific research where researchers need to tune a model's parameters to match experimental data in order to discover new biology. The goal of this challenge is to explore and compare innovative approaches to parameter estimation of large, heterogeneous computational models. Participants are encouraged to develop and/or apply optimization methods, including the selection of the most informative experiments. The organizers encourage participants to form teams to collaboratively solve the challenge. 1.1 BackgroundA central challenge in biology is to understand how phenotype arises from genotype. Despite decades of research which have produced vast amounts of biological data, a complete, predictive understanding of biological behavior remains elusive. Computational techniques are needed to assemble the rapidly growing amount of biological data into a unified understanding. Recently, researchers at Stanford University developed the first comprehensive dynamical "whole-cell" model of a living organism (Karr et al., 2012). The model broadly predicts the cell cycle dynamics of the Gram-positive bacterium M. genitalium from the level of individual molecules and their interactions, including its metabolism, transcription, translation, and replication. The model is composed of 28 sub-models of distinct cellular processes which were independently modeled at short time scales, and integrated together at longer time scales Figure 1. The model was validated by broadly comparing its predictions to a wide range of experimental data across several biological processes and scales. Figure 1. M. genitalium whole-cell model. Diagram schematically depicts the 28 sub-models (colored words) in the context of a single M. genitalium cell with its characteristic flask-like shape. Sub-models are connected through common metabolites, RNA, protein, and the chromosome, which are depicted as orange, green, blue, and red arrows, respectively. Reprinted from Karr et al., 2012. In total the M. genitalium whole-cell model contains 1,468 quantitative parameters. Accurately identifying these parameters is essential to whole-cell modeling. Furthermore, identifying these parameters is challenging because the model takes approximately 24 core-hr to simulate one cell cycle and because the model is stochastic. Karr and his colleagues identified the model parameters by first assembling a training set of over 1,836 experimental observations from over 900 publications, and second heuristically tuning the parameter values to match the training data. More rigorous approaches to parameter estimation are critically needed to improve the accuracy of whole-cell models and to enable researchers to continue to develop increasingly complex models. 1.2 Free cloud computational resourcesParticipants can simulate the whole-cell model with candidate parameter values for free in the cloud using BitMill. BitMill was generously adapted and donated to run the Dream challenge by Numerate. See Section 3.3.3 for instructions. 1.3 Prizes 1.3.1 Final challenge (due Sep 23, 2013)The team with the highest overall scoring solution will be invited to present their approach at the 6th Annual RECOMB/ISCB conference on Regulatory and Systems Genomics in Toronto, Canada. The winning team will also be invited to submit a manuscript describing their methodology to PLoS Computational Biology, the leading computational biology journal. The manuscript will be invited, but will still be subject to the same rigorous peer review as all PLoS Computational Biology articles. See Sections 2.6 and 2.7 for information about submission and scoring. 1.3.2 Intermediate sub-challenges (due Aug 9, Aug 23, Sept 6) We will award prizes for three intermediate sub-challenges: Aug 9: Milestone 1 -- Best prediction score 1st place: $200 2nd place: PLoS gear, MATLAB student license Aug 23: Milestone 2 -- Most creative method. Challenge organizers will judge the submitted write-ups and code. 1st place: $300 2nd place: PLoS gear, MATLAB student license Sep 6: Milestone 3 -- Best parameter score 1st place: $400 2nd place: PLoS gear, MATLAB student license See Sections 2.6 and 2.7 for information about submission and scoring. 1.4 Timeline Jun 10, 2013: Challenge launched July 9, 11am PST/ 2pm EST: Live webinar with challenge organizers. View slides. Video will be posted soon! Intermediate submission deadlines Aug 9 11:59pm PST: Milestone 1 -- Best prediction score. 1st place: $200; 2nd place: PLoS gear Aug 23 11:59pm PST: Milestone 2 -- Most creative method. 1st place: $300; 2nd place: PLoS gear Sep 6 11:59pm PST: Milestone 3 -- Best parameter score. 1st place: $400; 2nd place: PLoS gear Sep 23, 2013 11:59pm PST: Final solution submission deadline. 1st place: ISCB/RECOMB, PLoS invitations Late Sep, 2013: Winners announced Nov 8-12, 2013: Winning team presents their solutions at the 6th Annual RECOMB/ISCB conference on Regulatory and Systems Genomics in Toronto, Canada Spring, 2014: Winning solution and meta-analysis published 2 The challengeParticipants are challenged to estimate the values of 15 unknown parameter values from a set of 30 – 10 promoter affinities, 10 RNA half lives, and 10 metabolic reaction kcats – of a recently published whole-cell model of M. genitalium (Karr et al., 2012) given the model's structure and simulated data. The 30 unknown parameters are associated with 10 mRNA-coding genes whose gene products catalyze 10 metabolic reactions. Specifically, the organizers have modified the values of 15 of these parameters compared to their values published in Karr et al., 2012 and distributed to participants through GitHub. Together the modified parameter values increase the average in silico cell cycle length by 33% from 9 to 12 h. The organizers have not modified the values of the other 15 parameters. Participants will not be told which 15 parameters have been modified. Rather participants are challenged to learn this information. Participants are encouraged to develop and/or apply optimization methods, including the selection of the most informative experiments. Participants will be scored based on the distance between their estimated and the true parameters values and the distance between the in silico measurements from their estimated and the true parameter values. Below we describe the 30 parameters, the in silico data, the submission system, and the scoring algorithm. 2.1 The whole-cell model The whole-cell model is composed of 28 sub-models (also refereed to as modules or processes) each of which was modeled independently at short time scales using different mathematical representations (e.g. ODEs, Boolean, probabilistic, constraint-based, etc.). As illustrated in Figure 2, the model integrates the sub-models in three steps. First, the sub-models are structurally integrated by linking their common inputs and outputs through 16 common cell state variables which together represent the in silico cell's instantaneous configuration: Metabolite, RNA, and protein copy numbers; Metabolic reaction fluxes; Nascent DNA, RNA, and protein polymers; Molecular machines; Cell mass, volume, and shape; The external environment, including the host urogenital epithelium; and Time. Second, the common inputs to the sub-models were computationally allocated at the beginning of each time step. Third, values of the sub-model parameters were semi-automatically tuned to match experimental data. The whole-cell model is extensively described in Data S1 of Karr et al., 2012. Chapter 1 summarizes the model. Chapters 2 and 3 describe the mathematical and computational implementation of each cell state variable and process sub-model. Figure 2. Whole-cell model simulation algorithm. The model integrates cellular function sub-models through 16 cell variables. First, simulations are randomly initialized to the beginning of the cell cycle (left gray arrow). Next, for each 1 s time step (dark black arrows), the sub-models retrieve the current values of the cellular variables, calculate their contributions to the temporal evolution of the cell variables, and update the values of the cellular variables. This is repeated thousands of times during the course of each simulation. For clarity, cell functions and variables are grouped into five physiologic categories: DNA (red), RNA (green), protein (blue), metabolite (orange), and other (black). Colored lines between the variables and sub-models indicate the cell variables predicted by each sub-model. The number of genes associated with each sub-model is indicated in parentheses. Finally, simulations are terminated upon cell division when the septum diameter equals zero (right gray arrow). Reprinted from Karr et al., 2012. 2.1.1 Metabolic sub-modelWe encourage participants to solve the challenge by using the metabolic submodel as a simplified surrogate of the entire whole-cell model. The metabolic sub-model (Figure 3) includes the 10 reactions associated with the 30 parameters, including the 15 unknown ones. The metabolic sub-model was implemented using flux-balance analysis. See Chapter 3 of Data S1 of Karr et al., 2012 for further information about the metabolic sub-model. See section 3.4.1 for instruction on how to simulate the metabolic sub-model. Figure 3. M. genitalium metabolic network. Reprinted from Karr et al., 2012. 2.2 Model parametersThe model contains a large number of quantitative and structural parameters. However, participants are only asked to estimate the values of 15 of these parameters from the subset of 30 parameters indicated above: 10 promoter affinities, 10 RNA half-lives, and 10 metabolic reaction kcats. The next section contains more information about the unknown parameters including how to set their values. A table of all of the model's parameters including their biological meaning, value, and units is available here. Additionally, participants can use WholeCellKB to inspect the experimental data used to train the base value of each of the model's parameters. Figure 4 displays a screen shot of a representative WholeCellKB page of the thiamine kinase reaction. Participants can click on the "View in model" buttons highlighted in red to view the corresponding model properties highlighted in the MATLAB simulation code (Figure 5). Figure 4. Screen shot of WholeCellKB highlighting the "View in model" button. Participants can use this button to inspect how the base values of the model's parameters (highlighted in MATLAB code in Figure 5) are trained using the experimental data organized in WholeCellKB including the reported reaction stoichiometry listed in this screen shot. Figure 5. Screen shot of a WholeCellKB "View in model" page. The page highlights the model property or properties associated with the experimental data named at the top of the page (e.g. Reaction: Stoichiometry). 2.3 Model parameters to be estimatedParticipants are asked to estimate 30 unknown parameters of 3 types (Table 1): 10 promoter affinities, 10 RNA half-lives, and 10 metabolic reaction kinetics (kcats) Table 1. Unknown RNA half-lives and reaction kcats to be estimated. Each row lists a gene, the operon the gene is transcribed with and its RNA half-live in seconds, and a reaction catalyzed by the gene's protein product and its forward kcat. The organizers have modified the values of 15 of the 30 quantitative parameters (operon affinity, RNA half life, reaction kcat) in this table. Participants are challenged to determine which parameters have been modified and to estimate their new values. The values listed in the table are the base values of the parameters published in Karr et al., 2012. Gene ID Operon ID Operon half life (s) Enzyme ID Reaction ID Reaction kcat (1/s) MG_006 TU_003 209 MG_006_DIMER Tmk 0.07 MG_023 TU_011 245 MG_023_DIMER Fba 23.34 MG_047 TU_027 170 MG_047_TETRAMER MetK 0.11 MG_111 TU_069 187 MG_111_DIMER Pgi 1218.79 MG_272 TU_180 401 MG_271_272_273_274_192MER AceE 1128.31 MG_299 TU_203 174 MG_299_DIMER Pta 1620.91 MG_330 TU_233 253 MG_330_MONOMER CmkA2 103.25 MG_357 TU_260 216 MG_357_DIMER AckA 100.59 MG_407 TU_294 282 MG_407_DIMER Eno 300.87 MG_431 TU_307 219 MG_431_DIMER TpiA 816.67 Table 2. Unknown operon affinity/RNA polymerase binding probability to be estimated. The perturbation labels were corrected on August 28. This update does not change the perturbations that were made. The update only corrects the labels. Previously the modified parameters were misreported due to an error in the code. The first column of the table below indicates the modified RNA polymerase affinity/promoter affinities. The second and third columns indicate the prior (incorrect) label for each perturbation. Note: The purchasable perturbation data still uses the incorrect labels for the purchasing options and file names. Please use the table below to interpret the true perturbations. Operon ID Original perturbation file label (TU) Original perturbation file label (Gene) TU_003 TU_003 MG_006 TU_012 TU_011 MG_023 TU_028 TU_027 MG_047 TU_070 TU_069 MG_111 TU_184 TU_180 MG_272 TU_209 TU_203 MG_299 TU_245 TU_233 MG_330 TU_272 TU_260 MG_357 TU_306 TU_294 MG_407 TU_319 TU_307 MG_431 The 30 unknown parameters are associated with 10 mRNA-coding genes (3 parameters per gene). Of these 30 parameters, the values of 15 have been changed compared to distributed base parameter values. Together the modified parameter values increase the average in silico cell cycle length by 45% from 9 to 12 h. The values of the other 15 parameters have not been modified. Participants will not be told which 15 parameters have been modified. Rather participants are challenged to learn this information. In case this turns out to be too difficult, the organizers will reveal the identity of the 15 modified parameters. In summary, 15 parameters were modified: 3 promoter affinities 3 RNA half lives 9 kcats The values of the 13 of the 15 modified parameters were decreased. The values of 2 of the 15 modified parameters were increased. The decreases range from 2.8-93.4%. The increases range from 11.7-90.6%. 2.3.1 Promoter affinitiesThe promoter affinities are represented by the transcriptionUnitBindingProbabilities of the edu.stanford.covert.cell.sim.process.Transcription class. This property is a 335×1 numeric array. Because the entries represent probabilities their values are dimensionless and sum to 1. Each row corresponds to a transcription unit (also known as nascent RNA, operon, polycistronic RNA). The row labels and probabilities can be retrieved by evaluating rna = sim.state('Rna'); trn = sim.process('Transcription'); ids = rna.wholeCellModelIDs(rna.nascentIndexs); probsArr = trn.transcriptionUnitBindingProbabilities; probsStruct = sim.getRnaPolTuBindingProbs(); sim.applyRnaPolTuBindingProbs(probsStruct); Note: getRnaPolTuBindingProbs will automatically renormalize transcriptionUnitBindingProbabilities. 2.3.2 RNA half livesThe RNA half lives (s) are represented by the halfLives property of the edu.stanford.covert.cell.sim.state.Rna class. The property is a 2428×1 numeric array. Each row corresponds to a distinct RNA species. Although there are only 335 transcription units which are cleaved (also known as processed) into 347 mature RNA species, the property has length 2428 because its represents all of the forms (nascent, processed, mature, bound, misfolded, damaged, aminoacylated, intergenic) of each RNA gene product. The unknown RNA half lives all correspond to mature forms. The mature RNA half lives and row labels can be retrieved and modified by evaluating rna = sim.state('Rna'); ids = rna.wholeCellModelIDs(rna.matureIndexs); halfLivesArr = rna.halfLives(rna.matureIndexs); halfLivesStruct = sim.getRnaHalfLives(); sim.applyRnaHalfLives(halfLivesStruct); Note: The half lives of the processed, mature, and aminoacylated forms of each RNA species are constrained to be equal. The nascent half lives are equal to the average half lives of the component mature RNA species. The bound forms are constrained to have infinite half lives. The misfolded, damaged, and intergenic forms are constrained to have half lives equal to zero. The applyRnaHalfLives method automatically satisfies these constraints by updating the half lives of the processed, aminoacylated, and nascent forms in addition to the mature form. 2.3.3 Metabolic reaction kcatsThe metabolic reaction kcats (reactions/enzyme/s) are represented by the enzymeBounds property of the edu.stanford.covert.cell.sim.process.Metabolism class. The property is a 645×2 numeric array. Each row corresponds to a reaction. The first column corresponds to the reverse kcats. The second column represents the forward kcats. The row labels and forward kcats can be retrieved by evaluating met = sim.process('Metabolism'); ids = met.reactionWholeCellModelIDs; kcatsArr = met.enzymeBounds; kcatsStruct = sim.getMetabolicReactionKinetics(); sim.applyMetabolicReactionKinetics(kcatsStruct); Note: The unknown kcat parameter values all represent forward reactions. Note: the kcats are redundantly represented by fbaEnzymeBounds property of the same class. Use the applyMetabolicReactionKinetics method of the edu.stanford.covert.cell.sim.Simulation class to set kcat values. Do not edit the enzymeBounds or fbaEnzymeBounds properties directly. 2.4 In silico "experimental" data for parameter estimationParticipants can use the eight data types below to estimate the 15 modified parameters of the 30 parameters listed above: Single-cell data Dynamics: rows correspond to individual cells, columns correspond to time points 0..N (s). Growth (g/s) Mass (g) Volume (L) Note: the arrays which store the growth, mass, and volume data are NaN padded in the following way. Let Aij be the single-cell measurement of cell i at time point j. Then Aij is NaN when in silico cell i divided before time point j. Event times: rows correspond to individual cells. Replication initiation time (s) Replication termination time (s) Cell cycle length (s) Note: NaN values indicate that the in silico cell didn't complete the event. For example the cell cycle length data for cell i will be NaN if cell i didn't divide within the 65,000 s simulation. Metabolite concentrations (M): Time and population average concentrations. Rows correspond to metabolite species. Rows are labeled by sim.state('Metabolite').wholeCellModelIDs. DNA-seq (DNA molecules/nt): Time and population average DNA copy number of each 100 nt region of the chromosome. Row 1 corresponds to bases 1..100, Rows 2 corresponds to bases 101..200, etc. RNA-seq (transcripts/nt): Time and population average number of mapped RNA transcripts of each 100 nt region of the chromosome. Row 1 corresponds to bases 1..100, row 2 corresponds to bases 101..200, etc. ChIP-seq (protein molecules/nt): Time and population average DNA-bound protein density of each 100 nt region of the chromosome. Row 1 corresponds to bases 1..100, row 2 corresponds to bases 101..200, etc. Columns corresponds to mRNA-coding genes and are labeled by sim.gene.wholeCellModelIDs(sim.gene.mRNAIndexs). RNA expression array (M): Time and population average RNA concentrations by gene. Rows correspond to genes. Rows are labeled by sim.gene.wholeCellModelIDs. Protein expression array (M): Time and population average protein concentrations. Rows correspond to protein-coding genes. Rows are labeled by sim.gene.wholeCellModelIDs(sim.gene.mRNAIndexs). Metabolic reaction fluxes (rxn/s/gDCW): Time and population average reaction fluxes. Rows are labeled by sim.state('MetabolicReaction').reactionWholeCellModelIDs. 2.4.1 Initial wild-type data provided "free" to participantsThe organizers have performed the eight in silico experiments described on a population of 32 in silico cells using the 15 modified parameter values (compared to the base parameters values). Participants can download this data for "free" here. 2.4.2 Perturbation data available for "purchase"To reflect conditions that exist in actual scientific practice, each individual participant will also receive 5,000 credits which they can use to "buy" additional "experimental" data measured from perturbed in silico cells. The organizers created the perturbation data sets by individually increasing and decreasing the values of the 30 unknown parameters (see Table 1) by a factor of 2 compared to the unknown values. Participants can purchase the eight data types described above for each perturbation. In total 480 datasets are available for purchase (30 parameters × 2 perturbations × 8 data types). Each data set represents the average of eight in silico cells. Each data set costs 100 credits. Participants can form teams to pool perturbation data. Participants must use the "purchase" form to obtain additional data. Participants will purchase in silico data individually, not as teams. Perturbation parameter sets were calculated using the following code: genesTusRxns = { 'MG_006' 'TU_003' 'Tmk' 'MG_023' 'TU_011' 'Fba' 'MG_047' 'TU_027' 'MetK' 'MG_111' 'TU_069' 'Pgi' 'MG_272' 'TU_180' 'AceE' 'MG_299' 'TU_203' 'Pta' 'MG_330' 'TU_233' 'CmkA2' 'MG_357' 'TU_260' 'AckA' 'MG_407' 'TU_294' 'Eno' 'MG_431' 'TU_307' 'TpiA' }; parameterTypes = { 'PromAffinity' 'HalfLife' 'RxnKcat' }; parameterVals = { '05X' 0.5 '2X' 2.0 }; sim = edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil.load(); goldParameters = load(fullfile(baseDir, 'gold-parameters.mat')); sim.applyAllParameters(goldParameters); rnaPolTuBindingProbs = sim.getRnaPolTuBindingProbs(); rnaHalfLives = sim.getRnaHalfLives(); rxnKinetics = sim.getMetabolicReactionKinetics(); for i = 1:size(genesTusRxns, 1) for j = 1:numel(parameterTypes) for k = 1:size(parameterVals, 1) sim.applyAllParameters(goldParameters); switch parameterTypes{j} case 'PromAffinity' tuId = genesTusRxns{i, 2}; sim.applyRnaPolTuBindingProbs(struct(tuId, parameterVals{k, 2} * rnaPolTuBindingProbs.(tuId))); case 'HalfLife' tuId = genesTusRxns{i, 2}; sim.applyRnaHalfLives(struct(tuId, parameterVals{k, 2} * rnaHalfLives.(tuId))); case 'RxnKcat' rxnId = genesTusRxns{i, 3}; sim.applyMetabolicReactionKinetics(struct(rxnId, struct('for', parameterVals{k, 2} * rxnKinetics.(rxnId).for))); end parameters = sim.getAllParameters(); paramFileName = fullfile(baseDir, sprintf('parameters_%s_%s_%s.mat', genesTusRxns{i, 1}, parameterTypes{j}, parameterVals{k, 1})); save(paramFileName, '-struct', 'parameters'); end end end Participants can simulate these same perturbations themselves (see Section 3.3). However, participants will not know the true parameter values from which to base the perturbations. 2.4.3 Data access agreementParticipants will be required to accept the Data access agreement to obtain the challenge data. 2.5 Registering for the challengeParticipants must join to obtain free access to the cloud computing resources and to submit solutions. Participants will be required to create or join a team to complete the registration. After registering, participants can change their team affiliation at any time. Note: creating multiple accounts or teams solely to circumvent limits on the "purchased" in silico data is grounds for disqualification. 2.5.1 TeamsParticipants are encouraged to solve the challenge in teams. Teams can be of any size, and teams are responsible for managing their membership and distributing any prizes among members. Participants can use the forum to recruit team members. Parameter sets can be submitted by any team member. Only participant per team needs to submit their write-up and code. 2.6 Submitting solutions, code, & write-upsAfter registering, teams will be able to submit their solution in two parts: (1) the estimated parameter values and (2) code and a 1-2 page write-up describing the methods they used to solve the challenge and all code used to solve the challenge. When teams submit their write-up they will also be required to accept a statement of participation acknowledging that their submission represents their own work. The code and write-up should be submitted by one participant per team. Participants must submit their estimated parameter values using the MATLAB script postCloudSimulation. This script will (1) simulate eight in silico cells in the cloud using BitMill, (2) return to participants their average in silico measurements, (3) return to participants the distance between the true and estimated parameter values and the distance between the in silico data from the true and estimated parameter values, and (4) for debugging purposes, return to participants the standard outputs and errors of their simulation job concatenated into two files (output and error). Weekly the organizers will separately rank participants by these two distances, and post rankings on the leader board. The two distances will be combined to form an overall score which will be used to award prizes. Prizes will be awarded to teams based on the highest overall scoring set of parameter values from among all team members. See Sections 2.4, 2.7, and 3.3.3 for more information about how the in silico "experimental" data is calculated, how the distances are computed and scored, and how to run simulations in the cloud. One participant per team must submit their write-up and code using the procedure described in the "submission tutorial" above. It is up to teams to coordinate their code and write-ups. Create a new Synapse project for each submission Upload a brief short write-up (1-2 pages in text, word, or pdf format) to the project. Write-ups can be informal and may contain pseudo-code describing the algorithm(s) used, work flows, etc. Upload any code used the solve the challenge to the project Submit your project to the challenge. To encourage participants to build on the best performing methods the organizers will post the leading write-ups and code after each milestone. The organizers will also use the write-ups and code to report the most successful parameter estimation strategies in a summary paper to be published in Spring 2014. Note: Sage Bionetworks reserves the right to disqualify submissions from any participant or team at its sole discretion. 2.7 ScoringSubmissions will be scored according to two criteria: (1) the distance between the true and estimated parameter values and (2) the distance between the in silico data from the true and estimated parameter values. These criterion will be combined to form a single overall score. The distance and scoring calculations are implemented by the MATLAB class edu.stanford.covert.cell.sim.util.DreamScoring. The test_calcParameterAndPredictionScoring method of the edu.stanford.covert.cell.sim.DreamCompetitionTest class illustrates exactly how simulations will be run and scored. See below for information about how the individual distances and overall score are calculated. See Section 2.4 for more information about how the in silico "experimental" data is calculated. 2.7.1 Parameter distanceThe parameter distance will be calculated as follows. Let v_i{true} and v_i{est} be the true and estimated values of the parameter including all of the unknown parameters (promoter affinities, RNA half-lives, and metabolic reaction kinetics). Then the parameter distance is given by where N is the number of parameters. 2.7.2 Prediction distanceThe prediction distance will be calculated as follows. Let v_i{true} and v_i{est} be the in silico measurements obtained using the true and estimated parameter values, including the in silico single-cell, DNA-seq, RNA-seq, ChIP-seq, metabolomics, transcriptomics, and proteomics data. Then the prediction distance is given by where N is the number of in silico measurements and σ_i^{true} is the standard deviation of each in silico measurement under the true parameter values. 2.7.3 Overall score The prediction and parameter distances will be combined as follows. Let p_{param} and p_{pred} be the p-values of the parameter and prediction distances obtained by empirical sampling all of the submissions. Specifically, we form an empirical distribution of the parameter distance by (1) sampling the value of each parameter separately from all of the submissions with uniform probability and (2) calculating the distance between true and randomly chosen parameter values. Similarly, we form an empirical distribution of the prediction distance by (1) sampling the value of each in silico measurement separately from all of the submissions with uniform probability and (2) calculating the distance between the true and randomly chosen in silico measurements. Then the overall score is given by 2.7.4 Most creative methodThe organizers will award the prize for the most creative method for second milestone based on judging your submitted write-ups and code. 2.7.5 Scoring exampleA complete example of how to score submissions, including generating sample submissions is available here. 2.8 Leader boardWeekly the organizers will separately rank individual participants by the two distances described above, and post rankings on the leader board. The leader board will not rank teams, however prizes will be awarded based on teams. 3 Using the whole-cell modelHere we describe how to install and run the whole-cell model. The whole-cell model is described extensively in Data S1 of Karr et al., 2012. See Section 3.5 for additional information about the whole-cell model and its computational implementation. 3.1 Installing the whole-cell model and required softwareThe following sections describe how to install the whole-cell model software on Linux, Mac, and/or Windows. Alternatively, participants can use the whole-cell virtual machine which already contains all of the required software. See Section 3.1.4 for more information. Note: participants must join the challenge to obtain the accounts, passwords, and keys needed to install the whole-cell model and cloud computing software. After creating a Synapse account and joining the challenge you will receive an email from bitmill-support@numerate.com notifying you that we have created an Amazon IAM account, an Amazon S3 bucket, and a BitMill account for you. At this point you can complete the installation instructions below. 3.1.1 Linux/UnixNote: the following instructions were developed using Linux Mint 14 Install MATLAB ≥ 2009 with the following toolboxes: Bioinformatics Curve fitting Image processing Optimization Signal processing Statistics Note: The Statistics toolbox is the only toolbox required to simulate the model. The other toolboxes are needed to construct the Simulation object and run some of the model analysis. Install packages sudo apt-get install git sudo apt-get install curl sudo apt-get install python python-setuptools python-pip sudo pip install python-magic Install s3cmd wget http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha1/s3cmd-1.5.0-alpha1.tar.gz?r=&ts=1369441316&use_mirror=superb-dca2 tar -xvvf s3cmd-1.5.0-alpha1.tar.gz?r= cd s3cmd-1.5.0-alpha1 sudo python setup.py install cd .. rm -rf s3cmd-1.5.0-alpha1 rm s3cmd-1.5.0-alpha1.tar.gz?r= Configure s3cmd by executing python s3cmd --configure s3://<your_bucket_name> Download your access and secret keys here Enter access and secret keys provided at registration (you will receive an email from bitmill-support@numerate.com, it might take some time) Leave "encryption password" and "Path to GPG program" blank Set "Use HTTPS protocol" to "Yes" Leave "HTTP Proxy server name" blank Yes, test access. This should result in a message "... Success" Yes, save settings Install bitmill-bash git clone https://github.com/Numerate/bitmill-bash.git ~/bitmill-bash cd ~/bitmill-bash rm bitmill.conf s3cmd get s3://<your_bucket_name>/bitmill.conf bitmill.conf #replace <your_bucket_name> with your bucket's name ./gen_all_scripts.sh Install and configure whole-cell model code git clone -b parameter-estimation-DREAM-challenge-2013 https://github.com/CovertLab/WholeCell.git ~/WholeCell matlab >> cd /path/to/WholeCell matlab >> install(); Configure path Bash shells: append to ~/.bashrc. Create file if necessary. export PATH=$PATH:~/bitmill-bash:~/bitmill-bash/dream csh shells: append to ~/.cshrc. Create file if necessary. set PATH = ($PATH ~/bitmill-bash ~/bitmill-bash/dream) tsch shells: append to ~/.tcshrc. Create file if necessary. set PATH = ($PATH ~/bitmill-bash ~/bitmill-bash/dream) 3.1.2 Mac Install MATLAB ≥ 2009 with the following toolboxes: Bioinformatics Curve fitting Image processing Optimization Signal processing Statistics Note: The Statistics toolbox is the only toolbox required to simulate the model. The other toolboxes are needed to construct the Simulation object and run some of the model analysis. Download git client and install Install python-setuptools curl -O https://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg#md5=fe1f997bc722265116870bc7919059ea -o setuptools-0.6c11-py2.7.egg sudo sh setuptools-0.6c11-py2.7.egg --prefix=~ rm setuptools-0.6c11-py2.7.egg Install pip curl -O http://pypi.python.org/packages/source/p/pip/pip-1.3.1.tar.gz tar xzf pip-1.3.1.tar.gz cd pip-1.3.1 python setup.py install cd .. rm -rf pip-1.3.1 rm -rf pip-1.3.1.tar.gz Install python-magic sudo pip install python-magic Install s3cmd curl -O http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha1/s3cmd-1.5.0-alpha1.tar.gz?r=&ts=1369441316&use_mirror=superb-dca2 -o s3cmd-1.5.0-alpha1.tar.gz (if this does not work try using wget instead of curl -O, or download directly by copy pasting the link in your browser) tar -xvvf s3cmd-1.5.0-alpha1.tar.gz cd s3cmd-1.5.0-alpha1 sudo python setup.py install cd .. rm -rf s3cmd-1.5.0-alpha1 rm s3cmd-1.5.0-alpha1.tar.gz Configure s3cmd by executing python s3cmd --configure s3://<your_bucket_name> Download your access and secret keys here Enter access and secret keys provided at registration (you will receive an email from bitmill-support@numerate.com, it might take some time) Leave "encryption password" and "Path to GPG program" blank Set "Use HTTPS protocol" to "Yes" Leave "HTTP Proxy server name" blank Yes, test access. This should result in a message "... Success" Yes, save settings Install bitmill-bash Visit https://github.com/Numerate/bitmill-bash Click "Clone in Mac" After repository downloads: cd ~/bitmill-bash rm bitmill.conf s3cmd get s3://<your_bucket_name>/bitmill.conf bitmill.conf #replace <your_bucket_name> with your bucket's name ./gen_all_scripts.sh Download whole-cell model code and install Visit https://github.com/CovertLab/WholeCell Click "Clone in Mac" After the repository downloads Click on the repository in the GitHub client Switch branch to parameter-estimation-DREAM-challenge-2013 Configure whole-cell software by following on screen instructions matlab >> cd /path/to/WholeCell matlab >> install(); Configure path Bash shells: Using an editor (Emacs or vi) append to ~/.bashprofile. Create file if necessary. export PATH=$PATH:~/bitmill-bash:~/bitmill-bash/dream csh shells: Using an editor append to ~/.cshrc. Create file if necessary. set PATH = ($PATH ~/bitmill-bash ~/bitmill-bash/dream) tsch shells: Using an editor append to ~/.tcshrc. Create file if necessary. set PATH = ($PATH ~/bitmill-bash ~/bitmill-bash/dream) 3.1.3 Windows Install MATLAB ≥ 2009 with the following toolboxes: Bioinformatics Curve fitting Image processing Optimization Signal processing Statistics Note: The Statistics toolbox is the only toolbox required to simulate the model. The other toolboxes are needed to construct the Simulation object and run some of the model analysis. Download git client and install Download Cygwin and install, following the on screen instructions. When prompted select the following packages below. Note: ignore error cygutils.sh exit code 127. Devel: git (1.7.9-1) Editors: nano (2.2.6-1) Net: curl (7.29.0-1) Python: python (2.7.3-1) Python: python-setuptools (0.6.34-1) Web: wget (1.13.4-1) Install s3cmd. Open Cygwin shell and execute: cd /cygdrive/c/Program Files/ wget http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-alpha1/s3cmd-1.5.0-alpha1.tar.gz?r=&ts=1369895321&use_mirror=superb-dca2 tar -xvvf s3cmd-1.5.0-alpha1.tar.gz?r= rm s3cmd-1.5.0-alpha1.tar.gz?r= cd /cygdrive/c/Program Files/s3cmd-1.5.0-alpha1/ python setup.py install Configure s3cmd. Open Cygwin shell and execute python s3cmd --configure s3://<your_bucket_name> Download your access and secret keys here Enter access and secret keys provided at registration (you will receive an email from bitmill-support@numerate.com, it might take some time) Leave "encryption password" and "Path to GPG program" blank Set "Use HTTPS protocol" to "Yes" Leave "HTTP Proxy server name" blank Yes, test access. This should result in a message "... Success" Yes, save settings Install bitmill-bash. Open Cygwin shell and execute: git clone https://github.com/Numerate/bitmill-bash.git ~/bitmill-bash cd ~/bitmill-bash rm bitmill.conf s3cmd get s3://<your_bucket_name>/bitmill.conf bitmill.conf #replace <your_bucket_name> with your bucket's name ./gen_all_scripts.sh Download whole-cell model code and install Visit https://github.com/CovertLab/WholeCell Click "Clone in Windows" After the repository downloads Click on the repository in the GitHub client Switch branch to parameter-estimation-DREAM-challenge-2013 Configure whole-cell software by following on screen instructions matlab >> cd c:\path\to\WholeCell matlab >> install(); Configure Windows environment variables Open control panel → System → Advanced system settings → Environment variables Select "PATH" in "System variables" section. Click "Edit...". Then append to the value ;c:\cygwin\bin Click "New..." Variable name = "CYGWIN" Variable value = "nodosfilewarning" Open export PATH=$PATH:~/bitmill-bash:~/bitmill-bash/dream 3.1.4 Whole-cell virtual machine Install VirtualBox Download, import, and run the whole-cell virtual machine. See instructions for more information. Configure s3cmd by executing python s3cmd --configure s3://<your_bucket_name> Download your access and secret keys here Enter access and secret keys provided at registration (you will receive an email from bitmill-support@numerate.com, it might take some time) Leave "encryption password" and "Path to GPG program" blank Set "Use HTTPS protocol" to "Yes" Leave "HTTP Proxy server name" blank Yes, test access. This should result in a message "... Success" Yes, save settings Configure bitmill-bash git pull cd ~/bitmill-bash rm bitmill.conf s3cmd get s3://<your_bucket_name>/bitmill.conf bitmill.conf #replace <your_bucket_name> with your bucket's name ./gen_all_scripts.sh Update whole-cell software cd ~/WholeCell git pull 3.2 Instantiating simulations and setting parameter valuesFollow the four steps below to instantiate the Simulation class and set the values of the model's parameters. Setup MATLAB warnings, path setWarnings(); setPath(); Instantiate the Simulation class with the base parameter values sim = edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil.load(); Optionally, set simulation options such as the simulation length (s) and the random number generator's seed. Get option values sim.getOptions(); Set option values sim.applyOptions('lengthSec', 10, 'seed', 1); Optionally, modify the model's parameter values. See Section 2.2 for more information about the model's parameters. Get current parameter values sim.getRnaPolTuBindingProbs(); sim.getRnaHalfLives(); sim.getMetabolicReactionKinetics(); Modify parameter values sim.applyRnaPolTuBindingProbs(struct(... 'TU_001', 0.0015, ... 'TU_002', 0.0025 ... )) sim.applyRnaHalfLives(struct(... 'TU_001', 146.9388, ... 'TU_002', 152.9412 ... )) sim.applyMetabolicReactionKinetics(struct(... 'AtpA', struct(... 'for', 1, ... 'rev', -1 ... ) ... )); Get a struct containing the values of all of the simulation's parameters parameterVals = sim.getAllParameters(); 3.3 Simulating the modelAfter instantiating the model and setting the desired option and parameter values the model can either be run locally on your own machine (using either MATLAB or the free MATLAB Component Runtime) or remotely on the cloud using BitMill provided by Numerate. All three methods will execute the same code and conduct the same in silico "experiments". See Section 2.4 for more information about the in silico "experimental" data. The in silico perturbation data available for "purchase" was generated using the same scripts outlined below. The only difference between the perturbation simulations and the simulations that participants will run is that the perturbation experiments used the true parameter values (and perturbations) chosen by the organizers which are unknown to the participants. Note: Simulations run on BitMill will be 65,000 s long (Simulation.lengthSec = 65000). 3.3.1 Simulating the model locally using MATLABExecute the following code to (1) simulate and measure individual in silico cells and (2) average the in silico "experimental" measurements over a population of individual cells. Note: parameterVals = sim.getAllParameters() is a struct created by the final step of the previous section (Section 3.2). Simulate and measure individual in silico cells simulateHighthroughputExperiments(... 'seed', 1, ... 'parameterVals', parameterVals, ... 'simPath', 'output/sim-1.mat' ... ); Calculate population averages averageHighthroughputExperiments(... 'simPathPattern', 'output/sim-*.mat' ... ); Note: Each simulation will require approximately 24-48 core-hours. 3.3.2 Simulating the model locally using the free MATLAB Component Runtime (MCR) Download and install MCR Compile code (preferred method) or download the 2012b MCR Linux binaries cd /path/to/WholeCell/ ./build.sh simulateHighthroughputExperiments ./build.sh averageHighthroughputExperiments Edit MCR shell script (bin/averageHighthroughputExperiments/run_averageHighthroughputExperiments.sh) to prevent globbing Add set -f after echo LD_LIBRARY_PATH is $\{LD_LIBRARY_PATH\}; Add set +f at end of file Simulate individual cells for $i in 1..2 bin/simulateHighthroughputExperiments/run_simulateHighthroughputExperiments.sh /path/to/runtime seed $i parameterValsPath /path/to/parameterValsPath.{mat|xml} simPath output/sim-$i.mat end An example XML file is available here. Average in silico experiments from multiple individual cells bin/averageHighthroughputExperiments/run_averageHighthroughputExperiments.sh /path/to/runtime simPathPattern 'output/sim-*.mat' avgValsPath output/sim-average.mat 3.3.3 Simulating the model remotely on the cloud using BitMillAfter registering (see Section 2.5) for the competition, participants will receive an email from bitmill-support@numerate.com with the login information for their BitMill account. After installing and configuring the bitmill-bash software (see Section 3.1), participants can use the commands below to submit candidate solutions to BitMill. These commands will trigger BitMill to execute the same code outlined in the previous section (3.3.2) in the cloud, and return to participants the same in silico "experiments" from a population of eight in silico cells. Users will receive an email from BitMill when the simulation results are available. Results will be stored in the participant's Amazon S3 bucket. The commands also enable participants to download the parameter and prediction distances between their parameter values and the true parameter values from their S3 bucket. Note: Participants will only be able to perform five in silico experiment at a time in the cloud. Because simulations take approximately 1-2 days, we anticipate that participants will be able to perform approximately 3 in silico experiments per week. Note: parameterVals = sim.getAllParameters() is a struct created by the final step of the previous section (Section 3.2). 3.3.3.1 Running simulations simName = '<choose a short simulation name>'; bucketUrl = 's3://<your bucket>'; [jobId, status, errMsg] = postCloudSimulation(... 'simName', simName, ... 'bucketUrl', bucketUrl, ... 'parameterVals', parameterVals ... ); Note: Simulations run on BitMill will be 65,000 s long (Simulation.lengthSec = 65000). 3.3.3.2 Checking simulation statuses getCloudSimulationStatus() getCloudSimulationStatus(jobId) 3.3.3.3 Canceling simulations cancelCloudSimulation(jobId) 3.3.3.4 Retrieving simulation results stored in Amazon S3 downloadCloudSimulationResults(... 'simName', simName, ... 'bucketUrl', bucketUrl, ... 'localFolder', 'output' ... ); This will download four files: <simName>.predictions.mat: Struct containing the average in silico experimental observations from a population of eight cells <simName>.distances.mat: Struct containing the distances from the gold-standard parameter values and predicted in silico experimental data <simName>.out: Concatenation of the standard output of the individual simulations <simName>.err: Concatenation of the standard error of the individual simulations 3.4 Simulating individual sub-models and subsets of sub-modelsIn addition to simulating the entire model, participants can simulate individual sub-models or groups of sub-models. Below we provide several illustrating examples. 3.4.1 Simulating the metabolic sub-model %set warnings and MATLAB path setWarnings(); setPath(); %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; %load simulation object sim = CachedSimulationObjectUtil.load(); %optionally, set simulation options sim.applyOptions('seed', 1); %optionally, set simulation parameter sim.applyMetabolicReactionKinetics(struct(... 'AtpA', struct(... 'for', 1, ... 'rev', -1 ... ) ... )); %get handle to metabolism sub-model met = sim.process('Metabolism'); %optionally, sample initial conditions sim.initializeState(); %simulate dynamics for 100s lengthSec = 100; for i = 1:lengthSec met.evolveState(); end See Section 3.1 for more information about how to set the values of the model's parameters. 3.4.2 Simulating the metabolism and transcription sub-models %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; %load simulation object sim = CachedSimulationObjectUtil.load(); % get state and sub-model handles time = sim.state('Time'); met = sim.process('Metabolism'); transcription = sim.process('Transcription'); %simulate lengthSec = 100; for i = 1:lengthSec time.values = i; met.copyFromState(); met.evolveState(); met.copyToState(); transcription.copyFromState(); transcription.evolveState(); transcription.copyToState(); end 3.4.3 Simulating the metabolism sub-model and logging predictions %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; %load simulation object sim = CachedSimulationObjectUtil.load(); %get handle to metabolism sub-model and metabolic reaction state met = sim.process('Metabolism'); mr = sim.state('MetabolicReaction'); %simulate dynamics for 100s lengthSec = 100; growth = zeros(lengthSec, 1); for i = 1:100 met.evolveState(); growth(i) = mr.growth; end 3.4.4 Simulating the metabolic sub-model and setting enzyme copy numbersNote: in general, protein copy numbers are proportional to the promoter affinity times the RNA half life. %set warnings and MATLAB path setWarnings(); setPath(); %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; %load simulation object sim = CachedSimulationObjectUtil.load(); %optionally, set simulation options sim.applyOptions('seed', 1); %optionally, set simulation parameter sim.applyMetabolicReactionKinetics(struct(... 'AtpA', struct(... 'for', 1, ... 'rev', -1 ... ) ... )); %get handle to metabolism sub-model met = sim.process('Metabolism'); %optionally, sample initial conditions sim.initializeState(); %set enzyme copy numbers (in general, protein copy numbers are proportional to the promoter affinity times the RNA half life) met.enzymes(strcmp(met.enzymeWholeCellModelIDs, 'MG_006_DIMER')) = 10; %simulate dynamics for 100s lengthSec = 100; for i = 1:lengthSec met.evolveState(); end 3.4.5 Simulating the metabolism sub-model and recording all predictions %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; import edu.stanford.covert.cell.sim.util.DiskLogger; %load simulation object sim = CachedSimulationObjectUtil.load(); %get handle to metabolism sub-model and metabolic reaction state time = sim.state('Time'); met = sim.process('Metabolism'); %set parameters sim.applyOptions('lengthSec', 100); %initialize sim.initializeState(); %initialize logger outPath = 'output/ht-data-test'; logFreqSec = 10; logger = DiskLogger(outPath, logFreqSec); logger.addMetadata(struct(... 'shortDescription', '', ... 'longDescription', '', ... 'email', '', ... 'firstName', '', ... 'lastName', '', ... 'affiliation', '', ... 'knowledgeBaseWID', '', ... 'revision', '', ... 'differencesFromRevision', '', ... 'userName', '', ... 'hostName', '', ... 'ipAddress', '' ... )); logger.initialize(sim); %simulate dynamics for t = 1:sim.lengthSec %set time time.values = t; %calculate metabolism met.evolveState(); %log predictions logger.append(sim); end %finalize logger logger.finalize(sim); 3.4.6 Simulating the metabolism sub-model and recording high-throughput in silico data %import classes import edu.stanford.covert.cell.sim.util.CachedSimulationObjectUtil; import edu.stanford.covert.cell.sim.util.HighthroughputExperimentsLogger; %load simulation object sim = CachedSimulationObjectUtil.load(); %get handle to metabolism sub-model and metabolic reaction state time = sim.state('Time'); met = sim.process('Metabolism'); %set parameters sim.applyOptions('lengthSec', 100); %initialize sim.initializeState(); %initialize logger outPath = 'output/ht-data-test.mat'; logger = HighthroughputExperimentsLogger(outPath); logger.initialize(sim); %simulate dynamics for t = 1:sim.lengthSec %set time time.values = t; %calculate metabolism met.evolveState(); %log predictions logger.append(sim); end %finalize logger logger.finalize(sim); 3.5 Further informationThe whole-cell model is described extensively in Data S1 of Karr et al., 2012. The following references provide additional information about the whole-cell model: Read the model user guide Browse the model documentations doxygen m2html View a table listing the model's parameters View a table listing the metabolic reactions View the flux-balance analysis (FBA) metabolic model in several linear programming formats which can be evaluated with programs such as the free lpsolve Cplex .lpt Gurobi .lp Lindo .ltx Lindo .lp LPFML .xml MathProg .mod SBML .sbml Xpress .lpx ZIMPL .zpl View a table listing the model's states including their types and sizes View a table listing the row and column labels of model's states Browse or download the M. genitalium whole-cell knowledge base, WholeCellKB Browse the model frequently asked questions Browse the challenge forum Still have a question? Please post questions to the challenge organizers and other participants via the forum. 4 Questions? Comments?First, please browse the forum. Still have a question? Please post questions to the challenge organizers and other participants via the forum. 5 CreditsThe challenge was conceived by Markus Covert, Jonathan Karr, Pablo Meyer, and Gustavo Stolovitzky. Christian Basile, Po-Ru Loh, and Alejandro Villaverde provided valuable feedback on the challenge design. Jonathan Karr modified the model and provided the simulated data for the challenge. Christian Basile, Jonathan Karr, Kahn Rhrissorrakrai and Pablo Meyer tested the model. Brandon Allgood, Jonathan Karr, Mike Kellen, Pablo Meyer, Simon Wilkinson, and Jessen Yu implemented the computational infrastructure provided to the participants. Brian Bot, Jonathan Karr, and Pablo Meyer developed the credit system. Jonathan Karr and Pablo Meyer developed the scoring methodology and the leader board and curated the challenge. The computational infrastructure was provided free of charge to participants by Numerate and Sage Bionetworks. The organizers thank the following individuals for their help organizing the competition: Brandon Allgood, Numerate Christian Basile, Urban Green Energy Brian Bot, Sage Bionetworks Deepak Chandran, Autodesk Thomas Cokelaer, EMBL-EBI Markus Covert, Stanford University Bruce Hoff, Sage Bionetworks Jay Hodgson, Sage Bionetworks Jonathan Karr, Stanford University Mike Kellen, Sage Bionetworks Po-Ru Loh, MIT Thea Norman, Sage Bionetworks Kahn Rhrissorrakrai, IBM Pablo Meyer, IBM Julio Saez-Rodriguez, EMBL-EBI Gustavo Stolovitzky, IBM Alejandro Villaverde, Consejo Superior de Investigaciones Cientificas Simon Wilkinson, Numerate Jessen Yu, Numerate 6 References Karr et al. (2012) A Whole-cell computational model predicts phenotype from genotype. Cell, 150, 389?401. PubMed Karr et al. (2013) WholeCellKB: model organism databases for comprehensive whole-cell models. Nucleic Acids Res, 41, D787?92. PubMed Karr et al. (2015) Summary of the DREAM8 Parameter Estimation Challenge: Toward Parameter Identification for Whole-Cell Models. PLoS Comput Biol, 11, e1004096. PLoS 7 Links & downloads Challenge Webinar slides and video Sign up page .m file containing MATLAB code in the above challenge description Initial "experimental" data "Experimental" data "purchase" form Solution submission instructions (See Section 2.6) Write-up submission page Leader board Forum Whole-cell model Description Metabolic reactions table FBA Metabolic model in linear programming formats Cplex .lpt Gurobi .lp Lindo .ltx Lindo .lp LPFML .xml MathProg .mod SBML .sbml Xpress .lpx ZIMPL .zpl Table of all model parameters State properties table State property row and column IDs table Code User guide Documentation: doxygen, m2html Whole-cell virtual machine and instructions 2012b MCR Linux binaries WholeCellKB: M. genitalium knowledge base WholeCellViz: Visualization software Frequently asked questions Software BitMill cloud computing service Cygwin Git: Mac, Linux, Windows Linux Mint lpsolve MATLAB documentation MATLAB Component Runtime (MCR) s3cmd VirtualBox

AGREE TO DATA ACCESS RESTRICTIONS
Challenge forum
Challenge frequently asked questions
MatlabCodeFromChallengeDescription.m
Model code
Modified parameters.xlsx
Perturbation data (unrestricted access)
Resources
Software
Submissions
Unknown parameters.xlsx
gold-standard.predictions.mat
gold.parameters.mat
participant data
participant data (UPDATED)