ICGC-TCGA DREAM Mutation Calling challenge

Created By Kyle Ellrott kellrott
The ICGC-TCGA DREAM Genomic Mutation Calling Challenge (herein, The Challenge) is an international effort to improve standard methods for identifying cancer-associated mutations and rearrangements in whole-genome sequencing (WGS) data. Leaders of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) cancer genomics projects are joining with Sage Bionetworks and IBM-DREAM to initiate this innovative open crowd-sourced Challenge [1-3]. The goal of this somatic mutation calling (SMC) Challenge is to identify the most accurate mutation detection algorithms, and establish the state-of-the-art. The algorithms in this Challenge must use as input WGS data from tumour and normal samples and output mutation calls associated with cancer. Registration is open! Should you have any questions related to the Challenge, please visit our ICGC-TCGA-DREAM SMC Challenge Community Forum. You may review the DREAM8.5 Challenge rules here: syn2295117 BackgroundCancer is a disease of the genome [4], caused by disruptions in a person?s DNA that alter specific gene functions in a population of cells, and specifically, their growth. As the population of cancer cells grows, it is believed the genetic content of the population is further altered by DNA breakages. A metastatic cancer, one that has spread to other parts of the body away from its origin, has evolved from a single cell having a specific DNA mutation or a set of mutations. Understanding the origin and progression of cancer and its mechanisms is still at an early stage today. Mainly, the advancement of cancer research depends on our ability to read the DNA of cancer cells [5-7]. As genome sequencing technologies evolved, next-generation sequencing (NGS) instruments are now able to determine millions of pieces of DNA sequences, or reads, which now collectively span billions of genome single-letter locations. Today, DNA sequencers can produce terabytes of data in just a few hours. Therefore, while the crux of the problem has thus shifted from the biologist to the computer scientist, the picture that explains cancer genomes remains elusive in many ways. Shattering of chromosomes has been recently associated with cancer [8], complex chromosomal translocations [15] are being characterized around the world in cancer research labs in large cohorts of over 300 patients. Nevertheless, the ability to precisely localize a genomic breakage and resolve its association with cancer remains a challenge. In summary, the study of genomic alterations that drive cancer mutations has been accelerated at an unprecedented rate with the advent of next-generation sequencing and related projects around the world. A genomics revolution now aims to systematically characterize every somatic variation in every tumor by sampling large cohorts. While somatic variations can be focused point mutations that create single nucleotide variations (SNVs), they can also be mid-scale copy-number alterations (CNVs), and large-scale intra- or inter-chromosomal rearrangements, i.e., structural variations (SV) [5-7]. See our list of references below to learn more about this challenging and interesting field. MotivationBy relating particular genomic variations in a patient?s tumor to targetable genes, new drugs and treatments tailored to each patient will be developed; this the essence and purpose of personalized medicine. However, accurately identifying these variants and rearrangements using NGS data remains an open problem, as recent studies indicate that existing approaches overlap only about 20%. As the solution to the cancer genome now hides behind the analysis of terabytes of sequencing data, there is an urgent need for reliable data mining and classification methods that can bring NGS into routine clinical practice. Therefore, we believe this Challenge is an excellent setting for bringing researchers around the world to focus in this particular cancer research problem. In fact, we anticipate that the winning Challenge algorithms will become the standard off-the-shelf predictive approaches for the analysis of tens of thousands of cancer genomes sequenced over the next 5 years across many hospitals and bioinformatics labs worldwide. DataThe ICGC-TCGA DREAM Sequence Analysis Challenge will use real data: 10 Tumor/Normal matched genomes from prostate and pancreatic cancers, 5 from each cancer type. 5 simulated-sequencing tumors of increasing complexity will be released to provide easy "training" datasets, to help bring in participants from outside the field of cancer genomics. Each sample will reflect a treatment-naive primary tumor sequenced to ~50x coverage and a paired germ-line sequenced to ~30x coverage. See the Data Description page, for a thorough description of the data used in this Challenge. Data distribution through ICGC is governed by a set of procedures and principles designed to meet legal, ethical and regulatory standards for the sharing of human data. Access to raw data will be granted to ICGC DACO-approved participants only. All primary and validation data will be publicly available, even after the contest is completed, creating a gold-standard community resource. Results will be shared without restriction but raw data will remain under the restrictions and regulations of the ICGC-DACO. ChallengesThe ICGC-TCGA DREAM Sequence Analysis Challenge will be open to all and will aggregate predictive models and source code as a community resource. We believe that the best approach towards developing robust and accurate mutation predictions is to enable an open diverse community where data access is simple and people are incentivized to share. The main advantage of such open challenges lies in encouraging a diversity of analytical approaches from skilled analysts across scientific disciplines, to solve inherently difficult but important questions together. Intel-10 SNV Sub-ChallengeSingle Nucleotide Variants (SNVs) are alterations of a single base within the DNA code, and often cause sensitivity to specific drugs. A typical cancer may contain tens of thousands of SNVs. SNV detection is more reproducible, showing ~50-80% overlap in a set of published studies. ITM1-10 SV Sub-ChallengeStructural Variations (SVs) are duplications, deletions and rearrangements of medium-size to large segments (>100 bp) of the genome. These variations can include one or several breakpoints, at which an adjacency or junction is defined, explaining the breakages that a normal genome would have to experience to become a cancer one. Such genomic rearrangements are often described as being the primary cause of cancer. Over the past few decades, clinical cytogeneticists have been able to link specific chromosome breakpoints to clinically defined cancers, including subtypes of leukemias, lymphomas, and sarcomas. Breakpoint detection in cancer genomes is, anecdotally, exceedingly hard. Our (unpublished) pilot study shows ~30% overlap in predictions across multiple calling methods. Challenge Structure AssessmentA simultaneous comparison to state-of-the-art simulation approaches will take place. The Challenge will run an unbiased validation: predictions will be experimentally tested after all Challenge entries have been submitted. Validation will be performed by the Boutros Lab at the Ontario Institute for Cancer Research (OICR), which is not entering predictive models into the Challenge. After the Challenge closes, at least 5000 DNA candidate somatic mutations will be selected for validation by the Challenge organizers. Selection will be done using a public algorithm. Validation will be using an independent technology that will sequence the mutation and ~75 bp in either direction of it. All mutations be validated in all samples, allowing assessment of both false-negative and false-positive rates. The performance of the predictive algorithms from the participating Challenge teams will be ranked using the validation data: ranking will be based on sensitivity, specificity and balanced accuracy. A description of how participant algorithms and techniques will be compared can be found in our Algorithm's Performance page. IncentivesThe goal of this proposal is to identify the best mutation calling techniques. The main incentives are the following: Publications will be coordinated in collaboration with Nature Publishing Group. 7,500 in prize money has been contributed by two companies: 5,000 from Intel and $2,500 from Inova Translational Medicine Institute. The top-performing teams competing on the real tumor sub-challenge will split the winnings. Intel will contribute software engineering resources to develop an optimized, parallelized, professional implementation of the winning algorithm of the SNV sub-challenge, provided as an open source software package to the community. This optimization will enable the winning method to be applied retrospectively to all existing TCGA and ICGC data and we expect will facilitate its adoption as a widely used community standard for genomic analysis. Those methods identified as the best will be deployed for use in the ICGC/TCGA WG Pan-Cancer project that will commence next year. ICGC and TCGA have both recently announced they will jointly analyze over 2,000 whole genome (WG) datasets as part of the next Pan-Cancer effort with the aim of comprehensively elucidating the genomic changes present in many forms of cancers. Thus, algorithms selected by this DREAM competition enterprise will be positioned to help address the need in the coming year for the WG Pan-Cancer effort. This will provide the largest unified view of cancer genome variation to date. How to Participate in The Challenge Where Do I Start? Submit your application to enter the Challenge and gain access to the data here. You may review the Terms of Participation here: syn2295117. You may now register here! Download the Data Run your algorithms and get a list of genomic variant calls to submit. My Analysis is Finished, Now What? Parse your output to meet certain criteria (VCF v4.1) Submit your results (page opening TBA) Credits Paul C. Boutros, Ontario Institute for Cancer Research Lincoln D. Stein, Ontario Institute for Cancer Research Josh Stuart, University of California, Santa Cruz Gustavo Stolovitzky, IBM, DREAM Stephen Friend, Sage Bionetworks Adam Margolin, Sage Bionetworks Thea Norman, Sage Bionetworks References International Cancer Genome Consortium et al. International network of cancer genome projects. Nature 464, 993?998 (2010). http://icgc.org/ The Cancer Genome Atlas (TCGA). http://cancergenome.nih.gov/ Dialogue for Reverse Engineering Assessments and Methods (DREAM). http://www.the-dream-project.org/ Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719?724 (2009). Meyerson, M., Gabriel, S. & Getz, G. Advances in understanding cancer genomes through second-generation sequencing. Nature Rev. Genet. 11, 685?696 (2010). Alkan, C. et al., Genome structural variation discovery and genotyping, Nature Rev. Genetics (2011). Medvedev, P. et al., Computational methods for discovering variation with next-generation sequencing, Nature Methods (2009). Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27?40 (2011). Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330?337 (2012). Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61?70 (2012). Cancer Genome Atlas Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061?1068 (2008). Cancer Genome Atlas Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609?615 (2011). Cancer Genome Atlas Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519?525 (2012). Cancer Genome Atlas Network. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N. Engl. J. Med. 368, 2059?2074 (2013). Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666?677 (2013). Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646?674 (2011). Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214?218 (2013). Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 456, 66?72 (2008). Korbel, J.O. et al., Paired-end mapping reveals extensive structural variation in human genome, Science, 420?426 (2007). Tuzun, E. et al., Fine-scale structural variation of the human genome, Nat. Genet. 37, 727?732 (2005). About DREAM ChallengesSage Bionetworks and DREAM are convinced that running open computational Challenges focused on important unsolved questions in systems biomedicine can help advance basic and translational science. By presenting the research community with well-formulated questions that usually involve complex data, we effectively enable the sharing and improvement of predictive models, accelerating many-fold the analysis of such data. The ultimate goal, beyond the competitive aspect of these Challenges, is to foster collaborations of like-minded researchers that together will find the solution for vexing problems that matter most to citizens and patients. About SynapseSynapse is an open computational platform designed to facilitate new ways for data scientists to work with data and with each other. Synapse reinforces the power of DREAM Challenges to catalyze a diverse community of researchers to nucleate around a particular scientific question. Synapse?s engaging features such as real-time leaderboards, code-sharing, and provenance tracking incentivize continuous participation in DREAM Challenges. Participants can accelerate scientific progress by generating, sharing, and evolving thousands of predictive models in real time that would have otherwise taken years to produce.

syn2177211
syn2280639
syn2335184
syn2341962
syn2364739
syn2460585
syn2546909
syn2669651
syn2294475
syn2294674
syn2294675
syn2294676
syn2294677
syn2294678
syn2294679
syn2294680
syn2294681
syn2294682
syn2294683
syn2294684
syn2294685
syn2294686
syn2294687
syn2294688
syn2294689
syn2294690
syn2294691
syn2294692
syn2294693
syn2294694
syn2294695
syn2294696
syn2294697
syn2294698
syn2294699
syn2294700
syn2294701
syn2294702
syn2294703
syn2294704
syn2294705
syn2294706
syn2294707
syn2294708
syn2294709
syn2294710
syn2294711
syn2295119
syn2295120
syn2295121
syn2295122
syn2295123
syn2295124
syn2295125
syn2295126
syn2295127
syn2295128
syn2295129
syn2295130
syn2295131
syn2295132
syn2295133
syn2295134
syn2295147
syn2295148
syn2295149
syn2295150
syn2295151
syn2295152
syn2295153
syn2295154
syn2295155
syn2295156
syn2295157
syn2295158
syn2295159
syn2295160
syn2295161
syn2295162
syn2295163
syn2295164
syn2295165
syn2295166
syn2295167
syn2295168
syn2295169
syn2295170
syn2295171
syn2295172
syn2295173
syn2295174
syn2295175
syn2295176
syn2295189
syn2295190
syn2295191
syn2295192
syn2295193
syn2295194
syn2295195
syn2295196
syn2295197
syn2295198
syn2295199
syn2296297
syn2296298
syn2296299
syn2296300
syn2296301
syn2296302
syn2296303
syn2296304
syn2296305
syn2296691
syn2296692
syn2296693
syn2296694
syn2296695
syn2296696
syn2296697
syn2313952
syn2313953
syn2313954
syn2313955
syn2313956
syn2313957
syn2313958
syn2313959
syn2313960
syn2313961
syn2313962
syn2313963
syn2313964
syn2313965
syn2313966
syn2313967
syn2313968
syn2313969
syn2313970
syn2313971
syn2313972
syn2313973
syn2313974
syn2313975
syn2313976
syn2313977
syn2313978
syn2313979
syn2313980
syn2313981
syn2313982
syn2313983
syn2313984
syn2313985
syn2313986
syn2313987
syn2313988
syn2313989
syn2313990
syn2313991
syn2313992
syn2313993
syn2313994
syn2313995
syn2319083
syn2319088
syn2335185
syn2335186
syn2335187
syn2343127
syn2343128
syn2354306
syn2382113
syn2382114
syn2382115
syn2382116
syn2382117
syn2382118
syn2382119
syn2382120
syn2382121
syn2382122
syn2382123
syn2382124
syn2382125
syn2382126
syn2382127
syn2382128
syn2382129
syn2382130
syn2382131
syn2382132
syn2382133
syn2382134
syn2382135
syn2382136
syn2382137
syn2382138
syn2382139
syn2382140
syn2382141
syn2382142
syn2382143
syn2382144
syn2382145
syn2382146
syn2382147
syn2382148
syn2382149
syn2382150
syn2382151
syn2382152
syn2382153
syn2382154
syn2382155
syn2382156
syn2382157
syn2382158
syn2382159
syn2382160
syn2382161
syn2382162
syn2382163
syn2382164
syn2382165
syn2382166
syn2393489
syn2393493
syn2399133
syn2399134
syn2399135
syn2399136
syn2399137
syn2399138
syn2399139
syn2399140
syn2399141
syn2399142
syn2399143
syn2399144
syn2399145
syn2399146
syn2399147
syn2399148
syn2399149
syn2399150
syn2399151
syn2399152
syn2399153
syn2399154
syn2399157
syn2399158
syn2399160
syn2399161
syn2399162
syn2399163
syn2399165
syn2399167
syn2399168
syn2399170
syn2399172
syn2399174
syn2399175
syn2399177
syn2399179
syn2399181
syn2399184
syn2399185
syn2399186
syn2399188
syn2399190
syn2399191
syn2399192
syn2399193
syn2399194
syn2399195
syn2399198
syn2399201
syn2399203
syn2399205
syn2399206
syn2399207
syn2399208
syn2399209
syn2399210
syn2399211
syn2399212
syn2399213
syn2399214
syn2399215
syn2399216
syn2399217
syn2399218
syn2399219
syn2399220
syn2399221
syn2399222
syn2399223
syn2399224
syn2399225
syn2399226
syn2399227
syn2399228
syn2399229
syn2399230
syn2399231
syn2399232
syn2399233
syn2399234
syn2399235
syn2399236
syn2399237
syn2399238
syn2399239
syn2399240
syn2399241
syn2399242
syn2399243
syn2399244
syn2399245
syn2399246
syn2399247
syn2399248
syn2399249
syn2399250
syn2399251
syn2399252
syn2399253
syn2399254
syn2399255
syn2399256
syn2399257
syn2399258
syn2399259
syn2399260
syn2399261
syn2399262
syn2399263
syn2399264
syn2399265
syn2399266
syn2399267
syn2399268
syn2399269
syn2399270
syn2399271
syn2399272
syn2399273
syn2399274
syn2399275
syn2399276
syn2399277
syn2399278
syn2399279
syn2399280
syn2399281
syn2399282
syn2399283
syn2399959
syn2443974
syn2443977
syn2443978
syn2443979
syn2443980
syn2443981
syn2475823
syn2475824
syn2475825
syn2475826
syn2475827
syn2475828
syn2475829
syn2475830
syn2475831
syn2475832
syn2475833
syn2485207
syn2495934
syn2495935
syn2546926
syn2546927
syn2595890
syn2669653
syn2669654
syn2701416
syn2701417
syn2701418
syn2701419
syn2701420
syn2701421
syn2701422
syn2701423