A multilevel pan-cancer map links gene mutations to cancer hallmarks
© Knijnenburg et al. 2015
Received: 27 May 2015
Accepted: 7 July 2015
Published: 14 September 2015
A central challenge in cancer research is to create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This study was undertaken to construct such a model from functional annotations and explore its use when integrated with large-scale cancer genomics data.
We created a map that connects genes to cancer hallmarks via signaling pathways. We projected gene mutation and focal copy number data from various cancer types onto this map. We performed statistical analyses to uncover mutually exclusive and co-occurring oncogenic aberrations within this topology.
Our analysis showed that although the genetic fingerprint of tumor types could be very different, there were less variations at the level of hallmarks, consistent with the idea that different genetic alterations have similar functional outcomes. Additionally, we showed how the multilevel map could help to clarify the role of infrequently mutated genes, and we demonstrated that mutually exclusive gene mutations were more prevalent in pathways, whereas many co-occurring gene mutations were associated with hallmark characteristics.
Overlaying this map with gene mutation and focal copy number data from various cancer types makes it possible to investigate the similarities and differences between tumor samples systematically at the levels of not only genes but also pathways and hallmarks.
A central challenge in cancer research is to create models that bridge the gap between the molecular level on which interventions can be designed and the cellular and tissue levels on which the disease phenotypes are manifested. This is a daunting task. Cancer genomics research in the last decade has revealed the enormous complexity of this disease. Essential to the cancer phenotype and to its understanding are interactions between genes, between signaling pathways, and between cells. The latter interaction is exemplified by the important role of tumor heterogeneity [1, 2] and the relationship between the tumor and its environment [3, 4].
The complexity of cancer is reflected by the notion that cancer should not be considered as one disease but as a set of many diseases. In addition to traditional characteristics, including body location and morphology, cancers are distinguished by differences in their (epi)genomic signatures, gene and protein expression levels, and hyperactivated or deactivated pathways. Importantly, these differences at the molecular level are expected to enable personalized treatment strategies [5–7].
However, all cancers share the same set of deregulated biological processes, termed the hallmarks of cancer [8, 9]. How can we understand that tumors that are very different at the molecular level are similar when observed at a higher level of functional abstraction? More importantly, can this mapping that integrates the molecular characteristics and the disease phenotype lead to new hypotheses about biological mechanisms and therapy?
We have attempted to address these questions by creating a map that connects genes via pathways to hallmarks. By projecting gene mutation data from various cancer types on this map, we investigated the similarities and differences between these cancer types at the levels of not only genes but also pathways and hallmarks.
We considered mutually exclusive (ME) and co-occurring (CO) genes in the context of the multilevel map. In several studies, it has been observed that gene mutations that affect a pathway tend to be altered in an ME pattern . The rationale behind that observation is that once a gene involved in a pathway is mutated, a second mutation affecting that pathway does not confer a further selective advantage to the cancer cell. The large number of pathways in the multilevel map allowed us to systematically test whether there are indeed many ME mutations in the pathways. Interestingly, ME associations are typically expected within a pathway and not across pathways . This begs the question of whether there are ME associations between pairs of genes that are not part of the same pathway but link to the same hallmark, or whether there are many CO associations at the level of hallmarks. A CO association, which is on the other end of the spectrum from an ME association, means that genes are frequently found mutated together across cancer samples. The deregulation of distinct biological functions by these CO mutations may be necessary to acquire certain hallmark characteristics. Finally, we employed the map to assess whether genes that are not significantly frequently mutated (SFM) in a cancer type, but are mutated in a small number of samples, have a role in enabling cancer hallmark characteristics. Recent cancer genome studies have clearly demonstrated the extensive mutational heterogeneity in cancers ; relatively few genes are SFM (and can be detected as such by statistical approaches), whereas most genes are mutated in a small number of samples. The functional role of these infrequently mutated genes is unclear. Here, we employed the multilevel map to elucidate the functional role of these genes.
Mapping from hallmarks to Gene ontology (GO) processes
No. of genes
No. of pathways
Linked GO processes and function
Sustaining proliferative signaling
GO:0008283, cell proliferation
GO:0016049, cell growth
GO:0007049, cell cycle
GO:0051301, cell division
GO:0008284, positive regulation of cell proliferation
GO:0030307, positive regulation of cell growth
GO:0045787, positive regulation of cell cycle
GO:0051781, positive regulation of cell division
Evading growth suppressors
GO:0009968, negative regulation of signal transduction
GO:0008285, negative regulation of cell proliferation
GO:0030308, negative regulation of cell growth
GO:0045786, negative regulation of cell cycle
GO:0051782, negative regulation of cell division
Resisting cell death
GO:0012501, programmed cell death
GO:0043067, regulation of programmed cell death
GO:0090398, cellular senescence
GO:0032200, telomere organization
GO:0000723, telomere maintenance
GO:0032204, regulation of telomere maintenance
GO:0001302, replicative cell aging
GO:1900062, regulation of replicative cell aging
GO:2000772, regulation of cellular senescence
GO:0045765, regulation of angiogenesis
GO:2001212, regulation of vasculogenesis
GO:0008015, blood circulation
Tissue invasion and metastasis
GO:0007155, cell adhesion
GO:0001837, epithelial-to-mesenchymal transition
GO:0016477, cell migration
GO:0030155, regulation of cell adhesion
GO:0030030, cell projection organization
GO:0030036, actin cytoskeleton organization
GO:0030030, cell projection organization
GO:0034330, cell junction organization
GO:0007163, establishment or maintenance of cell polarity
GO:0006281, DNA repair
GO:0031570, DNA integrity checkpoint
GO:0045005, maintenance of fidelity involved in DNA-dependent DNA replication
GO:0006282, regulation of DNA repair
GO:0006954, inflammatory response
GO:0002367, cytokine production involved in immune response
GO:0002718, regulation of cytokine production involved in immune response
GO:0042060, wound healing
GO:0061041, regulation of wound healing
GO:0050727, regulation of inflammatory response
GO:0042533, tumor necrosis factor biosynthetic process
Reprogramming energy metabolism
GO:0006006, glucose metabolic process
GO:0046323, glucose import
GO:0071456, cellular response to hypoxia
Evading immune destruction
GO:0006955, immune response
GO:0002418, immune response to tumor cells
GO:0002837, regulation of immune response to tumor cells
GO:0020012, evasion or tolerance of host immune response
Ten tumor types and their abbreviations
No. of samples
Bladder urothelial carcinoma
Breast invasive carcinoma
Head and neck squamous cell carcinoma
Kidney renal clear cell carcinoma
Lung squamous cell carcinoma
Ovarian serous cystadenocarcinoma
Uterine corpus endometrioid carcinoma
Projecting mutation data on the multilevel map
The gene mutation data of the 2740 TCGA tumor samples were projected onto the map. We followed a straightforward strategy to propagate these mutation calls from the level of genes to the levels of pathways and hallmarks. If a sample had a mutation in at least one gene within a pathway, the mutational investment (MI) score of the sample in that pathway was set to 1; otherwise, it was set to 0. In other words, we implemented a logical OR function when going from genes to pathways, where at least one of the inputs (mutation calls for genes in the pathway) should be 1 to get an output of 1 (pathway MI). Similarly, if a sample had a mutation in at least one gene that links to a hallmark, the MI score of the sample in that hallmark was set to 1, and otherwise, it was set to 0. For each tumor sample, MI scores are thus binary calls at the levels of pathways and hallmarks, and they indicate the potential deregulation of the pathway and the potential enabling of the cancer hallmark, respectively. See “Random map rewiring” in the Additional file 1 section for details.
Mutual exclusivity and co-occurrence analysis
We employed a statistical analysis to detect ME and CO associations of pairs of genes across all cancer types. Within each cancer type, we determined the number of samples that have binary mutation calls for both members of a pair of genes. This “overlap” was assessed for ME associations, i.e., an overlap smaller than expected by chance, and for CO associations, i.e., an overlap larger than expected by chance. Gene pairs were grouped into three categories: (1) pairs of genes that were part of the same pathway for at least one of the pathways in the multilevel map, termed “pathway pairs;” (2) pairs of genes that were not part of the same pathway, yet impinged on the same hallmark for at least one of the hallmarks, termed “hallmark pairs;” and (3) pairs of genes that were neither part of the same pathway nor impinged on the same hallmark, termed “control pairs.” We tested all pairs of genes in which both genes had at least 25 mutations and either were part of the same pathway or were linked to the same hallmark. These analyses were performed for each cancer type separately. We used BiRewire  to create the appropriate null distribution for these tests. Specifically, for the binary mutation matrix of each cancer type, 10,000 permuted matrices were created. The observed overlap of mutated samples for a pair of genes in the original binary mutation matrix was compared with the overlap values derived from the 10,000 permuted matrices. Enhanced P value Estimation for Permutation Test (EPEPT) [18, 19] was used to compute P values for these permutation tests. Associations were called significant when P ≤ 1/n, where n was the total number of tests. If n was smaller than 20, the P value threshold was set to 0.05. This Bonferroni correction for multiple testing results in a per-family error rate of 1. ME and CO associations were tested separately. We did not test for CO for pairs of genes from the same chromosome to avoid spurious associations due to arm level copy number gain or loss.
The overall amount of detectable ME and CO associations was measured by the tail strength (TS) statistic . TS was determined from the list of P values obtained from the permutation tests for ME and CO associations in pathway, hallmark, and control pairs. We assumed these P values to be independently distributed such that the variance of TS can simply be estimated by 1 divided by the number of P values. The difference between two TSs follows a normal distribution, the mean of which can be estimated by the difference between the two TSs, and the standard deviation (SD) of which can be estimated by the sum of the two SD estimates. We took P values from the normal cumulative distribution function with this mean and SD to test for the difference in TSs between two groups. Groups were called significantly different when P ≤ 0.01.
Analyses were performed in MATLAB (MathWorks, Natick, MA, USA) and Python (open source).
Results and discussion
Mapping genes to pathways and to hallmarks
Some well-known cancer genes are hubs in the map, i.e., they appear in many pathways and influence multiple hallmarks. For example, phosphatidylinositol-4,5-biphosphate 3-kinase, catalytic subunit alpha (PIK3CA) appears in 39 pathways and signals to 8 hallmarks, and tumor protein p53 (TP53) is found in 9 pathways and links to 4 hallmarks. However, the majority of genes are part of 1 or 2 pathways and link to 1 or 2 hallmarks. See Additional file 2: Figure S1, Additional file 3: Figure S2, Additional file 4: Figure S3, Additional file 5: Figure S4, Additional file 6: Figure S5 for a detailed graphical overview of the connectivity in this map.
The multilevel mutational landscape
We compared the CoVs derived from this map with those derived from 1000 randomly rewired multilevel maps (see “Randomly rewired map” in the Additional file 1 section). For these randomly rewired maps, genes were connected to randomly selected pathways, and pathways were connected to randomly selected hallmarks. However, the in-degree and out-degree of the two bipartite graphs in this map, i.e., from genes to pathways and from pathways to hallmarks, were maintained. We observed substantially higher CoVs for several pathways for the actual map compared with the randomized map (Additional file 7: Figure S6). This indicates that cancer types are characterized by different MI scores at the levels of not only genes but also pathways. Although the CoVs were small at the level of hallmarks, we found that for some hallmarks they were slightly yet consistently larger than those for randomly rewired maps, hinting that cancer types might have different MIs in hallmarks.
In conclusion, the multilevel map populated with mutation data establishes the already intuitive answer to our question of how genetically different tumors can share the same hallmark characteristics; mutations in different genes impinge on the same or functionally related pathways and ultimately deregulate the same biological processes.
Mutual exclusivity and co-occurrence of gene mutations that enable hallmarks
Our strategy of propagating the binary mutation calls to the levels of pathways and hallmarks is based on the concept of “mutual exclusivity.” We investigated ME and CO genes in the context of the multilevel map.
This pattern was similar across most cancer types except for ovarian serous cystadenocarcinoma (OV), which was dominated by CO associations at the hallmark level (Fig. 3b). An overview of all significant associations is shown in Additional file 8: Table S1.
To investigate CO and ME associations further in pathways and hallmarks, we employed a complementary statistical analysis. Instead of looking only at the significant associations, we analyzed the complete distribution of P values using the TS statistic . The TS and its confidence interval (CI) were obtained for the P values derived from the ME and CO tests for pathway, hallmark, and control pairs, separately (Fig. 3c). A high TS indicates that there are more small P values than is expected by chance. Interestingly, for hallmark and control pairs, the TS was significantly larger for CO associations than for ME associations. Thus, although there were more significant individual ME associations (Fig. 3a), the overall distribution of P values was skewed towards low P values for CO associations (Fig. 3c). Other observations of the TS were in line with the previous analysis. Specifically, the TS for ME associations was the highest in pathway pairs, followed by hallmark pairs, for which the TS was significantly lower. Control pairs had the lowest TS. For CO associations, in contrast, there was no significant difference in the TS between pathway and hallmark pairs. This provides further evidence for the hypothesis that ME is more prevalent in pathway pairs than in hallmark pairs.
The role of infrequently mutated genes in pathway deregulation and hallmark activation
We employed the multilevel map to elucidate the role of genes that are not SFM in a cancer type.
First, we observed that many tumor samples (approximately 20% on average across all cancer types and hallmarks), which lacked mutations in SFM genes that were linked to hallmarks, had mutations in other genes that were linked to hallmarks and could potentially play a role in enabling them (Additional file 9: Figure S7).
We have created a map that connects genes to cancer hallmarks via pathways. We projected gene mutation and focal copy number alteration data from various cancer types onto this map. This allowed us to show that ME gene mutations are more prevalent in pathways compared with hallmarks, and conversely, that CO gene mutations are relatively important to enable hallmark characteristics. In addition, we demonstrated how the multilevel map can help to clarify the role of infrequently mutated genes.
However, making relevant predictions of how molecular events affect cellular and tissue phenotypes will involve computational multilevel models that are much richer in biological knowledge and data than the model presented here. We have employed a straightforward strategy to project the cancer gene mutation and copy number data onto the multilevel map. Future approaches will have to balance the added benefit of integrating additional knowledge and data with the added model complexity. We foresee several approaches that would yield more functional insight from the multilevel map when integrated with molecular data. First, information about genes can be formalized within the map. For example, many cancer genes can be classified as either tumor suppressor genes, which are inactivated by mutations or deletions, or oncogenes, which are activated by mutations or amplifications. These distinct roles will help to predict the downstream consequences of genomic aberrations. In addition, the position of a mutation in a gene potentially has an effect on its functional consequences , which could be taken into account. Second, currently unused information in the PID about the interactions between proteins (and other biomolecules), including positive and negative regulation, protein–protein binding, and other interactions, would enable the inference of regulatory relationships instead of simple statistical associations. Third, integration of additional molecular data, including gene and protein expression as well as epigenetic modifications, would make the information flow across the map more physiologically relevant. There are already some approaches that integrate different data types and interactions in a systematic and quantitative way [22, 23]. However, none of these approaches explicitly incorporate hallmarks into their framework.
In its current form, there are several important considerations about the multilevel map. First, the map is incomplete in terms of genes. A few frequently mutated genes, such as AT rich interactive domain 1A (ARID1A) and mucin 16 (MUC16), are not part of the map, simply because they are not part of PID. The use of pathway databases not only prevents the discovery of novel, relevant genes, but also limits the use of more recently discovered, and thus less studied, cancer genes. Second, certain cancer hallmarks are poorly characterized as evidenced by the very small number of genes and pathways that could be linked to these hallmarks. This seeming lack of annotation might be salvaged by updated pathway information. In addition, the large number of GO categories in the PID that could not be associated with cancer hallmarks could be revisited specifically with these underrepresented hallmarks in mind. Third, the map lacks certain well-established relationships. Specifically, the automated procedure was not able to map some well-known cancer genes, including cyclin-dependent kinase inhibitor 2A (CDKN2A), F-box and WD repeat domain containing 7 (FBXW7), and E2F transcription factor 3 (E2F3), to a hallmark. Although common knowledge would link the cell cycle regulator CDKN2A to sustained proliferative signaling and perhaps other hallmarks, these relations were not present as such in the PID and the GO.
Despite these shortcomings, we conjecture that multilevel maps, such as the one presented here, will help in the interpretation of large cancer genomic data sets. The use of three functional levels, i.e., genes, pathways, and hallmarks, facilitates an intuitive understanding. The pathways and hallmarks can be seen as conceptual tools that represent the functional levels crucial for an intelligible mapping from genes to the phenotype. The information in the map is easily assimilated with the researcher’s domain knowledge empowering the formulation of novel hypotheses and experiments.
TAK conceived of the study, participated in creating the multilevel map, integrated the mutation data with the multilevel map, performed the mutual exclusivity and co-occurrence analysis, and drafted the manuscript. TB contributed to study design and conception, participated in creating the multilevel map, wrote the code to parse the topology, performed the tail strength analysis, and helped to draft the manuscript. LFAW contributed to study design and conception, participated in creating the multilevel map, provided supervision, and helped to draft the manuscript. IS contributed to study design and conception, provided supervision, and helped to draft the manuscript. All authors read and approved the final manuscript.
The authors wish to thank Sander Canisius for fruitful discussions and suggestions. This work was supported in part by the National Cancer Institute (U24CA143835 to IS and TAK) and by the Netherlands Organization for Scientific Research—The Cancer System Biology Center (to LFAW and TB).
Compliance with ethical guidelines
Competing interests The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta. 2010;1805(1):105–17.PubMed CentralPubMedGoogle Scholar
- Longo DL. Tumor heterogeneity and personalized medicine. N Engl J Med. 2012;366(10):956–7.View ArticlePubMedGoogle Scholar
- Hanahan D, Coussens LM. Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell. 2012;21(3):309–22.View ArticlePubMedGoogle Scholar
- Kessenbrock K, Plaks V, Werb Z. Matrix metalloproteinases: regulators of the tumor microenvironment. Cell. 2010;141(1):52–67.PubMed CentralView ArticlePubMedGoogle Scholar
- van’t Veer LJ. Bernards R. Enabling personalized cancer medicine through analysis of gene-expression patterns. Nature. 2008;452(7187):564–70.View ArticlePubMedGoogle Scholar
- Hamburg MA, Collins FS. The path to personalized medicine. N Engl J Med. 2010;363(4):301–4.View ArticlePubMedGoogle Scholar
- Hood L, Friend SH. Predictive, personalized, preventive, participatory (P4) cancer medicine. Nat Rev Clin Oncol. 2011;8(3):184–7.View ArticlePubMedGoogle Scholar
- Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100(1):57–70.View ArticlePubMedGoogle Scholar
- Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–74.View ArticlePubMedGoogle Scholar
- Leiserson MD, Vandin F, Wu HT, Dobson JR, Eldridge JV, Thomas JL, et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet. 2015;47(2):106–14.PubMed CentralView ArticlePubMedGoogle Scholar
- Vandin F, Upfal E, Raphael BJ. De novo discovery of mutated driver pathways in cancer. Genome Res. 2012;22(2):375–85.PubMed CentralView ArticlePubMedGoogle Scholar
- Kandoth C, McLellan MD, Vandin F, Ye K, Niu B, Lu C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502(7471):333–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, et al. PID: the Pathway Interaction Database. Nucleic Acids Res. 2009;37(Database issue):D674–9. doi:10.1093/nar/gkn653.PubMed CentralView ArticlePubMedGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–9.PubMed CentralView ArticlePubMedGoogle Scholar
- Plaisier CL, Pan M, Baliga NS. A miRNA-regulatory network explains how dysregulated miRNAs perturb oncogenic processes across diverse cancers. Genome Res. 2012;22(11):2302–14.PubMed CentralView ArticlePubMedGoogle Scholar
- Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA, Ellrott K, et al. The cancer genome atlas pan-cancer analysis project. Nat Genet. 2013;45(10):1113–20.PubMed CentralView ArticlePubMedGoogle Scholar
- Gobbi A, Iorio F, Dawson KJ, Wedge DC, Tamborero D, Alexandrov LB, et al. Fast randomization of large genomic datasets while preserving alteration counts. Bioinformatics. 2014;30(17):i617–23.PubMed CentralView ArticlePubMedGoogle Scholar
- Knijnenburg TA, Wessels LF, Reinders MJ, Shmulevich I. Fewer permutations, more accurate P-values. Bioinformatics. 2009;25(12):i161–8.PubMed CentralView ArticlePubMedGoogle Scholar
- Knijnenburg T, Lin J, Rovira H, Boyle J, Shmulevich I. EPEPT: a web service for enhanced P-value estimation in permutation tests. BMC Bioinform. 2011;12(1):411.View ArticleGoogle Scholar
- Taylor J, Tibshirani R. A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics. 2006;7(2):167–81.View ArticlePubMedGoogle Scholar
- Lubinski J, Phelan CM, Ghadirian P, Lynch HT, Garber J, Weber B, et al. Cancer variation associated with the position of the mutation in the BRCA2 gene. Fam Cancer. 2004;3(1):1–10.View ArticlePubMedGoogle Scholar
- Ylipää A, Yli-Harja O, Zhang W, Nykter M. Characterization of aberrant pathways across human cancers. BMC Syst Biol. 2013;7(Suppl 1):S1.PubMed CentralPubMedGoogle Scholar
- Vaske CJ, Benz SC, Sanborn JZ, Earl D, Szeto C, Zhu J, et al. Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM. Bioinformatics. 2010;26(12):i237–45.PubMed CentralView ArticlePubMedGoogle Scholar