Prognostic gene network modules in breast cancer hold promise.

A substantial proportion of lymph node-negative patients who receive adjuvant chemotherapy do not derive any benefit from this aggressive and potentially toxic treatment. However, standard histopathological indices cannot reliably detect patients at low risk of relapse or distant metastasis. In the past few years several prognostic gene expression signatures have been developed and shown to potentially outperform histopathological factors in identifying low-risk patients in specific breast cancer subgroups with predictive values of around 90%, and therefore hold promise for clinical application. We envisage that further improvements and insights may come from integrative expression pathway analyses that dissect prognostic signatures into modules related to cancer hallmarks.


Background
An outstanding problem in the clinical management of breast cancer is overtreatment. It is estimated that approxi mately 55 to 75% of breast cancer patients who receive adjuvant chemotherapy would do equally well without it [1], but identifying this low-risk population with a high enough predictive value (≥90%) is not possible using standard prognostic factors such as lymph node status or tumour size. Several recently developed gene expression classifi ers have shown promise of achieving the required predictive values.
One such classifer is Oncotype DX, a prognostic test based on the expression levels of 21 genes, which has been shown to identify low-risk patients with an accuracy of at least 90%, but is restricted to lymph node-negative oestrogen receptor-positive (ER+) breast cancer [2]. Another classifi er is the 7 gene immune response (IR) module, which allows identifi cation of low-risk patients in oestrogen receptor-negative (ER-) breast cancer [3]. Both of these signatures appear to be robust, demonstrating a high predictive value across many diff erent breast cancer cohorts [2,3]. Gene Ontology (GO) analyses of prognostic signatures [2][3][4][5][6] have shown that specifi c biological processes play particularly important roles and that this is subgroup-specifi c. Th us, while cellproliferation is strongly prognostic in ER+ breast cancer [6], the clinical heterogeneity of ER-breast cancers appears to be explained mainly by diff erential expression of genes related to immune response pathways, highlighting the need to conduct survival analysis within specifi c breast cancer subgroups [7][8][9][10].

The article
In line with this, Li and colleagues [11] have recently conducted a novel bioinformatic analysis of existing breast cancer expression data sets in order to identify gene expression modules that may predict patients at low risk of distant metastasis in specifi c breast cancer subgroups. A common diffi culty in identifying robust prognostic gene signatures is the presence of noise and spurious signals, which often render the resulting gene signatures unstable to perturbations. To overcome these limitations, Li and colleagues used a novel multiple survival screening (MSS) algorithm, designed to extract the most robust signals from the data [11]. Using this algorithm, Li and colleagues report novel prognostic gene modules of high negative predictive values (87 to 100%) and that are related to cancer hallmarks, notably cellcycle, apoptosis and cell-adhesion [11]. Th e modules were derived using only one data set [12], and while individual modules were not prognostic across all eight validation sets examined, specifi c non-linear combi nations of these modules were. Th is suggests that classifi ers built from non-linear combinations of modules related to distinct cancer hallmarks may yield more powerful predictors than those based on individual modules or Abstract A substantial proportion of lymph node-negative patients who receive adjuvant chemotherapy do not derive any benefi t from this aggressive and potentially toxic treatment. However, standard histopathological indices cannot reliably detect patients at low risk of relapse or distant metastasis. In the past few years several prognostic gene expression signatures have been developed and shown to potentially outperform histopathological factors in identifying low-risk patients in specifi c breast cancer subgroups with predictive values of around 90%, and therefore hold promise for clinical application. We envisage that further improvements and insights may come from integrative expression pathway analyses that dissect prognostic signatures into modules related to cancer hallmarks.
linear combinations thereof. Th e modules were shown to be specifi c for either ER+ or ER-breast cancer, further supporting the view that cancer subtype analyses are necessary [6,7].
Li and colleagues' work is also signifi cant for two other reasons. First, by dissecting prognostic gene lists into various cancer hallmarks they identifi ed prognostic modules (for example, apoptosis) that are normally only associated with the primary tumour and not with the subsequent risk of relapse or distant metastasis. A similar fi nding implicating apoptosis as a molecular determinant of distant metastasis was made in a diff erent study [13]. Second, genes making up the prognostic modules were mapped to a protein interaction network and many were shown to directly interact with genes known to be mutated in breast cancer (COSMIC) [14]. Interestingly, the prognostic power of the modules could be related to these driver genes, as other sets of genes with the same GO annotations and also direct neighbours of the drivers were observed to be as prognostic as the original modules. Th is may explain the redundancy in prognostic gene sets, which is often observed in gene expression studies.

The viewpoint
Th e insights provided by Li and colleagues are important, yet there are some cautionary remarks. First, a predictive value of 90% is only of clinical relevance if this can be achieved using a genuine single-sample classifi er. However, it would appear that the algorithm of Li and colleagues [11] is not a single-sample predictor, since the specifi c weights in the centroids are retrained in each of the validation sets. Th us, the validation presented is only of the actual gene sets, and not of a single classifi er. Th erefore, the predictive values quoted should be interpreted with caution. In contrast, the Oncotype DX and IR module classifi ers are each defi ned by a single centroid and therefore constitute (modulo a trivial genewise recentering and rescaling) bona-fi de single-sample predictors. In other words, the Oncotype and IR module centroids used to classify test samples are unchanged and unique to the training set. Th us, it remains to be seen if a single-sample classifi er derived with the MSS algorithm [11] could be applied in a clinical setting.
Another important limitation of this study is the sole reliance on one data set [12] to infer prognostic gene sets. It is very likely that specifi c prognostic modules may have been missed, specially for the minority ER-subgroup, for which one data set would typically not provide the necessary sample numbers and power [7]. Th is may explain why Li and colleagues do not fi nd a prognostic immune response module in ER-breast cancer, when in fact many other studies have reported such a signature [3,8,9].
Probably, the most important insight from Li and colleagues' work is the observation that non-linear combinations of prognostic modules related to diff erent cancer hallmark GO terms may signifi cantly improve the prognostic power of gene expression signatures. Th is makes sense because more often than not gene expression signatures are biased towards those genes and GO terms with the largest eff ect sizes, thereby diluting the smaller yet equally important predictive eff ects of other genes or GO terms. Alternatively, the interactions between cancer hallmarks are likely to be non-linear and therefore non-linear combinatorial classifi ers may be required to capture these eff ects. Th us, development of further algorithmic tools that incorporate such nonlinearities may be a fruitful endeavour.
Th e observation by Li and colleagues that underlying these robust prognostic modules may be a common set of mutated driver genes is an important one and supports similar observations made by at least two other groups [15,16]. We envisage that as the quality and coverage of protein interaction networks improves, and as we approach the completion of a fi nal catalogue of mutated breast cancer genes, methods that interpret gene expression, protein expression and protein phosphorylation data in the context of these structural signalling networks and breast cancer genes are likely to provide us with the biological insights needed to drive this fi eld forward.

Competing interests
The authors declare that they have no competing interests.