A multigene predictor of metastatic outcome in early stage hormone receptor-negative and triple-negative breast cancer

Introduction Various multigene predictors of breast cancer clinical outcome have been commercialized, but proved to be prognostic only for hormone receptor (HR) subsets overexpressing estrogen or progesterone receptors. Hormone receptor negative (HRneg) breast cancers, particularly those lacking HER2/ErbB2 overexpression and known as triple-negative (Tneg) cases, are heterogeneous and generally aggressive breast cancer subsets in need of prognostic subclassification, since most early stage HRneg and Tneg breast cancer patients are cured with conservative treatment yet invariably receive aggressive adjuvant chemotherapy. Methods An unbiased search for genes predictive of distant metastatic relapse was undertaken using a training cohort of 199 node-negative, adjuvant treatment naïve HRneg (including 154 Tneg) breast cancer cases curated from three public microarray datasets. Prognostic gene candidates were subsequently validated using a different cohort of 75 node-negative, adjuvant naïve HRneg cases curated from three additional datasets. The HRneg/Tneg gene signature was prognostically compared with eight other previously reported gene signatures, and evaluated for cancer network associations by two commercial pathway analysis programs. Results A novel set of 14 prognostic gene candidates was identified as outcome predictors: CXCL13, CLIC5, RGS4, RPS28, RFX7, EXOC7, HAPLN1, ZNF3, SSX3, HRBL, PRRG3, ABO, PRTN3, MATN1. A composite HRneg/Tneg gene signature index proved more accurate than any individual candidate gene or other reported multigene predictors in identifying cases likely to remain free of metastatic relapse. Significant positive correlations between the HRneg/Tneg index and three independent immune-related signatures (STAT1, IFN, and IR) were observed, as were consistent negative associations between the three immune-related signatures and five other proliferation module-containing signatures (MS-14, ONCO-RS, GGI, CSR/wound and NKI-70). Network analysis identified 8 genes within the HRneg/Tneg signature as being functionally linked to immune/inflammatory chemokine regulation. Conclusions A multigene HRneg/Tneg signature linked to immune/inflammatory cytokine regulation was identified from pooled expression microarray data and shown to be superior to other reported gene signatures in predicting the metastatic outcome of early stage and conservatively managed HRneg and Tneg breast cancer. Further validation of this prognostic signature may lead to new therapeutic insights and spare many newly diagnosed breast cancer patients the need for aggressive adjuvant chemotherapy.


Introduction
Hormone receptor-negative (HRneg) breast cancer accounts for 30% to 40% of all newly diagnosed breast malignancies and is clinically subdivided into either human epidermal growth factor receptor 2 (HER2/ ERBB2)-positive or triple-negative (Tneg) breast tumors, and about 60% of the latter consist of basal-like breast cancers [1][2][3][4]. When characterized by histology or protein-, RNA-or DNA-based assays, HRneg and Tneg breast cancers are consistently found to be aggressive and heterogeneous subgroups that defy prognostic substratification [5][6][7][8][9]. Tneg and basal-like breast cancers represent about 15% of all newly diagnosed breast cancers and preferentially arise in younger women, African-Americans, and BRCA1 mutation carriers. Given their reputation for more invasive and proliferative characteristics, even early-stage HRneg and Tneg breast primaries are invariably treated with adjuvant systemic therapy. Since Tneg breast tumors lack clinically validated prognostic or predictive biomarkers, their systemic therapy consists of empiric combinations of toxic chemotherapy.
The metastatic potential of HRneg and Tneg breast cancers, unlike hormone receptor-positive (HRpos) breast cancer, is usually manifest within 5 years of primary tumor diagnosis, with or without adjuvant chemotherapy intervention [10][11][12]. For example, despite both primary and systemic treatment, patients with Tneg breast cancer have a median time to metastatic recurrence of fewer than 3 years and are more than three times as likely to die from metastases within 5 years [12]. Despite this aggressive tumor behavior, nearly two thirds of newly diagnosed early-stage (T 1,2 N 0,1 ) Tneg patients conservatively managed without adjuvant chemotherapy remain disease-free 5 or more years after diagnosis, indicating that most do not require systemic therapy for curative intent and illustrating the clinical heterogeneity intrinsic to this otherwise-aggressive form of HRneg breast cancer [13]. Since more than 60% of incident breast cancers (including HRneg and Tneg cases) in the US are localized at the time of diagnosis and therefore are amenable to curative management without unnecessary systemic therapy [14], the failure of both traditional and modern high-throughput analytical methods to prognostically stratify HRneg and Tneg breast cancers for more personalized and conservative management points to a high-priority need for additional biomarker discovery [9].
Many multigene breast cancer classifiers and outcome predictors have been introduced to date, but none has become universally accepted, although several have been standardized and commercialized [8,9]. Given the diversity of genes in these signatures, it is surprising that they demonstrate nearly 80% classification concordance with routine pathology-based classifiers of breast cancer into HRpos, HRneg, HER2positive, and Tneg subgroups [9]. Owing to the predominance of HRpos breast cancers and the many molecular differences distinguishing good-risk (luminal A) from poor-risk (luminal B) HRpos breast cancers, most of the well-described multigene predictors contain gene modules known to regulate or execute cell proliferation [9,15]. Thus, these signatures are most effective at assigning recurrence risk to earlystage HRpos breast cancer patients whose prognoses can be estimated using a simple Ki67 index [15] or more accurately assessed using a multigene predictor enriched for regulators of DNA and cell cycle function [16]. Large-scale meta-analyses across heterogeneous breast cancer datasets analyzed on different expression microarray platforms of multigene signatures like the 70-gene Mammaprint signature (NKI-70) [6], Celera 14-gene metastasis score (MS-14) [16], 76-gene Veridex signature (EMC-76) [17], core serum response (CSR/wound) signature [18], Oncotype/Genomic Health recurrence score (ONCO-RS) [19], p53 [20], and genomic grade index (GGI) [21] have shown that their prognostic values are comparable when evaluated against HRpos breast cancers (with or without adjuvant treatment). Moreover, despite the disparity in their gene composition, their proliferation modules appear to be the common driving force behind their overall prognostic value [22,23]. As the majority of HRneg breast cancers are highly proliferative, these various multigene predictors fail to show any value in discriminating prognosis within this HR subtype, supporting the widespread call for newer prognostic signatures not dependent on proliferation modules [22,23].
One meta-analysis observed that higher expression of an immune response (IR) gene module associated with STAT1 (signal transducer and activator of transcription 1) mRNA expression was significantly associated with better HRneg clinical outcome by univariate and multivariate analyses, prompting recent speculation that impaired host IR drives the development of HRneg metastatic events [23]. Earlier investigators showed that a novel interferon (IFN)-regulated breast cancer gene cluster, including the transcriptional regulator STAT1, was associated with somewhat better prognosis cases relative to other basal-like breast cancers [24]. Shortly thereafter, a team employing a novel pattern recognition and gene selection method and interrogating three public microarray datasets (based on different platforms) containing 186 adjuvant therapy-naïve, regionally involved HRneg breast cancers identified a 98-gene IR cluster and a 7-gene IR module capable of specifying up to 25% of HRneg breast cancers (including several HERpos but few medullary breast cancers) with significantly reduced risk of distant metastasis [25]. While the larger IR-98 gene cluster contained a number of IFN-related genes, including STAT1, the compact IR-7 module appeared functionally related to, but prognostically distinct from, the two previously reported IFN and STAT1 gene clusters [23,24]. More recently, this IR-7 predictor was refined by assigning different weights to the individual genes, yielding a composite IR score whose value increases with better HRneg prognosis [26]. While the prognostic value of this IR score was thought to be independent of tumor infiltration by lymphocytes [26], high levels of lymphocyte infiltration have been found to be associated with reduced risk of metastasis in Tneg/ basal-like breast cancers [27]; thus, the prognostic contribution of host stromal and immune cell elements within the primary tumor remains an open question awaiting additional study. Meanwhile, the urgency to identify that vast majority of early-stage HRneg breast cancer patients not destined for metastatic relapse and to spare them unnecessary chemotherapy compelled a subsequent unbiased microarray search among nodenegative HRneg and Tneg breast tumors for genes predictive of distant metastatic relapse.
The present study describes a novel set of 14 such prognostic gene candidates identified from a training cohort of 199 node-negative, adjuvant treatment-naïve HRneg (154 Tneg) cases curated from three public expression microarray datasets generated on the same microarray platform. Independent validation of the unweighted multigene HRneg/Tneg prognostic index was performed on a different cohort of 75 nodenegative, adjuvant-naïve HRneg cases curated from three additional public datasets generated on two different microarray platforms. This novel HRneg/Tneg signature is able to better discriminate validation cases destined for metastatic relapse in comparison with eight other reported signatures. Interestingly, this HRneg/Tneg multigene index lacks any proliferation module and shows modest but significant correlations with the previously reported IR, IFN, and STAT1 module genes. Although the reported IR, IFN, or STAT1 module genes are not components of the HRneg/Tneg signature, one gene component of this index (CXCL13) correlates significantly with each of the 7 IR module genes, indicating surrogate representation of the IR-7 module within the 14-gene HRneg/Tneg index. In keeping with the immune ontology of both IR and IFN/STAT1 gene signatures, network analysis of the HRneg/Tneg signature reveals that half of the 14 index genes are functionally linked to immune/inflammatory cytokine regulation.

Selection of HRneg and Tneg prognostic gene candidates
A set of 199 adjuvant-naïve, node-negative (N 0 ), estrogen receptor-negative breast cancers annotated for distant metastasis-free survival (DMFS) was identified as HRneg training cases from three published microarray studies similarly analyzed on the Affymetrix (Santa Clara, CA, USA) U133A platform ([GEO:GSE2034] [17], [GEO:GSE5327] [28], and [GEO:GSE7390] [29]). Clinical parameters (grade and tumor size) available from each of these training data sources are summarized in Table  S1 in Additional file 1. Tumor HER2 status was assigned based on mean-centered, log2-transformed ERBB2 transcript levels (probe set ID 216836_s_at) within each data source, yielding 154 Tneg training cases.
For candidate discovery, an initial subset of 135 HRneg cases from [GEO:GSE2034 and GSE5327] was analyzed by two different biostatistical approaches. In the first, prediction analysis of microarrays (PAM) was applied to log2-transformed discovery data subset by data source. Approximately 300 top discriminating probes were identified within each data source, and common probes with PAM scores bearing the same sign within both data sources were selected. Additional candidates were selected on the basis of a Monte Carlo cross-validation procedure. The discovery data subset was Z-transformed independently within data source and combined. A minimum variation filter was applied, yielding approximately 14,000 probes where at least 10% of cases showed greater than twofold variation from the mean. The filtered data were randomly subdivided into learning and test groups controlled for the number of metastatic cases. Univariate Cox analysis was performed, and prognostic significance was assessed as the P value computed from the Wald statistic averaged over 100 iterations. Probes with a P value of less than 0.01 and consistent correlation with DMFS (that is, Cox coefficient bearing the same sign) in more than 80% of all paired learning and test groups over the 100 iterations were selected. Univariate and multivariate Cox regression of DMFS on Z-transformed gene expression was performed on the discovery data for candidates with known official gene symbol annotation identified from both approaches, and probes with consistent correlation with DMFS in both univariate and multivariate settings were chosen for further assessment in the remaining 64 HRneg training cases from [GEO:GSE7930]. Expression data from these 64 cases were RMA-normalized and meancentered in Bioconductor R, and univariate Cox regression of DMFS on gene expression was performed. Candidates with consistent correlation with DMFS in both subsets of the training cohort were selected as final HRneg prognostic gene candidates for further validation. Tneg-specific prognostic gene candidates were similarly selected from 154 Tneg training cases: 108 cases were from the initial discovery subset [GEO: GSE2034 and GSE5237], and 46 cases were from the additional training subset [GEO:GSE7930].
Prognostic assessment of HRneg/Tneg genes within the training cohort Mean-centered, log2-scaled data from the three independent studies comprising the 199 training cases were merged using distance-weighted discrimination (DWD) [30]. Prognostic performance of individual HRneg/Tneg prognostic markers was assessed using univariate and multivariate Cox analyses as well as Kaplan-Meier analysis of each marker dichotomized at its median expression level. Expression indices of the HRneg-and Tneg-specific markers as well as the combined set of HRneg/Tneg markers were computed for each patient as follows: where x is the DWD-transformed expression, n is the number of genes within the signature, and P and N are the set of markers with positive and negative correlations with increased hazard, respectively. Tumors were dichotomized into high-versus-low index groups by their respective indices (that is, 199 HRneg cases by HRneg index and 154 Tneg cases by Tneg index) as well as the combined HRneg/Tneg index using median and upper third quartile values, which were discovered to yield near-optimal signature performance. Kaplan-Meier analysis was performed and significance was assessed by the log-rank statistic. Also, Cox regression of DMFS on group identity was performed to estimate the hazard ratio (HR) between patient groups with highversus-low signature index. Candidates were also prioritized by stepwise variable analysis. Briefly, candidates were added one at a time to the signature beginning with the gene most strongly correlated with DMFS by univariate Cox analysis (largest coefficient or minimum P). With each step, expression indices were computed for all possible additions and scored by univariate Cox regression to determine the optimal order of addition and candidate subset (largest coefficient or minimum P). Likewise, candidates were subtracted one at a time from the combined 14-gene HRneg/Tneg signature for prioritization comparison.
Prognostic assessment of HRneg/Tneg gene signature within the validation cohort The independent validation cohort consisted of 75 untreated, node-negative HRneg primary breast cancers annotated for DMFS pooled from three independent datasets ([GEO:GSE6532] [31], [EBI:E-TABM-158] [7], and NKI-295 [32]). Clinical parameters (grade and tumor size) available from each of these validation data sources are summarized in Table S1 in Additional file 1. Of these cases, 38 were analyzed on the Affymetrix platform ([GEO:GSE6532], [EBI:E-TABM-158]) whereas the remaining 37 were assayed on the Agilent Technologies, Inc. (Santa Clara, CA, USA) HU25K platform (NKI-295). Data generated on the Affymetrix platform were normalized using RMA and mean-centered independently within each data source, and Agilent data were converted to log2-scale and mean-centered. Chip annotation files were obtained from the Broad Institute (Cambridge, MA, USA) website [33], and within each data source, expression data were collapsed by gene symbols such that the expression of genes represented by multiple probes was computed as the average across probes. Of the 14 gene candidates identified from the Affymetrix platform-based training cohort, only one (PRRG3) could not be identified on the Agilent array platform. Processed expression data were mapped across platforms using gene symbols prior to combination using DWD. HRneg/Tneg candidates were mapped to the combined validation dataset by gene symbol, and prognostic performance of the HRneg/Tneg signature as an index was assessed by Kaplan-Meier and Cox regression analyses of the validation cohort dichotomized at the upper third quartile cut-point, which was once again found to give near-optimal signature performance.

HRneg/Tneg signature comparisons with other multigene predictors
The HRneg/Tneg candidates were assessed in relation to eight other signatures: NKI-70 [6], MS-14 [16], CSR/ wound-response [18], ONCO-RS [19], GGI [21,31], IR-7 [25,26], STAT1 cluster [23], and IFN cluster [24] (Table  S2 in Additional file 2). Gene signatures were mapped to the training and validation datasets using gene symbols, and for each signature, an expression index was computed for each patient as follows: where x is the DWD-transformed expression, n is the number of genes (within the signature) that are mapped to the dataset, and P and N are the set of markers with previously reported positive and negative correlations with increased hazard, respectively. Prognostic comparison of the signatures was performed in the validation cohort to avoid training bias toward the HRneg/Tneg candidates. For each signature, tumors were dichotomized into high-versus-low index groups by median values. Here, to ensure fair comparisons and minimize bias toward the newly identified candidates, the upper third quartile cut-point, which yielded near-optimal HRneg/Tneg signature performance, was not employed. Kaplan-Meier analyses were performed and significance was assessed by log-rank statistic. Cox regression of DMFS on group identity was used to estimate the HR between patient groups with high-versus-low signature values. Pearson correlations between the signature indices were performed using both training and validation cohorts. In addition, T cell-and B cell-specific gene signatures derived from human peripheral blood (Table  S2 in Additional file 2) were employed to estimate the degree of leukocyte infiltration within the training and validation tumors [34]. Signatures were mapped to the training and validation datasets by gene symbol, and the average expression of each signature was computed. Pearson correlations between the T-cell and B-cell signatures and the HRneg/Tneg index and IR-7 signature were performed on both training and validation cohorts. To confirm these associations, the analyses were repeated using a more restricted set of lymphocyte genes consisting of classical T cell-and B cell-specific surface markers and co-receptors (highlighted in red/ yellow in Table S2 in Additional file 2).

Pathway analysis of HRneg/Tneg signature genes
Pathway Studio (Ariadne Inc., Rockville, MD, USA) was used to identify potential common upstream regulators and downstream effectors of the Tneg/HRneg candidates. Also, Ingenuity Systems (Redwood City, CA, USA) was employed to explore potential connections between candidates through the shortest path (at most, one additional node) of direct interactions.

Results
Training cohort selection and assessment of HRneg/Tneg prognostic candidates Following the multistep protocol described in Materials and methods, 11 probes, representing 11 unique genes (CLIC5, CXCL13, MATN1, RPS28/ANKRD47, ABO, EXOC7, HAPLN1, PRRG3, PRTN3, RFXDC2, and RGS4), were identified as HRneg prognostic candidates from the training cohort of 199 HRneg cases. Likewise, 7 probes, representing 7 unique genes (CLIC5, CXCL13, MATN1, RPS28/ANKRD47, HRBL, SSX3, and ZNF3), were identified as Tneg prognostic candidates from the subset of 154 Tneg cases within the training cohort. Altogether, a non-redundant set of 14 genes demonstrating prognostic value in either the full HRneg training data or the Tneg subset was identified as HRneg/ Tneg prognostic candidates ( Table 1). Each of these 14 HRneg/Tneg genes showed prognostic significance by univariate Cox analysis in the pooled training cohort, but only half retained prognostic significance by multivariate analysis (Table 1). Interestingly, all but 2 (HAPLN1 and RGS4) of the 14 genes yielded negative Cox coefficients, indicating that for the majority of the HRneg/Tneg genes, higher transcript expression is associated with better prognosis (Table 1). Kaplan-Meier analysis revealed that, except for 3 genes (RPS28/ ANKRD47, MATN1, and HAPLN1), all of the HRneg and Tneg candidates were able to dichotomize the training cohort into prognostic groups showing significant differences in DMFS (Figures S1 and S2 in Additional files 3 and 4).
To assess the prognostic value of these HRneg/Tneg genes taken together as a multigene signature, an index value was computed as the sign-corrected average expression of the individual candidates such that higher expression of the signature index would be expected to correlate with worse prognosis. Kaplan-Meier analysis revealed that index values computed from the 11 genes identified from the HRneg training cohort (HRneg index) or from the 7 genes identified from the Tneg subset (Tneg index) were able to dichotomize their corresponding training cohorts into significantly different DMFS outcomes using a median value cut-point (logrank P = 2.04 × 10 -7 and 1.73 × 10 -5 , respectively). The HRneg/Tneg index, comprising the non-redundant set of all 14 HRneg and Tneg prognostic candidates, achieved an even more significant curve separation (logrank P = 6.14 × 10 -8 in full training data and 1.63 × 10 -6 in Tneg subset). Cox regression confirmed that the hazard associated with the high 14-gene HRneg/Tneg index value (HR 4.23, 95% confidence interval [CI] 2.4 to 7.45; P = 6.2 × 10 -7 in full training data and HR 4.18, 95% CI 2.22 to 7.88; P = 9.7 × 10 -6 in Tneg subset) was greater than that associated with either the HRneg or the Tneg indices in their corresponding training cohorts (HR 3.93, 95% CI 2.25 to 6.86; P = 1.4 × 10 -6 and HR 3.56, 95% CI 1.92 to 6.61; P = 5.6 × 10 -5 , respectively).
Near-optimal curve separation was achieved using an upper third quartile (≥ 75th percentile) value as an HRneg/Tneg index cut-point ( Figure 1). The Kaplan-Meier curves in Figure 1a and 1b show the full HRneg training cohort and its Tneg subset dichotomized at this third quartile cut-point into groups with significantly different DMFS outcomes based on the combined 14gene HRneg/Tneg signature index. The Cox proportional HRs between high and low index groups were 9.13 (95% CI 5.5 to 15.2; P~0) in the full training data and 11 (95% CI 6.11 to 19.6; P~0) for the Tneg subset. As was observed using a median value cut-point, the prognostic performance of the combined 14 gene HRneg/Tneg index using a third quartile cut-point was superior or comparable to that of the individual HRneg or Tneg indices in their respective training cohorts (Figure S3 in Additional file 5).
Stepwise addition and subtraction analysis within the 199 training cohort prioritized the individual HRneg/ Tneg genes comprising the 14-gene signature and revealed that 4 genes (CLIC5, EXOC7, RFXDC2, and SSX3) were consistently identified as the most significant contributors to the signature's prognostic value. Despite its prognostic significance in the multivariate Cox analysis, HAPLN1 was identified in the stepwise analysis as not providing additional prognostic value to the full 14-gene HRneg/Tneg signature.

Validation cohort assessment of the HRneg/Tneg prognostic signature
An upper third quartile cut-point for the combined HRneg/Tneg signature index also proved near optimal in discriminating DMFS outcome within the 75-case validation cohort in which gene expression data were generated from two different microarray platforms. Figure 1c shows the Kaplan-Meier curves of the validation cohort dichotomized this way by the combined 14-gene HRneg/Tneg index. The Cox proportional HR between the high-versus-low HRneg/Tneg index groups in this validation cohort was 2.85 (95% CI 1.24 to 6.52; P = 0.013).

Comparison of HRneg/Tneg signature with other multigene predictors
To compare the prognostic value of different signatures within the same population, a median cut-point value was used for each signature to dichotomize the validation cohort. Kaplan-Meier comparisons revealed that, of the nine signatures tested, only the HRneg/Tneg signature was able to significantly discriminate DMFS outcome (Figure 2a). Proliferation module-containing signatures like the NKI-70 (Figure 2b) and MS-14 (Figure 2c), known to be predictors of HRpos outcome [23,24], did not produce any prognostic separation in this HRneg population. The previously reported IR module, IR-7, though developed as an HRneg outcome predictor, trended toward discriminating DMFS outcome in this HRneg population only (Figure 2d). The log-rank P values of the Kaplan-Meier analyses and the Cox proportional HRs between the high-versus-low index groups for all nine multigene predictors in this validation cohort are shown in Table 2.
All possible associations between the HRneg/Tneg index and the eight other signatures were explored in both the training (n = 199) and validation (n = 75) cohorts ( Figure 3). Signature correlations (Pearson correlation coefficient, or R p ) found to be significant and consistent between these two cohorts included the following: (a) positive associations between HRneg/Tneg and three different immune-related signatures (STAT1, IFN, and IR-7) and (b) positive associations among the five different proliferation module-containing signatures (MS-14, ONCO-RS, GGI, CSR/wound, and NKI-70). Consistent negative associations between the indices of the immune-related signatures and proliferation module-containing signatures were observed; however, some of these correlations did not reach significance.
To compare the relationships between the HRneg/ Tneg and IR-7 prognostic indices with the degree of immune cell infiltration within the training and validation tumor cohorts, average expressions of T cell-specific and B cell-specific gene signatures were computed for each cohort. Since all of the genes in these lymphocytespecific signatures are positively correlated with lymphocyte abundance, higher lymphocyte gene signature values within the cohorts represent higher degrees of lymphocytic infiltration. Owing to sign adjustments in the calculation of the HRneg/Tneg and IR-7 prognostic indices, HRneg, hormone receptor-negative; Tneg, triple-negative.
negative correlations were expected to reflect the extent to which these indices were derived from infiltrating T or B lymphocytes. As shown in Table 3 addition, common downstream target analysis placed 3 of these HRneg/Tneg genes (CXCL13, RGS4, and PRTN3) as upstream of immune-function mediators IL-10, CCR7, and CCL3 (Figure 4b). Additional pathway exploration conducted using Ingenuity Pathway Systems (Figure 4c) identified cytokine TNF as linked to 6 HRneg/Tneg genes within a network that includes transcription factor STAT3, a key mediator of acute-phase response. Altogether, these network analyses identified 8 of the 14 genes within the HRneg/Tneg index as being potentially linked to immune/inflammatory cytokine regulation.

Discussion
Our training (199 HRneg with 154 Tneg) and validation (75 HRneg with 46 Tneg) cohorts of node-negative, adjuvant treatment-naïve breast cancers showed distant metastatic event rates similar to that of another conservatively managed early-stage Tneg cohort [13]. Among    IR-7 [25,26] IFN Cluster [24] ONCO-RS [19] GGI [21,31] MS-14 [   HRneg, hormone receptor-negative; IR-7, immune response signature with 7-gene immune response module; R p , Pearson correlation coefficient; Tneg, triplenegative. the training set, 65 (33%) had an eventual metastatic relapse and 85% of these occurred within 5 years of diagnosis; among the validation set, 24 (32%) had a metastatic event, and 91% of these occurred within 5 years of diagnosis. Given this clinical behavior and the 77% preponderance of Tneg primary tumors in the training set, it may not be surprising that of the 11 top prognostic candidates entrained by the full HRneg cohort, four genes (CXCL13, CLIC5, RPS28, and MATN1) were also among the 7 top prognostic candidates independently entrained by the 154 Tneg tumors. The slightly different prognostic performances of individual genes within each training set are illustrated by CLIC5, MATN1, and RPS28, which were slightly more effective discriminators against the Tneg subset relative to the full set of HRneg tumors (Figures S1 and S2 in Additional files 3 and 4). It is interesting to note that higher expression of 12 of the 14 HRneg/Tneg genes is associated with better DMFS. This is consistent with observations among other HRneg outcome predictors (IR-7, STAT1, and IFN signatures) in which individual gene components are often more highly expressed in association with better prognosis [23][24][25][26], and this is in stark contrast with HRpos outcome predictors, in which elevated expression of the majority of gene components is associated with increased tumor proliferation and poor prognosis [22,23]. These inherent differences were taken into account during the computation of a composite signature index, such that a higher index value would be expected to correlate with worse DMFS outcome.

SP1
Relative to their gene-specific prognostic values, a composite signature score (index) based on all of the candidate genes proved to be a better discriminator of metastatic outcome. Despite their variable Cox coefficients, no attempt was made to individually weight each gene in generating the index. While indices computed from the HRneg and Tneg genes alone were able to dichotomize their respective training cohorts into groups with significant differences in DMFS ( Figure S3 in Additional file 5), a combined index comprising all 14 HRneg/ Tneg genes was able to achieve equivalent or better curve separation at both cut-points tested, thus providing a rationale for considering all 14 HRneg/Tneg genes together as a signature in further studies. Stepwise variable addition and removal analysis identified candidate subsets from which indices can be computed without loss of prognostic performance as assessed by univariate Cox analysis, suggesting that the HRneg/Tneg signature index has prognostic robustness. In particular, when the training cohort was dichotomized at a median value cutpoint, the removal of up to three genes (for example, SSX3, MATN1, and PRTN3) did not significantly alter the HRs between poor versus good prognosis groups calculated from the modified 11-gene index relative to the full 14-gene index (HR 3.6, 95% CI 2.07 to 6.26 and HR 4.23, 95% CI 2.4 to 7.45, respectively). Also, a truncated index consisting of only 7 selected genes (CXCL13, CLIC5, RGS4, RPS28, RFX7, EXOC7, and HRBL) minimally altered the Kaplan-Meier curves and did not significantly reduce the HR (HR 3.6, 95% CI 2.09 to 6.23). Since both training and validation cohorts were composed of cases clinically annotated as HRneg, we reassessed the prognostic performance of the HRneg/Tneg index after removing 35 HRneg cases from the training cohort and 11 cases from the validation cohort having potentially high enough estrogen receptor transcript levels to be considered potentially false HRneg annotations. After these adjustments were made, the prognostic value of the HRneg/Tneg index improved slightly, as seen in the adjusted HRs calculated for the median cutpoint dichotomized training and validation groups (HR 4.57, 95% CI 2.45 to 8.54 and HR 2.72, 95% CI 1.11 to 6.67, respectively).
Choice of cut-points may significantly influence a signature's prognostic performance. Thus, although significant curve separation was achieved in the full training data (and the Tneg subset) using the median index value as a cut-point, additional Kaplan-Meier analyses were conducted to identify an optimal cut-point that minimizes the log-rank P value. Care was taken to restrict these analyses to cut-points within the 20th and 80th percentiles to prevent extreme group sizes and reduce the likelihood of over-fitting the training data. Optimal curve separations were achieved at cut-points near the upper third quartile for the HRneg and Tneg indices in their respective training cohorts as well as the 14-gene HRneg/Tneg index in the full training data and the Tneg subset, suggesting that selecting an upper third quartile cut-point in future studies may yield optimal signature performance. This observation was independently confirmed in the validation cohort whose gene expression data were derived from two different expression microarray platforms; here, optimal curve separation was observed at an index value of 0.2087 and very close to the upper third quartile (0.2388). Interestingly, despite placing 75% of patients in the good-prognosis group, when accuracy was assessed as a function of the proportion of metastatic events based on high or low index values, the HRneg/Tneg index appeared highly accurate at identifying patients with good prognosis and less accurate at assigning patients to the poor-prognosis category, consistent with previous observations showing better negative predictive value and less-optimal positive predictive value for the IR gene signature [26]. These performance characteristics suggest that the HRneg/Tneg index may be well suited for identifying newly diagnosed, early-stage patients whose expected good outcome on conservative management would not mandate aggressive adjuvant chemotherapy. To explore this possibility further, we considered the distribution of HRneg/Tneg indices within the training and validation cohorts for those with either metastatic or non-metastatic outcomes ( Figure S4 in Additional file 6); these distributions indicate that 10% to 15% of these cohorts have tumors with very low HRneg/Tneg indices and less than 10% likelihood of metastatic recurrence. However, additional independent validation studies are needed to confirm the presence of this small subgroup, which was not detected by the current optimization protocol and its survival characteristics.
The prognostic performance of the HRneg/Tneg index was also compared with that of other well-validated multigene predictors ( Figure 2); in addition, the multiple predictive indices were compared with one another across both the training and validation cohorts ( Figure 3). Nonoptimized median values were used as cut-points for prognostic comparison purposes in the validation dataset to minimize bias toward the HRneg/Tneg signature. Despite these measures, only the HRneg/Tneg index demonstrated significant Kaplan-Meier curve separation, whereas the previously reported IR signature and two other immune-related signatures approached significance in our pooled validation cohort only. This is in contrast to signatures like ONCO-RS and MS-14 that were originally developed as HRpos outcome predictors and showed no prognostic value within this HRneg cohort and is in agreement with previous reports suggesting that prognoses of HRneg and HRpos breast cancers are driven by fundamentally different mechanisms [22,23]. While the strong correlations between different proliferation module-containing signatures were expected and in keeping with previous reports (as were the significant associations between the different immune-related signatures), the anti-correlations observed between the composite scores (index) of proliferation and immune signatures in HRneg breast cancers were not previously noted. It is worth noting that, owing to the adjustments made during index computations, these anti-correlations reflect a positive association between immune and proliferation function and may be in keeping with the growth-stimulatory effects of proinflammatory cytokine/chemokine signaling. These anti-correlations may also be attributed in part to the poor prognostic performances of proliferation module-containing signatures in HRneg breast cancers and may account for the lack of prognostic value of the HRneg/Tneg and IR signatures within a corresponding cohort of more than 400 node-negative, adjuvant-naïve HRpos breast cancers ([GEO:GSE2034, GSE7390], NKI-295) in which both the ONCO-RS and MS-14 signatures are significantly prognostic (data not shown).
Given these signature associations, it is not entirely surprising that network analysis of the HRneg/Tneg signature employing two different commercial pathway programs revealed no links to known proliferation pathways but showed direct and indirect connections to several immune/inflammatory nodes, with 8 of the 14 HRneg/Tneg signature genes functionally linked to chemokine regulation and expression (Figure 4). Although the IR, IFN, and STAT1 signature genes are not components of the HRneg/Tneg signature, one gene in this index, CXCL13, was found to correlate significantly with each of the 7 IR genes, suggesting surrogate representation of the IR-7 index within the HRneg/Tneg signature and probably accounting for the weak but consistent association observed between the HRneg/Tneg index and the three other immune-related signatures in both training and validation cohorts. The observation that these three other immune-related signatures correlated much more strongly among themselves supports the possibility that the 14-gene HRneg/Tneg signature contains other non-immune/inflammatory modules, although examples of such pathways were not apparent in our network analysis. To pursue this hypothesis further, we attempted to correlate both the HRneg/Tneg and IR-7 indices with an assessment of lymphocyte infiltration within the cohort tumors. Only a small subset of the dataset tumors was clinically annotated for degree of lymphocytic infiltration [6,29], and an initial analysis of these few Tneg cases suggested a possible trend between the HRneg/Tneg score and the degree of lymphocytic infiltration (data not shown). Therefore, using a reported set of T cell-and B cell-specific genes as surrogate signatures for lymphocytic infiltration in the training and validation cases (Table S2 in Additional file 2) [34], we demonstrated a modest correlation between the HRneg/ Tneg index and both T-cell and B-cell gene expression. By comparison (Table 3), the IR-7 index correlated much more strongly with these lymphocyte-specific gene expression signatures, indicating that the IR-7 index may largely represent the extent of tumor infiltration by T cells and B cells and the HRneg/Tneg index potentially reflects this as well as additional tumor epithelial characteristics.
Of the chemokine-associated genes in the HRneg/ Tneg index, CXCL13 (ligand for the chemokine receptor CXCR5) has been best studied in breast cancer and recently was shown to be the most significantly overexpressed (mRNA and protein) chemokine in a panel of early-stage human breast cancers following a survey of 84 different chemokines [35]. Surprisingly, in this study, breast cancer overexpression of CXCL13 did not correlate with tumor infiltration by leukocytes but instead was immunohistochemically localized to the cytoplasm of the malignant epithelial cells [35]. This study also illustrates the possibility that some HRneg/Tneg signature genes emerge as blood biomarkers since CXCL13 blood levels were found to be specifically increased in patients with breast cancer [35]. Another chemokineassociated HRneg/Tneg gene initially thought to be expressed only in activated neutrophils, PRTN3 (neutrophil-derived serine proteinase 3), was recently shown to be transcriptionally overexpressed in cytokine-exposed epithelial cells, although its expression has not yet been linked to cancer [36,37]. Other chemokine-associated genes in the HRneg/Tneg signature linked to both epithelial and cancer cell expression include EXOC7 (exocyst complex component 7) [38], ABO (blood group glycosylransferases A and B) [39], CLIC5 (chloride intracellular channel 5) [40], RPS28 (40S ribosomal protein S28) [41], HAPLN1 (hyaluronan-and proteoglycanlinked protein 1) [42], and RGS4 (regulator of G-protein signaling 4) [43][44][45][46]. Studies of the last gene also illustrate why transcriptome-derived cancer signatures cannot reliably be extrapolated to protein-based tumorigenic mechanisms without more in-depth evaluation. While RGS4 transcriptional upregulation has been associated with increased viability, invasion, and motility of thyroid cancer, glioma, ovarian ascites, and Tneg breast cancer cells, it has now been shown that RGS4 mRNA and protein levels do not correlate, since (despite high RGS4 transcript levels) RGS4 protein levels must be proteasomally downregulated to enable metastasis [46]. To date, none of the 6 other HRneg/Tneg signature genes not functionally linked to chemokine pathways (PRRG3, RFX7, MATN1, SSX3, HRBL, and ZNF3) has shown any reported association with cancer. IFN: interferon; IR: immune response; IR-7: immune response signature with 7-gene immune response module; MS-14: Celera 14-gene metastasis score; NKI-70: 70-gene Mammaprint signature; ONCO-RS: Oncotype/Genomic Health recurrence score; PAM: prediction analysis of microarrays; STAT1: signal transducer and activator of transcription 1; Tneg: triple-negative.