- Research article
- Open Access
Novel 18-gene signature for predicting relapse in ER-positive, HER2-negative breast cancer
Breast Cancer Researchvolume 20, Article number: 103 (2018)
Several prognostic signatures for early oestrogen receptor-positive (ER+) breast cancer have been established with a 10-year follow-up. We tested the hypothesis that signatures optimised for 0–5-year and 5–10-year follow-up separately are more prognostic than a single signature optimised for 10 years.
Genes previously identified as prognostic or associated with endocrine resistance were tested in publicly available microarray data set using Cox regression of 747 ER+/HER2− samples from post-menopausal patients treated with 5 years of endocrine therapy. RNA expression of the selected genes was assayed in primary ER+/HER2− tumours from 948 post-menopausal patients treated with 5 years of anastrozole or tamoxifen in the TransATAC cohort. Prognostic signatures for 0–10, 0–5 and 5–10 years were derived using a penalised Cox regression (elastic net). Signature comparison was performed with likelihood ratio statistics. Validation was done by a case-control (POLAR) study in 422 samples derived from a cohort of 1449.
Ninety-three genes were selected by the modelling of microarray data; 63 of these were significantly prognostic in TransATAC, most similarly across each time period. Contrary to our hypothesis, the derived early and late signatures were not significantly more prognostic than the 18-gene 10-year signature. The 18-gene 10-year signature was internally validated in the TransATAC validation set, showing prognostic information similar to that of Oncotype DX Recurrence Score, PAM50 risk of recurrence score, Breast Cancer Index and IHC4 (score based on four IHC markers), as well as in the external POLAR case-control set.
The derived 10-year signature predicts risk of metastasis in patients with ER+/HER2− breast cancer similar to commercial signatures. The hypothesis that early and late prognostic signatures are significantly more informative than a single signature was rejected.
Five years of adjuvant endocrine therapy is standard treatment for patients with primary oestrogen receptor-positive (ER+) breast cancer, and it clearly improves prognosis . Multiparametric molecular assays are increasingly used to estimate prognosis and guide treatment decisions of patients with primary ER+ breast cancer. These include the Oncotype DX (OncotypeIQ/Genomic Health, Inc., Redwood City, CA, USA) Recurrence Score (RS) , Prosigna PAM50 (NanoString Technologies, Seattle, WA, USA) , Breast Cancer Index (BCI) , EndoPredict (Myriad Genetics, Zurich, Switzerland)  and IHC4 . All of them have been evaluated in the TransATAC series of samples that were established from patients with ER+ primary breast cancer randomised to treatment with 5 years of anastrozole or tamoxifen in the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial . It has become clear that, following surgery, the risk of recurrence in ER+ primary breast cancer is not constant, which is underlined by molecular differences. In TransATAC we have previously shown that the oestrogen module of RS was prognostic within 5 years of surgery (during endocrine therapy), however it became non-informative for recurrences beyond 5 years, thus weakening the overall prognostic value of RS . In the same data set, patients with high ER expression by RT-PCR were twice as likely to have a relapse 5–10 years after surgery than within the first 5 years. Bianchini et al. reported risk stratification by integrating the mitotic kinase score (MKS) and an oestrogen receptor-related score (ERS), both based on genes constituting the proliferation and oestrogen modules of RS. Women with high MKS and ERS tumours were at greater risk of late recurrence . More recently, improved risk estimation beyond 5 years by RS was reported when integrated with dichotomised ER expression assessed by RT-PCR .
Extending endocrine therapy beyond 5 years has been shown to reduce late-recurrence rate [11, 12], however those most likely to benefit from such therapy need to be identified. Although some of the widely used prognostic assays for ER+ patients have been shown to be prognostic for risk beyond 5 years [13,14,15,16], none of them have been optimised to quantify residual risk after 5 years free from recurrence, and their ability to predict late relapse varies substantially . The different time-dependent performance of multiparametric molecular signatures indicates that molecular features of ER+ breast cancers may be identified to improve prediction of residual risk in order to spare those patients with significantly low risk of late recurrence from extended endocrine therapy.
We therefore hypothesised that prognostic signatures optimised specifically for the early (0–5 years) and late (5–10 years) follow-up periods, respectively, would be more prognostic than a single signature optimised for the whole 10-year follow-up period. To test this hypothesis, we developed time-dependent prognostic signatures in patient samples from the TransATAC series for early, late and 10-year follow-up periods. The prognostic performance was tested in an independent sample set and against commercial signatures already assessed in TransATAC. Our primary aim was to compare the prognostic value of the newly developed signature(s) added to Clinical Treatment Score (CTS)  with that of PAM50 risk of recurrence (ROR) based on subtype and proliferation added to CTS.
Our initial analysis drew from four published breast cancer cohorts (GSE6532, GSE9195, GSE17705, GSE26971) analysed on either of the Affymetrix Human Genome HG-U133A (GPL96) and HG-U133 Plus 2.0 (GPL570) microarray platforms (Affymetrix, Santa Clara, CA, USA). The two platforms shared 22,277 probes to which we restricted our analyses. This cohort had 747 unique patient samples that matched our selection criteria: ER+, HER2−, treated with 5 years of endocrine therapy, chemotherapy-naive, with information on either distant metastasis-free survival (DMFS) or relapse-free survival (RFS) available with a long follow-up. Details of the inclusion criteria are listed in Additional file 1: Methods, and a full list of samples included in the analysis is shown in Additional file 2: Table S1.
In the TransATAC cohort, RNA was available from 948 formalin-fixed, paraffin-embedded (FFPE) tumours from the ATAC trial, previously extracted by Genomic Health Inc. (GHI) . Eligibility required hormone receptor-positive/HER2− disease, without chemotherapy treatment and at least 500 ng of RNA available. One hundred eighty-three recurrence events were recorded for this cohort. This study was approved by the South-East London Research Ethics Committee, and all patients gave informed consent.
The POLAR (Predictors Of early versus LAte Recurrence in ER+ breast cancer) samples were identified from archives of Royal Marsden Hospital (RMH), London, UK, and Lund University Hospital Biobank, Lund, Sweden. Eligibility criteria were patients with ER+/HER2− early breast cancer diagnosed between January 2000 and December 2004, treated with curative intent and with a follow-up data cut-off at May 2014. Patients must have received 5 years of adjuvant endocrine therapy (unless relapse occurred within this time); (neo)adjuvant chemotherapy was permitted. A 422-sample case-control design was used; control subjects were randomly selected according to matching criteria from among the remaining cohort of patients who did not relapse during follow-up. The total number of patients drawn upon was 1449. The following four matching criteria were used in this study: (1) age at diagnosis (< 50 or > 50 years), (2) Nottingham Prognostic Index (NPI) category (< 3.4, 3.4–5.4, > 5.4), (3) type of adjuvant endocrine therapy (tamoxifen only vs. any aromatase inhibitor [AI]) and (4) chemotherapy use (yes or no). Two-hundred forty-seven recurrence events were recorded. The POLAR study was approved by the RMH Research Ethics Committee (CCR 4122) and the ethics committee of Lund University Hospital (LU 240-01).
The primary endpoint was time to any recurrence, which was defined as locoregional (ipsilateral breast, contralateral breast and regional lymph nodes) and/or distant recurrence. Secondary endpoint was time to distant recurrence, which was the time from diagnosis until metastasis from the primary tumour at distant organs, excluding contralateral disease and locoregional and ipsilateral recurrences. Death before recurrence was treated as a censoring event for both endpoints.
In the microarray data set, 454 probes representing 454 genes (Additional file 2: Table S3) were analysed at univariate level; those significant in univariate analyses in a particular setting were entered into multivariable analyses. Further details are provided in Additional file 1: Methods.
For TransATAC, RNA was extracted by GHI for the RS study . RNA (100 ng) was used with the nCounter platform (NanoString Technologies) to assay the 93 endogenous and 7 reference genes selected in the process of the microarray expression analysis in 948 TransATAC samples.
For POLAR, RNA was extracted from three 3 × 10-μm unstained sections with more than 40% tumour cellularity using the RNeasy FFPE kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. RNA was quantified by using a NanoDrop instrument (Thermo Fisher Scientific, Wilmington, DE, USA). Between 50 and 200 ng of RNA was used to profile the expression of 27 endogenous and 5 reference genes with the NanoString nCounter.
NanoString expression data were background-corrected by subtracting the mean of the eight negative control probes, normalised with the geometric mean of five reference genes that had a correlation of Pearson’s r > 0.8 with all endogenous genes. The data set was then logarithmically (base 2) transformed and z-score-transformed. The KIF20A gene was detected in < 10% of samples in the TransATAC cohort and was removed from the data set. CTS, which carries information on tumour size, nodal status, grade, age and type of endocrine therapy, was calculated as published previously .
We trained separate early, late and 10-year signatures by performing elastic net analysis in the TransATAC training cohort. Our objective was to test if the early and late signatures had statistically significantly more prognostic power than the 10-year signature. If so, we would test the validity of the early and late signatures in the non-chemotherapy-treated subpopulation of POLAR and also test their performance in the chemotherapy-treated POLAR cohort. If the early and late signatures were not statistically significantly more prognostic than the overall signature, we would test the validity of the overall signature in the chemotherapy-naive POLAR group and explore its performance in the chemotherapy-treated POLAR group.
Statistical analyses of the cohort with microarray data were carried out at the Institute of Cancer Research (ICR) using R version 3.03 software (R Foundation for Statistical Computing, Vienna, Austria). Statistical analyses using the TransATAC cohort were performed at Queen Mary University of London with STATA version 13.1 (StataCorp, College Station, TX, USA) and R version 3.0.3 software. Statistical work on POLAR was carried out at RMH using the Statistical Analysis Plan version 2.0 and Prism 6.0c (GraphPad Software, La Jolla, CA, USA) software. Before data analysis took place, the statistical analysis plan for the TransATAC study was approved by the Long-term Anastrozole vs Tamoxifen Treatment Effects committee and that for the POLAR study was approved by the RMH Committee for Clinical Research, and these plans are described in Additional file 1: Methods. All statistical tests were two-sided.
We performed the following steps in our study. We used publicly available microarray data to generate lists of prognostic genes to be analysed in the TransATAC cohort. We developed early, late and 10-year prognostic signatures in a training data set (two-thirds of TransATAC) while setting aside a test set (one-third of TransATAC) so that the performance of the newly trained signatures could be evaluated. This internal validation included comparison with commercial signatures of BCI, Oncotype DX RS, PAM50 ROR and IHC4. Finally, we conducted an external validation in the POLAR case-control sample set.
Candidate gene selection and microarray expression data analysis
In order to derive time-dependent prognostic signatures, we shortlisted 585 candidate genes representing proliferation, oestrogen signalling, immune infiltration and immune signalling. These genes were tested for prognostic significance in publicly available gene expression sets of ER+ endocrine therapy-treated breast cancer. A flowchart illustrating the approach is shown in Fig. 1. Sixty-seven genes of interest that are part of the PAM50, Oncotype DX RS, EndoPredict and BCI profilers were also included. Additional genes likely to be related to benefit from endocrine therapy were identified from 81 patients by reanalysing our previously published neoadjuvant endocrine therapy-treated set of samples  (https://www.synapse.org/#!Synapse:syn16243). From this dataset, we identified 164 candidate genes by examining correlation of individual gene expression from untreated biopsies with change in the following after 2 weeks of AI treatment: (1) Ki-67, (2) proliferation-associated gene cluster, (3) oestrogen-associated gene cluster, and (4) expression of the modified version of the Global Index of Dependence on Estrogen  genes. An additional 354 genes were selected on the basis of literature searches. Genes from published gene modules of the proliferation-associated gene cluster, oestrogen-associated gene cluster and inflammatory response signature , the tumour invasion/metastasis module (PLAU)  and IGG-14 module (immunoglobulin-gamma)  were also included. The complete list of candidate genes and the reason for their inclusion are detailed in Additional file 2: Table S1.
Seven hundred forty-seven samples from the microarray expression dataset were compiled from four publicly available breast cancer cohorts to investigate the relationship between genes and outcome (Additional file 2: Table S2) [5, 23,24,25]. Expression data were available for 454 genes (Additional file 2: Table S3). We performed univariate Cox proportional hazards regression analyses for early, late and 10-year follow-up periods using RFS and DMFS as endpoints, respectively (six analyses), that identified 212 genes that were significant at p < 0.01 in any of the analyses (Additional file 2: Table S4). Genes significantly prognostic in a particular time period were taken forward for multivariable analyses performed by Cox proportional hazards regression with DMFS and RFS as endpoints, respectively, in the early, late and 10-year follow-up settings (six analyses). This resulted in 88 genes being selected in the models (Additional file 2: Table S5), of which 17 genes were removed owing to high correlation of expression with other candidates already selected (Additional file 2: Table S6). An additional 29 genes were added that included candidates without probes available in the microarray expression data analyses, some recently emerging candidates and also seven reference genes (Additional file 2: Table S7).
Expression profiling and signature building in TransATAC
Sample availability in TransATAC is shown in Fig. 2a. Expression data for the 100 selected genes (including housekeeping genes) (Additional file 2: Table S9) were obtained for 948 patient samples in TransATAC using the NanoString nCounter. We assessed the prognostic value of these molecular variables in TransATAC for early, late and 10-year time periods for RFS. Sixty-three genes were statistically significant in at least one of the time windows assessed (Additional file 2: Table S7, Additional file 3: Figure S1). We found different prognostic properties between early and late periods for 20 genes. Six genes were prognostic early but not in the late period (CD79, IL6ST, LRRC48, MPZL1, PGR and PIGV), and 14 genes were not significantly prognostic early but gained prognostic significance in the late setting (ANP32E, ANXA1, CTSL2, EPB41L2, ESR1, FOXA1, ICOS, IL17RB, MMP9, MYCBP2, NR2F1, PDZK1, SLAMF8 and TCF7L2).
The TransATAC cohort was then randomly split into two-thirds (n = 634) training and one-third (n = 314) validation sets while ensuring that the recurrence rate was similar in the two subgroups. Demographics for the training, validation and overall cohorts are presented in Table 1. We aimed to select prognostic variables independent of clinicopathological features that are commonly used for prognosis. To achieve this, on top of the 63 statistically significant genes in univariate analyses, CTS was also entered into multivariable selections for early, late and 10-year time-periods, respectively. Elastic net penalised Cox regression with leave-one-out cross-validation was used for feature selection in the TransATAC training set. CTS was selected in all three signatures in addition to 18 genes in the 10-year, 16 genes in the early, and 15 genes in the late follow-up analyses. The variables and their coefficients derived from the elastic net models are listed in Table 2. CTS had the highest coefficient in each of the time periods.
Comparison of time period-optimised prognostic signatures in TransATAC validation set
TransATAC was used to validate and compare the prognostic information of the three time period-dependent signatures (Table 3). In the 0–10-year follow-up period, all three newly derived signatures were significantly prognostic, with the late signature being significantly less informative than the 10-year signature (10-year signature likelihood ratio chi-square test [LRχ2] = 28.0; early signature LRχ2 = 33.4; late signature LRχ2 = 18.1). In the 0–5-year period, the 10-year signature and early signature were equally prognostic and significantly more than the late signature (LRχ2 for 10-year signature = 14.1; LRχ2 for early signature = 14.9; LRχ2 for late signature = 8.9). In the late setting, the early signature was the most prognostic, followed by the 10-year and late signatures (LRχ2 for 10-year signature = 13.9; LRχ2 for early signature = 18.6; LRχ2 for late signature = 9.3). CTS was strongly prognostic in all three time periods (CTS 0–10-year LRχ2 = 48.7; CTS 0–5-year LRχ2 = 29; CTS 5–10-year LRχ2 = 19.8).
For the 0–10-year period, all three signatures added statistically significant prognostic information beyond that of the CTS (ΔLRχ2 for 10-year signature = 7.9; ΔLRχ2 for early signature = 10.3; ΔLRχ2 for late signature = 4.3). In the 0–5-year period none of the signatures added significant prognostic information to CTS. However, in the 5–10-year period, the 10-year and early signatures added statistically significant prognostic information to CTS (10-year signature ΔLRχ2 = 4.8; early signature ΔLRχ2 = 8.0; late signature ΔLRχ2 = 2.7).
Given that the early and the late signatures were not statistically significantly more prognostic than the 10-year signature in the respective periods they were optimised for, we rejected our primary hypothesis that signatures optimised separately for the early and the late follow-up periods, respectively, are more prognostic than a 10-year signature, but we proceeded to assess the validity of the 18-gene, 10-year signature in an independent cohort and to compare its performance with that of commercial signatures.
Signature test of 10-year validity in POLAR cohort
A matched case-control set of samples was compiled from RMH and Lund University Hospital archives (POLAR) to validate the 10-year signature (Fig. 2b, Table 1). Our aims were to test the validity the 10-year signature in an endocrine therapy-only cohort similar to the training set and also to explore if the prognostic property (if any) extends to a higher-risk, chemotherapy-treated population. The latter cohort was of interest in the 5–10-year period because of the potential for its use in selecting patients for extended adjuvant endocrine therapy.
Despite having matched cases and controls on NPI category, the CTS was still higher in cases than in control subjects: 201.9 ± 98 (SD) vs. 170.8 ± 87.6 (p = 0.0009), respectively. In a univariate analysis, CTS had an OR of 1.004 (95% CI, 1.001–1.006) for a one-unit increase. We assessed a multivariable model with CTS with and the 10-year signature, and both were found to be statistically significant: 10-year signature OR = 1.851 (95% CI, 1.194–2.868), p = 0.006; CTS OR = 1.003 (1.001–1.005), p = 0.012.
We also assessed whether the 10-year signature added significant prognostic information above CTS alone using LR tests (Table 4, Additional file 4: Table S10). In the overall POLAR cohort (n = 422), CTS was prognostic across 10 years and in the early follow-up period (CTS 0–10-year period LRχ2 = 11.23; 0–5-year period LRχ2 = 22.09), but not in the 5–10-year period. The 10-year signature was prognostic in all three follow-up periods and contributed to CTS with significant prognostic information in the 10-year and early periods (0–10-year period ΔLRχ2, CTS + 10-year signature vs. CTS = 7.74; 0–5-year period ΔLRχ2, CTS + 10-year signature vs. CTS = 7.59), but not in the 5–10-year period. Both CTS and the 10-year signature were marginally more informative across the 10 years in the chemotherapy-treated POLAR cohort than in the endocrine therapy-only population, despite the latter having more patients and events (patients, n = 170 vs. n = 252; events, 99 vs. 148). Additionally, the 10-year signature added significantly more prognostic information to CTS in the chemotherapy-treated group (ΔLRχ2: CTS + 10-year signature vs. CTS = 6.71) than among those receiving endocrine therapy only (ΔLRχ2, CTS + 10-year signature vs. CTS = 2.47).
Prognostic properties of the 18 individual genes constituting the 10-year signature were assessed in POLAR and compared with data obtained in TransATAC. In POLAR, only 8 of the 18 genes were significantly prognostic at the univariate level (Fig. 3), but all genes except tumour necrosis factor-alpha (TNF) showed the same prognostic direction both in TransATAC and in POLAR.
Comparison of the 10-year signature with CTS, RS, PAM50 ROR, BCI and IHC4 in TransATAC
We have previously published data on the prognostic performance of CTS, RS, PAM50 ROR, BCI and IHC4 in TransATAC [6, 15, 18, 26]; data for all scores were available for 271 patients in the validation cohort. We assessed their prognostic information for 10 years after surgery using any recurrence and distant recurrence as endpoints, respectively, and compared them with the newly developed 10-year signature (Table 5). For both any and distant recurrence, the BCI provided the most added information beyond the CTS in this set (any recurrence, CTS LRχ2 = 37.4; BCI ΔLRχ2 = 9.5; distant recurrence, CTS LRχ2 = 46.7; BCI ΔLRχ2 = 14.5, respectively). The novel 10-year signature performed similarly to the other three scores in this respect.
We developed novel time-specific prognostic signatures for early, late and 10-year follow-up periods for ER+/HER2− patients treated with endocrine therapy alone to allow us to test the hypothesis that sequentially applying early and late signatures could be more prognostic for risk of relapse than a single newly developed 10-year signature. This hypothesis was based largely on our observation that the performance of some components in many of the commercially available signatures varied between these time periods. For example, we found that ESR1 and the oestrogen module overall in the RS was less prognostic in years 5–10 than in years 0–5 . Analogous findings were reported by Bianchini et al. . Very recently, the Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) published data on clinicopathological and limited immunohistochemical data on over 60,000 women who were treated with 5 years of endocrine therapy . Although progesterone receptor showed strong prognostic performance in years 0–5, it showed no significant relationship with prognosis thereafter. These data on markers associated with hormone responsiveness support the contention, but by no means prove, that cessation of endocrine treatment at 5 years may lead to increased recurrence risk in more hormonally responsive tumours. We therefore included in our assessment genes that we and others have found to be associated with the anti-proliferative response of primary ER+ breast cancer to oestrogen deprivation. Our work involved a discovery set of 747 samples; training and test sets of 634 and 314 TransATAC samples, respectively; and independent case-control series from 1449 eligible samples. As such, this was one of the largest original gene expression analyses undertaken for evaluating prognosis in ER+ breast cancer.
Of the 92 genes selected from microarray data and assessed in univariate analyses in TransATAC, we found 63 to be significantly prognostic (p < 0.05) in any of the three time periods, which is considerably more than expected by chance after allowing for multiple testing errors. For most genes, the same prognostic pattern was observed for early and late periods, however we observed some possibly different prognostic properties for 20 genes. Notably, consistent with the above arguments, higher levels of ESR1 and its pioneer factor FOXA1 showed a shift at 5 years to be associated with worse prognosis beyond 5 years, but surprisingly over the 10-year period, the two genes were associated with poor prognosis. The complementary role whereby upon stimulus ER binding to chromatin is dependent on the presence of FOXA1 is well established . In our dataset, FOXA1 and ESR1 correlated highly (Pearson’s R = 0.65); the possibility that increased expression of one or both may put patients at increased risk of late relapse merits further investigation, particularly with regard to whether the genes also identify patients who benefit from extended adjuvant therapy.
The optimised time-dependent signatures derived in the TransATAC training set were rather similar to one another in makeup. All genes in the 10-year signature featured in either (or both) of the early and late signatures with their coefficients being in the same direction. The early and late signatures had five and three variables, respectively, not present in the 10-year signature, suggesting that the early and late signatures may not have captured time-specific features or that such time-specific features that exist exert a minor modulatory influence on the overall prognosis over 10 years. It is notable that CTS was consistently the most prognostic variable in the three time-dependent models and that its contribution was similar in both early and late recurrence. This is consistent with the data of the EBCTCG that classical clinicopathological features retain their strong prognostic influence beyond 5 years .
Given that the 10-year signature captured prognostic features of both early and late events, it is perhaps not surprising that no improvement was seen in the use of early and late signatures compared with the overall 10-year signature that led to the rejection of our hypothesis. Also, it should be noted that splitting of the 0–10-year time period into 0–5- and 5–10-year periods markedly reduces the power to detect prognostic contributions. At least a contributory factor for the lack of improvement may be the dominance of proliferation-related genes in our and other signatures. As shown in our earlier analysis of the RS, each of the individual proliferation genes and the integrated module are equally prognostic before and after 5 years . Notably, this is also supported by the observation by the EBCTCG that Ki-67 was equally prognostic before and after 5 years in their overview analysis of late recurrence .
The 10-year signature was nonetheless validated in the POLAR sample set and provided significant prognostic information in both chemotherapy-naive and chemotherapy-treated cohorts. Moreover, it added independent prognostic information beyond that of CTS in the POLAR cohort. Comparison of the information provided by each gene showed that 8 of the 18 genes were significantly prognostic at univariate level in POLAR (4 genes at P < 0.05, 2 genes at P < 0.01 and 3 genes at P < 0.001). TNF showed an opposite prognostic direction in training and validation sets, thus weakening the performance of the signature in POLAR. TNF is a versatile pro-inflammatory cytokine that has both pro- and anti-tumour activities promoting lymphocytic infiltration and activating the nuclear factor-κB, c-Jun N-terminal kinase and mitogen-activated protein kinase pathways, and it is capable of inducing apoptosis through TNF receptors 1 and 2 . It may be that the inclusion of higher-risk, chemotherapy-treated patients in POLAR contributed to the difference in TNF’s prognostic pattern; further investigation is needed to explain the relationship of TNF and risk of relapse in these cohorts.
The 10-year signature was compared with established prognostic signatures in the TransATAC validation set. Importantly, the 10-year signature was developed for the endpoint of any recurrence contrary to the endpoint of distant recurrence used in the development of RS, PAM50, ROR, BCI and IHC4. In univariate assessments, BCI and the 10-year signatures were the most informative for both all and distant recurrence. When added to CTS, all signatures assessed provided similar amounts of information, with CTS + BCI being the most informative for distant recurrence. This new signature did not outperform the established signatures, even though it was based on a large and wide-ranging analysis of both established prognostic genes and novel genes with a clear rationale for inclusion. Larger studies may be needed to fully optimise novel prognostic signatures with improved prognostic information, however the data from our studies indicate that the gain is unlikely to be large. Other approaches that assess response to treatment or integrate mutational and DNA copy number profiles or by the use of circulating tumour DNA are likely to be more fruitful.
The results presented here support the mounting evidence that better risk estimation can be achieved by combining molecular profilers with clinicopathological factors. For the three time-dependent signatures derived in TransATAC, CTS was the most prognostic in all three time-dependent signatures and provided more prognostic information than RS, ROR, BCI and IHC4, respectively. Additionally, all profilers added significant prognostic information to CTS, leading to combined signatures being significantly more informative. There is emerging evidence for genetic differences affecting outcome amongst various racial groups . Although this is an important question with practical consequences, the cohort presented here was > 99% Caucasian and did not provide us with the opportunity to examine within TransATAC.
Our study has strengths and limitations. An advantage was that a large discovery cohort of 634 samples was used for signature training. All tumours were ER+/HER2− from post-menopausal patients who had received 5 years of endocrine therapy without chemotherapy. This was a homogeneous group of breast cancers, which reduced confounding factors such as tumour subtype and differing treatment lengths and types. Data for the clinical prognostic tests were obtained by the same methods as set out by the tests’ developers. The same batch of RNA was used for the newly developed signatures presented here and for the clinical prognostic tests used in the comparisons, reducing intra-sample variation. The clinical data were derived from a registration standard trial with comprehensive follow-up over 10 years. Limitations include that the candidate gene selection based on microarray data and associated clinical information from multiple studies did not allow the assessment of candidates by taking multiple clinical variables into account; this may have limited the performance of derived signatures that ultimately included CTS as a variable. Also, CTS, IHC4 and the 10-year signature were derived in TransATAC; therefore, their performance in the comparisons was slightly overestimated compared with what we would see in independent cohorts. Finally, although this study was relatively large compared with others, the splitting of the data into early and late signatures decreased the statistical power for comparisons within those time periods. The approach we have taken is likely to have somewhat overfitted the 10-year signature to the TransATAC population. An alternative approach for the derivation and validation of the 10-year signature would have been to fit the signature to the whole of the TransATAC cohort and validate it in the POLAR cohort. However, the approach we took allowed the comparison of the 10-year signature with commercially available signatures in the TransATAC test set. Had the 10-year signature not at least matched these, it would not have been worth proceeding further.
In summary, we found that early and late signatures are unlikely to be more informative for predicting relapse than a single signature optimised for 10 years. Larger studies may be needed to fully optimise novel gene expression signatures for prognosis in endocrine-treated ER+ patients with breast cancer, however a substantial improvement in performance is unlikely.
Arimidex, Tamoxifen, Alone or in Combination
Breast Cancer Index
Clinical Treatment Score
Distant metastasis-free survival
Early Breast Cancer Trialists’ Collaborative Group
Oestrogen receptor-related score
Genomic Health Inc.
Score based on four IHC markers
Lund University Hospital
Mitotic kinase score
Nottingham Prognostic Index
Predictors Of early versus LAte Recurrence in ER+ breast cancer
Royal Marsden Hospital
Risk of recurrence
Oncotype DX Recurrence Score
Tumour necrosis factor
Early Breast Cancer Trialists’ Collaborative Group (EBCTCG). Aromatase inhibitors versus tamoxifen in early breast cancer: patient-level meta-analysis of the randomised trials. Lancet. 2015;386(10001):1341–52.
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–26.
Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160–7.
Ma XJ, Salunga R, Dahiya S, Wang W, Carney E, Durbecq V, Harris A, Goss P, Sotiriou C, Erlander M, et al. A five-gene molecular grade index and HOXB13:IL17BR are complementary prognostic factors in early stage breast cancer. Clin Cancer Res. 2008;14(9):2601–8.
Filipits M, Rudas M, Jakesz R, Dubsky P, Fitzal F, Singer CF, Dietze O, Greil R, Jelen A, Sevelda P, et al. A new molecular predictor of distant recurrence in ER-positive, HER2-negative breast cancer adds independent information to conventional clinical risk factors. Clin Cancer Res. 2011;17(18):6012–20.
Cuzick J, Dowsett M, Pineda S, Wale C, Salter J, Quinn E, Zabaglo L, Mallon E, Green AR, Ellis IO, et al. Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer. J Clin Oncol. 2011;29(32):4273–8.
Forbes JF, Cuzick J, Buzdar A, Howell A, Tobias JS, Baum M. Effect of anastrozole and tamoxifen as adjuvant treatment for early-stage breast cancer: 100-month analysis of the ATAC trial. Lancet Oncol. 2008;9(1):45–53.
Dowsett M, Sestak I, Buus R, Lopez-Knowles E, Mallon E, Howell A, Forbes JF, Buzdar A, Cuzick J. Estrogen receptor expression in 21-gene recurrence score predicts increased late recurrence for estrogen-positive/HER2-negative breast Cancer. Clin Cancer Res. 2015;21(12):2763–70.
Bianchini G, Pusztai L, Karn T, Iwamoto T, Rody A, Kelly C, Muller V, Schmidt S, Qi Y, Holtrich U, et al. Proliferation and estrogen signaling can distinguish patients at risk for early versus late relapse among estrogen receptor positive breast cancers. Breast Cancer Res. 2013;15:R86.
Wolmark N, Mamounas EP, Baehner FL, Butler SM, Tang G, Jamshidian F, Sing AP, Shak S, Paik S. Prognostic impact of the combination of recurrence score and quantitative estrogen receptor expression (ESR1) on predicting late distant recurrence risk in estrogen receptor-positive breast cancer after 5 years of tamoxifen: results from NRG Oncology/National Surgical Adjuvant Breast and Bowel Project B-28 and B-14. J Clin Oncol. 2016;34(20):2350–8.
Goss PE, Ingle JN, Martino S, Robert NJ, Muss HB, Piccart MJ, Castiglione M, Tu D, Shepherd LE, Pritchard KI, et al. A randomized trial of letrozole in postmenopausal women after five years of tamoxifen therapy for early-stage breast cancer. N Engl J Med. 2003;349(19):1793–802.
Davies C, Pan H, Godwin J, Gray R, Arriagada R, Raina V, Abraham M, Medeiros Alencar VH, Badran A, Bonfill X, et al. Long-term effects of continuing adjuvant tamoxifen to 10 years versus stopping at 5 years after diagnosis of oestrogen receptor-positive breast cancer: ATLAS, a randomised trial. Lancet. 2013;381(9869):805–16.
Dubsky P, Brase JC, Jakesz R, Rudas M, Singer CF, Greil R, Dietze O, Luisser I, Klug E, Sedivy R, et al. The EndoPredict score provides prognostic information on late distant metastases in ER+/HER2− breast cancer patients. Br J Cancer. 2013;109(12):2959–64.
Sestak I, Dowsett M, Zabaglo L, Lopez-Knowles E, Ferree S, Cowens JW, Cuzick J. Factors predicting late recurrence for estrogen receptor-positive breast cancer. J Natl Cancer Inst. 2013;105(19):1504–11.
Sgroi DC, Sestak I, Cuzick J, Zhang Y, Schnabel CA, Schroeder B, Erlander MG, Dunbier A, Sidhu K, Lopez-Knowles E, et al. Prediction of late distant recurrence in patients with oestrogen-receptor-positive breast cancer: a prospective comparison of the breast-cancer index (BCI) assay, 21-gene recurrence score, and IHC4 in the TransATAC study population. Lancet Oncol. 2013;14(11):1067–76.
Sestak I, Cuzick J, Dowsett M, Lopez-Knowles E, Filipits M, Dubsky P, Cowens JW, Ferree S, Schaper C, Fesl C, et al. Prediction of late distant recurrence after 5 years of endocrine treatment: a combined analysis of patients from the Austrian Breast and Colorectal Cancer Study Group 8 and Arimidex, Tamoxifen Alone or in Combination randomized trials using the PAM50 risk of recurrence score. J Clin Oncol. 2015;33(8):916–22.
Zhao X, Rodland EA, Sorlie T, Vollan HK, Russnes HG, Kristensen VN, Lingjaerde OC, Borresen-Dale AL. Systematic assessment of prognostic gene signatures for breast cancer shows distinct influence of time and ER status. BMC Cancer. 2014;14:211.
Dowsett M, Cuzick J, Wale C, Forbes J, Mallon EA, Salter J, Quinn E, Dunbier A, Baum M, Buzdar A, et al. Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study. J Clin Oncol. 2010;28(11):1829–34.
Dunbier AK, Ghazoui Z, Anderson H, Salter J, Nerurkar A, Osin P, A’Hern R, Miller WR, Smith IE, Dowsett M. Molecular profiling of aromatase inhibitor-treated postmenopausal breast tumors identifies immune-related correlates of resistance. Clin Cancer Res. 2013;19(10):2775–86.
Mackay A, Urruticoechea A, Dixon JM, Dexter T, Fenwick K, Ashworth A, Drury S, Larionov A, Young O, White S, et al. Molecular response to aromatase inhibitor treatment in primary breast cancer. Breast Cancer Res. 2007;9(3):R37.
Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, Delorenzi M, Piccart M, Sotiriou C. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res. 2008;14(16):5158–65.
Fan C, Prat A, Parker JS, Liu Y, Carey LA, Troester MA, Perou CM. Building prognostic models for breast cancer patients using clinical variables and hundreds of gene expression signatures. BMC Med Genet. 2011;4:3.
Loi S, Haibe-Kains B, Desmedt C, Lallemand F, Tutt AM, Gillet C, Ellis P, Harris A, Bergh J, Foekens JA, et al. Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol. 2007;25(10):1239–46.
Loi S, Haibe-Kains B, Desmedt C, Wirapati P, Lallemand F, Tutt AM, Gillet C, Ellis P, Ryder K, Reid JF, et al. Predicting prognosis using molecular profiling in estrogen receptor-positive breast cancer treated with tamoxifen. BMC Genomics. 2008;9:239.
Symmans WF, Hatzis C, Sotiriou C, Andre F, Peintinger F, Regitnig P, Daxenbichler G, Desmedt C, Domont J, Marth C, et al. Genomic index of sensitivity to endocrine therapy for breast cancer. J Clin Oncol. 2010;28(27):4111–9.
Dowsett M, Sestak I, Lopez-Knowles E, Sidhu K, Dunbier AK, Cowens JW, Ferree S, Storhoff J, Schaper C, Cuzick J. Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy. J Clin Oncol. 2013;31(22):2783–90.
Pan H, Gray R, Braybrooke J, Davies C, Taylor C, McGale P, Peto R, Pritchard KI, Bergh J, Dowsett M, Hayes DF, EBCTCG. 20-Year risks of breast-cancer recurrence after stopping endocrine therapy at 5 years. N Engl J Med. 2017;377(19):1836–46.
Hurtado A, Holmes KA, Ross-Innes CS, Schmidt D, Carroll JS. FOXA1 is a key determinant of estrogen receptor function and endocrine response. Nat Genet. 2011;43(1):27–33.
Wang X, Lin Y. Tumor necrosis factor and cancer, buddies or foes? Acta Pharmacol Sin. 2008;29(11):1275–88.
Troester MA, Sun X, Allott EH, Geradts J, Cohen SM, Tse CK, Kirk EL, Thorne LB, Mathews M, Li Y, et al. Racial differences in PAM50 subtypes in the Carolina Breast Cancer Study. J Natl Cancer Inst. 2018;110(2):176–82.
We acknowledge Professor Mårten Fernö and Professor Per Malmström for their work in providing data and samples for the Lund cohort. We thank Genomic Health Inc., NanoString Technologies and BioTheranostics for the data of their respective gene signatures. The authors acknowledge Kabir Mohammed’s work on calculating C-indices in the POLAR cohort.
This work was supported by Breast Cancer Now working in partnership with Walk the Walk, as well as by the National Institute for Health Research Royal Marsden/ICR Biomedical Research Centre. ARB was funded by Cancer Research UK (grant number C569/A16891). The study was supported by funds from Skåne County Council’s Research and Development Foundation, Governmental Funding of Clinical Research within the National Health Service (grant number ALFSKANE-350191 [to MK]), the Swedish Breast Cancer Association (BRO), the Mrs Berta Kamprad Foundation and The Inger Persson Research Foundation. IS and JC were supported by Cancer Research UK (programme grant C569/A10404).
Availability of data and materials
Please contact the corresponding author for data requests.
Ethics approval and consent to participate
The TransATAC study was approved by the South-East London Research Ethics Committee, and all patients gave informed consent. The POLAR study was approved by the RMH Research Ethics Committee (CCR: 4122) and the ethics committee of Lund University Hospital (LU 240-01), and all patients gave informed consent.
Consent for publication
MCUC reports patents, royalties, other intellectual property: PAM50 patent. The other authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Methods. Additional methods. (DOCX 21 kb)
Table S1. List of 585 candidate genes. Table S2. List and identifiers for the 747-patient microarray expression data cohort. Table S3. List of 454 Affymetrix probes studied. Table S4. List of 212 genes significantly prognostic (p < 0.01) in any of the three time periods in the microarray data. Table S5. List of 88 genes by multivariable selections in any of the three time periods in the microarray data. Nodal status was used as a covariate in the regressions. Table S6. List of 17 genes manually removed from the multivariable list. Table S7. List of 29 genes added to the candidate list. Table S8. Details of the 100-probe NanoString code set used in TransATAC. Table S9. HRs, CIs and p values for the 92 genes assessed in TransATAC in univariate analyses. (XLSX 130 kb)
Figure S1. Forest plot of HRs and CIs for the 92 genes assessed in TransATAC in univariate analyses. Asterisk denotes significance. (PDF 1497 kb)
Table S10. Likelihood ratio (LR) χ2 and p values for CTS and 10-year signature in three groups of POLAR validation set for 0–5 and 5–10 years of follow-up. Both univariate and multivariable analyses are presented for years 0–10, years 0–5, and years 5–10 separately. LR test based on Cox proportional hazards models for univariate and multivariable analyses. Differences in LR values (ΔLRχ2) were used. CTS was used as a covariate in the multivariable regressions. POLAR Molecular Predictors Of early versus LAte Recurrence in ER-positive breast cancer, CTS Clinical Treatment Score. (DOCX 15 kb)