Skip to main content
  • Research article
  • Open access
  • Published:

Prognostic value of automated KI67 scoring in breast cancer: a centralised evaluation of 8088 patients from 10 study groups

Abstract

Background

The value of KI67 in breast cancer prognostication has been questioned due to concerns on the analytical validity of visual KI67 assessment and methodological limitations of published studies. Here, we investigate the prognostic value of automated KI67 scoring in a large, multicentre study, and compare this with pathologists’ visual scores available in a subset of patients.

Methods

We utilised 143 tissue microarrays containing 15,313 tumour tissue cores from 8088 breast cancer patients in 10 collaborating studies. A total of 1401 deaths occurred during a median follow-up of 7.5 years. Centralised KI67 assessment was performed using an automated scoring protocol. The relationship of KI67 levels with 10-year breast cancer specific survival (BCSS) was investigated using Kaplan–Meier survival curves and Cox proportional hazard regression models adjusted for known prognostic factors.

Results

Patients in the highest quartile of KI67 (>12 % positive KI67 cells) had a worse 10-year BCSS than patients in the lower three quartiles. This association was statistically significant for ER-positive patients (hazard ratio (HR) (95 % CI) at baseline = 1.96 (1.31–2.93); P = 0.001) but not for ER-negative patients (1.23 (0.86–1.77); P = 0.248) (P-heterogeneity = 0.064). In spite of differences in characteristics of the study populations, the estimates of HR were consistent across all studies (P-heterogeneity = 0.941 for ER-positive and P-heterogeneity = 0.866 for ER-negative). Among ER-positive cancers, KI67 was associated with worse prognosis in both node-negative (2.47 (1.16–5.27)) and node-positive (1.74 (1.05–2.86)) tumours (P-heterogeneity = 0.671). Further classification according to ER, PR and HER2 showed statistically significant associations with prognosis among hormone receptor-positive patients regardless of HER2 status (P-heterogeneity = 0.270) and among triple-negative patients (1.70 (1.02–2.84)). Model fit parameters were similar for visual and automated measures of KI67 in a subset of 2440 patients with information from both sources.

Conclusions

Findings from this large-scale multicentre analysis with centrally generated automated KI67 scores show strong evidence in support of a prognostic value for automated KI67 scoring in breast cancer. Given the advantages of automated scoring in terms of its potential for standardisation, reproducibility and throughput, automated methods appear to be promising alternatives to visual scoring for KI67 assessment.

Background

Despite endorsements by several international guidelines [1, 2] KI67 is yet to gain widespread application as a prognostic and/or predictive marker in breast cancer [3]. This is due, largely, to methodological variability in KI67 scoring (such as antibody type, specimen type, type of fixative, antigen retrieval methods, method of scoring, etc.), and limitations in the design and analyses of studies that have reported on this marker [3–7].

In the majority of settings, KI67 is evaluated visually by a pathologist even though there is yet to be consensus regarding which regions to score between the invasive edge, hot spots or the entire spectrum of the whole section or tumour core [8]. As a result, both the intra-observer and, especially, the inter-observer reproducibility of visually derived KI67 scores have been shown to be poor [9–11]. This has not only hampered inter-study comparability for KI67, but has fuelled concerns regarding its analytical validity [3]. To address some of the methodological issues related to KI67 assessment, the International KI67 in Breast Cancer Working Group published recommendations aimed at the standardisation of the analytical processes for KI67 evaluation [8]. This panel, however, fell short of making recommendations regarding the preferred method of scoring for KI67 between visual and automated. Several reports suggest that automated methods could address some of the problems associated with visual scoring [11–19]. These methods are high throughput and are not limited by intra-observer variability. However, concerns exist regarding the accuracy of automated methods and the prognostic power of KI67 derived using these methods relative to that derived visually by pathologists. Few relatively small studies have reported a head-to-head comparison between scores derived using both methods in terms of prognostic properties, and the results from these are conflicting [11, 17–19].

The majority opinion regarding the prognostic property of KI67 derives mostly from reviews and meta-analyses, which support its prognostic role in breast cancer [4–7, 20]. The meta-analyses by de Azambuja et al. [6] involving 12,155 patients and by Stuart-Harris et al. [7] which included over 15,000 patients represent two comprehensive analyses on this subject. These are limited, however, by reported evidence of publication bias, by significant between-study heterogeneity and by the fact that most of the included studies utilised different methodological approaches for KI67 evaluation. Furthermore, while the analysis by de Azambuja et al. [6] was limited by its inclusion of only univariate hazard ratios, that by Harris et al. [7] was limited by the small intersection between the sets of covariates in the included studies. In a population-based cohort of a cancer registry, Inwald et al. [21] examined the prognostic role of KI67 in 3658 patients for whom KI67 was routinely measured in clinical practice and reported significant associations between KI67 and overall survival [21]. An important strength of this analysis was that it utilised routinely assessed KI67 measurements in a clinical setting. But this was also limited by the heterogeneity of the KI67 analytical processes in the different laboratories involved in the study. Nonetheless, KI67 has found use in a variety of clinical and epidemiological scenarios, including its endorsement by a number of international guidelines for use in treatment decision-making in ER-positive breast cancer [1, 2] and its incorporation as part of emerging prognostic tools such as the IHC4 score [22, 23] and PREDICT, a breast cancer treatment benefit tool [24].

In this study, we evaluate the value and robustness of automated scoring of KI67 for large-scale, multicentre studies of breast cancer prognostication. We centrally generated an automated KI67 score from stained tissue microarrays (TMAs), and assessed its prognostic value overall for different subtypes of breast cancer. We also compared the prognostic performance of automated and visually derived KI67 scores in a subset of patients.

Methods

Study population and study design

This analysis was conducted within the Breast Cancer Association Consortium [25], which is a large, ongoing collaborative project involving study groups across the globe. Figure 1 shows that we collected a total of 166 TMAs containing 19,039 cores, representing 10,005 patients from 13 study groups (Additional file 1: Table S1). Ten study groups provided unstained TMAs which were then stained and digitised in the Breakthrough Core Pathology laboratory at the Institute of Cancer Research (ICR) and the academic biochemistry laboratory of the Royal Marsden Hospital (RMH), London, UK. Two groups (MARIE and PBCS) provided pre-stained TMAs which were also digitised in our centre. One study (SEARCH) provided TMA images acquired using a similar Ariol technology (a digital image acquisition and analysis system) to the one adopted for this analysis. Of the 10,005 patients, 1917 were excluded on account of failing predefined quality control checks (N = 946) or due to absent data on follow-up times and/or vital status (N = 971). As a result, a total of 8088 patients from 10 study groups with a median follow-up of 7.5 years and a total of 1401 breast cancer specific deaths were used in the survival analysis involving automated KI67. Of these, 2440 patients with pathologists’ visual KI67 scores in addition to automated KI67 scores were used to extrapolate a visual from an automated cut-off point, following which comparative survival analyses involving visual and automated KI67 scores were conducted. Information on other clinico-pathological characteristics of tumours including histological grade, nodal status, tumour size, stage, adjuvant systemic therapy (endocrine therapy and/or chemotherapy) and other IHC markers (i.e. oestrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2)) were obtained from clinical records. Additional Ariol HER2 data were obtained for a subset of patients with missing clinical HER2 data but for whom data on ER and PR were available (N = 403). All patients provided written informed consent and all participating studies gained approval from the local ethical committees and institutional review boards.

Fig. 1
figure 1

Study population and study design. We collected 166 TMAs containing 19,039 cores from 10,005 patients. Of these, 15 TMAs containing 1346 cores were selected as the training set and these were used to develop an automated scoring protocol that was validated against corresponding computer-assisted visual (CAV) scores. Ultimately, this protocol was applied to the scoring of all 166 TMAs. Following automated scoring, all cores that failed our priori defined quality control checks (including total nuclei count >50 and <15,000, and KI67 score = 100 %) were excluded (N = 946 patients). For the purpose of survival analyses, all subjects with missing follow-up/survival data were also excluded (N = 971 patients). As a result, a total of 8088 patients were used in the survival analysis involving automated KI67 score. Furthermore, based on a subset of patients (N = 2440) with pathologists’ KI67 scores in addition to the automated KI67 scores, we extrapolated a visual from an automated cut-off point and used this to compare the prognostic performance of visual and automated KI67 scores in breast cancer. QC quality control, TMA tissue microarray

KI67 immunostaining

Sections were dewaxed using xylene and rehydrated through graded alcohol (100, 90 and 70 %) to water. Slides were then placed in a preheated (5 min, 800 W microwave) solution of Dako Target Retrieval solution pH 6.0 (S1699), microwaved on high power for 10 min and then allowed to cool in this solution at room temperature for 10 min. In the next stage, the slides were placed on a Dako autostainer and stained using a standard protocol with Dako MIB-1 diluted 1/50 and visualised using the Dako REAL kit (K5001). The MIB-1 antibody was also adopted for the staining of the TMAs that were not part of those centrally stained at the ICR, but at varying concentrations (PBCS = 1:500; MARIE = 1:400 and SEARCH = 1:200) (Additional file 1: Table S2).

KI67 scoring

KI67 scoring has been described previously [26], but briefly all TMAs were digitised using the Ariol 50s digital scanner. Fifteen TMAs were selected as a training set. These were scored visually by a pathologist (MA) using a computer-assisted visual (CAV) counting method and used to validate the automated method. The CAV method relied upon built-in features of the Ariol digital system to count negative and positive nuclear populations within 250 μm × 250 μm squares separated by grids. The standard CAV approach entailed the counting of at least 1000 cells across the entire spectrum of each core. In the majority of cores, more than 1000 cells were counted even though fewer than this number was counted in a small minority. Overall, cores with more than 500 cells were considered to be of satisfactory quality. The CAV method is precise, prevents double counting and was observed to have excellent intra-observer reproducibility when a random subset of cores (N = 111) were re-scored at an interval of 3 months from the first time they were scored (observed agreement/kappa = 96 %/0.90); good core-level agreements with two other independent scorers (observed agreement/kappa: CAV vs scorer 2 = 87 %/0.66; CAV vs scorer 3 = 84 %/0.59; scorer 2 vs scorer 3 = 89 %/0.69) were also observed in a randomly selected subset of 202 cores. Visual scoring in the external TMAs involved both quantitative and semi-quantitative methods. Each core from each patient was scored by two independent pathologists and the KI67 score for each patient was then taken as the average score from the two scorers across all cores for that patient.

The automated scoring was performed using the Ariol machine which has functionality that allows for the automatic detection of malignant and non-malignant nuclei using shape and size characteristics. Using colour deconvolution, it also distinguishes between DAB-positive and DAB-negative (haematoxylin) malignant cells. To determine the negative and positive populations of cells, an appropriate region of interest of the malignant cell population in a core was demarcated and two colours were selected to indicate positive and negative nuclear populations. The appropriate colour pixels were then selected to represent the full range of hue, saturation and intensity that was considered representative of the positive and negative nuclear classes [26]. Subsequently, the best shape parameters that discriminated malignant and non-malignant cells according to their spot width, width, roundness, compactness and axis ratio were also selected. The data were divided into a training and a validation subset and the automated and visual scoring for KI67 showed good agreement (observed agreement = 87 %; Kappa = 0.64) and discriminatory accuracy (AUC = 85 %) in the validation subset, hence allowing for the adoption of this method for the scoring of all 166 TMAs.

Statistical methods

For patients with multiple cores from the same tumour, we used the average KI67 score across valid cores to represent the % positive cells in that tumour. Descriptive analyses of the distribution of KI67 according to clinical and pathological characteristics of the patients were conducted using the non-parametric Kruskal–Wallis equality of medians test for continuous measures and the paired chi-squared test for categorical measures. The relative survival probabilities for patients in different quartiles of the KI67 distribution were compared using Kaplan–Meier survival curves for the 10-year breast cancer specific survival (BCSS). To allow for prevalent cancers, time at risk was left-censored for study entry. It was decided, a priori, not to make any assumptions on a prognostic cut-off point for automated KI67 scores in our dataset but instead to leverage on the continuous values to observe a prognostic threshold. As a result, we performed quartile analysis by dividing the continuous KI67 scores into quartiles (Q1–Q4) and examining the prognostic differences among the different quartiles for all patients in the study. The 10-year BCSS was determined using Kaplan–Meier survival curves and Cox-proportional hazards regression models stratified by ER status (positive vs negative) and according to nodal status (positive vs negative) and other IHC markers. The univariate Cox models were partially adjusted for study group and age at diagnosis while the multivariate models had further adjustments for other known prognostic factors including histological grade, tumour size, nodal status, morphology, ER, PR, HER2 and adjuvant systemic therapy (endocrine and/or chemotherapy). In the multivariate models, missing values for other covariates were addressed using the multiple imputation plus outcome (MI+) approach [27]. Because of observed violation of the proportionality assumption of the Cox model by automated KI67, it was modelled as a time-varying covariate using an extension of the Cox model that allows for the inclusion of a coefficient (T) that varied as an exponential function of time. The log of the coefficient is indicative of both the direction and the magnitude of change in hazard ratio with time, such that if log T < 1 then hazard falls with time, while if log T > 1 then hazard increases with time. Known violation of the proportional hazards assumption by ER was addressed in the same way. Consistency of hazard ratio (HR) estimates across the different study groups was evaluated using the I 2 statistic, derived by performing a fixed-effect meta-analysis of study-specific HR estimates. To enable direct comparison between the visual and automated KI67 scores, we extrapolated a visual from an automated cut-off point in a linear regression model and used the resulting cut-off point for all further analyses. All analyses were conducted using STATA statistical software version 10 (StataCorp, College Station, TX, USA). Statistical tests were two-sided and P < 0.05 was considered statistically significant.

Results

Description of study population and association between automated KI67 score and other clinico-pathological characteristics of breast cancer patients (N = 8088)

In all, a total of 143 TMAs containing 15,313 cores from 8088 patients were used in this analysis, as shown in Fig. 1. The studies included in this analysis used different TMA designs (Table 1). More than half (4431/55 %) of the patients had KI67 scores on at least two cores and evaluation of dichotomous categories revealed concordant KI67 status in 83.7 % of the patients. When we examined the distribution of continuous KI67 scores among categories of the different clinical and pathological characteristics we observed this to differ according to histological grade, tumour size, morphology, ER status, PR status and HER2 status, but not nodal status or stage at diagnosis (Fig. 2). The distribution of these characteristics for patients with high KI67 (Q4 or >12 % positive cells) and low KI67 (Q1–Q3) are shown in Additional file 1: Tables S3 and S4 for ER-positive and ER-negative patients, respectively.

Table 1 Description of study populations, TMA designs and patient characteristics for the 8088 patients included in this analysis
Fig. 2
figure 2

Distribution of continuous KI67 scores according to categories of other clinical and pathological variables. Significant differences were seen in the distribution of automated KI67 scores according to categories of histological grade, tumour size, morphology, ER status, PR status and HER2 status, but not nodal status or stage. ER oestrogen receptor, HER2 human epidermal growth factor receptor 2, PR progesterone receptor

Association between automated KI67 score and 10-year BCSS among 8088 patients

Using continuous measures of KI67 categorised into quartiles, we observed poorest survival in the highest quartile, corresponding to 12 % positive cells, but little difference in survival between the other three (Q1–Q3) quartiles (log-rank P = 1.2 × 10−5; Fig. 3a). As a result, the continuous KI67 value was dichotomised at the threshold of 12 % in subsequent analyses. High KI67 was significantly associated with worse 10-year BCSS overall (log-rank P = 3.1 × 10−7) among ER-positive cancers (log-rank P = 1.3 × 10−3) but not ER-negative cancers (log-rank P = 0.35) (Fig. 3b–d, respectively). Similarly, in multivariate models, high KI67 expression was significantly associated with worse 10-year BCSS among ER-positive cancers (HR at baseline = 1.96; 95 % CI = 1.31–2.93) but not ER-negative breast cancers (HR = 1.23; 95 % CI = 0.86–1.77; P-heterogeneity = 0.064) (Table 2). Further stratification of ER-positive cancers according to nodal status showed that high KI67 was associated with worse survival in both node-negative and node-positive cancers in multivariate analysis (node-negative 2.47 (1.16–5.27); node-positive 1.74 (1.05–2.86); P-heterogeneity = 0.67) (Table 2). The association between KI67 and survival was significant among ER-positive patients who did not receive chemotherapy (1.95 (1.18–3.21); P = 0.009) but not among those who did (1.89 (0.84–4.29); P = 0.124; P-heterogeneity = 0.60). We found no evidence of between-study heterogeneity in estimates of HR for ER-positive patients (I 2 = 0.0 %, P = 0.94) or ER-negative patients (I 2 = 0.0 %, P = 0.86) (Additional file 2: Figure S1). Among hormone receptor-positive breast cancers, the HR for KI67 was not significantly different according to HER2 status (Table 2; P-heterogeneity = 0.270). Modest evidence for a poorer prognosis among high, relative to low, KI67 was also seen for triple-negative breast cancers (1.70 (1.02–2.84); P = 0.04). No significant associations with prognosis were found for KI67 among HER2-positive (i.e. ER–/PR–/HER2+) breast cancers (0.91 (0.60–1.36)) (Table 2).

Fig. 3
figure 3

Kaplan–Meier survival curves for the 10-year BCSS according to strata of automated KI67 scores, overall and by ER status. KM survival curves for the association between KI67 and 10-year BCSS among: (a) quartiles of KI67 (Q1, <25th percentile; Q2, 25th–50th percentile; Q3, >50th to 75th percentile; and Q4, >75th percentile; N = 8088); (b) dichotomous categories of KI67 (≤12 %/low and >12 %/high) overall (N = 8088 patients); (c) ER-positive cancers (N = 5520 patients); and (d) ER-negative cancers (N = 2049 patients)

Table 2 Hazard ratio (HR) and 95 % CI for the association between automated KI67 score and 10-year BCSS in partially and fully adjusted models: analysis stratified overall and according to ER, nodal status and other immunohistochemical markers (N = 8088 patients)

Comparison of 10-year BCSS among 2440 patients with both visual and automated quantitative KI67 scores

The automated cut-off point of 12 % positive cells corresponded to a visual cut-off point of 24.2 % based on a linear regression model comprising patients with quantitative data on both methods. The visual cut-off was rounded up to a cut-off point of 25 %. Strong evidence (P < 0.0001) in support of a positive linear correlation (r = 0.63) between automated and visual scores was observed and continuous automated scores showed good discriminatory accuracy against the visually determined binary classes (AUC = 82 %, 95 % CI = 80–84 %)(Additional file 3: Figure S2). Twenty-six percent of the patients were classified as having high visual KI67, in contrast to 29 % for the automated KI67 scores; cross-classification of visual and automated categories revealed better specificity (84 %) than sensitivity (65.6 %) for the automated score in classifying visually determined categories (Additional file 1: Table S5). High KI67 was associated with worse survival in Kaplan–Meier curves based on both automated (log-rank P = 9.8 × 10−6) and visual (log-rank P = 3.8 × 10−14) KI67 scores even though attenuation of the difference between strata was observed for automated KI67 scores (Additional file 4: Figure S3). In two separate models for visual and automated KI67 scores each adjusted for age at diagnosis and study group we observed stronger evidence for an association between KI67 and survival for the visual KI67 score than for the automated KI67 score (Table 3). Analysis of model fit revealed similar parameters for both scores, however, especially in ER-positive breast cancers (AIC/BIC: visual = 2656/2618; automated = 2675/2638) (Table 3). When we performed further adjustments for other prognostic factors in multivariate Cox models of imputed datasets, we observed both visual and automated KI67 scores to be significantly associated with survival for all patients (HR (95 % CI): visual = 1.75 (1.23–2.49); automated = 1.61 (1.14–2.28)) and for ER-positive patients (visual = 2.30 (1.34–3.94); automated = 2.10 (1.28–3.47)), but not for ER-negative patients (visual = 1.63 (0.97–2.72); automated = 1.28 (0.79–2.05)) (Table 3).

Table 3 Univariate (partially adjusted) and multivariate (fully adjusted) hazard ratio (HR) and 95 % CI for the associations between automated and visual KI67 scores with survival in breast cancer (N = 2440)

Discussion

Findings from our analysis provide strong evidence in support of a prognostic relationship for automated KI67 scoring in ER-positive (node-negative and node-positive) patients that is independent of tumour grade and other prognostic factors. Even though our data suggested a larger magnitude of the association between KI67 and survival among the node-negative patients, the difference between node-positive and node-negative was not statistically significant. Involving over 8000 patients from multiple centres internationally, this represents the largest study that has evaluated the prognostic value of automated KI67 scoring in breast cancer to date. Furthermore, the large sample size allowed us to evaluate its prognostic value in a number of breast cancer subtypes including ER+ (node-negative and node-positive), ER–, ER+ and/or PR+ (HER2+ or HER2–), ER–/PR– and HER2+ (i.e. HER2-enriched) and triple-negative breast cancers.

Our findings suggest that automated KI67 scoring is an analytically valid approach to generating KI67 scores. This is particularly noteworthy given the growing need to incorporate measures of KI67 in prognostic tools such as the IHC4 score and PREDICT [23, 24]. These tools are relatively cheap, readily available and utilise routinely measured IHC markers and, in the case of PREDICT, other routinely available patient data to provide information that can help clinicians and patients make informed decisions regarding the course of treatment. It is acknowledged that prognostication in breast cancer is becoming increasingly more sophisticated and that a number of multigene assays [28, 29] have been validated for this purpose; however, their costs and proprietary concerns limit their use in a large number of settings. Moreover, findings from previous studies suggest that some multigene assays may not perform better than routinely measured IHC markers. For instance, Cuzick et al. [23] reported similar prognostic properties for the Genomic Health recurrence score (GHI-RS, Oncotype DX), a 21-gene panel test, and the IHC4 score in their analysis of 1125 women from the TransATAC study, and notably KI67 was assessed by image analysis in that study [23]. Nonetheless, the relative performance of visual and automated KI67 scores in relation to the IHC4 score or PREDICT can only be assessed in studies that are specifically designed for that purpose.

In addition to lack of analytical validity, the prognostic performance of KI67 has also been questioned due to the design and analysis of studies that have reported previously on this protein [3]. Our evaluation is a large-scale, multicentre analysis which has adopted the recommended laboratory processes for the staining and scoring of KI67 [8]. All TMAs in our analysis were stained using the MIB1 antibody (even though not all of them were centrally stained in our centre) and scored using a single automated algorithm. Our estimates of ~2-fold and ~1.5-fold increased risk of mortality at baseline for high versus low KI67 in univariate and multivariate analyses, respectively, are similar to those reported by de Azambuja et al. (HR = 1.95) and Harris et al. (HR = 1.42) [6, 7] in their univariate and multivariate meta-analyses, respectively. Stratification of our analysis according to other IHC markers (in addition to ER) showed automated KI67 to be prognostic in hormone receptor-positive cancers. These findings, together with our observation of the prognostic value of KI67 in both node-negative and node-positive ER-positive patients, support the decision by the St Gallen International Expert Consensus to endorse KI67 for treatment decision-making in ER-positive early (1–3 axillary nodes) breast cancer patients [1]. We also observed modest evidence in support of poorer survival outcomes among high, relative to low, KI67 expressing triple-negative subtypes of breast cancer. This finding is in support of a previous report by Keam et al. [30]. Our population of triple-negative breast cancers (N = 1001), however, was 9.5 times larger than that of Keam et al. (N = 105).

Comparative analysis of visual and automated KI67 scores showed a stronger survival association for the visual over the automated scores; however, differences were generally modest. Given the advantages of automated versus visual scoring in terms of its potential for standardisation, reproducibility and throughput, automated methods appear to be promising alternatives to visual scoring for KI67 assessment. A potential limitation to the adoption of automated KI67 scoring in the clinical setting is that misclassification of positive nuclei as negative or malignant nuclei as benign could lead to attenuation of prognostic associations, an observation that has been reported previously for ER and PR [31] and one which we have also observed for KI67 in this analysis. This can be mitigated, however, by stringent quality control processes or by the adoption of a synergistic approach that combines the benefits of both the automated and visual scoring methods. One such approach is the CAV scoring method which we developed for the visual counting of negative and positive malignant nuclei. This approach, a variation of which has been reported previously [15], exploits the advantages of both visual and digital imaging tools by enabling the visual counting of KI67-positive cells in well-defined areas of a tumour within a computer microenvironment. This method is limited, however, by the observation that it is time consuming; as such, it may not be efficient if adopted for the large-scale scoring of KI67 in epidemiological studies, clinical trials or biomarker discovery studies. Nonetheless, efforts are currently underway to standardise the methods for the visual scoring of KI67 in core-cuts.

We centrally generated KI67 scores on TMAs and determined a threshold of 12 % positive cells of prognostic relevance in our study population. However, due to possible variations in the distribution of KI67 scores according to specimen type and among different laboratories, this cut-off point may not apply to other types of clinical samples or to other laboratories. As a result, pending international standardisation of the KI67 analytical processes, setting local laboratory-specific cut-off points as recommended by international guidelines [1] remains a pragmatic approach to determining ‘high’ and ‘low’ KI67. Furthermore, although our automated cut-off point of 12 % positive cells was determined to correspond to a visual score of 25 %, this may be related, at least in part, to the fact that automated systems generally count more cells than the visual evaluator, a reason that has been proposed to explain differences in KI67 scores between visual and automated scoring and different automated scoring approaches [26]. Nonetheless, findings from a recent meta-analysis that assessed the prognostic value of different cut-off levels of KI67 suggest that a visual cut-off point >25 % provides greater discrimination in mortality risk than other cut-off points [32].

Some limitations of our analysis include the lack of data on specific chemotherapeutic or endocrine agents received by each patient, as a result of which we were unable to account for the impact of a specific treatment regimen on survival or to examine whether or not KI67 is predictive of response to specific chemotherapeutic and/or endocrine agents. We were, however, able to account for whether or not patients received adjuvant systemic treatment in all our analyses because more than two-thirds of the patients had information on treatment. This also allowed us to perform stratified analysis according to whether or not chemotherapy was administered. Also, we did not have data on disease-free survival which may have been a more informative end point than BCSS in early breast cancer. Our assessment of KI67 on TMAs may mean that direct inference cannot be drawn from our findings on other types of clinical samples, especially whole sections [8]. This is because KI67 scores are speculated to be lower for TMAs than for whole sections and not many studies have assessed the correlation between KI67 scores on TMAs and those on whole sections. However, one such study by Kobierzycki et al. [33] involving 51 archival paraffin blocks of invasive ductal carcinoma showed excellent correlation (r = 0.91) between the TMAs and whole sections. Their paper utilised three 0.6 mm core punches, however, and this may explain the high correlation between KI67 scores on TMAs and whole sections that was observed in that study. Nonetheless, the fact that more than half (4431/55 %) of the patients in our analysis had KI67 scores on two or more cores, with 83 % of these showing concordant KI67 status, should limit the impact of intra-tumour heterogeneity of KI67 scores on our findings.

Conclusion

Our large, multicentre study indicates that automated KI67 scoring provides prognostic information in breast cancer that is independent of standard parameters. In view of its potential for standardisation, throughput and reproducibility, the automated method appears to be a promising alternative to visual scoring for KI67. These findings are important given the increasing need to incorporate measures of KI67 as part of tools that are needed to refine prognostic scores for breast cancer patients; this is especially relevant for patients with ER-positive, node-negative tumours, in order to aid decisions on providing adjuvant chemotherapy. However, further work is needed to standardise the staining and scoring protocols for KI67. In doing so, the potential benefits and drawbacks of automated versus visual scoring systems should merit consideration. In light of this we welcome ongoing efforts by the International Working Party on KI67 in Breast Cancer aimed at standardisation of the analytical processes for KI67.

Abbreviations

AUC:

Area under the curve

BCAC:

Breast Cancer Association Consortium

BCSS:

Breast cancer specific survival

CAV:

Computer-assisted visual

DAB:

Diaminobenzidine

ER:

Oestrogen receptor

GHI-RS:

Genomic Health recurrence score

HER2:

Human epidermal growth factor receptor 2

HR:

Hazard ratio

ICC:

Intra-class correlation

ICR:

Institute of Cancer Research

IHC:

Immunohistochemical

MI:

Multiple imputation

PR:

Progesterone receptor

ROC:

Receiver operating characteristic

TMA:

Tissue microarray

References

  1. Goldhirsch A, Winer EP, Coates AS, Gelber RD, Piccart-Gebhart M, Thürlimann B, et al. Personalizing the treatment of women with early breast cancer: highlights of the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2013. Ann Oncol. 2015;12:296–301. doi:10.1038/nrclinonc.2015.46.

    Google Scholar 

  2. Senkus E, Kyriakides S, Penault-Llorca F, Poortmans P, Thompson A, Zackrisson S, et al. Primary breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2013;22:2013.

    Google Scholar 

  3. Andre F, Arnedos M, Goubar A, Ghouadni A, Delaloge S. Ki67—no evidence for its use in node-positive breast cancer. Nat Rev Clin Oncol. 2015;12:296–301. doi:10.1038/nrclinonc.2015.46.

    Article  PubMed  Google Scholar 

  4. Urruticoechea A, Smith IE, Dowsett M. Proliferation marker Ki-67 in early breast cancer. J Clin Oncol. 2005;23(28):7212–20.

    Article  CAS  PubMed  Google Scholar 

  5. Yerushalmi R, Woods R, Ravdin PM, Hayes MM, Gelmon KA. Ki67 in breast cancer: prognostic and predictive potential. Lancet Oncol. 2010;11(2):174–83.

    Article  CAS  PubMed  Google Scholar 

  6. de Azambuja E, Cardoso F, de Castro G, Colozza M, Mano MS, Durbecq V, et al. Ki-67 as prognostic marker in early breast cancer: a meta-analysis of published studies involving 12 155 patients. Br J Cancer. 2007;96(10):1504–13.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Stuart-Harris R, Caldas C, Pinder S, Pharoah P. Proliferation markers and survival in early breast cancer: a systematic review and meta-analysis of 85 studies in 32,825 patients. Breast. 2008;17(4):323–34.

    Article  CAS  PubMed  Google Scholar 

  8. Dowsett M, Nielsen TO, A’Hern R, Bartlett J, Coombes RC, Cuzick J, et al. Assessment of Ki67 in breast cancer: recommendations from the International Ki67 in Breast Cancer Working Group. J Natl Cancer Inst. 2011;103(22):1656–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Polley M-YC, Leung SCY, McShane LM, Gao D, Hugh JC, Mastropasqua MG, et al. An International Ki67 Reproducibility Study. J Natl Cancer Inst. 2013;105(24):1897–906.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Mikami Y, Ueno T, Yoshimura K, Tsuda H, Kurosumi M, Masuda S, et al. Interobserver concordance of Ki67 labeling index in breast cancer: Japan Breast Cancer Research Group Ki67 Ring Study. Cancer Sci. 2013;104(11):1539–43.

    Article  CAS  PubMed  Google Scholar 

  11. Gudlaugsson E, Skaland I, Janssen EAM, Smaaland R, Shao Z, Malpica A, et al. Comparison of the effect of different techniques for measurement of Ki67 proliferation on reproducibility and prognosis prediction accuracy in breast cancer. Histopathology. 2012;61(6):1134–44.

    Article  PubMed  Google Scholar 

  12. Pinder S, Wencyk P, Sibbering D, Bell J, Elston C, Nicholson R, et al. Assessment of the new proliferation marker MIB1 in breast carcinoma using image analysis: associations with other prognostic factors and survival. Br J Cancer. 1995;71(1):146.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pietiläinen T, Lipponen P, Aaltomaa S, Eskelinen M, Kosma V, Syrjänen K. The important prognostic value of Ki-67 expression as determined by image analysis in breast cancer. J Cancer Res Clin Oncol. 1996;122(11):687–92.

    Article  PubMed  Google Scholar 

  14. Fasanella S, Leonardi E, Cantaloni C, Eccher C, Bazzanella I, Aldovini D, et al. Proliferative activity in human breast cancer: Ki-67 automated evaluation and the influence of different Ki-67 equivalent antibodies. Diagn Pathol. 2011;6 Suppl 1:S7.

    Article  CAS  PubMed  Google Scholar 

  15. Laurinavicius A, Plancoulaine B, Laurinaviciene A, Herlin P, Meskauskas R, Baltrusaityte I, et al. A methodology to ensure and improve accuracy of Ki67 labelling index estimation by automated digital image analysis in breast cancer tissue. Breast Cancer Res. 2014;16:R35.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Tuominen VJ, Ruotoistenmaki S, Viitanen A, Jumppanen M, Isola J. ImmunoRatio: a publicly available web application for quantitative image analysis of estrogen receptor (ER), progesterone receptor (PR), and Ki-67. Breast Cancer Res. 2010;12(4):R56.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Konsti J, Lundin M, Joensuu H, Lehtimäki T, Sihto H, Holli K, et al. Development and evaluation of a virtual microscopy application for automated assessment of Ki-67 expression in breast cancer. BMC Clin Pathol. 2011;11(1):3.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Mohammed ZMA, McMillan DC, Elsberger B, Going JJ, Orange C, Mallon E, et al. Comparison of visual and automated assessment of Ki-67 proliferative activity and their impact on outcome in primary operable invasive ductal breast cancer. Br J Cancer. 2012;106(2):383–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Klauschen F, Wienert S, Schmitt WD, Loibl S, Gerber B, Blohmer J-U, et al. Standardized Ki67 diagnostics using automated scoring—clinical validation in the GeparTrio Breast Cancer Study. Clin Cancer Res. 2015;21(16):3651–7.

    Article  CAS  PubMed  Google Scholar 

  20. Sheri A, Dowsett M. Developments in Ki67 and other biomarkers for treatment decision making in breast cancer. Ann Oncol. 2012;23 Suppl 10:x219–27.

    Article  PubMed  Google Scholar 

  21. Inwald EC, Klinkhammer-Schalke M, Hofstädter F, Zeman F, Koller M, Gerstenhauer M, et al. Ki-67 is a prognostic parameter in breast cancer patients: results of a large population-based cohort of a cancer registry. Breast Cancer Res Treat. 2013;139(2):539–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Cuzick J, Dowsett M, Pineda S, Wale C, Salter J, Quinn E, Zabaglo L, Mallon E, Green AR, Ellis IO, Howell A. Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer. J Clin Oncol. 2011;29(32):4273–8.

    Article  PubMed  Google Scholar 

  23. Cuzick J, Dowsett M, Pineda S, Wale C, Salter J, Quinn E, et al. Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer. J Clin Oncol. 2011;29(32):4273–8.

    Article  PubMed  Google Scholar 

  24. Wishart GC, Bajdik CD, Dicks E, Provenzano E, Schmidt MK, Sherman M, et al. PREDICT Plus: development and validation of a prognostic model for early breast cancer that includes HER2. Br J Cancer. 2012;107(5):800–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. BCAC. Breast Cancer Association Consortium. 19 Aug 2015. http://bcac.ccge.medschl.cam.ac.uk/. Accessed 12 Oct 2016.

  26. Abubakar M, Howat WJ, Daley F, Zabaglo L, McDuffus L-A, Blows F, et al. High-throughput automated scoring of Ki67 in breast cancer tissue microarrays from the Breast Cancer Association Consortium. J Pathol Clin Res. 2016;2(3):138-153. doi:10.1002/cjp2.42.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Ali AMG, Dawson SJ, Blows FM, Provenzano E, Ellis IO, Baglietto L, et al. Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer. Br J Cancer. 2011;104(4):693–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004;351(27):2817–26.

    Article  CAS  PubMed  Google Scholar 

  29. Filipits M, Nielsen TO, Rudas M, Greil R, Stöger H, Jakesz R, et al. The PAM50 risk-of-recurrence score predicts risk for late distant recurrence after endocrine therapy in postmenopausal women with endocrine-responsive early breast cancer. Clin Cancer Res. 2014;20(5):1298–305.

    Article  CAS  PubMed  Google Scholar 

  30. Keam B, Im S-A, Lee K-H, Han S-W, Oh D-Y, Kim JH, et al. Ki-67 can be used for further classification of triple negative breast cancer into two subtypes with different response and prognosis. Breast Cancer Res. 2011;13(2):R22.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Howat WJ, Blows FM, Provenzano E, Brook MN, Morris L, Gazinska P, et al. Performance of automated scoring of ER, PR, HER2, CK5/6 and EGFR in breast cancer tissue microarrays in the Breast Cancer Association Consortium. Clin J Pathol. 2015;1(1):18-32. doi:10.1002/cjp2.3.

    Article  CAS  Google Scholar 

  32. Petrelli F, Viale G, Cabiddu M, Barni S. Prognostic value of different cut-off levels of Ki-67 in breast cancer: a systematic review and meta-analysis of 64,196 patients. Breast Cancer Res Treat. 2015;153(3):477–91.

    Article  PubMed  Google Scholar 

  33. Kobierzycki C, Pula B, Wojnar A, Podhorska-Okolow M, Dziegiel P. Tissue microarray technique in evaluation of proliferative activity in invasive ductal breast cancer. Anticancer Res. 2012;32(3):773–7.

    PubMed  Google Scholar 

Download references

Acknowledgements

The authors acknowledge support from Will Howat and Leigh-Anne McDuffus of the Cancer Research UK Cambridge Institute, University of Cambridge and from Lila Zabaglo of the Academic Biochemistry Laboratory of the Institute of Cancer Research, London during development of the automated scoring protocol. The authors wish to thank Heather Thorne, Eveline Niedermayr, all of the kConFab research nurses and staff, the heads and staff of the Family Cancer Clinics, and the Clinical Follow Up Study (which has received funding from the NHMRC, the National Breast Cancer Foundation, Cancer Australia, and the National Institute of Health (USA)) for their contributions to this resource, and the many families who contribute to kConFab. kConFab is supported by a grant from the National Breast Cancer Foundation, and previously by the National Health and Medical Research Council (NHMRC), the Queensland Cancer Fund, the Cancer Councils of New South Wales, Victoria, Tasmania and South Australia, and the Cancer Foundation of Western Australia.

Funding

The ABCS study was supported by the Dutch Cancer Society (grants NKI 2007–3839; 2009 4363); BBMRI-NL, which is a Research Infrastructure financed by the Dutch government (NWO 184.021.007); and the Dutch National Genomics Initiative.

The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. Additional cases were recruited in the context of the VERDI study, which was supported by a grant from the German Cancer Aid (Deutsche Krebshilfe).

The KBCP study was financially supported by the special Government Funding (EVO) of Kuopio University Hospital grants, Cancer Fund of North Savo, the Finnish Cancer Organizations, the Academy of Finland and by the strategic funding of the University of Eastern Finland.

The MARIE study was supported by the Deutsche Krebshilfe e.V. (70-2892-BR I, 106332, 108253, 108419), the Hamburg Cancer Society, the German Cancer Research Center (DKFZ) and the Federal Ministry of Education and Research (BMBF) Germany (01KH0402).

The MCBCS was supported by an NIH Specialized Program of Research Excellence (SPORE) in Breast Cancer (CA116201), the Breast Cancer Research Foundation, the Mayo Clinic Breast Cancer Registry and a generous gift from the David F. and Margaret T. Grohne Family Foundation and the Ting Tsung and Wei Fong Chao Foundation.

The ORIGO authors thank E. Krol-Warmerdam, and J. Blom; the contributing studies were funded by grants from the Dutch Cancer Society (UL1997-1505) and the Biobanking and Biomolecular Resources Research Infrastructure (BBMRI-NL CP16).

The PBCS study was funded by Intramural Research Funds of the National Cancer Institute, Department of Health and Human Services, USA.

The RBCS study was funded by the Dutch Cancer Society (DDHK 2004–3124, DDHK 2009–4318).

The SEARCH study is funded by a programme grant from Cancer Research UK (C490/A10124. C490/A16561) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. Part of this work was supported by the European Community’s Seventh Framework Programme under grant agreement number 223175 (grant number HEALTH-F2-2009223175) (COGS). The authors acknowledge funds from Breakthrough Breast Cancer, UK, in support of MG-C at the time this work was carried out and funds from the Cancer Research, UK, in support of MA at the Institute of Cancer Research, London.

Availability of supporting data

The data that support the findings of this study belong to the Breast Cancer Association Consortium (BCAC) and cannot be shared in a public repository because participants have not consented to sharing their data publicly.

Authors’ contributions

MA and MG-C conceived and carried out the analysis. MG-C supervised the work. FD carried out centralised laboratory work. MA developed the automated and CAV KI67 scoring protocols. NO, AJS and MD provided additional supervisory support. PC performed data management. MA and MG-C analysed the data. FB, HRA, PC, JB, RM, HB, CSt, AM, JC-C, AR, PS, FJC, PD, RAEMT, CSe, JF, MES, JL, SH, DE, MJH, AH, JWMM, CHMvD, MKB, QW, MJ, MS, AJS, DE, JW, LV‘tV, FEvL, MKS and PDP contributed to TMA/data collection and/or data management. All authors contributed to manuscript development and writing and gave final approval for its submission.

Authors’ information

Not applicable

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Each of the individual studies was approved by the local ethics committees and institutional review boards (IRBs) and written informed consent to participate in the study was obtained from each participant across all study groups. The ABCS study received approval from the Medical Ethical Committees of the Netherlands Cancer Institute-Antoni van Leeuwenhoek Hospital (NKI-AVL; Amsterdam, the Netherlands) and Leiden University Medical Cancer (LUMC; Leiden, the Netherlands). The ESTHER study was approved by the Ethics Committees of the Medical Faculty of the University of Heidelberg and the Medical Association of Saarland. The joint Ethics Committee of Kuopio University and Kuopio University Hospital approved the Kuopio Breast Cancer Project (KBCP). Approval for the MARIE study was obtained from the Ethics Committee of the University of Heidelberg, the Hamburg Medical Council and the Medical Board of the State of Rheinland-Pfalz. The MCBCS study was approved by the Ethics Committee of the Mayo Clinic College of Medicine. The Medical Ethical Review Boards of the Rotterdam Cancer Centre and academic cancer centre in Leiden approved the study protocol for the ORIGO study. The PBCS study protocol was reviewed and approved by local and the US National Cancer Institute (NCI) IRBs. The RBCS study was approved by the Ethical Committees of the University Hospital Rotterdam, Erasmus University Rotterdam and Leiden University Medical Centre, Leiden. The SEARCH study was approved by the Eastern multicentre research ethics committee. The kConFab study obtained human ethics approval at all the participating institutions through which subjects are recruited.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustapha Abubakar.

Additional files

Additional file 1:

Is Table S1 presenting a description of study populations, Table S2 presenting KI67 immunohistochemistry reagents and antigen retrieval protocols according to study groups, Table S3 presenting the association of clinical and pathological characteristics with high (>12 %) and low (≤12 %) KI67 categories among 5520 ER-positive breast cancer cases, Table S4 presenting the association of clinical and pathological characteristics with high (>12 %) and low (≤12 %) KI67 categories among 2049 ER-negative breast cancer cases, Table S5 presenting cross-classification of visual and automated KI67 score categories, Table S6 presenting a multivariate model for the association of KI67 with 10-year BCSS among 5520 ER-positive patients, and Table S7 presenting a multivariate model for the association of KI67 with 10-year BCSS among 2049 ER-negative patients. (PDF 237 kb)

Additional file 2:

Is Figure S1 showing meta-analysis of study-specific HR stratified by ER status. (TIF 1182 kb)

Additional file 3:

Is Figure S2 showing ROC curve for the discriminatory accuracy of continuous automated KI67 scores against binary visual categories (≤25 % and > 25 %) based on a subset of patients with data on both visual and automated KI67 scores. (TIF 25 kb)

Additional file 4:

Is Figure S3 showing Kaplan–Meier survival curves for the 10-year BCSS according to strata (high and low) of automated and visual KI67 scores (N = 2440). (TIF 1271 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abubakar, M., Orr, N., Daley, F. et al. Prognostic value of automated KI67 scoring in breast cancer: a centralised evaluation of 8088 patients from 10 study groups. Breast Cancer Res 18, 104 (2016). https://doi.org/10.1186/s13058-016-0765-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13058-016-0765-6

Keywords