Skip to content


  • Research article
  • Open Access

Risk prediction for estrogen receptor-specific breast cancers in two large prospective cohorts

  • 1,
  • 2,
  • 1,
  • 3, 4, 5, 6,
  • 5, 6,
  • 5, 6,
  • 7,
  • 8,
  • 9, 10,
  • 10, 11, 12,
  • 13, 14, 15,
  • 16,
  • 17,
  • 17,
  • 17,
  • 18,
  • 17, 19, 20, 21,
  • 22, 23,
  • 22, 23,
  • 22, 23,
  • 24,
  • 24,
  • 24,
  • 25, 26, 27, 28,
  • 29,
  • 30,
  • 30, 31,
  • 17, 32,
  • 33, 34, 35, 36,
  • 37,
  • 17,
  • 37,
  • 38,
  • 2 and
  • 1Email author
Breast Cancer Research201820:147

  • Received: 23 March 2018
  • Accepted: 4 November 2018
  • Published:



Few published breast cancer (BC) risk prediction models consider the heterogeneity of predictor variables between estrogen-receptor positive (ER+) and negative (ER-) tumors. Using data from two large cohorts, we examined whether modeling this heterogeneity could improve prediction.


We built two models, for ER+ (ModelER+) and ER- tumors (ModelER-), respectively, in 281,330 women (51% postmenopausal at recruitment) from the European Prospective Investigation into Cancer and Nutrition cohort. Discrimination (C-statistic) and calibration (the agreement between predicted and observed tumor risks) were assessed both internally and externally in 82,319 postmenopausal women from the Women’s Health Initiative study. We performed decision curve analysis to compare ModelER+ and the Gail model (ModelGail) regarding their applicability in risk assessment for chemoprevention.


Parity, number of full-term pregnancies, age at first full-term pregnancy and body height were only associated with ER+ tumors. Menopausal status, age at menarche and at menopause, hormone replacement therapy, postmenopausal body mass index, and alcohol intake were homogeneously associated with ER+ and ER- tumors. Internal validation yielded a C-statistic of 0.64 for ModelER+ and 0.59 for ModelER-. External validation reduced the C-statistic of ModelER+ (0.59) and ModelGail (0.57). In external evaluation of calibration, ModelER+ outperformed the ModelGail: the former led to a 9% overestimation of the risk of ER+ tumors, while the latter yielded a 22% underestimation of the overall BC risk. Compared with the treat-all strategy, ModelER+ produced equal or higher net benefits irrespective of the benefit-to-harm ratio of chemoprevention, while ModelGail did not produce higher net benefits unless the benefit-to-harm ratio was below 50. The clinical applicability, i.e. the area defined by the net benefit curve and the treat-all and treat-none strategies, was 12.7 × 10− 6 for ModelER+ and 3.0 × 10− 6 for ModelGail.


Modeling heterogeneous epidemiological risk factors might yield little improvement in BC risk prediction. Nevertheless, a model specifically predictive of ER+ tumor risk could be more applicable than an omnibus model in risk assessment for chemoprevention.


  • Breast cancer
  • Risk prediction
  • Estrogen receptor
  • Prospective cohort
  • EPIC
  • WHI


Breast cancer (BC) screening and chemoprevention strategies should prioritize women who are expected to benefit from the interventions. Risk prediction models could be useful assessment tools to facilitate this strategy, as long as the models themselves possess sufficient predictive power. So far, more than 20 risk prediction models have been developed for BC since the first model developed by Gail in 1989 [1, 2]. Initially, the Gail model (hereinafter referred to as ModelGail) was based on age, age at menarche and at first live birth, previous breast biopsy, and family history of BC, yielding moderate discriminatory power (C-statistic) of 0.58 in external validations [3, 4]. New predictors, such as breast density, hormone replacement therapy (HRT), anthropometric measures, and lifestyle factors (e.g. alcohol intake), were continuously introduced into the succeeding models, resulting in marginal improvements in prediction [5].

BC comprises etiologically distinct subtypes defined by molecular factors. Hormonal and reproductive factors, such as elevated circulating sex hormone levels, early menarche, delayed childbirth, and nulliparity, are only or are more strongly related to increased risks of subtypes expressing estrogen receptor (ER+) and progesterone receptor (PR+) [6]. Further, ER+ breast tumors respond more favorably to hormone therapy than ER-/PR- tumors [68]. It has been hypothesized that combining etiologically distinct subtypes as one single outcome undermines BC prediction [9]. However, most of the published BC risk prediction models are omnibus models and only one model differentiates risk associations by hormone receptor status [10].

In the current analysis, using data from the European Prospective Investigation into Cancer and Nutrition (EPIC) and the Women’s Health Initiative (WHI) study in the USA, we examined whether modeling heterogeneous risk associations by ER status, which entails building ER-specific risk prediction models, could yield better prediction of BC risk.


Study population for model derivation and internal validation

The study population for model derivation consisted of women recruited into the EPIC cohort from 1992 to 2000 in 10 European countries (Norway, Sweden, Denmark, the UK, the Netherlands, Germany, France, Spain, Italy, and Greece) [11, 12]. Women with one or more of the following characteristics were excluded: (1) < 40 or > 70 years of age at recruitment (n = 49,410); (2) diagnosed with cancer before recruitment (n = 39,760); and (3) no information on censoring date and/or disease status (n = 142). All women recruited in the study center of Malmö, Sweden were also excluded due to lack of information on ER status for all BC diagnoses (n = 14,396). After these exclusions, 281,330 women (51% postmenopausal at recruitment) were included in the analysis.

Study population for external validation

The WHI study was launched in 1993 and recruited 161,808 postmenopausal women aged 50–79 years into either an observational study or one of the three clinical trials that tested the health effects of HRT, a low-fat diet, and calcium-vitamin D supplementation, respectively [13]. For the purpose of the present study, we excluded non-Caucasian women (n = 28,267), women in the HRT trial (n = 27,347), women who had mastectomy or a history of cancer at recruitment (n = 16,501), and women with incomplete information on the risk factors considered in our models (n = 29,431), resulting in a validation population of 82,319 women.

All women in the EPIC and WHI studies provided written informed consent. In the WHI study, Human Subjects Committee approval at each participating institution was provided. The present study was approved by the Ethical Review Board of the International Agency for Research on Cancer (Lyon, France).

Risk factors and disease outcomes

Among the most frequently included predictors in current BC risk prediction models [5], the following variables were available in EPIC and WHI, and were therefore included in this study: menopausal status, age at menopause, age at menarche, duration of HRT, duration of breastfeeding, full-term pregnancy (FTP), number of FTPs, age at first FTP, body height, body mass index (BMI), interaction between BMI and menopausal status, alcohol intake, and country. Table 5 in Appendix provides the coding of these predictor variables. We retained all the women for analysis and handled the missing values by five-time multiple imputations with chained equations [14]. Three predictor variablesin the Gail model were not included in our models, i.e. family history of BC in first-degree relatives, previous breast biopsy, and history of atypical hyperplasia. In the EPIC study, family history of BC was only available for 49% of women, while information on previous breast biopsy and history of atypical hyperplasia were not collected.

Sensitivity analyses that included effect modification of parity by menopausal status in the EPIC study showed no evidence of statistically significant interactions. Similarly, no effect modifications were observed for HRT by BMI and breastfeeding by parity. These interactions were hence not retained further.

Incident BC diagnoses among the EPIC women were ascertained through national cancer registries or a combination of health insurance records, pathology registries, and regular questionnaire surveys. The definition of positive hormone receptor status was standardized using the following criteria: ≥ 10% cells stained, any “plus-system description”, ≥ 20 fmol/mg, an Allred score of ≥ 3, immunoreactive score (IRS) ≥ 2, or an H-score ≥ 10. Among the WHI women, centrally trained, locally based physician adjudicators verified BC diagnoses by medical record and pathology report review, and positive hormone receptor status was defined as ≥ 10% cells stained [15].

Absolute risk modeling

Using the EPIC data, we fitted cause-specific piecewise-constant hazards models [16] for ER+ and ER- tumors separately (hereinafter referred to as ModelER+ and ModelER-). The cutoffs were placed at 45, 50, 55, 60, 65, 70, and 75 years of age. Whether a risk association is heterogeneous by ER status was examined using the likelihood ratio test [17].

Tumors with unknown ER status, primary cancers at other sites, and deaths from non-cancer causes were modelled as competing events to ER+ tumors and ER- tumors. A Gompertz model with age as the time scale was fitted for all these competing events combined. In addition, ER+ and ER- tumors were considered mutually competing.

To evaluate the improvement in risk prediction by modeling the heterogeneous risk associations, an omnibus model was also fitted following the same methodology described above, treating ER+ and ER- tumors as one single disease outcome.

Model validation

First, we validated our ER-specific models internally by fivefold cross-validation [18] and then externally using WHI data. For external validation using the WHI data, we combined the model coefficients derived from the EPIC women and the ER-specific baseline hazards of the WHI women to project 5-year ER-specific absolute risks. We calculated C-statistics to assess discriminatory accuracy and the ratio of expected-to-observed number of tumors occurring in the first 5 years (E/O) to assess overall calibration. In the WHI women, the 5-year absolute risk of developing BC was projected using ModelGail, enabling us to compare the performance of our model with that of ModelGail.

We performed decision curve analysis in the WHI women to compare the clinical applicability of ModelER+ and ModelGail for identification of women for chemoprevention.

Let B denote the benefit of receiving chemoprevention for an individual who would develop BC, H the harm of receiving chemoprevention for an individual who would never develop BC, and pi indicates an individual risk. The rationale of decision curve analysis is that positive net benefits is guaranteed at the population level if chemoprevention only covers individuals with risk projections pi above the risk threshold pt, where:

pt × B = (1 − pt) × H [19, 20].

Given the fact that quantities of B and H of chemoprevention remain unknown, net benefits are calculated through all the possible risk thresholds between two extremes, i.e. zero and the maximal risk estimate, representing a treat-all strategy and a treat-none strategy, respectively. The clinical applicability of a risk prediction model is indicated by how much the model’s net benefit curve is above the treat-all and treat-none strategies, i.e. the area formed by the model’s net benefit curve and the two extreme strategies.


Cohort description

Country-specific distributions of the risk factors among the EPIC women are shown in Table 6 in Appendix. Distributions of the same risk factors among the WHI women are shown in Table 7 in Appendix. During an average follow-up period of 14.7 years, 12,067 BC cases (7210 ER+ tumors, 1598 ER- tumors, and 3259 tumors with unknown ER status), 16,929 primary cancers at other sites, and 6548 deaths from non-cancer causes were ascertained among the EPIC women, as reported in Table 1.
Table 1

Distribution of incident breast cancer (BC) by country, estrogen receptor (ER) status, and baseline menopausal status among the women from the European Prospective Investigation into Cancer and Nutrition (EPIC) and Women’s Health Initiative (WHI) studies



Age at recruitment (years)

Years of follow-up

Incident BC

Crude incidence rate (/105 person-years)











EPIC study



























































































































































WHI study










The ER-specific absolute risk models

Among the risk factors with identical associations by ER status (Table 2), being postmenopausal compared with premenopausal at recruitment was associated with a reduced tumor risk after controlling for age (hazard ratio (HR) = 0.66, 95% confidence interval (CI) = 0.60 to 0.74). For postmenopausal women, a statistically significant and monotonically increasing tumor risk was observed with older age at menopause compared with reaching menopause before the age of 45 years (ptrend < 0.001). No statistically significant association was observed for breastfeeding and breast cancer risk among parous women. Later age at menarche (≥ 15 vs ≤ 11 years of age) was statistically significantly associated with decreased tumor risk (HR = 0.85, 95% CI = 0.79 to 0.92). Duration of HRT was statistically significantly associated with increased breast cancer risk (ptrend < 0.001). BMI was associated with breast cancer and exhibited a statistically significant interaction with menopausal status: for postmenopausal women, HRs (95% CIs) for the BMI categories in ascending order were 1.11 (1.04 to 1.18), 1.21 (1.10 to 1.34), and 1.30 (1.11 to 1.53), respectively. For alcohol intake, exceeding one drink per day, compared with nondrinking, was statistically significantly associated with an increased breast cancer risk.
Table 2

Risk associations for ER+ and ER- tumors, the EPIC study and the WHI studya

Risk factors

EPIC study

WHI study

ER+, n = 7210

HR (95% CI)

ER-, n = 1598

HR (95% CI)

ER+, n = 2276

HR (95% CI)

ER-, n = 421

HR (95% CI)

Menopausal status:

 postmenopausalb vs premenopausal

0.66 (0.60–0.74)

Age at menopause, years:

 45.1–50.0 vs ≤ 45.0

1.16 (1.06–1.28)

1.16 (1.05–1.29)

 50.1–55.0 vs ≤ 45.0

1.25 (1.13–1.38)

1.41 (1.27–1.56)

 > 55.0 vs ≤ 45.0

1.41 (1.21–1.63)e

1.40 (1.20–1.62)

Breastfeeding, months:

 0.1–6 vs 0

1.01 (0.95–1.08)

1.04 (0.94–1.14)

 6.1–12 vs 0

0.96 (0.88–1.04)

1.04 (0.91–1.18)

 > 12 vs 0

1.01 (0.93–1.11)

1.07 (0.95–1.20)

Age at menarche, years:

 12 vs ≤ 11

1.06 (0.98–1.14)

0.89 (0.80–0.99)

 13 vs ≤ 11

1.00 (0.93–1.07)

0.82 (0.74–0.91)

 14 vs ≤ 11

0.97 (0.91–1.05)

0.86 (0.75–0.98)

 ≥ 15 vs ≤ 11

0.85 (0.79–0.92)

0.78 (0.67–0.91)

HRT use, years:

 0.1–1.0 vs 0

1.17 (1.09–1.26)

1.01 (0.86–1.19)

 1.1–2.0 vs 0

1.27 (1.15–1.40)

1.17 (0.97–1.40)

 2.1–3.0 vs 0

1.39 (1.24–1.56)

1.37 (1.13–1.65)

 > 3.0 vs 0

1.55 (1.44–1.66)e

1.53 (1.39–1.67)e

BMI, kg/m2:

 25.0–29.9 vs < 25.0

0.99 (0.92–1.07)

1.02 (0.93–1.12)

 30.0–34.9 vs < 25.0

0.97 (0.85–1.10)

1.14 (1.02–1.28)

 ≥ 35.0 vs < 25.0

1.12 (0.92–1.36)

1.23 (1.07–1.41)

BMI* menopausec:

 1 vs 0

1.11 (1.01–1.23)

 2 vs 0

1.26 (1.07–1.47)

 3 vs 0

1.17 (0.91–1.50)

Alcohol intake, drinks per day:

 < 1.0 vs 0

1.00 (0.94–1.07)

1.08 (0.91–1.29)

 1.0–1.9 vs 0

1.14 (1.05–1.24)

1.20 (0.98–1.47)

 ≥ 2.0 vs 0

1.22 (1.12–1.33)

1.26 (1.01–1.59)


 Yes vs no

0.81 (0.71–0.91)

0.97 (0.76–1.24)

0.85 (0.67–1.08)

0.65 (0.37–1.14)

Number of FTP:

 2 vs 1

0.99 (0.92–1.06)

1.05 (0.90–1.22)

1.14 (0.97–1.34)

1.15 (0.78–1.70)

 ≥ 3 vs 1

0.87 (0.80–0.95)

0.95 (0.81–1.13)

0.99 (0.84–1.17)

0.96 (0.65–1.41)

Age at 1st FTP, years:

 20.1–25.0 vs ≤ 20.0

1.05 (0.97–1.14)

1.04 (0.89–1.23)

1.03 (0.89–1.20)

1.37 (0.97–1.95)

 25.1–30.0 vs ≤ 20.0

1.20 (1.10–1.31)

0.93 (0.78–1.12)

1.14 (0.97–1.33)

1.34 (0.92–1.96)

 30.1–35.0 vs ≤ 20.0

1.32 (1.18–1.48)

0.96 (0.75–1.23)

1.59 (1.31–1.94)

1.01 (0.58–1.75)

 > 3.05 vs ≤ 20.0

1.46 (1.24–1.73)e

0.91 (0.59–1.38)

1.56 (1.16–2.09)

1.10 (0.48–2.53)

Height, per 10-cm increment

1.19 (1.15–1.24)

1.06 (0.98–1.16)

1.14 (1.06–1.22)

1.04 (0.89–1.22)

BMI body mass index, CI confidence interval, EPIC European Prospective Investigation into Cancer and Nutrition, ER estrogen receptor, FTP full-term pregnancy, HR hazard ratio, HRT hormone replacement therapy, WHI Women’s Health Initiative

aHeterogeneous risk associations among the EPIC women were examined using the likelihood ratio test. The resulting heterogeneous risk factor profiles were applied to the WHI women

bAge at menopause ≤ 45 years

c0: premenopausal or postmenopausal and BMI < 25 kg/m2; 1: postmenopausal and BMI 25.0–29.9 kg/m2; 2: postmenopausal and BMI 30.0–34.9 kg/m2; 3: postmenopausal and BMI ≥ 35 kg/m2. Among postmenopausal women, the HRs (95% CIs) for BMI from low to high categories were 1.11 (1.04–1.18), 1.21 (1.10–1.34), and 1.30 (1.11–1.53)

dThe number of FTP = 1 and age at first FTP ≤ 20 years.

eptrend < 0.001

Tests for heterogeneity showed differential risk associations for FTP, number of FTPs, age at first FTP, body height, and country by ER status (Table 2 and Table 8 in Appendix). Parity (one single FTP, age at FTP ≤ 20 years) compared with nulliparity was associated with a statistically significant reduction in ER+ tumor risk (HR = 0.81, 95% CI = 0.71 to 0.91). Among parous women, having three or more FTPs was associated with a further risk reduction for ER+ tumors compared with one single FTP (HR = 0.87, 95% CI = 0.80 to 0.95), and delayed age at first FTP was associated with increased ER+ tumor risk (ptrend < 0.001). In addition, every 10-cm increment in body height was associated with a 19% increase in ER+ tumor risk (95% CI = 1.15 to 1.24). None of these factors, however, was statistically significantly associated with ER- tumor risk. Table 8 in Appendix shows the coefficients for different countries by ER status. Based on the same heterogeneous risk factor profiles, we also estimated the risk associations using the WHI data (Table 2), which were largely comparable to those from the EPIC study, with the exception of age at menarche, and especially for ER- tumors, FTP, number of FTP, and age at first FTP.

Model validation

Table 3 shows the predictive performance of the ER-specific models (C-statistic and E/O) corrected by the fivefold cross-validation. ModelER+, ModelER- and the omnibus model shared a C-statistic of 0.68. Elimination of the country effect reduced the C-statistic notably to 0.64 for ModelER+, 0.59 for ModelER-, and 0.63 for the omnibus model. A minor difference in C-statistic was observed between premenopausal and postmenopausal women. The omnibus model exhibited a higher C-statistic for ER+ than for ER- tumors (0.64 vs 0.59). ModelER+ significantly overestimated the 5-year tumor risk by 10% (E/O = 1.10, 95% CI = 1.05 to 1.14), particularly among premenopausal women (13%). ModelER- non-significantly underestimated the risk (E/O = 0.96, 95% CI = 0.88 to 1.05) overall and by menopausal status.
Table 3

Internal validation of the estrogen receptor (ER)-specific risk prediction models (ModelFR+ and ModelFR-) by fivefold cross-validation, overall and by age, in the women from the European Prospective Investigation into Cancer and Nutrition (EPIC) study




Omnibus model

C-statistic (95% CI)

 Before eliminating country effect

0.68 (0.65–0.70)

0.68 (0.64–0.72)

0.68 (0.66–0.70)

 After eliminating country effect


0.64 (0.61–0.67)

0.59 (0.54–0.64)

0.63 (0.60–0.65)

  By menopausal status


0.64 (0.59–0.68)

0.58 (0.51–0.66)

0.62 (0.59–0.66)


0.62 (0.59–0.66)

0.60 (0.52–0.67)

0.62 (0.59–0.65)

  By ER status


0.64 (0.62–0.67)


0.59 (0.53–0.64)

Ratio of observed–expected (95% CI)


1.10 (1.05–1.14)

0.96 (0.88–1.05)

1.07 (1.03–1.11)

 By menopausal status


1.13 (1.06–1.20)

0.97 (0.85–1.10)

1.09 (1.02–1.15)


1.07 (1.02–1.13)

0.96 (0.84–1.08)

1.06 (1.00–1.11)

External validation with the WHI data resulted in a C-statistic of 0.59 (95% CI = 0.58 to 0.60) for ModelER+ and 0.53 (95% CI = 0.50 to 0.57) for ModelER- (Table 4). ModelGail yielded an overall C-statistic of 0.57 (95% CI = 0.56 to 0.59) with a markedly lower C-statistic of 0.53 (95% CI = 0.50 to 0.57) for ER- tumors. Regarding calibration, an overestimation was observed for ER+ tumors (E/O = 1.09, 95% CI = 1.03 to 1.14) whereas a statistically non-significant underestimation was observed for ER- tumors (E/O = 0.94, 95% CI = 0.82 to 1.06). ModelGail underestimated the overall BC risk by 22% (E/O = 0.78, 95% CI = 0.73 to 0.82). Among the EPIC women, the overestimation of ER+ tumor risk occurred largely in low-risk individuals (Fig. 1a); for ER- tumor risk, overestimation was observed mainly among low-risk individuals whereas underestimation was observed mainly among high-risk individuals (Fig. 1b). Among WHI women, the overestimation by ModelER+ and the underestimation by ModelGail were largely systematic (Fig. 1c and e). The statistically non-significant underestimation by ModelER- in the WHI women showed no clear pattern (Fig. 1d).
Table 4

External validation of the estrogen receptor (ER)-specific risk prediction models and the Gail model in women from the Women’s Health Initiative (WHI) study


ER-specific risk prediction models




C-statistic (95% CI)


0.59 (0.58–0.60)

0.53 (0.50–0.57)

0.57 (0.56–0.59)

 By ER status


0.58 (0.57–0.60)


0.53 (0.50–0.57)

Ratio of observed–expected (95% CI)

1.09 (1.03–1.14)

0.94 (0.82–1.06)

0.78 (0.73–0.82)

Fig. 1
Fig. 1

Calibration of the risk prediction model of ER-positive tumors (ModelER+), risk prediction model of ER-negative tumors (ModelER-), and Gail risk prediction model (ModelGail) by risk deciles. a ModelER+ in women from the European Prospective Investigation into Cancer and Nutrition (EPIC); b ModelER- in the EPIC women; c ModelER+ in the women from the Women’s Health Initiative (WHI); d ModelER- in the WHI women; e ModelGail in the WHI women

Figure 2 shows the net benefit curves of ModelER+ and ModelGail. The net benefit curves of the two models started to diverge from the treat-all strategies at the risk threshold of 0.55%, which was roughly the minimal risk projected by both models. ModelER+ would yield higher net benefits than both the treat-all strategy and the treat-none strategy (denoted by the x-axis at y = 0) if the risk threshold lay between 0.55% and 2.5%, corresponding to an assumption that the benefit of chemoprevention was 180 to 40 times the harm. In contrast, ModelGail would yield lower net benefits than the treat-all strategy if a risk threshold below 2% were selected, including 1.67%, the currently adopted risk threshold for chemoprevention in the USA, and would yield negative net benefits if a risk threshold above 4% (i.e. benefit ≈ 25 × harm) were selected. The clinical applicability of ModelGaili, as indicated by the sum of Area A and Area B shown in Fig. 2, was 3.0 × 10− 6. The clinical applicability of ModelER+ was 12.7 × 10− 6 (Area C).
Fig. 2
Fig. 2

Net benefit curves for the risk prediction model of ER-positive tumors (ModelER+) (black solid line) and the Gail risk prediction model (ModelGail) (black broken line) applied to women from the Women’s Health Initiative study. Corresponding curves for the treat-all strategy are represented in gray (solid line for all breast cancer cases, broken line for ER+ tumors only). Area A = − 7.84 × 10-6; Area B = 1.08 × 10-5; Area C = 1.27 × 10-5


The heterogeneous risk associations in our ER-specific risk prediction models are consistent with the established knowledge that FTP, number of FTPs, and delayed childbirth are associated with ER+ tumors but not with ER- tumors [68]. Our study also confirms a largescale meta-analysis of epidemiological data showing that BC risk increases with prolonged duration of HRT use [21]. Data from the WHI randomized trial showed a statistically significant increase in the incidence and mortality of invasive BC in the estrogen-plus-progestin arm compared with the placebo arm [22, 23], whereas estrogen alone decreased BC incidence and mortality among postmenopausal women with prior hysterectomy [24, 25]. Stronger positive associations for estrogen plus progestin than for estrogen alone were reported for BC [26, 27]. In the present study, we could not separate estrogen alone and estrogen plus progestin due to unknown HRT compounds among former users in EPIC. Among current HRT users at baseline, use of estrogen plus progestin was more common in EPIC than in the WHI cohort (76% vs 44%, respectively). However, similar associations between the duration of lifetime HRT use and BC risk were observed in both the EPIC and the WHI study.

In ER-specific risk models, statistically significant and homogeneous risk associations were fitted for age at menopause and age at menarche, in line with a pooled analysis of previous investigations where nearly identical effects were observed for ER+ tumors and ER- tumors [28]. The present study demonstrated a null association between breastfeeding and BC risk, inconsistent with previous investigations where inverse associations were reported [6, 8, 29]. We note that most previous studies were case-control studies, which were subject to recall bias. In fact, the inverse association disappeared in some cohort studies [30, 31]. In a more recent pooled analysis, breastfeeding was not associated with ER+ and/or PR+ tumors but was inversely associated with ER-/PR- tumors [32].

In a pooled analysis of prospective cohort data, every 10-cm increment in body height was statistically significantly associated with ER+ tumor risk (HR = 1.18) but had null association with ER- tumor risk [33], supporting the way we modeled body height in the present study.

Prediction of ER+ tumor risk might be practically more useful than prediction of overall BC risk [3]. The reason for this is twofold. First, projecting subtype-specific risks allows for accurate estimation of the risk associations of factors that are etiologically heterogeneous and as a result might increase the discriminatory power. Second, since currently used chemoprevention only reduces the risk of ER+ tumors [34], there is a need for a model that can specifically predict the risk of developing ER+ tumors.

The discriminatory accuracy of ModelER+ in internal validation performed no better than most of the current omnibus models using questionnaire-derived data, suggesting limited improvement in discrimination after accounting for etiological heterogeneity. This was not surprising given that ER+ tumors are the dominant subtype and the omnibus model shared nearly equivalent parameters (data not shown) with ModelER+ in the present study. According to the only study so far that has modeled ER-specific risks, the discriminatory power of the ER+/PR+ model was moderately higher than that of the ER-/PR- model (0.64 vs 0.61) [10]. In that study, risk factors with heterogeneous associations included age, menopausal status, BMI, age at first birth, and past use of postmenopausal HRT, and its subtype-specific models were based on a relatively small number of tumors (1281 ER+/PR+ tumors, 417 ER-/PR- tumors). Notably, in that study there was no correction for potential overfitting by either internal or external approaches.

When externally validated in the WHI cohort, ModelER+ exhibited moderate discriminatory accuracy comparable to that of ModelGail. Women in the USA with 5-year BC risk of 1.67% or higher, projected by ModelGail, are considered potentially eligible for chemoprevention [35]. This risk threshold would lead to coverage of 36,265 (44.0%) women in our WHI validation population, of whom 1239 were subsequently diagnosed with ER+ tumors and 194 with ER- tumors. According to ModelER+, a risk threshold of 1.97% would cover the same number of women with 16 more prospective ER+ tumors and 2 fewer prospective ER- tumors.

The decision curve analysis provided some interesting insight into the clinical applicability of ModelER+ and ModelGail. As indicated by the net benefit curves, ModelER+ would demonstrate no advantage over the treat-all strategy if the benefit-to-harm ratio of chemoprevention were higher than 180, equivalent to any risk threshold below the minimal risk projection (≈ 0.55%), while such a boundary benefit-to-harm ratio was 50 for ModelGail. Interestingly, the treat-all strategy would even outperform ModelGail when the risk threshold was situated at 1.67%. In contrast to ModelGail, ModelER+ had a wider threshold range where higher net benefits could be obtained by a model-based decision-making than by either the treat-all or the treat-none strategy. Considering the unknown benefit and harm associated with chemoprevention, ModelER+ thus has broader applicability than ModelGail, as indicated by the areas formed by the two models’ net benefit curves and the two extreme strategies. As shown in Fig. 2, the lowest benefit-to-harm ratio for chemoprevention against BC to produce a positive net benefit is 25, whereas such a benefit-to-harm ratio for chemoprevention against ER+ tumors is 40, suggesting that chemoprevention against ER+ tumors might be 1.6 times (40/25) more efficient than chemoprevention against all types of BC.

Among both the EPIC women and the WHI women, ModelER+ overestimated the 5-year risk by about 10%, possibly due to potential misspecifications of our models, such as imperfect fit of the baseline hazard functions (the baseline hazard estimates are given in Table 9 in the Appendix). More importantly, this overestimation was systematic rather than in an overfitting pattern, i.e. underestimation occurs in low-risk individuals and overestimation occurs in high-risk individuals [36].

We derived ER-specific models from a large prospective cohort and validated them in another large independent cohort for external validation. This is a strong approach to robust parameterization and assessment of model performance. However, some limitations characterize the present study. Our models did not include some established risk factors such as family history of BC (FHBC) and previous breast biopsy, as these variables were not available in the EPIC study. A complete-case analysis of EPIC women with known FHBC (n = 138,257, 49% of the sample) showed positive homogenous associations between FHBC and tumor subtypes (HRER+ = 1.64, 95% CI = 1.49 to 1.81; HRER- = 1.50, 95% CI = 1.23 to 1.91; pheterogeneity = 0.57), suggesting that inclusion of this factor would increase the predictive power of the model, though not differentially across the hormonal receptor status of the tumors. Another limitation of our study was the underestimation of baseline hazards due to EPIC tumors with unknown ER status, which accounted for about 25% of BC diagnoses. Under the assumption of ER-status data missing at random, parameter estimates are expected to be unbiased, a necessary requisite to carry out proper external validation, whereas the underestimated baseline hazard would be replaced with the actual baseline hazard function of the test population.


In summary, we found that modeling heterogeneous risk associations of epidemiological factors yields little improvement in BC risk prediction. Nevertheless, compared with the current omnibus models, a model specifically predictive of ER+ tumor risk could be more applicable in risk assessment for chemoprevention.



Breast cancer


Body mass index


Confidence interval


European Prospective Investigation into Cancer and Nutrition


Ratio of observed to expected


Estrogen receptor


Family history of breast cancer


Full-term pregnancy


Hazard ratio


Hormone replacement therapy


Immunoreactive score


Risk prediction model of estrogen-negative tumors


Risk prediction model of estrogen-positive tumors


Gail model for breast cancer risk


Progesterone receptor


Women’s Health Initiative



The authors are thankful to all women who participated in the EPIC study and the WHI study and all the supporting staff who contributed to data collection and management.


This study was undertaken while KL was a postdoctoral fellow at the International Agency for Research on Cancer, which was partially supported by the European Commission FP7 Marie Curie Actions–People-Co-funding of regional, national and international programs (COFUND). The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Research Center (DKFZ) and Federal Ministry of Education and Research (BMBF) (Germany); Hellenic Health Foundation (Greece); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence program on Food, Nutrition and Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada;, PI13/01162 to EPIC-Murcia), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC (RD06/0020) (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (UK). The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C.

Availability of data and materials

The access policy to EPIC data and bio-specimens can be found at

Authors’ contributions

KL, PF, and MG designed the study. GA and RP provided access to the WHI data as sponsoring primary investigators. KL conducted the analyses, interpreted the results, and drafted the manuscript. All authors read and commented on the manuscript. All authors read and approved the final manuscript.

Ethics approval and consent to participate

All women in the EPIC and WHI studies provided written informed consent. In the WHI study, Human Subjects Committee approval at each participating institution was provided. The present study was approved by the Ethical Review Board of the International Agency for Research on Cancer (Lyon, France).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Authors’ Affiliations

Nutritional Methodology and Biostatistics Group, Nutrition and Metabolism Section, International Agency for Research on Cancer, 150 cours Albert Thomas, 69372 Lyon Cedex 08, France
Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, USA
Breast and Gynaecologic Cancer Registry of Côte d’Or, Georges-François Leclerc Comprehensive Cancer Care Centre, Dijon, France
EA 4184, Medical School, University of Burgundy, Dijon, France
CESP, INSERM U1018, Univ. Paris-Sud, UVSQ, Université Paris-Saclay, Villejuif, France
Gustave Roussy, Villejuif, France
Epidemiology and Prevention Unit, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
Cancer Registry and Histopathology Department, “Civic-M. P.Arezzo” Hospital, ASP, Ragusa, Italy
Escuela Andaluza de Salud Pública, Instituto de Investigación Biosanitaria ibs. GRANADA, Hospitales Universitarios de Granada/ Universidad de Granada, Granada, Spain
CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
Navarra Public Health Institute, Pamplona, Spain
IdiSNA, Navarra Institute for Health Research, Pamplona, Spain
CIBER de Epidemiología y Salud Pública (CIBERESP), Madrid, Spain
Department of Epidemiology, Regional Health Council, IMIB-Arrixaca, Murcia, Spain
Department of Health and Social Sciences, Universidad de Murcia, Murcia, Spain
Unit of Nutrition and Cancer. Cancer Epidemiology Research Program, Catalan Institute of Oncology-IDIBELL. L’Hospitalet de Llobregat, Barcelona, Spain
Department of Epidemiology & Biostatistics, School of Public Health, Imperial College London, London, UK
Cancer Epidemiology Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
Department for Determinants of Chronic Diseases, National Institute for Public Health and the Environment, Bilthoven, The Netherlands
Department of Gastroenterology and Hepatology, University Medical Centre, Utrecht, The Netherlands
Department of Social & Preventive Medicine, Faculty of Medicine, University of Malaya, Kuala Lumpur, Malaysia
Hellenic Health Foundation, Athens, Greece
WHO Collaborating Center for Nutrition and Health, Unit of Nutritional Epidemiology and Nutrition in Public Health, Department of Hygiene, Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece
Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg, Germany
Department of Population Health, New York University School of Medicine, New York, USA
Department of Environmental Medicine, New York University School of Medicine, New York, USA
Perlmutter Cancer Center, New York University School of Medicine, New York, USA
Department of Public Health and Clinical Medicine, Nutritional Research, Umeå University, Umeå, Sweden
Department of Surgical and Perioperative Sciences, Umeå University, Umeå, Sweden
Section for Epidemiology, Department of Public Health, Aarhus University, Aarhus, Denmark
Department of Cardiology, Aalborg University Hospital, Aalborg, Denmark
Department of Nutrition, Bjørknes University College, Oslo, Norway
Department of Research, Cancer Registry of Norway, Institute of Population-Based Cancer Research, Oslo, Norway
Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
Genetic Epidemiology Group, Folkhälsan Research Center, Helsinki, Finland
Department of Community Medicine, University of Tromsø, The Arctic University of Norway, Tromsø, Norway
Nutritional Epidemiology Group, Nutrition and Metabolism Section, International Agency for Research on Cancer, Lyon, France
Biomarkers Group, Nutrition and Metabolism Section, International Agency for Research on Cancer, Lyon, France


  1. Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86.View ArticleGoogle Scholar
  2. Engelhardt EG, Garvelink MM, de Haes JH, van der Hoeven JJ, Smets EM, Pieterse AH, Stiggelbout AM. Predicting and communicating the risk of recurrence and death in women with early-stage breast cancer: a systematic review of risk prediction models. J Clin Oncol. 2014;32(3):238–50.View ArticlePubMedGoogle Scholar
  3. Chlebowski RT, Anderson GL, Lane DS, Aragaki AK, Rohan T, Yasmeen S, Sarto G, Rosenberg CA, Hubbell FA. Predicting risk of breast cancer in postmenopausal women by hormone receptor status. J Natl Cancer Inst. 2007;99(22):1695–705.View ArticlePubMedGoogle Scholar
  4. Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst. 2001;93(5):358–66.View ArticlePubMedGoogle Scholar
  5. Anothaisintawee T, Teerawattananon Y, Wiratkapun C, Kasamesup V, Thakkinstian A. Risk prediction models of breast cancer: a systematic review of model performances. Breast Cancer Res Treat. 2012;133(1):1–10.View ArticlePubMedGoogle Scholar
  6. Althuis MD, Fergenbaum JH, Garcia-Closas M, Brinton LA, Madigan MP, Sherman ME. Etiology of hormone receptor-defined breast cancer: a systematic review of the literature. Cancer Epidemiol Biomark Prev. 2004;13(10):1558–68.Google Scholar
  7. Yang XR, Chang-Claude J, Goode EL, Couch FJ, Nevanlinna H, Milne RL, Gaudet M, Schmidt MK, Broeks A, Cox A, et al. Associations of breast cancer risk factors with tumor subtypes: a pooled analysis from the Breast Cancer Association Consortium studies. J Natl Cancer Inst. 2011;103(3):250–63.View ArticlePubMedGoogle Scholar
  8. Ma H, Bernstein L, Pike MC, Ursin G. Reproductive factors and breast cancer risk according to joint estrogen and progesterone receptor status: a meta-analysis of epidemiological studies. Breast Cancer Res. 2006;8(4):R43.View ArticlePubMedPubMed CentralGoogle Scholar
  9. Gierach GL, Yang XR, Figueroa JD, Sherman ME. Emerging concepts in breast cancer risk prediction. Curr Obstet Gynecol Rep. 2013;2(1):43–52.View ArticlePubMedPubMed CentralGoogle Scholar
  10. Colditz GA, Rosner BA, Chen WY, Holmes MD, Hankinson SE. Risk factors for breast cancer according to estrogen and progesterone receptor status. J Natl Cancer Inst. 2004;96(3):218–28.View ArticlePubMedGoogle Scholar
  11. Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, Charrondiere UR, Hemon B, Casagrande C, Vignat J, et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5(6B):1113–24.View ArticlePubMedGoogle Scholar
  12. Riboli E, Kaaks R. The EPIC project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol. 1997;26(Suppl 1):S6–14.View ArticlePubMedGoogle Scholar
  13. Anderson GL, Manson J, Wallace R, Lund B, Hall D, Davis S, Shumaker S, Wang CY, Stein E, Prentice RL. Implementation of the Women's Health Initiative study design. Ann Epidemiol. 2003;13(9 Suppl):S5–17.View ArticlePubMedGoogle Scholar
  14. Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011;20(1):40–9.View ArticlePubMedPubMed CentralGoogle Scholar
  15. Curb JD, McTiernan A, Heckbert SR, Kooperberg C, Stanford J, Nevitt M, Johnson KC, Proulx-Burns L, Pastore L, Criqui M, Daugherty S. Outcomes ascertainment and adjudication methods in the Women's Health Initiative. Ann Epidemiol. 2003;13(9 Suppl):S122–8.View ArticlePubMedGoogle Scholar
  16. Royston P, Lambert PC. Flexible parametric survival analysis using Stata: beyond the Cox model. College Station: Stata Press; 2011.Google Scholar
  17. Lunn M, McNeil D. Applying Cox regression to competing risks. Biometrics. 1995;51(2):524–32.View ArticlePubMedGoogle Scholar
  18. Verweij PJ, Van Houwelingen HC. Cross-validation in survival analysis. Stat Med. 1993;12(24):2305–14.View ArticlePubMedGoogle Scholar
  19. Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak. 2006;26(6):565–74.View ArticleGoogle Scholar
  20. Rousson V, Zumbrunn T. Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. BMC Med Inform Decis Mak. 2011.
  21. Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and hormone replacement therapy: collaborative reanalysis of data from 51 epidemiological studies of 52,705 women with breast cancer and 108,411 women without breast cancer. Lancet. 1997;350(9084):1047–59.View ArticleGoogle Scholar
  22. Chlebowski RT, Hendrix SL, Langer RD, Stefanick ML, Gass M, Lane D, Rodabough RJ, Gilligan MA, Cyr MG, Thomson CA, et al. Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: the Women's Health Initiative Randomized Trial. JAMA. 2003;289(24):3243–53.View ArticleGoogle Scholar
  23. Chlebowski RT, Anderson GL, Gass M, Lane DS, Aragaki AK, Kuller LH, Manson JE, Stefanick ML, Ockene J, Sarto GE, et al. Estrogen plus progestin and breast cancer incidence and mortality in postmenopausal women. JAMA. 2010;304(15):1684–92.View ArticlePubMedPubMed CentralGoogle Scholar
  24. Stefanick ML, Anderson GL, Margolis KL, Hendrix SL, Rodabough RJ, Paskett ED, et al. Effects of conjugated equine estrogens on breast cancer and mammography screening in postmenopausal women with hysterectomy. JAMA. 2006;295(14):1647–57.View ArticlePubMedGoogle Scholar
  25. Anderson GL, Chlebowski RT, Aragaki AK, Kuller LH, Manson JE, Gass M, Bluhm E, Connelly S, Hubbell FA, Lane D, et al. Conjugated equine oestrogen and breast cancer incidence and mortality in postmenopausal women with hysterectomy: extended follow-up of the Women's Health Initiative randomised placebo-controlled trial. Lancet Oncol. 2012;13(5):476–86.View ArticlePubMedPubMed CentralGoogle Scholar
  26. Beral V, Million Women Study Collaborators. Breast cancer and hormone-replacement therapy in the Million Women Study. Lancet. 2003;362(9382):419–27.View ArticlePubMedGoogle Scholar
  27. Stahlberg C, Pedersen AT, Lynge E, Andersen ZJ, Keiding N, Hundrup YA, Obel EB, Ottesen B. Increased risk of breast cancer following different regimens of hormone replacement therapy frequently used in Europe. Int J Cancer. 2004;109(5):721–7.View ArticlePubMedGoogle Scholar
  28. Collaborative Group on Hormonal Factors in Breast Cancer. Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13(11):1141–51.View ArticlePubMed CentralGoogle Scholar
  29. Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and breastfeeding: collaborative reanalysis of individual data from 47 epidemiological studies in 30 countries, including 50302 women with breast cancer and 96973 women without the disease. Lancet. 2002;360(9328):187–95.View ArticleGoogle Scholar
  30. Ritte R, Tikk K, Lukanova A, Tjonneland A, Olsen A, Overvad K, Dossus L, Fournier A, Clavel-Chapelon F, Grote V, et al. Reproductive factors and risk of hormone receptor positive and negative breast cancer: a cohort study. BMC Cancer. 2013;13:584.View ArticlePubMedPubMed CentralGoogle Scholar
  31. Phipps AI, Chlebowski RT, Prentice R, McTiernan A, Wactawski-Wende J, Kuller LH, Adams-Campbell LL, Lane D, Stefanick ML, Vitolins M, et al. Reproductive history and oral contraceptive use in relation to risk of triple-negative breast cancer. J Natl Cancer Inst. 2011;103(6):470–7.View ArticlePubMedPubMed CentralGoogle Scholar
  32. Islami F, Liu Y, Jemal A, Zhou J, Weiderpass E, Colditz G, Boffetta P, Weiss M. Breastfeeding and breast cancer risk by receptor status—a systematic review and meta-analysis. Ann Oncol. 2015;26(12):2398–407.PubMedPubMed CentralGoogle Scholar
  33. Zhang B, Shu XO, Delahanty RJ, Zeng C, Michailidou K, Bolla MK, Wang Q, Dennis J, Wen W, Long J, et al. Height and breast cancer risk: evidence from prospective studies and Mendelian randomization. J Natl Cancer Inst. 2015;107(11):djv219.Google Scholar
  34. Nelson HD, Fu R, Griffin JC, Nygren P, Smith ME, Humphrey L. Systematic review: comparative effectiveness of medications to reduce risk for primary breast cancer. Ann Intern Med. 2009;151(10):703–15 W-226-35.View ArticlePubMedGoogle Scholar
  35. Freedman AN, Graubard BI, Rao SR, McCaskill-Stevens W, Ballard-Barbash R, Gail MH. Estimates of the number of US women who could benefit from tamoxifen for breast cancer chemoprevention. J Natl Cancer Inst. 2003;95(7):526–32.View ArticlePubMedGoogle Scholar
  36. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38.View ArticlePubMedPubMed CentralGoogle Scholar


© The Author(s). 2018