Skip to main content


Risk prediction for estrogen receptor-specific breast cancers in two large prospective cohorts



Few published breast cancer (BC) risk prediction models consider the heterogeneity of predictor variables between estrogen-receptor positive (ER+) and negative (ER-) tumors. Using data from two large cohorts, we examined whether modeling this heterogeneity could improve prediction.


We built two models, for ER+ (ModelER+) and ER- tumors (ModelER-), respectively, in 281,330 women (51% postmenopausal at recruitment) from the European Prospective Investigation into Cancer and Nutrition cohort. Discrimination (C-statistic) and calibration (the agreement between predicted and observed tumor risks) were assessed both internally and externally in 82,319 postmenopausal women from the Women’s Health Initiative study. We performed decision curve analysis to compare ModelER+ and the Gail model (ModelGail) regarding their applicability in risk assessment for chemoprevention.


Parity, number of full-term pregnancies, age at first full-term pregnancy and body height were only associated with ER+ tumors. Menopausal status, age at menarche and at menopause, hormone replacement therapy, postmenopausal body mass index, and alcohol intake were homogeneously associated with ER+ and ER- tumors. Internal validation yielded a C-statistic of 0.64 for ModelER+ and 0.59 for ModelER-. External validation reduced the C-statistic of ModelER+ (0.59) and ModelGail (0.57). In external evaluation of calibration, ModelER+ outperformed the ModelGail: the former led to a 9% overestimation of the risk of ER+ tumors, while the latter yielded a 22% underestimation of the overall BC risk. Compared with the treat-all strategy, ModelER+ produced equal or higher net benefits irrespective of the benefit-to-harm ratio of chemoprevention, while ModelGail did not produce higher net benefits unless the benefit-to-harm ratio was below 50. The clinical applicability, i.e. the area defined by the net benefit curve and the treat-all and treat-none strategies, was 12.7 × 10− 6 for ModelER+ and 3.0 × 10− 6 for ModelGail.


Modeling heterogeneous epidemiological risk factors might yield little improvement in BC risk prediction. Nevertheless, a model specifically predictive of ER+ tumor risk could be more applicable than an omnibus model in risk assessment for chemoprevention.


Breast cancer (BC) screening and chemoprevention strategies should prioritize women who are expected to benefit from the interventions. Risk prediction models could be useful assessment tools to facilitate this strategy, as long as the models themselves possess sufficient predictive power. So far, more than 20 risk prediction models have been developed for BC since the first model developed by Gail in 1989 [1, 2]. Initially, the Gail model (hereinafter referred to as ModelGail) was based on age, age at menarche and at first live birth, previous breast biopsy, and family history of BC, yielding moderate discriminatory power (C-statistic) of 0.58 in external validations [3, 4]. New predictors, such as breast density, hormone replacement therapy (HRT), anthropometric measures, and lifestyle factors (e.g. alcohol intake), were continuously introduced into the succeeding models, resulting in marginal improvements in prediction [5].

BC comprises etiologically distinct subtypes defined by molecular factors. Hormonal and reproductive factors, such as elevated circulating sex hormone levels, early menarche, delayed childbirth, and nulliparity, are only or are more strongly related to increased risks of subtypes expressing estrogen receptor (ER+) and progesterone receptor (PR+) [6]. Further, ER+ breast tumors respond more favorably to hormone therapy than ER-/PR- tumors [6,7,8]. It has been hypothesized that combining etiologically distinct subtypes as one single outcome undermines BC prediction [9]. However, most of the published BC risk prediction models are omnibus models and only one model differentiates risk associations by hormone receptor status [10].

In the current analysis, using data from the European Prospective Investigation into Cancer and Nutrition (EPIC) and the Women’s Health Initiative (WHI) study in the USA, we examined whether modeling heterogeneous risk associations by ER status, which entails building ER-specific risk prediction models, could yield better prediction of BC risk.


Study population for model derivation and internal validation

The study population for model derivation consisted of women recruited into the EPIC cohort from 1992 to 2000 in 10 European countries (Norway, Sweden, Denmark, the UK, the Netherlands, Germany, France, Spain, Italy, and Greece) [11, 12]. Women with one or more of the following characteristics were excluded: (1) < 40 or > 70 years of age at recruitment (n = 49,410); (2) diagnosed with cancer before recruitment (n = 39,760); and (3) no information on censoring date and/or disease status (n = 142). All women recruited in the study center of Malmö, Sweden were also excluded due to lack of information on ER status for all BC diagnoses (n = 14,396). After these exclusions, 281,330 women (51% postmenopausal at recruitment) were included in the analysis.

Study population for external validation

The WHI study was launched in 1993 and recruited 161,808 postmenopausal women aged 50–79 years into either an observational study or one of the three clinical trials that tested the health effects of HRT, a low-fat diet, and calcium-vitamin D supplementation, respectively [13]. For the purpose of the present study, we excluded non-Caucasian women (n = 28,267), women in the HRT trial (n = 27,347), women who had mastectomy or a history of cancer at recruitment (n = 16,501), and women with incomplete information on the risk factors considered in our models (n = 29,431), resulting in a validation population of 82,319 women.

All women in the EPIC and WHI studies provided written informed consent. In the WHI study, Human Subjects Committee approval at each participating institution was provided. The present study was approved by the Ethical Review Board of the International Agency for Research on Cancer (Lyon, France).

Risk factors and disease outcomes

Among the most frequently included predictors in current BC risk prediction models [5], the following variables were available in EPIC and WHI, and were therefore included in this study: menopausal status, age at menopause, age at menarche, duration of HRT, duration of breastfeeding, full-term pregnancy (FTP), number of FTPs, age at first FTP, body height, body mass index (BMI), interaction between BMI and menopausal status, alcohol intake, and country. Table 5 in Appendix provides the coding of these predictor variables. We retained all the women for analysis and handled the missing values by five-time multiple imputations with chained equations [14]. Three predictor variablesin the Gail model were not included in our models, i.e. family history of BC in first-degree relatives, previous breast biopsy, and history of atypical hyperplasia. In the EPIC study, family history of BC was only available for 49% of women, while information on previous breast biopsy and history of atypical hyperplasia were not collected.

Sensitivity analyses that included effect modification of parity by menopausal status in the EPIC study showed no evidence of statistically significant interactions. Similarly, no effect modifications were observed for HRT by BMI and breastfeeding by parity. These interactions were hence not retained further.

Incident BC diagnoses among the EPIC women were ascertained through national cancer registries or a combination of health insurance records, pathology registries, and regular questionnaire surveys. The definition of positive hormone receptor status was standardized using the following criteria: ≥ 10% cells stained, any “plus-system description”, ≥ 20 fmol/mg, an Allred score of ≥ 3, immunoreactive score (IRS) ≥ 2, or an H-score ≥ 10. Among the WHI women, centrally trained, locally based physician adjudicators verified BC diagnoses by medical record and pathology report review, and positive hormone receptor status was defined as ≥ 10% cells stained [15].

Absolute risk modeling

Using the EPIC data, we fitted cause-specific piecewise-constant hazards models [16] for ER+ and ER- tumors separately (hereinafter referred to as ModelER+ and ModelER-). The cutoffs were placed at 45, 50, 55, 60, 65, 70, and 75 years of age. Whether a risk association is heterogeneous by ER status was examined using the likelihood ratio test [17].

Tumors with unknown ER status, primary cancers at other sites, and deaths from non-cancer causes were modelled as competing events to ER+ tumors and ER- tumors. A Gompertz model with age as the time scale was fitted for all these competing events combined. In addition, ER+ and ER- tumors were considered mutually competing.

To evaluate the improvement in risk prediction by modeling the heterogeneous risk associations, an omnibus model was also fitted following the same methodology described above, treating ER+ and ER- tumors as one single disease outcome.

Model validation

First, we validated our ER-specific models internally by fivefold cross-validation [18] and then externally using WHI data. For external validation using the WHI data, we combined the model coefficients derived from the EPIC women and the ER-specific baseline hazards of the WHI women to project 5-year ER-specific absolute risks. We calculated C-statistics to assess discriminatory accuracy and the ratio of expected-to-observed number of tumors occurring in the first 5 years (E/O) to assess overall calibration. In the WHI women, the 5-year absolute risk of developing BC was projected using ModelGail, enabling us to compare the performance of our model with that of ModelGail.

We performed decision curve analysis in the WHI women to compare the clinical applicability of ModelER+ and ModelGail for identification of women for chemoprevention.

Let B denote the benefit of receiving chemoprevention for an individual who would develop BC, H the harm of receiving chemoprevention for an individual who would never develop BC, and pi indicates an individual risk. The rationale of decision curve analysis is that positive net benefits is guaranteed at the population level if chemoprevention only covers individuals with risk projections pi above the risk threshold pt, where:

pt × B = (1 − pt) × H [19, 20].

Given the fact that quantities of B and H of chemoprevention remain unknown, net benefits are calculated through all the possible risk thresholds between two extremes, i.e. zero and the maximal risk estimate, representing a treat-all strategy and a treat-none strategy, respectively. The clinical applicability of a risk prediction model is indicated by how much the model’s net benefit curve is above the treat-all and treat-none strategies, i.e. the area formed by the model’s net benefit curve and the two extreme strategies.


Cohort description

Country-specific distributions of the risk factors among the EPIC women are shown in Table 6 in Appendix. Distributions of the same risk factors among the WHI women are shown in Table 7 in Appendix. During an average follow-up period of 14.7 years, 12,067 BC cases (7210 ER+ tumors, 1598 ER- tumors, and 3259 tumors with unknown ER status), 16,929 primary cancers at other sites, and 6548 deaths from non-cancer causes were ascertained among the EPIC women, as reported in Table 1.

Table 1 Distribution of incident breast cancer (BC) by country, estrogen receptor (ER) status, and baseline menopausal status among the women from the European Prospective Investigation into Cancer and Nutrition (EPIC) and Women’s Health Initiative (WHI) studies

The ER-specific absolute risk models

Among the risk factors with identical associations by ER status (Table 2), being postmenopausal compared with premenopausal at recruitment was associated with a reduced tumor risk after controlling for age (hazard ratio (HR) = 0.66, 95% confidence interval (CI) = 0.60 to 0.74). For postmenopausal women, a statistically significant and monotonically increasing tumor risk was observed with older age at menopause compared with reaching menopause before the age of 45 years (ptrend < 0.001). No statistically significant association was observed for breastfeeding and breast cancer risk among parous women. Later age at menarche (≥ 15 vs ≤ 11 years of age) was statistically significantly associated with decreased tumor risk (HR = 0.85, 95% CI = 0.79 to 0.92). Duration of HRT was statistically significantly associated with increased breast cancer risk (ptrend < 0.001). BMI was associated with breast cancer and exhibited a statistically significant interaction with menopausal status: for postmenopausal women, HRs (95% CIs) for the BMI categories in ascending order were 1.11 (1.04 to 1.18), 1.21 (1.10 to 1.34), and 1.30 (1.11 to 1.53), respectively. For alcohol intake, exceeding one drink per day, compared with nondrinking, was statistically significantly associated with an increased breast cancer risk.

Table 2 Risk associations for ER+ and ER- tumors, the EPIC study and the WHI studya

Tests for heterogeneity showed differential risk associations for FTP, number of FTPs, age at first FTP, body height, and country by ER status (Table 2 and Table 8 in Appendix). Parity (one single FTP, age at FTP ≤ 20 years) compared with nulliparity was associated with a statistically significant reduction in ER+ tumor risk (HR = 0.81, 95% CI = 0.71 to 0.91). Among parous women, having three or more FTPs was associated with a further risk reduction for ER+ tumors compared with one single FTP (HR = 0.87, 95% CI = 0.80 to 0.95), and delayed age at first FTP was associated with increased ER+ tumor risk (ptrend < 0.001). In addition, every 10-cm increment in body height was associated with a 19% increase in ER+ tumor risk (95% CI = 1.15 to 1.24). None of these factors, however, was statistically significantly associated with ER- tumor risk. Table 8 in Appendix shows the coefficients for different countries by ER status. Based on the same heterogeneous risk factor profiles, we also estimated the risk associations using the WHI data (Table 2), which were largely comparable to those from the EPIC study, with the exception of age at menarche, and especially for ER- tumors, FTP, number of FTP, and age at first FTP.

Model validation

Table 3 shows the predictive performance of the ER-specific models (C-statistic and E/O) corrected by the fivefold cross-validation. ModelER+, ModelER- and the omnibus model shared a C-statistic of 0.68. Elimination of the country effect reduced the C-statistic notably to 0.64 for ModelER+, 0.59 for ModelER-, and 0.63 for the omnibus model. A minor difference in C-statistic was observed between premenopausal and postmenopausal women. The omnibus model exhibited a higher C-statistic for ER+ than for ER- tumors (0.64 vs 0.59). ModelER+ significantly overestimated the 5-year tumor risk by 10% (E/O = 1.10, 95% CI = 1.05 to 1.14), particularly among premenopausal women (13%). ModelER- non-significantly underestimated the risk (E/O = 0.96, 95% CI = 0.88 to 1.05) overall and by menopausal status.

Table 3 Internal validation of the estrogen receptor (ER)-specific risk prediction models (ModelFR+ and ModelFR-) by fivefold cross-validation, overall and by age, in the women from the European Prospective Investigation into Cancer and Nutrition (EPIC) study

External validation with the WHI data resulted in a C-statistic of 0.59 (95% CI = 0.58 to 0.60) for ModelER+ and 0.53 (95% CI = 0.50 to 0.57) for ModelER- (Table 4). ModelGail yielded an overall C-statistic of 0.57 (95% CI = 0.56 to 0.59) with a markedly lower C-statistic of 0.53 (95% CI = 0.50 to 0.57) for ER- tumors. Regarding calibration, an overestimation was observed for ER+ tumors (E/O = 1.09, 95% CI = 1.03 to 1.14) whereas a statistically non-significant underestimation was observed for ER- tumors (E/O = 0.94, 95% CI = 0.82 to 1.06). ModelGail underestimated the overall BC risk by 22% (E/O = 0.78, 95% CI = 0.73 to 0.82). Among the EPIC women, the overestimation of ER+ tumor risk occurred largely in low-risk individuals (Fig. 1a); for ER- tumor risk, overestimation was observed mainly among low-risk individuals whereas underestimation was observed mainly among high-risk individuals (Fig. 1b). Among WHI women, the overestimation by ModelER+ and the underestimation by ModelGail were largely systematic (Fig. 1c and e). The statistically non-significant underestimation by ModelER- in the WHI women showed no clear pattern (Fig. 1d).

Table 4 External validation of the estrogen receptor (ER)-specific risk prediction models and the Gail model in women from the Women’s Health Initiative (WHI) study
Fig. 1

Calibration of the risk prediction model of ER-positive tumors (ModelER+), risk prediction model of ER-negative tumors (ModelER-), and Gail risk prediction model (ModelGail) by risk deciles. a ModelER+ in women from the European Prospective Investigation into Cancer and Nutrition (EPIC); b ModelER- in the EPIC women; c ModelER+ in the women from the Women’s Health Initiative (WHI); d ModelER- in the WHI women; e ModelGail in the WHI women

Figure 2 shows the net benefit curves of ModelER+ and ModelGail. The net benefit curves of the two models started to diverge from the treat-all strategies at the risk threshold of 0.55%, which was roughly the minimal risk projected by both models. ModelER+ would yield higher net benefits than both the treat-all strategy and the treat-none strategy (denoted by the x-axis at y = 0) if the risk threshold lay between 0.55% and 2.5%, corresponding to an assumption that the benefit of chemoprevention was 180 to 40 times the harm. In contrast, ModelGail would yield lower net benefits than the treat-all strategy if a risk threshold below 2% were selected, including 1.67%, the currently adopted risk threshold for chemoprevention in the USA, and would yield negative net benefits if a risk threshold above 4% (i.e. benefit ≈ 25 × harm) were selected. The clinical applicability of ModelGaili, as indicated by the sum of Area A and Area B shown in Fig. 2, was 3.0 × 10− 6. The clinical applicability of ModelER+ was 12.7 × 10− 6 (Area C).

Fig. 2

Net benefit curves for the risk prediction model of ER-positive tumors (ModelER+) (black solid line) and the Gail risk prediction model (ModelGail) (black broken line) applied to women from the Women’s Health Initiative study. Corresponding curves for the treat-all strategy are represented in gray (solid line for all breast cancer cases, broken line for ER+ tumors only). Area A = − 7.84 × 10-6; Area B = 1.08 × 10-5; Area C = 1.27 × 10-5


The heterogeneous risk associations in our ER-specific risk prediction models are consistent with the established knowledge that FTP, number of FTPs, and delayed childbirth are associated with ER+ tumors but not with ER- tumors [6,7,8]. Our study also confirms a largescale meta-analysis of epidemiological data showing that BC risk increases with prolonged duration of HRT use [21]. Data from the WHI randomized trial showed a statistically significant increase in the incidence and mortality of invasive BC in the estrogen-plus-progestin arm compared with the placebo arm [22, 23], whereas estrogen alone decreased BC incidence and mortality among postmenopausal women with prior hysterectomy [24, 25]. Stronger positive associations for estrogen plus progestin than for estrogen alone were reported for BC [26, 27]. In the present study, we could not separate estrogen alone and estrogen plus progestin due to unknown HRT compounds among former users in EPIC. Among current HRT users at baseline, use of estrogen plus progestin was more common in EPIC than in the WHI cohort (76% vs 44%, respectively). However, similar associations between the duration of lifetime HRT use and BC risk were observed in both the EPIC and the WHI study.

In ER-specific risk models, statistically significant and homogeneous risk associations were fitted for age at menopause and age at menarche, in line with a pooled analysis of previous investigations where nearly identical effects were observed for ER+ tumors and ER- tumors [28]. The present study demonstrated a null association between breastfeeding and BC risk, inconsistent with previous investigations where inverse associations were reported [6, 8, 29]. We note that most previous studies were case-control studies, which were subject to recall bias. In fact, the inverse association disappeared in some cohort studies [30, 31]. In a more recent pooled analysis, breastfeeding was not associated with ER+ and/or PR+ tumors but was inversely associated with ER-/PR- tumors [32].

In a pooled analysis of prospective cohort data, every 10-cm increment in body height was statistically significantly associated with ER+ tumor risk (HR = 1.18) but had null association with ER- tumor risk [33], supporting the way we modeled body height in the present study.

Prediction of ER+ tumor risk might be practically more useful than prediction of overall BC risk [3]. The reason for this is twofold. First, projecting subtype-specific risks allows for accurate estimation of the risk associations of factors that are etiologically heterogeneous and as a result might increase the discriminatory power. Second, since currently used chemoprevention only reduces the risk of ER+ tumors [34], there is a need for a model that can specifically predict the risk of developing ER+ tumors.

The discriminatory accuracy of ModelER+ in internal validation performed no better than most of the current omnibus models using questionnaire-derived data, suggesting limited improvement in discrimination after accounting for etiological heterogeneity. This was not surprising given that ER+ tumors are the dominant subtype and the omnibus model shared nearly equivalent parameters (data not shown) with ModelER+ in the present study. According to the only study so far that has modeled ER-specific risks, the discriminatory power of the ER+/PR+ model was moderately higher than that of the ER-/PR- model (0.64 vs 0.61) [10]. In that study, risk factors with heterogeneous associations included age, menopausal status, BMI, age at first birth, and past use of postmenopausal HRT, and its subtype-specific models were based on a relatively small number of tumors (1281 ER+/PR+ tumors, 417 ER-/PR- tumors). Notably, in that study there was no correction for potential overfitting by either internal or external approaches.

When externally validated in the WHI cohort, ModelER+ exhibited moderate discriminatory accuracy comparable to that of ModelGail. Women in the USA with 5-year BC risk of 1.67% or higher, projected by ModelGail, are considered potentially eligible for chemoprevention [35]. This risk threshold would lead to coverage of 36,265 (44.0%) women in our WHI validation population, of whom 1239 were subsequently diagnosed with ER+ tumors and 194 with ER- tumors. According to ModelER+, a risk threshold of 1.97% would cover the same number of women with 16 more prospective ER+ tumors and 2 fewer prospective ER- tumors.

The decision curve analysis provided some interesting insight into the clinical applicability of ModelER+ and ModelGail. As indicated by the net benefit curves, ModelER+ would demonstrate no advantage over the treat-all strategy if the benefit-to-harm ratio of chemoprevention were higher than 180, equivalent to any risk threshold below the minimal risk projection (≈ 0.55%), while such a boundary benefit-to-harm ratio was 50 for ModelGail. Interestingly, the treat-all strategy would even outperform ModelGail when the risk threshold was situated at 1.67%. In contrast to ModelGail, ModelER+ had a wider threshold range where higher net benefits could be obtained by a model-based decision-making than by either the treat-all or the treat-none strategy. Considering the unknown benefit and harm associated with chemoprevention, ModelER+ thus has broader applicability than ModelGail, as indicated by the areas formed by the two models’ net benefit curves and the two extreme strategies. As shown in Fig. 2, the lowest benefit-to-harm ratio for chemoprevention against BC to produce a positive net benefit is 25, whereas such a benefit-to-harm ratio for chemoprevention against ER+ tumors is 40, suggesting that chemoprevention against ER+ tumors might be 1.6 times (40/25) more efficient than chemoprevention against all types of BC.

Among both the EPIC women and the WHI women, ModelER+ overestimated the 5-year risk by about 10%, possibly due to potential misspecifications of our models, such as imperfect fit of the baseline hazard functions (the baseline hazard estimates are given in Table 9 in the Appendix). More importantly, this overestimation was systematic rather than in an overfitting pattern, i.e. underestimation occurs in low-risk individuals and overestimation occurs in high-risk individuals [36].

We derived ER-specific models from a large prospective cohort and validated them in another large independent cohort for external validation. This is a strong approach to robust parameterization and assessment of model performance. However, some limitations characterize the present study. Our models did not include some established risk factors such as family history of BC (FHBC) and previous breast biopsy, as these variables were not available in the EPIC study. A complete-case analysis of EPIC women with known FHBC (n = 138,257, 49% of the sample) showed positive homogenous associations between FHBC and tumor subtypes (HRER+ = 1.64, 95% CI = 1.49 to 1.81; HRER- = 1.50, 95% CI = 1.23 to 1.91; pheterogeneity = 0.57), suggesting that inclusion of this factor would increase the predictive power of the model, though not differentially across the hormonal receptor status of the tumors. Another limitation of our study was the underestimation of baseline hazards due to EPIC tumors with unknown ER status, which accounted for about 25% of BC diagnoses. Under the assumption of ER-status data missing at random, parameter estimates are expected to be unbiased, a necessary requisite to carry out proper external validation, whereas the underestimated baseline hazard would be replaced with the actual baseline hazard function of the test population.


In summary, we found that modeling heterogeneous risk associations of epidemiological factors yields little improvement in BC risk prediction. Nevertheless, compared with the current omnibus models, a model specifically predictive of ER+ tumor risk could be more applicable in risk assessment for chemoprevention.



Breast cancer


Body mass index


Confidence interval


European Prospective Investigation into Cancer and Nutrition


Ratio of observed to expected


Estrogen receptor


Family history of breast cancer


Full-term pregnancy


Hazard ratio


Hormone replacement therapy


Immunoreactive score

ModelER- :

Risk prediction model of estrogen-negative tumors

ModelER+ :

Risk prediction model of estrogen-positive tumors

ModelGail :

Gail model for breast cancer risk


Progesterone receptor


Women’s Health Initiative


  1. 1.

    Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, Mulvihill JJ. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81(24):1879–86.

  2. 2.

    Engelhardt EG, Garvelink MM, de Haes JH, van der Hoeven JJ, Smets EM, Pieterse AH, Stiggelbout AM. Predicting and communicating the risk of recurrence and death in women with early-stage breast cancer: a systematic review of risk prediction models. J Clin Oncol. 2014;32(3):238–50.

  3. 3.

    Chlebowski RT, Anderson GL, Lane DS, Aragaki AK, Rohan T, Yasmeen S, Sarto G, Rosenberg CA, Hubbell FA. Predicting risk of breast cancer in postmenopausal women by hormone receptor status. J Natl Cancer Inst. 2007;99(22):1695–705.

  4. 4.

    Rockhill B, Spiegelman D, Byrne C, Hunter DJ, Colditz GA. Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. J Natl Cancer Inst. 2001;93(5):358–66.

  5. 5.

    Anothaisintawee T, Teerawattananon Y, Wiratkapun C, Kasamesup V, Thakkinstian A. Risk prediction models of breast cancer: a systematic review of model performances. Breast Cancer Res Treat. 2012;133(1):1–10.

  6. 6.

    Althuis MD, Fergenbaum JH, Garcia-Closas M, Brinton LA, Madigan MP, Sherman ME. Etiology of hormone receptor-defined breast cancer: a systematic review of the literature. Cancer Epidemiol Biomark Prev. 2004;13(10):1558–68.

  7. 7.

    Yang XR, Chang-Claude J, Goode EL, Couch FJ, Nevanlinna H, Milne RL, Gaudet M, Schmidt MK, Broeks A, Cox A, et al. Associations of breast cancer risk factors with tumor subtypes: a pooled analysis from the Breast Cancer Association Consortium studies. J Natl Cancer Inst. 2011;103(3):250–63.

  8. 8.

    Ma H, Bernstein L, Pike MC, Ursin G. Reproductive factors and breast cancer risk according to joint estrogen and progesterone receptor status: a meta-analysis of epidemiological studies. Breast Cancer Res. 2006;8(4):R43.

  9. 9.

    Gierach GL, Yang XR, Figueroa JD, Sherman ME. Emerging concepts in breast cancer risk prediction. Curr Obstet Gynecol Rep. 2013;2(1):43–52.

  10. 10.

    Colditz GA, Rosner BA, Chen WY, Holmes MD, Hankinson SE. Risk factors for breast cancer according to estrogen and progesterone receptor status. J Natl Cancer Inst. 2004;96(3):218–28.

  11. 11.

    Riboli E, Hunt KJ, Slimani N, Ferrari P, Norat T, Fahey M, Charrondiere UR, Hemon B, Casagrande C, Vignat J, et al. European Prospective Investigation into Cancer and Nutrition (EPIC): study populations and data collection. Public Health Nutr. 2002;5(6B):1113–24.

  12. 12.

    Riboli E, Kaaks R. The EPIC project: rationale and study design. European Prospective Investigation into Cancer and Nutrition. Int J Epidemiol. 1997;26(Suppl 1):S6–14.

  13. 13.

    Anderson GL, Manson J, Wallace R, Lund B, Hall D, Davis S, Shumaker S, Wang CY, Stein E, Prentice RL. Implementation of the Women's Health Initiative study design. Ann Epidemiol. 2003;13(9 Suppl):S5–17.

  14. 14.

    Azur MJ, Stuart EA, Frangakis C, Leaf PJ. Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res. 2011;20(1):40–9.

  15. 15.

    Curb JD, McTiernan A, Heckbert SR, Kooperberg C, Stanford J, Nevitt M, Johnson KC, Proulx-Burns L, Pastore L, Criqui M, Daugherty S. Outcomes ascertainment and adjudication methods in the Women's Health Initiative. Ann Epidemiol. 2003;13(9 Suppl):S122–8.

  16. 16.

    Royston P, Lambert PC. Flexible parametric survival analysis using Stata: beyond the Cox model. College Station: Stata Press; 2011.

  17. 17.

    Lunn M, McNeil D. Applying Cox regression to competing risks. Biometrics. 1995;51(2):524–32.

  18. 18.

    Verweij PJ, Van Houwelingen HC. Cross-validation in survival analysis. Stat Med. 1993;12(24):2305–14.

  19. 19.

    Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Mak. 2006;26(6):565–74.

  20. 20.

    Rousson V, Zumbrunn T. Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. BMC Med Inform Decis Mak. 2011.

  21. 21.

    Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and hormone replacement therapy: collaborative reanalysis of data from 51 epidemiological studies of 52,705 women with breast cancer and 108,411 women without breast cancer. Lancet. 1997;350(9084):1047–59.

  22. 22.

    Chlebowski RT, Hendrix SL, Langer RD, Stefanick ML, Gass M, Lane D, Rodabough RJ, Gilligan MA, Cyr MG, Thomson CA, et al. Influence of estrogen plus progestin on breast cancer and mammography in healthy postmenopausal women: the Women's Health Initiative Randomized Trial. JAMA. 2003;289(24):3243–53.

  23. 23.

    Chlebowski RT, Anderson GL, Gass M, Lane DS, Aragaki AK, Kuller LH, Manson JE, Stefanick ML, Ockene J, Sarto GE, et al. Estrogen plus progestin and breast cancer incidence and mortality in postmenopausal women. JAMA. 2010;304(15):1684–92.

  24. 24.

    Stefanick ML, Anderson GL, Margolis KL, Hendrix SL, Rodabough RJ, Paskett ED, et al. Effects of conjugated equine estrogens on breast cancer and mammography screening in postmenopausal women with hysterectomy. JAMA. 2006;295(14):1647–57.

  25. 25.

    Anderson GL, Chlebowski RT, Aragaki AK, Kuller LH, Manson JE, Gass M, Bluhm E, Connelly S, Hubbell FA, Lane D, et al. Conjugated equine oestrogen and breast cancer incidence and mortality in postmenopausal women with hysterectomy: extended follow-up of the Women's Health Initiative randomised placebo-controlled trial. Lancet Oncol. 2012;13(5):476–86.

  26. 26.

    Beral V, Million Women Study Collaborators. Breast cancer and hormone-replacement therapy in the Million Women Study. Lancet. 2003;362(9382):419–27.

  27. 27.

    Stahlberg C, Pedersen AT, Lynge E, Andersen ZJ, Keiding N, Hundrup YA, Obel EB, Ottesen B. Increased risk of breast cancer following different regimens of hormone replacement therapy frequently used in Europe. Int J Cancer. 2004;109(5):721–7.

  28. 28.

    Collaborative Group on Hormonal Factors in Breast Cancer. Menarche, menopause, and breast cancer risk: individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13(11):1141–51.

  29. 29.

    Collaborative Group on Hormonal Factors in Breast Cancer. Breast cancer and breastfeeding: collaborative reanalysis of individual data from 47 epidemiological studies in 30 countries, including 50302 women with breast cancer and 96973 women without the disease. Lancet. 2002;360(9328):187–95.

  30. 30.

    Ritte R, Tikk K, Lukanova A, Tjonneland A, Olsen A, Overvad K, Dossus L, Fournier A, Clavel-Chapelon F, Grote V, et al. Reproductive factors and risk of hormone receptor positive and negative breast cancer: a cohort study. BMC Cancer. 2013;13:584.

  31. 31.

    Phipps AI, Chlebowski RT, Prentice R, McTiernan A, Wactawski-Wende J, Kuller LH, Adams-Campbell LL, Lane D, Stefanick ML, Vitolins M, et al. Reproductive history and oral contraceptive use in relation to risk of triple-negative breast cancer. J Natl Cancer Inst. 2011;103(6):470–7.

  32. 32.

    Islami F, Liu Y, Jemal A, Zhou J, Weiderpass E, Colditz G, Boffetta P, Weiss M. Breastfeeding and breast cancer risk by receptor status—a systematic review and meta-analysis. Ann Oncol. 2015;26(12):2398–407.

  33. 33.

    Zhang B, Shu XO, Delahanty RJ, Zeng C, Michailidou K, Bolla MK, Wang Q, Dennis J, Wen W, Long J, et al. Height and breast cancer risk: evidence from prospective studies and Mendelian randomization. J Natl Cancer Inst. 2015;107(11):djv219.

  34. 34.

    Nelson HD, Fu R, Griffin JC, Nygren P, Smith ME, Humphrey L. Systematic review: comparative effectiveness of medications to reduce risk for primary breast cancer. Ann Intern Med. 2009;151(10):703–15 W-226-35.

  35. 35.

    Freedman AN, Graubard BI, Rao SR, McCaskill-Stevens W, Ballard-Barbash R, Gail MH. Estimates of the number of US women who could benefit from tamoxifen for breast cancer chemoprevention. J Natl Cancer Inst. 2003;95(7):526–32.

  36. 36.

    Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128–38.

Download references


The authors are thankful to all women who participated in the EPIC study and the WHI study and all the supporting staff who contributed to data collection and management.


This study was undertaken while KL was a postdoctoral fellow at the International Agency for Research on Cancer, which was partially supported by the European Commission FP7 Marie Curie Actions–People-Co-funding of regional, national and international programs (COFUND). The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Research Center (DKFZ) and Federal Ministry of Education and Research (BMBF) (Germany); Hellenic Health Foundation (Greece); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); ERC-2009-AdG 232997 and Nordforsk, Nordic Centre of Excellence program on Food, Nutrition and Health (Norway); Health Research Fund (FIS), PI13/00061 to Granada;, PI13/01162 to EPIC-Murcia), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, ISCIII RETIC (RD06/0020) (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C570/A16491 and C8221/A19170 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk, MR/M012190/1 to EPIC-Oxford) (UK). The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C.

Availability of data and materials

The access policy to EPIC data and bio-specimens can be found at

Author information

KL, PF, and MG designed the study. GA and RP provided access to the WHI data as sponsoring primary investigators. KL conducted the analyses, interpreted the results, and drafted the manuscript. All authors read and commented on the manuscript. All authors read and approved the final manuscript.

Correspondence to Pietro Ferrari.

Ethics declarations

Ethics approval and consent to participate

All women in the EPIC and WHI studies provided written informed consent. In the WHI study, Human Subjects Committee approval at each participating institution was provided. The present study was approved by the Ethical Review Board of the International Agency for Research on Cancer (Lyon, France).

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Description of the Gail model

The Gail model, also known as the Breast Cancer Risk Assessment Tool, has been adopted to estimate the 5-year absolute risk of developing invasive breast cancer among women aged 35 years or older. Women with a 5-year absolute risk of 1.67% or higher as projected by the Gail model are regarded as eligible for chemoprevention by tamoxifen. The Gail model includes the following predictor variables: age, ethnicity, age at menarche, age at first live birth, number of first-degree relatives with breast cancer, number of previous breast biopsies, and history of atypical hyperplasia. The relative risks of these risk factors were estimated from a case-control study within the Breast Cancer Detection Demonstration Project (BCDDP). The baseline age-specific hazard rates were also calculated from the BCDDP as the observed age-specific hazard rates times 1 minus the population attributable fraction [1]. Five-year breast cancer risk projection in the Women’s Health Initiative study using the Gail model has been detailed elsewhere [3].

Table 5 Coding for predictor variables
Table 6 Distributions of the risk factors among the EPIC women by country
Table 7 Distributions of the risk factors among the WHI women by ER status
Table 8 Coefficients for different countries by ER status in the women from the EPIC study
Table 9 Baseline piecewise constant hazard estimates for postmenopausal ER+ tumors and ER- tumors in the women from the EPIC and the WHI studiesa

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, K., Anderson, G., Viallon, V. et al. Risk prediction for estrogen receptor-specific breast cancers in two large prospective cohorts. Breast Cancer Res 20, 147 (2018).

Download citation


  • Breast cancer
  • Risk prediction
  • Estrogen receptor
  • Prospective cohort
  • EPIC
  • WHI