Comparative validation of the BOADICEA and Tyrer-Cuzick breast cancer risk models incorporating classical risk factors and polygenic risk in a population-based prospective cohort of women of European ancestry

Background The Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) and the Tyrer-Cuzick breast cancer risk prediction models are commonly used in clinical practice and have recently been extended to include polygenic risk scores (PRS). In addition, BOADICEA has also been extended to include reproductive and lifestyle factors, which were already part of Tyrer-Cuzick model. We conducted a comparative prospective validation of these models after incorporating the recently developed 313-variant PRS. Methods Calibration and discrimination of 5-year absolute risk was assessed in a nested case-control sample of 1337 women of European ancestry (619 incident breast cancer cases) aged 23–75 years from the Generations Study. Results The extended BOADICEA model with reproductive/lifestyle factors and PRS was well calibrated across risk deciles; expected-to-observed ratio (E/O) at the highest risk decile :0.97 (95 % CI 0.51 − 1.86) for women younger than 50 years and 1.09 (0.66 − 1.80) for women 50 years or older. Adding reproductive/lifestyle factors and PRS to the BOADICEA model improved discrimination modestly in younger women (area under the curve (AUC) 69.7 % vs. 69.1%) and substantially in older women (AUC 64.6 % vs. 56.8%). The Tyrer-Cuzick model with PRS showed evidence of overestimation at the highest risk decile: E/O = 1.54(0.81 − 2.92) for younger and 1.73 (1.03 − 2.90) for older women. Conclusion The extended BOADICEA model identified women in a European-ancestry population at elevated breast cancer risk more accurately than the Tyrer-Cuzick model with PRS. With the increasing availability of PRS, these analyses can inform choice of risk models incorporating PRS for risk stratified breast cancer prevention among women of European ancestry. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01399-7.


Introduction
The Breast and Ovarian Analysis of Disease and Carrier Estimation Algorithm (BOADICEA) breast cancer model was originally developed to predict breast cancer risk for women using pedigree-level family history information and genetic testing results on rare pathogenic variants in high and moderate risk genes [1,2]. This model has been updated (version 5.0) to include reproductive and lifestyle factors and the recently developed polygenic risk score (PRS) based on 313 common germline variants [3] for applications in both general and high-risk populations [4,5]. The Tyrer-Cuzick or International Breast Intervention Study (IBIS) model [6], commonly used in clinical and research settings, also includes extensive family history and comprehensive risk factor information and has been updated to include information on PRS. We recently evaluated the performance of the Tyrer-Cuzick model (version 8.0) without PRS in a prospective cohort of women of European ancestry in the general population [7]. Here, we perform a comparative validation of the extended BOADICEA and Tyrer-Cuzick models incorporating the 313-variant PRS in the same prospective cohort.
The original BOADICEA and Tyrer-Cuzick models are considered in clinical guidelines [8,9] for management of women with a family history of breast cancer and have been implemented in user-friendly risk assessment tools that can incorporate PRS (BOADICEA: https://canrisk.org; IBIS: https://ibis.ikonopedia.com). Assessment of disease risk using PRS has been increasingly commercially available through genetic services and marketed to clinicians [10]. Given the widespread use of these models and their capabilities to incorporate PRS with other risk factors in flexible risk prediction tools, comparative prospective validation of these models and their extensions is critical to assess their ability to accurately identify women at different risk levels for risk-stratified screening, surveillance or prevention strategies.
We report results from a prospective comparative validation of the extended BOADICEA model (v.5) with risk factors and 313-variant PRS and Tyrer-Cuzick model (v.8) with the same PRS in the Generations Study, a population-based cohort study of UK women [11].

Methods
Data were used from a nested case-control sample within the Generations Study (2003-2012), a prospective cohort of over 113,000 UK women aged 16-102 years; details are elsewhere [7,11]. The comparative validation analyses of 5-year absolute risk of breast cancer were based on 1337 women aged 23-75 years, including 619 incident breast cancer patients within 5 years from study recruitment, with information on the PRS and the risk factors used in both the BOADICEA (v.5) and Tyrer-Cuzick (v.8) models ( Supplementary Fig. 1). Supplementary Table 1 summarizes the information on questionnairebased risk factors and 313-variant PRS for these women.
To update the original BOADICEA model, the relative risks for the risk factors and PRS were derived using the literature-based approach [3,7]; further details are given in Lee et al. [4]. In this model, the family history association, described by a residual polygenic component, was adjusted to account for the PRS explaining~20% of the breast cancer familial aggregation. The PRS was added to the Tyrer-Cuzick model (v.8) using the approach described in Brentnall et al. [12], where the associations of family history and PRS were unadjusted and assumed to be multiplicative on the risk of developing breast cancer. The comparative validation analyses were performed using the standardized model calibration and discrimination methods implemented in the Individualized Coherent Absolute Risk Estimator (iCARE) tool [13] (details in supplement). Briefly, model calibration was assessed in terms of relative and absolute risk by comparing the observed and expected quantities, overall, and within risk categories. The area under the curve (AUC) was estimated to assess model discrimination.

Results
For women younger than 50 years, the original and extended BOADICEA models (with PRS and with PRS and reproductive/lifestyle factors) showed good calibration of relative and absolute risk ( The original and extended BOADICEA models also showed good calibration of relative and absolute risk for women 50 years or older (Fig. 2) substantial improvements in AUC from 56.8 % (52.9 % − 60.6%) to 64.6 % (60.9 % − 68.2%). Adding risk factors substantially improved the risk discrimination of the original model (data not shown) and the extended model with PRS (Fig. 2). The Tyrer-Cuzick model with PRS had risk discrimination comparable to the extended BOADICEA model with PRS and risk factors; however, the former substantially overestimated risk for women at the highest risk decile [E/O : 1.73 (1.03 − 2.90)]. Overestimation of risk in high-risk deciles was present in models with or without the PRS (Supplementary Fig. 2).

Discussion
Our study shows that the extended BOADICEA model, which incorporated reproductive and lifestyle factors and a 313-SNP PRS to the familial aggregation information, predicted 5-year absolute risk of breast cancer more accurately than the Tyrer-Cuzick model with the same PRS, for women at the highest risk decile in the Generations study, a UK-based prospective cohort.
Previous studies in populations of women of European ancestry provided evidence of overestimation of absolute risk obtained from the Tyrer-Cuzick model without PRS for women in the highest risk decile [7,14]. Two recent studies that incorporated PRSs with fewer genetic variants to this model, showed good calibration in terms of relative risk, but did not evaluate absolute risk calibration [12,15]. Our results showed overestimation of the absolute risk for women at the high-risk deciles, possibly due to not attenuating the contribution of family history association to account for the substantial familial aggregation explained by the PRS. This can lead to inflated breast cancer risks, particularly for women with breast cancer family history who are more prevalent in highrisk deciles. Accounting for the correlation between the PRS and family history would likely reduce this overestimation and future studies are needed to investigate the extent of this reduction.
Strengths of the current analyses include the use of the Generations Study, a relatively recent populationbased cohort with a wide range of ages of participating women and the comparison of two widely used risk prediction tools that can incorporate PRS. With the increasing availability of PRS (e.g., in countries like US), such rigorous comparative evaluation of models incorporating PRS with other risk factor information is critical to assess their suitability in clinical and research applications. Moreover, model calibration was assessed both overall and within risk categories, in particular for women at the extremes of risk for whom prevention and screening are most relevant. The CanRisk tool has already implemented the BOADICEA model and its extensions. The current study provides some evidence of accurate risk predictions from this tool for the UK general population. Further evaluation of this tool in both general and high-risk populations is needed before widespread clinical applications. Moreover, future research is merited towards risk model building and validation for women of non-European ancestry.
To summarize, the extended BOADICEA model with PRS and reproductive/lifestyle factors identified women of European ancestry at elevated 5-year risk of breast cancer more accurately than the Tyrer-Cuzick model with PRS. As disease risk prediction with PRS is becoming more available through genetic services in some countries (e.g., the USA), these and other similar analyses will potentially inform the choice of risk models for developing risk-stratified breast cancer prevention and screening strategies for women of European ancestry.
Additional file 1: Supplementary Materials: Comparative validation of the BOADICEA and Tyrer-Cuzick breast cancer risk models incorporating classical risk factors and polygenic risk in a populationbased prospective cohort of women of European ancestry.Additional details on the definition of study follow-up, sources of genotype data, risk factors in BOADICEA and Tyrer-Cuzick models, model validation methods are given. Supplementary Fig. 1. shows the study design in the validation cohort. Supplementary Fig. 2. shows a comparative validation of Tyrer-Cuzick model with and without PRS in the Generations Study. Supplementary Table 1 shows the risk factor distribution in the Generation Study. Figure 1. Calibration and discrimination of five-year risk predictions of breast cancer for women younger than 50 years in the nested casecontrol sample of the Generations Study cohort with risk categories based on deciles of predicted five-year absolute risk. Figure 2. Calibration and discrimination of five-year risk predictions of breast cancer for women aged 50 years or older in the nested case-control sample of the Generations Study cohort with risk categories based on deciles of predicted five-year absolute risk.