Breast cancer risk prediction and individualised screening based on common genetic variation and breast density measurement

Introduction Over the last decade several breast cancer risk alleles have been identified which has led to an increased interest in individualised risk prediction for clinical purposes. Methods We investigate the performance of an up-to-date 18 breast cancer risk single-nucleotide polymorphisms (SNPs), together with mammographic percentage density (PD), body mass index (BMI) and clinical risk factors in predicting absolute risk of breast cancer, empirically, in a well characterised Swedish case-control study of postmenopausal women. We examined the efficiency of various prediction models at a population level for individualised screening by extending a recently proposed analytical approach for estimating number of cases captured. Results The performance of a risk prediction model based on an initial set of seven breast cancer risk SNPs is improved by additionally including eleven more recently established breast cancer risk SNPs (P = 4.69 × 10-4). Adding mammographic PD, BMI and all 18 SNPs to a Swedish Gail model improved the discriminatory accuracy (the AUC statistic) from 55% to 62%. The net reclassification improvement was used to assess improvement in classification of women into low, intermediate, and high categories of 5-year risk (P = 8.93 × 10-9). For scenarios we considered, we estimated that an individualised screening strategy based on risk models incorporating clinical risk factors, mammographic density and SNPs, captures 10% more cases than a screening strategy using the same resources, based on age alone. Estimates of numbers of cases captured by screening stratified by age provide insight into how individualised screening programs might appear in practice. Conclusions Taken together, genetic risk factors and mammographic density offer moderate improvements to clinical risk factor models for predicting breast cancer.


Introduction
Breast cancer screening aims to detect the disease early in women and thereby reduce mortality from breast cancer. It may not be cost-effective to screen all women equally often, but rather to allocate resources disproportionately across women at different risks of developing breast cancer. To identify high-and low-risk groups, a model for estimating a woman's individual risk is needed. One of the earliest and most widely used risk models for sporadic breast cancer is the Gail model [1]. The model uses the risk factors of current age, age at menarche, age at first live birth, number of previous breast biopsies and first-degree relatives with breast cancer and converts relative risk to absolute risk through use of baseline breast cancer incidence and mortality from other causes. Several studies have assessed the contribution of adding a measure of mammographic density to breast cancer risk prediction models [2][3][4] because mammographic density is one of the strongest risk factors for breast cancer with a high population attributable risk [5].
Over the past decade, several common, low penetrance risk alleles for breast cancer have been identified by genome-wide association studies (GWAS), which has led to a recent increased interest in individualised risk prediction for clinical purposes [6,7]. The potential impact of adding genetic information to the Gail model has been investigated by several researchers [2,[8][9][10]. Gail [2] added seven breast cancer risk-associated single-nucleotide polymorphisms (SNPs) to the standard Gail model and the discriminatory accuracy improved from an area under the receiver operating curve (AUC) of 60% to an AUC of 63%, which was however, less than the improvement found from adding mammographic density to the Gail model. A further 11 independent SNP associations have been recently validated in large GWAS and candidate gene studies, but their importance for risk prediction has not yet been thoroughly investigated [11][12][13][14][15][16][17][18][19][20].
Mealiffe et al. [8] studied prediction models based on the same seven SNPs as Gail, using data from the Women's Health Initiative clinical trial and a wider range of statistical methods. The authors studied changes in risk strata and provided evidence in favour of including genetic information in models for the prediction of breast cancer. Pharoah et al. [10] have further suggested that polygenic risk profiling may already provide sufficient information to justify targeting breast cancer screening to those women at highest risk. Based on a simple analytical strategy, Pashayan et al. [21] recently investigated the implications for individualised screening in England, using the 18 currently established breast cancer risk SNPs. The authors compared the efficiency of an individualised screening approach based on a polygenic profile, with the efficiency of a standard approach to screening, based on age alone.
We investigate the risk prediction performance of the currently established 18 breast cancer risk SNPs, empirically, in a well-characterised case-control study of breast cancer in Swedish women, with data available on mammographic density, BMI and Gail model variables. We evaluate performance of various prediction models by receiver operator characteristic curve analysis and by assessing reclassification of subjects into risk categories. We also evaluate the efficiency of individualised screening by extending the analytical strategy of Pashayan et al. [21] to incorporate non-genetic risk factors and to compare performance of screening programs based on equal resources with different risk-prediction models. Presentation of results stratified by age provides insight into how individualised screening programs might appear in practice.

Data
The individuals/subjects included in the current study are drawn from a population-based case-control study of postmenopausal breast cancer in women born in Sweden aged 50 to 74 years at the time of enrolment, which was between 1 October 1993 and 31 March 1995. Controls were randomly selected from the Swedish register of the total population and were frequency matched to the expected age distribution of the cases. Details on data collection and subjects have been described previously [22]. From the original case-control study, consisting of 3,345 cases and 3,454 controls, breast density measurements were available for 1,780 cases and 1,701 controls. In all, 1,569 breast cancer cases and 1,730 healthy controls, from the original casecontrol study, were included in a genetic study. Among these women breast density measurements were available for 1,022 cases and 868 controls. We carried out our analysis on three subsets: women with complete data on Gail, percentage density (PD) and body mass index (BMI) variables; women with complete data on Gail and SNP variables; and women with complete data on Gail, PD, BMI and SNP variables.
The process of collecting mammographic density in the cases and controls included in this study has been described elsewhere [23]. In short, medio-lateral oblique views were used. For controls, the side was chosen randomly, whereas for cases the side contralateral to the tumour was used. The density resolution was set at 12bit spatial resolution. Cumulus [24], a computer-assisted thresholding technique, was used to assess density on digitised film mammograms. For each image, a (single) trained observer set the appropriate gray-scale threshold levels defining the edge of the breast and distinguishing dense from non-dense tissue. The software calculated the total number of pixels within the entire region of interest and within the region identified as dense. The PD was then calculated from these values (dense area/ total breast area). The images were measured together with approximately the same amount of images for healthy, control women and the reader was blinded to case-control status. A random 10% of the images were included as replicates to assess the intra-observer reliability, which was high with a Spearman rank correlation coefficient of 0.92.
Genotyping was performed at the National University of Singapore. Approval of the study was given by the Institutional Review Boards in Sweden and the National University of Singapore.

Statistical analysis
Gail et al. [1] presented a method to estimate the probability that a woman, with a particular risk profile, in terms of age and other known risk factors will develop breast cancer during a specific time interval. The method can be used to combine case-control data with national registry data. Absolute risk is the probability that a subject who is free of the disease of interest at age a will be diagnosed with that disease in a subsequent age interval (a,a + δ], and can be written as: du} is the probability of surviving competing risks up to age t. In this equation the term S 2 (t)/S 2 (a) corresponds to the conditional probability of surviving other causes from age a to t, and the exponential term corresponds to surviving without breast cancer from age a to age t. At age t, there is an instantaneous probability h 1 (t)r(t)dt of developing breast cancer. The baseline hazard, h 1 (t), is estimated by multiplying age-specific breast cancer incidence rates, h * 1 (t), by a conversion factor equal to one minus the population attributable risk. The age-specific hazard of dying from other causes other than breast cancer is represented by h 2 (t) and is assumed to be the same for all individuals. The population attributable risk is a function of the relative risk model r(t), and can be determined according to an approach described by Bruzzi et al. [25]. In the current article, similarly to as in Gail et al. [1], we work under the simplifying assumptions that h 1 , h 2 and r are constant within five-year intervals. We estimated the age-specific breast cancer incidence rates and hazard of dying from other causes from the Swedish Cancer and Cause of Death registries, respectively [see Table A1 in Additional file 1], and treated these values as known, without error.
The Gail relative risk model [1] incorporates information on the risk factors age at menarche, age at first live birth, number of previous breast biopsies and firstdegree relatives. We did have information on age at menarche and age at first live birth, but used family history (binary) and benign breast disease (binary), respectively, as proxies for number of first-degree relatives and the number of previous breast biopsies. In our risk-prediction models, effect estimates for Gail risk factors [1], PD, BMI and the genetic markers were retrieved from literature, except for the two Gail proxy variables. We estimated the effect sizes of the proxy variables using our own data by fitting a logistic model with both main effects included in a model which included an offset term of combined effect from age at menarche and age of first live birth, based on published effect estimates. We assumed a multiplicative penetrance model for the breast cancer-associated SNPs. In order to provide relative odds of 1.0 or more for disease-associated alleles, where necessary the genotype scores were recoded such that the low-risk homozygote represented the baseline [2]. For SNPs with effect estimates from multiple sources [11][12][13][14][15][16][17][18][19][20], we used the inverse variance method ( [26]; pp.375) to obtain a weighted average of effect estimates from the separate studies.
Mammographic density has been consistently shown to be strongly associated with breast cancer and has previously been considered in breast cancer risk-prediction models [3][4][5]. Due to the strong negative correlation between body size and mammographic density, the effect of density on breast cancer risk is underestimated if body size is not adjusted for. We therefore included BMI, together with PD in our risk prediction models. We used effect sizes obtained externally from a large sample of postmenopausal women, from [27], as estimates of risk (odds ratios) of breast cancer according to percent mammographic density (six categories), adjusted for BMI and as estimates of risk of breast cancer according to BMI (five categories), adjusted for density (Table 1).
We used the Gail approach to estimate the 5-year and 10-year absolute risk of breast cancer based on age and various combinations of genetic and non-genetic risk factors. We evaluated various models for breast cancer risk based on subsets of women with data on (i) Gail, PD and BMI variables, (ii) Gail and SNP variables and (iii) Gail, PD, BMI and SNP variables.
We used the Hosmer-Lemeshow test to assess calibration of the prediction models based on comparing observed and expected outcomes within deciles of estimated risk. As in Mealiffe et al. [8], we first fitted a logistic regression model with a coefficient of one for the logit of the absolute risk to estimate a location parameter to account for the case-control design. We also evaluated Brier scores [see Additional file 2]. To assess discrimination we performed receiver operating characteristic curve analysis, calculating the AUC statistic, along with DeLong's non-parametric interval for AUC, and assessed departure from a model with no diagnostic capacity using the Mann-Whithney U test. We used the non-parametric approach of DeLong et al. [28] to test for differences in AUC.
To assess the ability of a new test to reclassify subjects accurately into higher or lower risk categories, we evaluated the two statistics suggested by Pencina et al. [29] for assessing improvement in model performance accomplished by adding new explanatory variables, the net reclassification improvement (NRI) and the integrated discrimination improvement (IDI). We also examined the predictiveness curve [30].
For the English population Pashayan et al. [21] have evaluated the efficiency of individualised screening strategies for breast cancer based on age and polygenic risk profiles. They evaluated the number of cases potentially detectable, along with the number of women eligible for screening (in the population of women aged 35 to 79 years) based on an individualised screening strategy of screening women aged 35 to 79 years with a 2.5% 10year risk evaluated as a function of age and polygenic profile. Their approach involves inferring points on the predictiveness curve in the population (aged 35 to 79 years) at large. We extended the procedure (see below) to evaluate the potential impact of individualised screening in Sweden. As mammography screening is offered to women aged 40 to 75 years in Sweden, we evaluated the performance of a number of individualised screening approaches against a baseline (age only) strategy of screening all women between aged 40 to 75 years. In Sweden, with its screening strategy, the 10-year absolute risk of breast cancer reaches 2.5% by age 40 years and is thereafter above 2.5% all the way up to age 75 years (absolute risk values derived from Table A1 [see Additional file 1], using (1); data not shown). In addition to a polygenic profile, we also incorporated Gail risk factors and PD in the calculation of individualised risk scores. To calculate individualised risk scores we simulated a population of 100,000 women aged 40 to 75 years, according to the age distribution of the Swedish population. For generating non-genetic risk factors for these women we sampled from our own controls, with replacement. As our data consist of postmenopausal women, for women aged younger than 50 years we were forced to make additional simplifying assumptions. We assumed that these women have the same age conditional risk factor distribution as women aged 50 years. We evaluated different screening strategies based on estimating the proportion of the population that has an individualised risk greater than a given threshold (1.5%, 2% or 2.5%) and the proportion of cases that are expected to occur within the high-risk subgroup. Evaluating at different thresholds enabled us to find screening strategies that use equal resources (% eligible for screening) but are based on different risk-prediction models. We stratified our results in five-year age intervals to shed light on how individualised screening strategies might appear in practice. Although the analytical approach for calculating the proportion of cases captured by screening does not explicitly model the process of evolving risk scores for individual women, dependence of the distributions of the non-genetic risk factors on age is incorporated and age stratification provides valuable insights. All statistical analyses were performed using the free statistical software R [31] and R packages ROCR and PredictABEL.

Results
We examined several models for predicting absolute risk. We examined the effects of including the four Gail variables, with modified variable definitions, which we refer to as the Swe-Gail variables, as well as the effect of including PD and BMI. In all, 18 breast cancer susceptibility loci with common risk alleles have been examined in this study (Table 1); referred to as The18 herein. We also selected out the earlier known subset of seven markers studied by Gail [2], referred to herein as The7.
We first examined the classification abilities of models with and without PD and BMI, but including Swe-Gail risk factors age at menarche, age at first live birth, family history, benign breast disease, in 1,739 cases and 1,672 controls (Table 2). Without PD and BMI we observed an AUC of 0.569 (95% confidence interval (CI) = 0.550 to 0.588), compared with an AUC of 0.602 (95% CI = 0.584 to 0.621) with PD and BMI. The difference in AUCs was statistically significant (ΔAUC = 0.033, P = 1.17 × 10 -7 ). Based on a subset of women with complete data on Gail variables and SNPs, a statistically significant improvement in AUCs was seen when adding The7 to the Swe-Gail model. Improvement was further enhanced when the recently discovered 11 SNPs were added (ΔAUC = 0.018, P = 4.69 × 10 -4 ). Furthermore, a gain in AUCs was observed from including these 11 SNPs when the baseline model also included PD and BMI. We finally selected a subset of women with complete data on Gail variables, SNPs, PD and BMI and compared the performances of the Swe-Gail model and a model additionally including PD, BMI and The18. The latter model, referred to as the full model herein, obtained an AUC of 0.619, improving the AUC by 0.067 (P = 3.24 × 10 -9 ). In this subset, with PD and BMI only we observed on AUC of 0.541 (95% CI = 0.515 to 0.568), with The18 only we observed an AUC of 0.589 (95% CI = 0.563 to 0.614) and with PD, BMI and The18 we observed an AUC of 0.600 (95% CI = 0.575 to 0.626).
The values of absolute five-year risk of breast cancer for the women included in our study, calculated at time of sampling/diagnosis based on the Swe-Gail model and the model additionally containing PD, BMI and The18 are plotted in Figure 1. Complementing the Gail model with PD and The18 increases the spread of the predicted absolute risks. A marked difference in distributions between cases and controls was observed for the full model. The means of the absolute five-year risks were 3.69% and 2.84% for cases and controls, respectively. Of the controls and the cases, 47.9% and 64.8%, respectively, had a five-year absolute risk higher than 2.5%. The difference in distributions between cases and controls was more subtle for the Swe-Gail model.
Assuming three risk categories, we used reclassification tables to compare pairs of models in terms of their assignment of women to low (0,ε 1 ), intermediate (ε 1 ,ε 2 ), and elevated risk categories (ε 2 ,1), based on five-year absolute risk estimates ( Table 3). As cut-off values we choose ε 1 (= 2.41%) to correspond to the first quartile of the estimated risk based on the Swe-Gail model and ε 2 (= 4.11%) to correspond to the third quartile. The NRI value for the comparison of the Swe-Gail model with the full model was 0.170 (Z = 5.750, P = 8.93 × 10 -9 ). In total, 46% of women were reclassified. Reclassification based on the full model was overall in the right direction, with an upward shift in risk categories for cases and a downward shift for controls. The global IDI measure was 0.004 (Z = 5.742, P = 9.33 × 10 -9 ). Using the cut-off values suggested by Mealiffe et al. [8], i.e. ε 1 = 1.5%,ε 2 = 2%, the NRI value for comparing the same two models was estimated to be 0.193 (Z = 8.229, P = 2.22 × 10 -16 ).
Model calibration was assessed using the Hosmer-Lemeshow approach and by calculating Brier scores. All models showed lack of fit; however, lack of model fit does not necessarily limit classification ability based on estimated risks [32]. Results for the Swe-Gail model and the full model are displayed in Tables A2 and A3 [ One way to assess the predictive power of a model is to estimate the proportions of cases that are accounted for by given percentages of the population at the highest risk [30]. For four personalised screening models, we compared the percentage of individuals eligible for screening and the percentage of cases potentially detectable by   screening, for eligible cases, against the (current) approach of screening all women aged 40 to 75 years at three cut-offs of eligibility [see Table A4 of Additional file 1]. The full model with eligibility for screening defined by an absolute risk cut-off value of 2% has a slightly lower level of eligibility than the Swe-Gail model with a 2.5% cut-off for eligibility (74% and 76%, respectively), but has a substantially higher catchment; for the full model 90% of the cases are potentially screen detectable, while for the latter only 85% are potentially detectable. As a consequence of adding SNPs, BMI and PD to the risk-prediction model, resources are more efficiently re-allocated to women with high-risk profiles.
With the aim of comparing screening strategies with equal resources, we also calculated the number of cases captured by screening based on the most efficient ageonly screening program, and based on individualised screening using the full model when confined to including only 76% of women aged 40 to 75 years. The percentage of all cases aged 40 to 75 years covered by the age-only based program was 81%, that is 4% less than the program based on the Swe-Gail model and this was 10% less than the program based on the full risk-prediction model, for which 91% of cases were screened. Results are summarised in Table 4. Age stratified percentages of cases eligible for screening together with the

Swe−Gail,PD,BMI,The18
Predicted risk Observed risk percentages of cases covered by screening, for the three programs with 76% coverage, along with a selection of models presented in Table A4, are presented in Table  A5 [see Additional file 1].

Discussion
In the present study we have investigated the potential gain in combining SNP information with clinical information and mammographic percentage density for the prediction of absolute risk of developing breast cancer, in the Swedish population, utilising the Gail approach [1]. We have examined several models for predicting absolute risk, in particular examining the importance of variables in the Gail model, mammographic percentage density, the seven SNPs studied by Gail [2], and an additional 11 SNPs that have recently been confirmed to be associated with breast cancer risk. We provide evidence that the AUC of the risk-prediction model based on the initial seven breast cancer risk SNPs is improved by additionally including the 11 more recently established breast cancer risk SNPs (P = 4.69 × 10 -4 ). We further show that including mammographic PD, BMI and the 18 SNPs, in the baseline Swe-Gail model, is strongly associated with positive reclassification (NRI = 0.170, P = 8.93 × 10 -9 ). The value of the AUC statistic, for assessing discrimination based on absolute risks calculated from the Gail model, which we observed is low compared with what has been observed in some studies carried out in the US. Rockhill et al. [33] observed an AUC of 0.58 based on the Nurse's Health Study and Gail [2] an AUC of 0.61 based on white women aged 50 years and over from the US National Health Interview Study. We note that for the standard Gail model, the standard deviation of the log relative risk estimated for our samples is lower than the value estimated in Gail [2] (0.34 compared with 0.36) and that generally an increase in variability in risk scores will be associated with an increased AUC value [34]. Studies of the original Gail model have reported AUC values ranging from 0.54 to 0.74, although the values at both ends of this interval have been observed in more 'extreme' populations (0.54 in a cohort of 70 year old and older US women [35] and 0.74 in a study of UK women aged 21 to 73 years from a UK family history clinic [36]).
In our study we observed improvements of 2 to 3% in AUC values as a result of adding mammographic density to risk-prediction models, which is slightly more than the 1% improvement observed by Tice et al. [3]. The increase is likely to be partially due to the good intra-observer reliability of the Cumulus method used for measuring percent density, compared with the BI-RADS method used by Tice et al. [3]; see [37]. Chen et al. [4] estimated an increase in excess of 4%, also using percent density. Mammograms/measurements of PD were available only on slightly fewer than 50% of the individuals in their study; statistical modelling was used to infer PD in the remaining subjects.
We examined the usefulness of the 18 markers on a population level, with respect to screening. Using published effect estimates for the 18 markers and the clinical variables we evaluated several approaches to individualised screening, against age only-based screening, in women aged 40 to 75 years (at different 10-year  risk cut-offs for defining eligibility for screening). We showed for the Swedish female population that a personalised screening approach based on a risk prediction model incorporating age, Gail model variables, PD, BMI and 18 SNPs captures significantly more breast cancer cases than screening approaches using equal resources based on age and Gail model variables and on age alone. The individualised screening strategies investigated here correspond very loosely to a strategy where all women are screened at baseline (e.g. at age 40 years) and at a small number of occasions (e.g. shortly after menopause, and at age 65 years), in order to ascertain personalised risk and between these occasions women are recommended to attend screening at intervals tailored to their personalised risk. In practice, rather than reducing the total number of mammograms, as in Pashayan et al. [21], and in the simulation study herein, individualised screening might in the first line reallocate existing resources unequally across women, according to their risk.
It is now recognised that stratification according to genetic risk scores may improve the efficiency of screening programs [10]. With on-going genotyping efforts by among others, the Breast Cancer Association Consortium [38], it is likely that in the near future the number of established breast cancer risk SNPs will increase markedly, potentially making a polygenic approach to disease prevention a reality [11]. In the future other novel risk factors could potentially be incorporated into the Gail approach, such as steroid hormone levels, more detailed reproductive history and novel measures of mammographic density, such as texture features [39].
The strengths of the present study are the populationbased setting with a high participation rate and the detailed information on key breast cancer risk factors, including mammographic density. To our knowledge this is the first study to assess the prediction performance of the currently established 18 breast cancer risk SNPs empirically.
There are limitations to the present study. Two of the variables in our prediction models varied slightly in definition from those used in the standard Gail model. For these variables we were forced to use internal effect estimates. Any bias in estimating discriminatory accuracy is, however, expected to be negligible. Further limitations are that the study is focused on postmenopausal women and that the family history variable used in our study is very crude. More sophisticated approaches have been described that more specifically describe the nature of the family history [40,41], using for example such variables as number and types of relatives affected with breast cancer (plus the ages at which they developed breast cancer), special risk factors such as BRCA1 and BRCA2 gene mutations and family history of cancers at other sites.
The approach used for assessing efficiency of individualised screening programs is simplistic. It assumes that women being screened are under constant surveillance and that cancer is instantaneously detectable without error. Moreover, our approach was based on further simplifying assumptions, for example that effect sizes are age independent. Related to this particular condition is our assumption that women aged less than 50 years have the same risk distribution of those women aged 50 years. However, in reality the relative risk associated with family history is higher at younger ages [42]. One way to relax our assumption would be to incorporate interaction effects between age and family history. Effects of other risk factors (e.g. breast density) may also vary with age, but the approach becomes unwieldy/estimates become unstable if we account for age-dependent effects of several risk factors. An approach to examining sensitivity of our results to our assumption, which addresses the issue more generally, is to investigate what happens when we increase/decrease the variance of the risk scores in the women aged less than 50 years by a fixed factor. When we increased the variance of the log relative risks by a fixed factor (10% increase) we observed an increase in the percentage of cases screened (approximately 1%) along with a very small increase (less than 1%) in the percentage of individuals eligible for screening, across all three considered prediction models, and advantages of the full model, compared with sub-models, were still observed (data not shown).
More refined approaches for evaluating screening strategies need to be developed and applied. It is important to incorporate breast cancer mortality as well as incidence and to at least partially reflect that breast cancer is a complex disease with a number of subtypes (which receive different treatments) and that patient survival outlooks vary. Accurately predicting an individuals risk of developing and dying from breast cancer remains a challenge. Microsimulation may prove a useful tool for accounting for the complicated processes of disease progression and detection when evaluating the efficiency of screening strategies [43]. Using microsimulation, it would be possible to assess refined strategies, for example, where screening intervals are defined as functions of breast cancer risk and to consider other aspects such as possible over-diagnosis and screening sensitivity.

Conclusions
Taken together, genetic risk factors and mammographic density offer moderate improvements to clinical risk factor models for predicting breast cancer.