Modeling risks of cardiovascular and cancer mortality following a diagnosis of loco-regional breast cancer

Background Many women with breast cancer also have a high likelihood of cardiovascular mortality, and while there are several cardiovascular risk prediction models, none have been validated in a cohort of breast cancer patients. We first compared the performance of commonly-used cardiovascular models, and then derived a new model where breast cancer and cardiovascular mortality were modeled simultaneously, to account for the competing risk endpoints and commonality of risk factors between the two events. Methods We included 20,462 women diagnosed with stage I–III breast cancer between 2000 and 2010 in Kaiser Permanente Northern California (KPNC) with follow-up through April 30, 2015, and examined the performance of the Framingham, CORE and SCOREOP cardiovascular risk models by area under the receiver operating characteristic curve (AUC), and observed-to -expected (O/E) ratio. We developed a multi-state model based on cause-specific hazards (CSH) to jointly model the causes of mortality. Results The extended models including breast cancer characteristics (grade, tumor size, nodal involvement) with CVD risk factors had better discrimination at 5-years with AUCs of 0.85 (95% CI 0.83, 0.86) for cardiovascular death and 0.80 (95% CI 0.78, 0.87) for breast cancer death compared with the existing cardiovascular models evaluated at 5 years AUCs ranging 0.71–0.78. Five-year calibration for breast and cardiovascular mortality from our multi-state model was also excellent (O/E = 1.01, 95% CI 0.91–1.11). Conclusion A model incorporating cardiovascular risk factors, breast cancer characteristics, and competing events, outperformed traditional models of cardiovascular disease by simultaneously estimating cancer and cardiovascular mortality risks. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01469-w.

well as the cardiotoxicity of many cancer therapies [4]. In order to reduce health disparities in cardio-oncology outcomes, it is essential that we have a better way of risk stratifying women based on their CVD risk and BC treatment history. It is likely that the risk of cardiovascular disease and mortality [4,5], as well as the risk of treatment related cardio-toxic effects in BC survivors is as a result underestimated by providers and women [4,6,7].
Given the expected long-term survival and increase in risk of cardiovascular events and mortality in BC survivors, prospective evaluation of the performance of risk models will help in clinical risk counseling by primary care physicians and oncologists. Currently, there are a number of CVD risk prediction models that have been developed for the general population [8][9][10][11][12]. However, some have been found to have poor performance in persons older than age 65 years, and especially in women [13][14][15]. The performance of these commonly-used CVD models in women with a history of BC has not been evaluated. To address this gap, we examined the performance of established CVD risk models in a non-metastatic BC population. We then developed an enhanced model to more accurately identify combinations of risk factors that may predict an increased risk for CVD versus BC mortality.

Study population and data collection
Our study population included all women who were diagnosed for the first time with stages I to III BC in the Kaiser Permanente Northern California (KPNC) healthcare delivery system from January 1, 2000 to December 31, 2010, with follow-up through April 30, 2015. The KPNC coverage area includes 23 counties in the San Francisco Bay Area and the Central Valley of California [16,17]. We linked data from the KPNC Cancer Registry to the electronic medical record (EMR) and administrative and clinical electronic databases within KPNC to obtain patient demographic, cancer specific, and model covariate information. This study including the analytic data plan was approved by the KPNC Institutional Review Board.
Using the medical record, we defined patient age as age at first diagnosis of BC, race as White, Black, Asian/ Pacific Islander, or Other, and ethnicity as Hispanic or non-Hispanic. Information on BC included type of surgery (lumpectomy, mastectomy (unilateral or bilateral)), stage, grade, lymph nodes (number examined and number positive), estrogen receptor (ER) and progesterone receptor (PR) status, HER2 status, tumor size (cm), and laterality. Information on treatment was also extracted from the medical record, including chemotherapy (within the first year of diagnosis) and hormonal therapy. Treatment variables included whether or not the patient received any radiation, chemotherapy, or hormonal therapy. For chemotherapy we also extracted intravenous medications received, and summarized by drug class. Using data from the medical record, we summarized the patients' other comorbidities based on the modified Charlson Comorbidity Index [18].
We extracted the first recorded measurement of CVD risk factors for this patient population in the time frame of 6 months prior to cancer diagnosis to up to 18 months post diagnosis. We extracted the following information needed for the CVD models: total cholesterol, HDL cholesterol, LDL cholesterol, systolic blood pressure (average of 2 measurements if available), smoking status, diabetes, and whether prescribed any blood pressure lowering medication. We also recorded history of CVD defined as the occurrence of any of the following ICD-9 codes at any time prior to, up through 6 months post, BC diagnosis [19] (ICD-9 codes: 410-414, 428, 431, 432, 434, 435, 440.21). We include both CVD mortality and morbidity.
We created person-time data based on date of diagnosis of BC until either disenrollment from KPNC (defined as > 90-day lapse in enrollment), date and cause of death (available in the KPNC Cancer Registry), or end of follow-up (April 30, 2015), whichever came first. Cardiovascular death also included ICD-10 codes: I00-I99.

CVD model selection
Models were first selected based on a systematic review of the literature [20] and were limited to those that could be validated based on availability of data from the KPNC EMR. A summary of included models can be found in Additional file 1: Table S1. The majority of the models were developed on the Framingham cohort and predicted the endpoint "hard CHD events", defined as coronary death or CHD events (ICD-9 codes 402, 410-414, 429.2, and 429.9, see Additional file: Table S2 for full details). Additional focus was given to any models developed specifically for an older population (age > 65 years), which included the Systematic Coronary Risk Evaluation (SCORE) [8,13] and Coronary Risk in the Elderly (CORE) [14]. Other CVD risk models that could not be validated due to their use of covariates that are not routinely captured in clinical practice included the Reynolds Risk Score, QRISK, PROCAM, and Framingham with additional covariates such as BMI and heart rate [21][22][23][24][25][26].

Statistical methods
We first compared patient and disease characteristics using descriptive statistics and calculated and plotted cumulative incidences of death by cause. In order to assess the performance of the CVD models, individual patient risk scores were calculated based on the original models either by use of score sheets [19] or direct calculation from the model's parameter estimates. We used Cox proportional hazards models to create the Framingham recalibrated model, keeping the established risk factor categories from the 2008 Framingham Model [9] and re-calculating the parameter estimates based on the current cohort.
We summarized discrimination of each model at the model's recommended prediction interval by the area under the ROC curve (AUC), with 95% confidence intervals calculated using the methods of DeLong [27]. Due to the variation in endpoint definition (e.g. morbidity and mortality endpoints) and prediction fixed-time horizons (e.g. 5 year versus 10 year) across models, we used two approaches to calculate the area under the ROC curve (AUC) for each model. The first approach we used was a naïve approach in which we excluded patients who were not followed for the full prediction horizon either due to experiencing a competing cause of death or shorter follow-up time. In this approach, patients who experienced an event after the prediction time frame were considered non-events. The second approach we applied used all patients, and those who did not have complete follow-up for the prediction horizon contributed data only up to the length of time that they were followed, such that competing events are not excluded from the validation sample, but rather censored at their last follow-up time [28]. We used the rmap package in R for this latter approach. For model validation analyses, if a patient was missing any covariate data necessary for the risk calculation, they were excluded from the validation of that specific model.
In creating a new model, we used a multi-state framework that allows for the simultaneous modeling of the competing causes of death from BC, CVD, or all other causes. Each cause of death is considered a state to which the patient can transition after diagnosis, with a given probability based on their individual risk profile. The transitions are modeled with stratified (or cause-specific) Cox proportional hazards models, using the mstate library in R [29][30][31][32]. We considered risk factors from the established CVD models as well as BC characteristics, treatment indicators, age, and history of CVD before BC. In order to include the greatest number of women in our new model, we allowed a "missing" category to be included for each covariate that contained missing EMR data. We included in the final models only risk factors that were statistically significantly associated with the transition of interest. As sensitivity analyses we also created separate Cox proportional hazards models and sub-distribution hazards models [33] for CVD and BC death. Lastly, to examine the impact of missing risk factor data, complete case analyses were run, which included only those observations with non-missing values for all risk factors in the models. We summarized model discrimination by averaging the AUC from 10 bootstrap samples of our original data. We used SAS version 9.4 for Windows (SAS Institute, Cary, NC) and R version 3.1.2 (http:// cran. us.r-proje ct. org/) for all analyses.

Results
There were 20,462 women diagnosed with stage I-III BC between 2000 and 2010 in KPNC ( Table 1). The mean age at diagnosis was 60 years (range 21 to 103 years). The majority of the sample was white (79%) and non-Hispanic (90%). Over half the sample (52%) was diagnosed with stage 1 disease, and nearly all were treated with surgery (57% lumpectomy, 41% mastectomy).
There were 17,773 (86.9%) women who were alive at last follow-up (in 2015 or at the time of censoring). Less than 1% were censored for disenrollment from KPNC, and the median follow-up for survivors was 7.5 years. There were 2729 (13.3%) deaths overall, with 842 (4.1% of the cohort or 38.5% of deaths) due to BC and 696 (3.4% of the cohort or 25.5% of deaths) due to cardiovascular events. Of the 1191 (5.8% of the cohort or 43.6% of deaths) deaths from other causes, the largest subgroups were due to other cancers (n = 321) and unspecified respiratory causes (n = 198). (See Additional file: Table S3 for a summary of the non-fatal cardiovascular events following a BC diagnosis.) In this cohort, there were also differences in cause of mortality by age and stage at BC diagnosis. For those less than 50 years of age at BC diagnosis, and those with stage 3 disease, BC was the leading cause of death (see Fig. 1). For those greater than 70 years at diagnosis with stage 1 and 2 disease, the cumulative incidences of mortality from CVD and all other causes were much higher than those due to BC. Women who were diagnosed in the 50-69 age range also had survival outcomes that were dependent upon their stage at BC diagnosis, with stage 2 and 3 disease having higher incidences of BC death. Table 2 summarizes the discrimination of existing CVD models whose AUCs ranged from 0.64 (Framingham model for hard events evaluated at 10 years) to 0.78 (Framingham model recalibrated to our data set, evaluated at 5 years) in the current cohort. The SCORE OP model, developed specifically for use on an older population (age > 65), did not outperform the models developed on a broader range of age groups (SCORE OP AUC = 0.76, 95% CI 0.73-0.81), nor did the CORE model which was the only model accounting for competing causes of death (AUC = 0.74, 95% CI 0.72, 0.76).
Despite the moderate discriminative ability of the existing models, the O/E ratios show that the models for 5 and 10-year predictions, those predicting fatal events only, and those developed on a European population were not well calibrated to this cohort of BC patients. The majority of the models tended to underestimate the number of events across most deciles of risk, while the recalibrated models over-estimated the number of events. Those with shorter prediction horizons (2-4 years) had better calibration, and may therefore be more useful in this patient population.

Multi-state model
To evaluate whether we could improve upon the previously published CVD models for a cohort of early-stage BC women, we fit a multi-state model including both established cardiovascular risk factors, and BC disease  and treatment characteristics (see Table 3). The corresponding hazard ratios appear in Additional file 1: Table S7. BC characteristics including grade, ER/PR status, tumor size, and nodal involvement were all significantly associated with risk of BC death (p < 0.05), as expected, but also associated with CVD and other-cause mortality as well. Older age, current and former smoking, HDL cholesterol < 35 mg/dl, ER negative disease, and having no surgery for BC were also associated with an increased risk of each cause of death. Prior history of CVD, higher Charlson comorbidity score, and lack of receipt of radiation or chemotherapy were associated with an increased risk of death from CVD and death from other causes, but not death due to BC. Due to small numbers of events within strata, and multivariate endpoint, we were unable to test individual drug classes in this model. In sensitivity analyses, the results from separate Cox models and sub-distribution hazards models for each cause of death, yielded similar results to the multistate model. The complete case analyses also provided similar inferences, though with decreased statistical power (results not shown).

Fig. 1 Cumulative incidence of cause of death, by stage and age at breast cancer diagnosis
To evaluate the performance of the model using the AUC, predicted probabilities for the outcomes of interest were calculated at 3, 5 and 10 years for all patients in the data set (Additional file 1: Table S4). Similar to the established CVD risk scores, the model performs slightly better for the shorter time frame, with AUCs of 0.84 and 0.82 at 3 years versus 0.82 and 0.77 at 10 years, for CVD mortality and BC mortality, respectively, though performance of the new model remains higher overall. The observed to expected number of combined BC and CVD deaths at 5 years predicted by the multi-state model was also quite accurate (O/E = 1.01, 95% CI 0.91-1.11).

Discussion
In this cohort of women diagnosed with breast cancer, the death rates from cardiovascular events and BC were roughly equivalent overall, though younger women were more likely to die of BC than cardiovascular events. Although our modeling was limited to a single health care organization, the KPNC cohort was very similar in age and other demographics to the U.S. SEER18 data (see Additional file: Table S8), with few exceptions (e.g., SEER18 has a higher percentage of Stage 3 (13.7 vs. 8.9) and women 75 + (19.8 vs. 15.5 compared to the KPNC cohort)).
Determining which BC survivors are at the greatest risk for CVD is of paramount importance in treating the entire patient, following her BC diagnosis, particularly given the shared risk factors for BC and CVD coupled with the cardiotoxic effects of many cancer therapies [4]. Yet, commonly used models for predicting CVD have not been prospectively validated in BC patients. Some important limitations of such models are that they were developed on cohorts that were younger and without a history of CVD events or comorbidities. The fact that none have been validated in a BC population is likely due in many settings to the lack of data on CVD risk factors such as blood pressure and lipids, collected around the time of diagnosis. To our knowledge, our study is the first to attempt to validate the use of these models following a loco-regional BC diagnosis. A key conclusion of our study is that an integrated model with both breast cancer characteristics and CVD risk factors improves prediction of CVD after a diagnosis of breast cancer.
We demonstrated that standard CVD prediction models showed moderate discrimination in our cohort,  though performance was widely influenced by the time horizon for prediction of CVD outcome with better performance in shorter time horizons, likely due to the smaller risk of a competing death in that time frame and more complete follow-up of patients. The models performed better using a shorter time window and with more recent risk factor information suggesting that they could be used clinically at regular windows while women are still undergoing BC treatment. The most common BC molecular subtype, hormone receptor positive (HR +) cancers, are typically treated for 5-10 years with endocrine therapies. This means that women are seen regularly for breast cancer treatment and thus our models which performed best in shorter time horizons could be feasible as many women are still being actively followed and treated. Therefore, in evaluating these models, we conclude that they may be useful in most women, particularly after age 65 years when women affected with BC may be more likely to die of CVD than BC [34], and recommend that no more than a 2-4 year prediction window be calculated based on these models. The limited utility of standard CVD prediction models for long-term risk prediction in BC survivors is concerning, especially given that higher CVD risk in BC survivors compared with the general population does not manifest itself until about 7 years after the diagnosis of BC [7].
Changes in definition of the outcome measure (e.g. morbidity versus mortality) across CVD models also present challenges in making comparisons of model performance. While moderate discrimination was seen in both the models predicting mortality and those predicting non-fatal events, it is not clear which endpoint may be more meaningful clinically in a cancer survivor population. For model validation, a mortality endpoint is less subjective, while non-fatal events may not be captured routinely outside of a clinical trial setting. Additionally, in our population, at least 14% of women already have a documented CVD event prior to their BC diagnosis and are likely being treated or monitored for a cardiovascular condition, thus predicting additional non-fatal events may be of less concern.
Another factor contributing to the poor performance of the established cardiovascular models could be differences in risk factor distribution, including missing risk factor data, between the original cohorts in which the models were developed, and our older, contemporary, comorbid cohort, as seen in Additional file: Table S5. For example, in the original Framingham cohort, 62% of women were non-smokers, versus only 36% in our current cohort, with an additional 37% missing smoking status and excluded from the validation analysis. Similarly, blood pressure-lowering medication was not routinely used in the original Framingham cohort, versus 27% of the current cohort having a history of its use, and subsequently, a lower overall distribution of blood pressure compared to the original Framingham women. However, even with recalibration of the Framingham model to our data set (Additional file 1: Table S6), model performance was not improved, likely due to the handling (ignoring) of deaths in the absence of a CVD event. This becomes especially problematic for longer-term prediction horizons, such as 5-10 years. For longer-term prediction it is important to account for competing causes of mortality, thus this question supports the use of a multi-state model that can simultaneously model the competing failure types. Our findings that increased stage of BC at diagnosis was also significantly associated with CVD and allcause mortality indicate that beyond traditional CVD risk factors and increased age. Whether this association between stage and CVD mortality reflects poorer underlying health in general by factors not measured in these models, barriers to health care access and quality (even within a large health care organization), and/ or treatment related factors is unclear. However, what this association does suggest though is that patients with more advanced disease at any age should be aware of their increased risk of mortality from other causes following their primary BC treatment. For women less than age 50 years at BC diagnosis, managing the risk of BC recurrence and mortality remains of primary concern, but those with increased cancer burden may also be at elevated risk of CVD mortality, despite their young age. This could be in part, due to the early and delayed cardio-toxicities of the cancer treatment [4]. For example, the risk of heart failure is shown to increase with increasing cumulative doses of anthracyclines, which have been commonly used to treat earlystage BC for decades (5% versus 48% risk at a dose of 400 mg/m 2 versus 700 mg/m 2 , respectively) (4). Incorporating detailed information about the treatment of BC into CVD risk prediction models, including the type, cumulative dose, and duration of therapy, might considerably improve performance. However, more work is needed in this area, given that we were unable to formally examine the relationship between individual drugs and outcomes using the multi-state model.

Conclusions
We found that many BC prognostic factors were significantly associated with both BC mortality and CVD mortality. However, our data were limited by the fact that many of the women had prior cardiovascular events, may not have had blood lipids measured within the desired time frame of the BC diagnosis, were subject to missing data, such as detailed smoking and chemotherapy information, and were lacking additional known risk factors including BMI, diet and alcohol use. While our multistate model requires further validation from an external source, it showed good discrimination in our cohort and might serve as a first line for identifying subgroups of patients that may be at increased risk of cardiovascular events based on readily accessible covariates.