Skip to main content

Machine learning prediction of pathological complete response and overall survival of breast cancer patients in an underserved inner-city population

Abstract

Background

Generalizability of predictive models for pathological complete response (pCR) and overall survival (OS) in breast cancer patients requires diverse datasets. This study employed four machine learning models to predict pCR and OS up to 7.5 years using data from a diverse and underserved inner-city population.

Methods

Demographics, staging, tumor subtypes, income, insurance status, and data from radiology reports were obtained from 475 breast cancer patients on neoadjuvant chemotherapy in an inner-city health system (01/01/2012 to 12/31/2021). Logistic regression, Neural Network, Random Forest, and Gradient Boosted Regression models were used to predict outcomes (pCR and OS) with fivefold cross validation.

Results

pCR was not associated with age, race, ethnicity, tumor staging, Nottingham grade, income, and insurance status (p > 0.05). ER−/HER2+ showed the highest pCR rate, followed by triple negative, ER+/HER2+, and ER+/HER2− (all p < 0.05), tumor size (p < 0.003) and background parenchymal enhancement (BPE) (p < 0.01). Machine learning models ranked ER+/HER2−, ER−/HER2+, tumor size, and BPE as top predictors of pCR (AUC = 0.74–0.76). OS was associated with race, pCR status, tumor subtype, and insurance status (p < 0.05), but not ethnicity and incomes (p > 0.05). Machine learning models ranked tumor stage, pCR, nodal stage, and triple-negative subtype as top predictors of OS (AUC = 0.83–0.85). When grouping race and ethnicity by tumor subtypes, neither OS nor pCR were different due to race and ethnicity for each tumor subtype (p > 0.05).

Conclusion

Tumor subtypes and imaging characteristics were top predictors of pCR in our inner-city population. Insurance status, race, tumor subtypes and pCR were associated with OS. Machine learning models accurately predicted pCR and OS.

Introduction

Breast cancer is a complex disease with highly heterogeneous tumor characteristics and clinicopathological profiles [1]. Predicting response to neoadjuvant chemotherapy and overall survival in breast cancer patients remains a crucial challenge for disease management. In addition, racial, ethnic, and socioeconomic disparities could also influence breast cancer outcomes [2, 3], highlighting the need for diverse and inclusive datasets to develop more accurate predictive models.

Molecular subtypes of breast cancer exhibit distinct clinicopathological profiles [4]. These subtypes have varying responses to different treatment modalities, emphasizing the importance of tailoring therapy based on tumor subtype [5, 6]. Incorporating molecular subtype information into predictive models helps better predict treatment response and overall survival, guiding clinicians in making informed decisions. Racial and ethnic groups differ in their prevalence of tumor subtypes, which could contribute to inconsistent prognoses [7]. Most breast cancer clinical trials also lack racial and ethnic diversity, with Blacks and Hispanics largely underrepresented, presenting a barrier to precision medicine for these populations [7, 8]. Moreover, socioeconomic status could also affect outcomes.

Tumor characteristics, clinicopathological profiles, patient profiles, and other variables interact, making it challenging to identify independent risk factors that predict outcomes. Recent advancements in machine learning predictive modeling have shown promise in addressing this challenge [9, 10] because machine learning can deal with complex datasets without the need to specify a priori the complex relationship among the large number of variables. These models leverage algorithms that learn patterns from a vast array of patient data [9, 10], including demographic information, histopathological features, treatment regimens, molecular profiles, and socioeconomic factors. By harnessing the power of machine learning, robust and accurate models that integrate diverse populations and tumor subtypes can be developed, aiding in personalized medicine for breast cancer patients. However, machine learning also has the potential to exacerbate racial and ethnic disparities with imbalanced representation of demographics [11].

Pathologic complete response (pCR) serves as a surrogate marker for neoadjuvant treatment efficacy in breast cancer patients [12,13,14]. Achieving pCR, defined as the absence of invasive carcinoma in the breast and axillary lymph nodes following neoadjuvant treatment, is associated with improved overall survival (OS) [12,13,14]. Accurate prediction of pCR can guide treatment decisions, potentially sparing patients from unnecessary interventions or identifying those who may require additional therapies [15]. Overall survival reflects the long-term outcomes and effectiveness of treatment strategies [16]. Identifying predictors of OS can assist in tailoring escalating therapy or follow-up intervals toward discrete risk factors. Machine learning predictive models offer the potential to integrate large number of clinical, pathological, molecular data and socioeconomic factors to provide personalized treatments to improve pCR and OS for individual patients [17].

The goals of this study were to employ four machine learning models to identify key risk factors among a large array of clinicopathological, tumor subtypes, insurance status, income, and imaging characteristics from a diverse racial, ethnic, and socioeconomic status and to predict pCR and OS at 7.5 years after diagnosis in breast cancer patients. Four machine learning models were employed to predict pCR and 7.5-year OS.

Methods

Data sources

This retrospective study was approved by our IRB (institution removed for blinded review but can be identified if needed) with waived informed consent (2020-12169). The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines. The patient cohort comprised of all patients diagnosed with invasive breast cancer within our institution’s health system which serves an inner-city urban population between 01/01/2012 and 12/31/2021 and treated with neoadjuvant chemotherapy followed by surgery in our health system. Data were obtained from the cancer registry of our institution and via chart review of the electronic medical records and radiology reports. There were 509 patients and 34 patients excluded due to missing pCR outcome, with a final sample size of 475. Missing non-MRI data averaged 3.1%. Patients with missing data were excluded from ML modeling. Only 240 patients had MRI reports describing all relevant imaging elements. The sample size for each analysis is provided in respective tables and figures.

The clinicopathological data included age, race (White, Black, Asian, others), ethnicity (Hispanic, non-Hispanic), clinical tumor (T) and nodal (N) stage by TNM staging, Nottingham grade (Nottingham Grade 1 (well differentiated), 2 (moderately differentiated), 3 (poorly differentiated)), tumor subtypes (ER−/HER2+, ER+/HER2+, ER+/HER2−, and triple negative), and radiological data from MRI radiology reports (background parenchymal enhancement, tumor size, multifocal lesions, skin involvement, satellite lesions, pectoralis muscle involvement, lymph node involvement, chest wall involvement, nipple involvement, and multicentric lesions). In addition, income quintiles and insurance status were also tabulated. The primary outcomes were pCR and OS.

Logistic regression

Logistic regression was performed to compute the odds ratios (ORs) of risk factors associated with outcomes (N = 475). Inputs for pCR ORs included demographics, tumor subtypes, insurance status, and income quintile. Inputs for overall survival ORs included demographics, tumor subtypes, insurance status, income quintile, and pCR status. Insurance status included private, Medicaid, Medicare. Self-pay and uninsured status amounted to < 1% of sample size and were not included in OR calculation.

Predicting pCR and OS

Four predictive models, Logistic Regression, Neural Network (NN), Random Forest (RF), and Gradient Boosted Regressor (GBR), were created to predict pCR in patients who received neoadjuvant chemotherapy.

Multivariate logistic regression was used as a baseline for comparison. The solver, or the algorithm used by the LR model for optimization, was newton-cg which uses the second-order Taylor Series to create an approximation for gradient optimization [18].

For Neural Networks we used a fully connected feed-forward neural network with one hidden layer and one output layer [19]. The hidden layer contains 32 neurons, activation function of ReLU, and l2 regularization with regularization factor of 0.01. The NN utilizes a mean squared error loss function and Adam optimizer with a learning rate of 0.01.

A Random Forest Algorithm was utilized with a max depth of 1 for the univariate analysis, and max depth of 10 for multivariate analysis to limit overfitting [20]. The algorithm creates multiple decision trees to create a more holistic and better result when it comes to multivariate analysis.

Gradient Boosted Regression utilizes the Boosting ensembling technique which combines multiple weak learners, which in this case is a regression model, and ensembles them together to create a strong learner, or a stronger regression model [21]. In our model, we utilized a max depth of 1, 50 estimators, and a learning rate of 0.001 for the univariate analysis, and a max depth of 3, 100 estimators, and a learning rate of 0.1.

Hyperparameter tuning was conducted using the grid search method. For the neural networks, the grid search algorithm combined powers of 2 for the number of neurons and powers of 10 for the learning rate. For the Random Forest, the grid search algorithm combined numbers from 1 to 50 for the max depth. For the Gradient Boosted Regression algorithm, the grid search algorithm combined numbers from 1 to 100 for the depth and estimators and powers of 10 for the learning rate.

These analyses were conducted using Python, specifically the TensorFlow library for the neural networks and the sklearn library for RF, Logistic Regression, and GBR models. An 80/20 train validation split was utilized with fivefold cross validation [22, 23]. Performance metrics (such as AUCs) were reported for test (validation) sets only using fivefold cross validation from which mean ± SD were obtained. A 50% probability threshold was used to calculate sensitivity/specificity. 95% confidence interval was chosen.

Data used to predict pCR included demographics, clinical staging, tumor subtypes, and MRI data. Data used to predict OS included demographics, tumor subtypes, clinical staging, tumor subtypes, MRI data, and pCR status. OS was determined to be the proportion of patients alive 7.5 years after diagnosis. Insurance status and income quintiles were not used. The top 10 predictors were identified and used to evaluate performance indices.

Kaplan–Meier survival analysis

Kaplan–Meier survival analysis for patients with breast cancer was performed with stratification of pCR status, tumor subtypes, insurance, race, ethnicity, and income quintiles. For race and ethnicity, outcomes were also sub-stratified by pCR status.

Statistical analysis

Χ2 tests were performed using R Studio (version 3.1). Logistic regression analysis used R studio or Python (version 3.10.9) for identifying risk factors and for predicting outcomes. Hazard ratios were obtained using Cox-regression analysis using R studio and Kaplan–Meier curves were generated using Python. ANOVA was used for comparison with three or more groups. A p < 0.05 was used to indicate statistical significance unless otherwise specified.

Results

pCR

Patient profiles stratified by pCR status are summarized in Table 1. pCR was not significantly associated with age (≥ 50yo vs < 50yo (p = 0.20), race (p = 0.87) or ethnicity (p = 1.0), T-stage (p = 0.09), N stage (p = 0.31), and Nottingham grade (p = 0.09), but was significantly associated with tumor subtype (p < 0.001), with ER−/HER2+ (56.5%) having the highest pCR rate, followed by triple negative (31.0%), ER+/HER2+ (23.0%) and ER+/HER2− (8.5%).

Table 1 Patient profiles by pCR status (N = 475)

pCR was significantly associated with tumor size (p = 0.003), with tumors > 5 cm having a lower pCR rate (11.1%) compared with tumors measuring ≤ 2 cm and 2–5 cm (34.2% and 35.1%, respectively). Mild BPE had the highest rate of pCR (35.6%) followed by moderate (24.1%) and marked (0.0%) (p < 0.03) BPE. This was unexpected and we further investigated and found that patients with marked BPE consisted of mostly of ER+/HER2− and ER+/HER2+, and larger tumor size, and poorer differentiation. Income and insurance status were not significantly associated with pCR (p > 0.05).

Table 2 shows the composition and pCR for different race and ethnicity grouped by tumor subtypes. Blacks had higher composition of triple negative (p < 0.05), lower composition of the ER+/HER2+ and ER+/HER2− subtypes. There was however no significant difference due to race (p > 0.05) nor ethnicity (p > 0.05) for each tumor subtype. Note that there were high proportion of HER2 positive breast cancers and relatively low proportion of ER/PR positive cases because ER/PR positive patients are known to be less responsive to neoadjuvant chemotherapy and they were not given neoadjuvant chemotherapy. When data were modeled individually for each of the four tumor subtype groups (Additional file 1: Table S1), radiographic tumor size and BPE were highly ranked among predictors, but T-stage ranked lower as a predictor of pCR.

Table 2 Percent of patients and pCR for race and ethnicity grouped by molecular subtypes (N = 475), (B) AUCs for all four univariate models across all 4 tumor subtypes

Table 3 shows the odds ratios for achieving pCR. Race (p > 0.05) and ethnicity (p > 0.05) did not contribute to different odds of achieving pCR. ER+/HER2− had the lowest likelihood of achieving pCR (OR = 0.085, [0.037,0.194], p < 0.0001), followed by triple negative (OR = 0.406, [0.219,0.754], p = 0.004), and ER+/HER2+ (OR = 0.285, [0.137,0.593] p < 0.0001) compared to ER−/HER2+. Patients in different income quintiles, except with the 1st quintile, and patients with insurance status did not have lower odds of achieving pCR.

Table 3 Odds ratios for pCR as outcome for demographics, tumor subtypes, pCR, income quintile, and insurance status (N = 433 out of 475)

Table 4 summarizes the results of four different ML models. All 4 models consistently ranked ER+/HER2−, ER−/HER2+, radiographic tumor size, and BPE as top predictors, but ER+/HER2+ and triple negative were not top predictors. Accuracy ranged from 0.697 to 0.731, specificity 0.736 to 0.890, sensitivity 0.555 to 0.799 and AUC 0.743 to 0.755.

Table 4 Multivariate analysis of top predictors of pCR (N = 240)

Overall survival

Figure 1 shows the Kaplan–Meier survival analysis (N = 475). Patients who achieved pCR had a clear survival benefit compared to those who did not (HR = 0.1898, 95% CI (0.08275–0.4354), p < 0.0001) (Fig. 1A). Tumor subtype was significantly associated with survival probability (Fig. 1B). Patients with triple negative disease were significantly more likely to die than those with ER+/HER2+ (HR = 0.3109, 95% CI: [0.1468–0.6582], p = 0.0023), ER+/HER2− (HR = 0.4020, 95% CI: [0.2245–0.7198], p = 0.0022) and ER−/HER2+ (HR = 0.5077, 95% CI: [0.2487–1.0363], p = 0.0626). There were no significant differences in survival probability among ER−/HER2+, ER+/HER2+, and ER+/HER2− subtypes (p > 0.05). Insurance status was significantly associated with survival probability (Fig. 1C). Patients on Medicaid (HR = 3.29 95% CI: [1.39–7.77], p = 0.007) and Medicare (HR = 6.93 95% CI: [2.824–17.01], p < 0.0001) showed higher odds of mortality compared to those on private insurance. Note that patients on Medicare were significantly older (p < 0.05 ANOVA). There were no significant differences in survival probability by race (p > 0.05) and ethnicity (p > 0.05) when stratified by pCR (Fig. 1D, E). There was no significant difference in survival probability income status (p > 0.05) (Fig. 1F).

Fig. 1
figure 1

Kaplan–Meier survival curves for patients with breast cancer (N = 475) by pCR status, tumor subtypes, insurance status, race, ethnicity, and income. Patients belonging to Asian and “other” race (n = 19 and n = 26, respectively) were grouped together for comparison with white (n = 136) and Black (n = 233) races. The median time to last contact was 3.83 years (IQR: 2.13–6.46). All patients had followed up with a recorded date of last contact, among whom there were 85 events

Patient profiles for OS at 7.5 years after diagnosis are summarized in Table 5. OS was significantly lower for those who were ≥ 50yo compared to < 50yo (p = 0.04). OS was significantly associated with T-stage (p < 0.0001), N stage (p < 0.001), and race (p = 0.03), but not ethnicity (p = 0.53) or Nottingham grade (p = 0.92). OS was significantly associated with tumor subtype (p < 0.001), with triple negative having the lowest survival (72.5%). OS was not significantly associated with BPE (p = 0.89), tumor size (p = 0.15), and income (p = 0.39), whereas OS was significantly associated with pCR (p < 0.0001), and insurance status (p = 0.0002) by χ2 analysis.

Table 5 Patient profiles by OS status at 7.5 years (N = 475)

Table 6 shows OS at 7.5 years for different races and ethnicities grouped by tumor subtypes. Blacks vs whites showed no differences in OS for any subtypes (p > 0.05). Hispanics vs non-Hispanics also showed no differences in OS across any subtypes (p > 0.05); however, non-Hispanic patients with triple-negative subtype were significantly less likely to survive (p < 0.05).

Table 6 OS at 7.5 years for race and ethnicity grouped by tumor subtypes

Table 7 shows the OS odds ratios for demographics, tumor subtypes, pCR, income quintile, and insurance status. Blacks and Asians had worse survival ORs compared to Whites (p < 0.05). Triple negative had worse OR compared to ER−/HER2+ (p = 0.025). The other subtypes showed no worse odds of OS compared to ER−/HER2+ (p > 0.05). OS was not associated with income quintiles, but patients on Medicaid and Medicare had worse ORs compared to those on private insurance. As noted above patients on Medicare were significantly older (p < 0.001, ANOVA).

Table 7 Odds ratios for OS at 7.5 years for demographics, tumor subtypes, pCR, income quintile, and insurance status (N = 433 out of 475)

Table 8 summarizes the results of the ML models. The top 10 predictors were similar for all 4 models, with high accuracy, specificity, and accuracy. AUC ranged from 0.84 to 0.85. Note that these models which included MRI data performed better than those that did not include MRI data.

Table 8 Multivariate results of top predictors of OS for all 4 models utilizing top 10 predictors including MRI data (N = 240)

Discussion

This study employed multiple machine learning models to predict pCR and OS using patient demographics, clinicopathologic tumor characteristics, and MRI radiology report data from a diverse racial and ethnic patient population, many of whom had lower socioeconomic status. The major findings are: (1) pCR is associated with tumor stage, and tumor size and BPE, but not race, ethnicity, income quintile, and insurance status, (2) ER−/HER2+ has the highest pCR rate, followed by triple negative, ER+/HER2+ and ER+/HER2−, (3) all 4 machine learning models consistently rank ER+/HER2−, ER−/HER2+, radiographic tumor size, and BPE as top predictors of pCR (AUC = 0.74–0.76), (4) OS is associated with pCR status, tumor subtype, tumor stage, some MRI data, and insurance status, race and ethnicity. All 4 models consistently rank ER+/HER2−, ER−/HER2+, radiographic tumor size, and BPE as top predictors of OS (AUC = 0.83–0.84), (5) pCR, and certain tumor subtype, and private insurance status are associated with higher survival probability, (6) when grouping race and ethnicity by tumor subtypes, neither pCR nor OS outcomes was different due to race and ethnicity for each tumor subtype.

pCR

Studies evaluating associations between pCR and race and ethnicity have reported conflicting results [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]. Studies utilizing data from the National Cancer Database (NCDB), have demonstrated lower pCR rates in Black women with triple negative or HER2+ disease. [27, 34, 40]. However, as the NCDB does not capture specifics of treatment, these findings may reflect disparities in access to or quality of care between Black and white patients. Retrospective evaluations of patients treated at single institutions, or treated on multi-institution clinical trials, who likely received more uniform care, have shown differing results, with some demonstrating no association between race and pCR [25, 26, 33, 38], and others showing poorer outcomes for Black women [35]. Differences in treatment could account for the differing outcomes in the various studies. Knisley et al. found that white women more likely completed the recommended course of NAC treatment than did African American women [41]. Two studies by Griggs et al. demonstrated that black women with early stage breast cancer are more likely to receive substandard dose of chemotherapy, lower relative dose intensity, dose reductions in a treatment cycle, and delay in start of chemotherapy relative to white women [42, 43]. Black patients experience a higher rate of cardiotoxicity compared to white patients with adjuvant HER2-targeted therapy, resulting in incomplete treatment [44]. Enhanced cardiac surveillance, cardioprotective strategies, and early referral to cardiology when appropriate may be of benefit [44]. A prospective study where Black breast cancer patients received the same care as white breast cancer patients demonstrated equivalent disease specific survival, illustrating that equal outcomes between Blacks and whites are achievable when treatment disparities are eliminated [45].

We found no evidence that pCR was associated with race or ethnicity per se in our healthcare system, after adjusting for covariates such as tumor subtypes. Instead, our data showed strong association between tumor subtypes and pCR, consistent with multiple other prior studies [46,47,48,49,50,51,52], with ER−/HER2+ tumors exhibiting highest pCR rates, followed by triple negative, ER+/HER2+ and ER+/HER2−. When grouping race and ethnicity by tumor subtypes, Blacks showed a higher composition of triple negative as expected but there were no differences in pCR due to race for each tumor subtypes. Our results might reflect relatively equal healthcare access and treatment for breast cancer across the spectrum of racial groups in our healthcare system. Larger multi-center studies are needed to confirm these findings. Tumor differentiation, as expressed by Nottingham grade, was not predictive of pCR. Previous studies have reported both with and without association between tumor grade and pCR [47,48,49,50,51,52]. Insurance status and income quintile were not significantly associated with pCR.

Tumor biology consistently emerges as a factor linked to pCR [53, 54]. Previously reported racial disparities in survival may be due to factors which are potentially inter-related and would therefore be difficult to isolate from one another, such as socioeconomic differences, differences in insurance, and differences in treatment. Facilitating health care access and standardizing treatments across racial groups would help in eliminating such disparities.

MRI tumor size and BPE at presentation were significant predictors of pCR. Smaller tumor size was associated with higher pCR rates, suggesting that early detection and intervention may contribute to improved treatment outcomes. This is in keeping with several other studies [56,57,58]. Qian et al. found lower T scores and smaller tumor size correlated with higher pCR rates [54]. Goorts et al. reported lower T stages had significantly higher pCR and found that cT3/cT4 were independent risk factors for decreased pCR [55]. Another study found tumor size greater than 5 cm had a lower likelihood of pCR and that receptor status had the greatest impact on pCR, though both receptor status and tumor size were important [56]. They also saw no significant relationship between tumor size and receptor status [56]. Of note, in machine learning analysis, tumor size was consistently predictive for pCR but tumor stage was less predictive. This discrepancy could be because between tumor size by longest diameter was obtained from radiology report, which was a coarse measurement by a radiologist in a clinical setting. Mild BPE at presentation also correlated with higher pCR rates, indicating that the absence of extensive benign tissue may facilitate treatment response. These findings underscore the potential importance of imaging features in predicting treatment outcomes. There is no consensus in the literature on the association between BPE and pCR. One study showed BPE may be associated with pCR in limited circumstances, and another study showed BPE was associated with lower pCR in HR+/HER2− breast cancer patients [24, 31]. While tumor subtypes are invariant for each patient, tumor size, BPE and other imaging characteristics are modulated by treatment across time; thus, the temporal evolution of imaging characteristics can provide additional and useful data to predict outcomes.

Four machine learning models consistently identified and ranked ER+/HER2−, ER−/HER2+, tumor size, and BPE as the top predictors of pCR, followed by Nottingham grade, nodal and tumor staging. This convergence among the models reinforces the significance of these variables in predicting treatment response. Given the small sample size, there were not sufficient data to vigorously test which machine learning model was superior. Although many prior studies have reported the predictive value of tumor subtypes for pCR, the accuracy of these predictions based on tumor subtypes alone ranged from modest to moderate [57, 58]. Our patient cohort is unique due to its diversity, lower socioeconomic status, and a higher prevalence of triple-negative cancer. Our institution is a National Cancer Institute designated cancer center university hospital where patients had access to clinical trials and state-of-the-art treatment, which may also explain why race was not a factor in pCR.

Finally, we noted that the addition of MRI data to the model outperformed prediction of pCR without using MRI data. Higher AUC was similarly achieved in a prior study by combining clinical and imaging data in predicting pCR with ML from a public dataset [37, 59]. Accurately determining which breast cancer patients are likely to respond to neoadjuvant chemotherapy can aid in targeting type and dosing of medications to likely responders while minimizing unnecessary treatment to non-responders to maximize favorable outcomes.

Overall survival

OS have also been reported to be worse in minority and underserved populations [60, 61]. Reeder-Hayes et al. after adjusting for age, comorbidities, disease characteristics including type of locoregional therapy, and neighborhood poverty, found that Black women were 25% less likely to receive monoclonal antibody treatment than white women among Medicare beneficiaries with stage I to III HER2+ diagnosed in 2010 and 2011 [62]. We found significant differences in OS due to race but not ethnicity by logistic regression analysis. OS was also significantly associated with tumor subtypes, with triple-negative subtype exhibiting worse OS, emphasizing the aggressive nature of this subtype and the need for targeted treatment approaches. In addition to race and molecular subtypes, access to care and other factors could contribute to different outcomes. OS was also significantly associated with pCR, consistent with multiple clinical trials demonstrating improved breast cancer outcomes in patients who achieve pCR, with prognostic value greatest for aggressive tumor subtypes [47,48,49,50,51,52, 63].

When grouping race and ethnicity by tumor subtypes, we found no differences in OS that were due to race and ethnicity. This is consistent with the observation that OS was not associated with race after stratified by pCR (Fig. 1), corroborating that tumor subtypes play a more important role in pCR, than race and ethnicity per se.

Patients on Medicare and Medicaid had worse OS outcomes than those on private insurance. Medicare patients were generally older, which could have contributed to worse outcomes. This is in accordance with prior studies that have shown disparities in outcomes based on insurance. Avanian et al. showed that women without insurance or with Medicaid had worse overall survival with 49% and 40% higher risk of death, respectively [64]. Underinsured women may not be able to access ancillary services that have been shown to improve breast cancer outcomes, such as exercise programs, nutrition courses, and psychotherapy [65,66,67]. Additionally, insurance may be reflective of socioeconomic factors that may influence both oncologic and non-oncologic outcomes, such as medical insight, income, healthcare access, and nutritional status [68,69,70].

Our data showed OS was not associated with income quintiles. Several studies have reported associations between expansion of Medicaid coverage and improved survival in cancer patients, and other studies have found that greater levels of financial toxicity predict for poorer oncologic outcomes [71,72,73]. Association of income inequalities with increased mortality has been noted in other studies [74, 75]. One study revealed excess mortality hazard for breast cancer to be lower for individuals in higher income quintiles in their study population after adjusting for age, education, and occupation [76].

Machine learning models consistently identified and ranked tumor size, nodal stage, and pCR as the top predictors of OS, following by some tumor subtypes. A meta-analysis of 21 studies showed that the number of circulating tumor cells detected before NAC in early breast cancer patients was markedly associated with tumor size and had a detrimental effect on overall survival and on distant disease-free survival but was not associated with receptor status or pCR [77]. This suggests that the tumor size and tumor microenvironment exert a significant effect on outcomes independent of receptor status and pCR [77].

The addition of MRI data outperformed predictions of OS without MRI data. This highlights the potential of MRI data as non-invasive tools to support treatment decision-making and improve prognostic accuracy. Note that some tumor subtypes and tumor size (but not clinical staging) were top independent predictors of pCR, whereas tumor size and clinical stage (but not tumor subtypes) were top independent predictors of OS. These could be due to the potential interaction among different variables. Note that pCR is not the top predictor of OS (only among top 3 to 5, depending on models). Machine learning approaches offer a means to account for covariates and interactions among variables.

Limitations

There are several limitations. Our sample sizes are small when stratified further by molecular subtypes; and thus, those findings need to be interpreted with caution. The sample size for MRI radiology reports was small. We utilized MRI radiology reports as inputs rather than the actual images themselves. Future investigations could incorporate deep learning analysis of breast cancer images, which may further improve prediction accuracy and provide additional insights. Our cohort had small proportion of Caucasians which could contribute to difference in findings with literature. Our cohort had small proportion of Caucasians which could contribute to difference in findings with literature.

Income data were based on zip codes and individual patients’ status might be different for those based on zip codes. Some patients might have multiple insurance and we only used the primary insurance in our analysis. Attrition rate due to relocation could result in missing mortality data; and thus, it is possible some patients might have expired that were not accounted for.

Changes in neoadjuvant therapies and post neoadjuvant treatment may impact the validity of predictive models but they were not accounted for in predictive models. For example, the addition of immunotherapy to neoadjuvant chemotherapy for triple-negative breast cancer was not the standard of care at the time that our patient cohort was treated; therefore, our findings may not be generalizable to patients treated with immunotherapy. Axillary lymph node data have been used to predict PCR and OS [78,79,80].

Conclusion

This study employed multiple machine learning models to predict pCR and survival in a racially and ethnically diverse patient population from an underserved inner-city community. Incorporating imaging data alongside tumor subtypes enhances the accuracy of predictions. Race, but not ethnicity, and insurance status, but not incomes, were associated with worse survival. These findings have implications for personalized cancer treatment strategies and emphasize the need for further research in cancer treatment outcomes with respective to health disparity.

Availability of data and materials

Data are available on reasonable request. De-identified data used during the study are available from the corresponding author upon reasonable request.

References

  1. Turashvili G, Brogi E. Tumor heterogeneity in breast cancer. Front Med (Lausanne). 2017;4:227.

    Article  PubMed  Google Scholar 

  2. Silber JH, Rosenbaum PR, Ross RN, Reiter JG, Niknam BA, Hill AS, Bongiorno DM, Shah SA, Hochman LL, Even-Shoshan O, et al. Disparities in breast cancer survival by socioeconomic status despite medicare and medicaid insurance. Milbank Q. 2018;96(4):706–54.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Yedjou CG, Sims JN, Miele L, Noubissi F, Lowe L, Fonseca DD, Alo RA, Payton M, Tchounwou PB. Health and racial disparity in breast cancer. Adv Exp Med Biol. 2019;1152:31–49.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Park S, Koo JS, Kim MS, Park HS, Lee JS, Lee JS, Kim SI, Park BW. Characteristics and outcomes according to molecular subtypes of breast cancer as classified by a panel of four biomarkers using immunohistochemistry. Breast. 2012;21(1):50–7.

    Article  PubMed  Google Scholar 

  5. Carey LA, Dees EC, Sawyer L, Gatti L, Moore DT, Collichio F, Ollila DW, Sartor CI, Graham ML, Perou CM. The triple negative paradox: primary tumor chemosensitivity of breast cancer subtypes. Clin Cancer Res. 2007;13(8):2329–34.

    Article  CAS  PubMed  Google Scholar 

  6. Desmedt C, Haibe-Kains B, Wirapati P, Buyse M, Larsimont D, Bontempi G, Delorenzi M, Piccart M, Sotiriou C. Biological processes associated with breast cancer clinical outcome depend on the molecular subtypes. Clin Cancer Res. 2008;14(16):5158–65.

    Article  CAS  PubMed  Google Scholar 

  7. Kong X, Liu Z, Cheng R, Sun L, Huang S, Fang Y, Wang J. Variation in breast cancer subtype incidence and distribution by race/ethnicity in the United States from 2010 to 2015. JAMA Netw Open. 2020;3(10): e2020303.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Aldrighetti CM, Niemierko A, Van Allen E, Willers H, Kamran SC. Racial and ethnic disparities among participants in precision oncology clinical studies. JAMA Netw Open. 2021;4(11): e2133205.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Deo RC, Nallamothu BK. Learning about machine learning: the promise and pitfalls of big data and the electronic health record. Circ Cardiovasc Qual Outcomes. 2016;9(6):618–20.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Mahoro E, Akhloufi MA. Applying deep learning for breast cancer detection in radiology. Curr Oncol. 2022;29(11):8767–93.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Huang J, Galal G, Etemadi M, Vaidyanathan M. Evaluation and mitigation of racial bias in clinical machine learning models: scoping review. JMIR Med Inform. 2022;10(5): e36388.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Pennisi A, Kieber-Emmons T, Makhoul I, Hutchins L. Relevance of pathological complete response after neoadjuvant therapy for breast cancer. Breast Cancer (Auckl). 2016;10:103–6.

    CAS  PubMed  Google Scholar 

  13. Sahoo S, Lester SC. Pathology of breast carcinomas after neoadjuvant chemotherapy: an overview with recommendations on specimen processing and reporting. Arch Pathol Lab Med. 2009;133(4):633–42.

    Article  PubMed  Google Scholar 

  14. von Minckwitz G, Untch M, Blohmer JU, Costa SD, Eidtmann H, Fasching PA, Gerber B, Eiermann W, Hilfrich J, Huober J, et al. Definition and impact of pathologic complete response on prognosis after neoadjuvant chemotherapy in various intrinsic breast cancer subtypes. J Clin Oncol. 2012;30(15):1796–804.

    Article  Google Scholar 

  15. Zhang J, Wu Q, Yin W, Yang L, Xiao B, Wang J, Yao X. Development and validation of a radiopathomic model for predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer patients. BMC Cancer. 2023;23(1):431.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Seidman AD, Maues J, Tomlin T, Bhatnagar V, Beaver JA. The evolution of clinical trials in metastatic breast cancer: design features and endpoints that matter. Am Soc Clin Oncol Educ Book. 2020;40:1–11.

    PubMed  Google Scholar 

  17. Banu A, Ahmed R, Musleh S, Shah Z, Househ M, Alam T. Predicting overall survival in METABRIC cohort using machine learning. Stud Health Technol Inform. 2023;305:632–5.

    PubMed  Google Scholar 

  18. Royer CW, O’Neill M, Wright SJ. A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization. Math Program. 2019;180:451–88.

    Article  Google Scholar 

  19. P P: Neural network programming in python. Int J Innov Technol Explor Eng. 2019;8(6s4):373–377.

  20. Livingston F. Implementation of Breiman's random forest machine learning algorithm. ECE591Q Mach Learn. 2005:1–13.

  21. Zemel R, Pitassi T. A gradient-based boosting algorithm for regression problems. Neural Inf Process Syst. 2001.

  22. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. TensorFlow: a system for large-scale machine learning. OSDI. 2016;16:265–83.

    Google Scholar 

  23. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

  24. Arasu VA, Kim P, Li W, Strand F, McHargue C, Harnish R, Newitt DC, Jones EF, Glymour MM, Kornak J, et al. Predictive value of breast MRI background parenchymal enhancement for neoadjuvant treatment response among HER2− patients. J Breast Imaging. 2020;2(4):352–60.

    Article  PubMed  PubMed Central  Google Scholar 

  25. Chavez-Macgregor M, Litton J, Chen H, Giordano SH, Hudis CA, Wolff AC, Valero V, Hortobagyi GN, Bondy ML, Gonzalez-Angulo AM. Pathologic complete response in breast cancer patients receiving anthracycline- and taxane-based neoadjuvant chemotherapy: evaluating the effect of race/ethnicity. Cancer. 2010;116(17):4168–77.

    Article  PubMed  Google Scholar 

  26. Dawood S, Broglio K, Kau SW, Green MC, Giordano SH, Meric-Bernstam F, Buchholz TA, Albarracin C, Yang WT, Hennessy BT, et al. Triple receptor-negative breast cancer: the effect of race on response to primary systemic treatment and survival outcomes. J Clin Oncol. 2009;27(2):220–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Killelea BK, Yang VQ, Wang SY, Hayse B, Mougalian S, Horowitz NR, Chagpar AB, Pusztai L, Lannin DR. Racial differences in the use and outcome of neoadjuvant chemotherapy for breast cancer: results from the national cancer data base. J Clin Oncol. 2015;33(36):4267–76.

    Article  PubMed  Google Scholar 

  28. Llanos AA, Chandwani S, Bandera EV, Hirshfield KM, Lin Y, Ambrosone CB, Demissie K. Associations between sociodemographic and clinicopathological factors and breast cancer subtypes in a population-based study. Cancer Causes Control. 2015;26(12):1737–50.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Ma SJ, Serra LM, Yu B, Farrugia MK, Iovoli AJ, Yu H, Yao S, Oladeru OT, Singh AK. Racial/ethnic differences and trends in pathologic complete response following neoadjuvant chemotherapy for breast cancer. Cancers (Basel). 2022;14(3):534.

    Article  CAS  PubMed  Google Scholar 

  30. Meti N, Saednia K, Lagree A, Tabbarah S, Mohebpour M, Kiss A, Lu FI, Slodkowska E, Gandhi S, Jerzak KJ, et al. Machine learning frameworks to predict neoadjuvant chemotherapy response in breast cancer using clinical and pathological features. JCO Clin Cancer Inform. 2021;5:66–80.

    Article  PubMed  Google Scholar 

  31. Rella R, Bufi E, Belli P, Petta F, Serra T, Masiello V, Scrofani AR, Barone R, Orlandi A, Valentini V, et al. Association between background parenchymal enhancement and tumor response in patients with breast cancer receiving neoadjuvant chemotherapy. Diagn Interv Imaging. 2020;101(10):649–55.

    Article  CAS  PubMed  Google Scholar 

  32. Saednia K, Lagree A, Alera MA, Fleshner L, Shiner A, Law E, Law B, Dodington DW, Lu FI, Tran WT, et al. Quantitative digital histopathology and machine learning to predict pathological complete response to chemotherapy in breast cancer patients using pre-treatment tumor biopsies. Sci Rep. 2022;12(1):9690.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Sarma M, Perimbeti S, Nasir S, Attwood K, Kapoor A, O’Connor T, Early A, Levine EG, Takabe K, Kalinski P, et al. Lack of racial differences in clinical outcomes of breast cancer patients receiving neoadjuvant chemotherapy: a single academic center study. Breast Cancer Res Treat. 2022;192(2):411–21.

    Article  PubMed  PubMed Central  Google Scholar 

  34. Shubeck S, Zhao F, Howard FM, Olopade OI, Huo D. Response to treatment, racial and ethnic disparity, and survival in patients with breast cancer undergoing neoadjuvant chemotherapy in the US. JAMA Netw Open. 2023;6(3): e235834.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Terman E, Sheade J, Zhao F, Howard FM, Jaskowiak N, Tseng J, Chen N, Hahn O, Fleming G, Huo D, et al. The impact of race and age on response to neoadjuvant therapy and long-term outcomes in Black and White women with early-stage breast cancer. Breast Cancer Res Treat. 2023;200(1):75–83.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Tichy JR, Deal AM, Anders CK, Reeder-Hayes K, Carey LA. Race, response to chemotherapy, and outcome within clinical breast cancer subtypes. Breast Cancer Res Treat. 2015;150(3):667–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wang H, Yee D. I-SPY 2: a neoadjuvant adaptive clinical trial designed to improve outcomes in high-risk breast cancer. Curr Breast Cancer Rep. 2019;11(4):303–10.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Warner ET, Ballman KV, Strand C, Boughey JC, Buzdar AU, Carey LA, Sikov WM, Partridge AH. Impact of race, ethnicity, and BMI on achievement of pathologic complete response following neoadjuvant chemotherapy for breast cancer: a pooled analysis of four prospective Alliance clinical trials (A151426). Breast Cancer Res Treat. 2016;159(1):109–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Li F, Yang Y, Wei Y, He P, Chen J, Zheng Z, Bu H. Deep learning-based predictive biomarker of pathological complete response to neoadjuvant chemotherapy from histological images in breast cancer. J Transl Med. 2021;19(1):348.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Balmanoukian A, Zhang Z, Jeter S, Slater S, Armstrong DK, Emens LA, Fetting JH, Wolff AC, Davidson NE, Jacobs L, et al. African American women who receive primary anthracycline- and taxane-based chemotherapy for triple-negative breast cancer suffer worse outcomes compared with white women. J Clin Oncol. 2009;27(22):e35-37 (author reply e38-39).

    Article  PubMed  Google Scholar 

  41. Knisely AT, Michaels AD, Mehaffey JH, Hassinger TE, Krebs ED, Brenin DR, Schroen AT, Showalter SL. Race is associated with completion of neoadjuvant chemotherapy for breast cancer. Surgery. 2018;164(2):195–200.

    Article  PubMed  Google Scholar 

  42. Griggs JJ, Sorbero ME, Stark AT, Heininger SE, Dick AW. Racial disparity in the dose and dose intensity of breast cancer adjuvant chemotherapy. Breast Cancer Res Treat. 2003;81(1):21–31.

    Article  CAS  PubMed  Google Scholar 

  43. Griggs JJ, Culakova E, Sorbero ME, Poniewierski MS, Wolff DA, Crawford J, Dale DC, Lyman GH. Social and racial differences in selection of breast cancer adjuvant chemotherapy regimens. J Clin Oncol. 2007;25(18):2522–7.

    Article  PubMed  Google Scholar 

  44. Litvak A, Batukbhai B, Russell SD, Tsai HL, Rosner GL, Jeter SC, Armstrong D, Emens LA, Fetting J, Wolff AC, et al. Racial disparities in the rate of cardiotoxicity of HER2-targeted therapies among women with early breast cancer. Cancer. 2018;124(9):1904–11.

    Article  CAS  PubMed  Google Scholar 

  45. Leonard-Murali S, Nathanson SD, Springer K, Baker P, Susick L. Early breast cancer survival of black and white American women with equal diagnostic and therapeutic management. Eur J Surg Oncol. 2023;49(3):583–8.

    Article  PubMed  Google Scholar 

  46. Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, Bonnefoi H, Cameron D, Gianni L, Valagussa P, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. Lancet. 2014;384(9938):164–72.

    Article  PubMed  Google Scholar 

  47. Spring L, Greenup R, Niemierko A, Schapira L, Haddad S, Jimenez R, Coopey S, Taghian A, Hughes KS, Isakoff SJ, et al. Pathologic complete response after neoadjuvant chemotherapy and long-term outcomes among young women with breast cancer. J Natl Compr Cancer Netw. 2017;15(10):1216–23.

    Article  Google Scholar 

  48. Cortazar P, Zhang L, Untch M, Mehta K, Costantino JP, Wolmark N, Bonnefoi H, Cameron D, Gianni L, Valagussa P, et al. Pathological complete response and long-term clinical benefit in breast cancer: the CTNeoBC pooled analysis. The Lancet. 2014;384(9938):164–72.

    Article  Google Scholar 

  49. van Uden DJP, van Maaren MC, Bult P, Strobbe LJA, van der Hoeven JJM, Blanken-Peeters C, Siesling S, de Wilt JHW. Pathologic complete response and overall survival in breast cancer subtypes in stage III inflammatory breast cancer. Breast Cancer Res Treat. 2019;176(1):217–26.

    Article  PubMed  PubMed Central  Google Scholar 

  50. Symmans WF, Wei C, Gould R, Yu X, Zhang Y, Liu M, Walls A, Bousamra A, Ramineni M, Sinn B, et al. Long-term prognostic risk after neoadjuvant chemotherapy associated with residual cancer burden and breast cancer subtype. J Clin Oncol. 2017;35(10):1049–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Kuerer HM, Newman LA, Smith TL, Ames FC, Hunt KK, Dhingra K, Theriault RL, Singh G, Binkley SM, Sneige N, et al. Clinical course of breast cancer patients with complete pathologic primary tumor and axillary lymph node response to doxorubicin-based neoadjuvant chemotherapy. J Clin Oncol. 1999;17(2):460–9.

    Article  CAS  PubMed  Google Scholar 

  52. Kuroi K, Toi M, Ohno S, Nakamura S, Iwata H, Masuda N, Sato N, Tsuda H, Kurosumi M, Akiyama F. Prognostic significance of subtype and pathologic response in operable breast cancer; a pooled analysis of prospective neoadjuvant studies of JBCRG. Breast Cancer (Tokyo, Japan). 2015;22(5):486–95.

    Article  PubMed  Google Scholar 

  53. Untch M, Fasching PA, Konecny GE, Hasmuller S, Lebeau A, Kreienberg R, Camara O, Muller V, du Bois A, Kuhn T, et al. Pathologic complete response after neoadjuvant chemotherapy plus trastuzumab predicts favorable survival in human epidermal growth factor receptor 2-overexpressing breast cancer: results from the TECHNO trial of the AGO and GBG study groups. J Clin Oncol. 2011;29(25):3351–7.

    Article  CAS  PubMed  Google Scholar 

  54. Qian B, Yang J, Zhou J, Hu L, Zhang S, Ren M, Qu X. Individualized model for predicting pathological complete response to neoadjuvant chemotherapy in patients with breast cancer: a multicenter study. Front Endocrinol (Lausanne). 2022;13: 955250.

    Article  PubMed  Google Scholar 

  55. Goorts B, van Nijnatten TJ, de Munck L, Moossdorff M, Heuts EM, de Boer M, Lobbes MB, Smidt ML. Clinical tumor stage is the most important predictor of pathological complete response rate after neoadjuvant chemotherapy in breast cancer patients. Breast Cancer Res Treat. 2017;163(1):83–91.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Livingston-Rosanoff D, Schumacher J, Vande Walle K, Stankowski-Drengler T, Greenberg CC, Neuman H, Wilke LG. Does tumor size predict response to neoadjuvant chemotherapy in the modern era of biologically driven treatment? A nationwide study of US breast cancer patients. Clin Breast Cancer. 2019;19(6):e741–7.

    Article  PubMed  PubMed Central  Google Scholar 

  57. Khan N, Adam R, Huang P, Maldjian T, Duong TQ. Deep learning prediction of pathologic complete response in breast cancer using MRI and other clinical data: a systematic review. Tomography. 2022;8(6):2784–95.

    Article  PubMed  PubMed Central  Google Scholar 

  58. Dammu H, Ren T, Duong TQ. Deep learning prediction of pathological complete response, residual cancer burden, and progression-free survival in breast cancer patients. PLoS ONE. 2023;18(1): e0280148.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Syed A, Adam R, Ren T, Lu J, Maldjian T, Duong TQ. Machine learning with textural analysis of longitudinal multiparametric MRI and molecular subtypes accurately predicts pathologic complete response in patients with invasive breast cancer. PLoS ONE. 2023;18(1): e0280320.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Hines RB, Johnson AM, Lee E, Erickson S, Rahman SMM. Trends in breast cancer survival by race-ethnicity in Florida, 1990–2015. Cancer Epidemiol Biomarkers Prev. 2021;30(7):1408–15.

    Article  PubMed  Google Scholar 

  61. Mitchell E, Alese OB, Yates C, Rivers BM, Blackstock W, Newman L, Davis M, Byrd G, Harris AE. Cancer healthcare disparities among African Americans in the United States. J Natl Med Assoc. 2022;114(3):236–50.

    PubMed  Google Scholar 

  62. Reeder-Hayes K, Peacock Hinton S, Meng K, Carey LA, Dusetzina SB. Disparities in use of human epidermal growth hormone receptor 2-targeted therapy for early-stage breast cancer. J Clin Oncol. 2016;34(17):2003–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Liu H, Lv L, Gao H, Cheng M. Pathologic complete response and its impact on breast cancer recurrence and patient’s survival after neoadjuvant therapy: a comprehensive meta-analysis. Comput Math Methods Med. 2021;2021:7545091.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Ayanian JZ, Kohler BA, Abe T, Epstein AM. The relation between health insurance coverage and clinical outcomes among women with breast cancer. N Engl J Med. 1993;329(5):326–31.

    Article  CAS  PubMed  Google Scholar 

  65. Andersen BL, Yang HC, Farrar WB, Golden-Kreutz DM, Emery CF, Thornton LM, Young DC, Carson WE 3rd. Psychologic intervention improves survival for breast cancer patients: a randomized clinical trial. Cancer. 2008;113(12):3450–8.

    Article  PubMed  Google Scholar 

  66. Hoy MK, Winters BL, Chlebowski RT, Papoutsakis C, Shapiro A, Lubin MP, Thomson CA, Grosvenor MB, Copeland T, Falk E, et al. Implementing a low-fat eating plan in the Women’s Intervention Nutrition Study. J Am Diet Assoc. 2009;109(4):688–96.

    Article  PubMed  PubMed Central  Google Scholar 

  67. Ibrahim EM, Al-Homaidh A. Physical activity and survival after breast cancer diagnosis: meta-analysis of published studies. Med Oncol. 2011;28(3):753–65.

    Article  PubMed  Google Scholar 

  68. Weeks JC, Cook EF, O’Day SJ, Peterson LM, Wenger N, Reding D, Harrell FE, Kussin P, Dawson NV, Connors AF Jr, et al. Relationship between cancer patients’ predictions of prognosis and their treatment preferences. JAMA. 1998;279(21):1709–14.

    Article  CAS  PubMed  Google Scholar 

  69. Lundqvist A, Andersson E, Ahlberg I, Nilbert M, Gerdtham U. Socioeconomic inequalities in breast cancer incidence and mortality in Europe-a systematic review and meta-analysis. Eur J Public Health. 2016;26(5):804–13.

    Article  PubMed  PubMed Central  Google Scholar 

  70. Coates RJ, Clark WS, Eley JW, Greenberg RS, Huguley CM Jr, Brown RL. Race, nutritional status, and survival from breast cancer. J Natl Cancer Inst. 1990;82(21):1684–92.

    Article  CAS  PubMed  Google Scholar 

  71. Ramsey SD, Bansal A, Fedorenko CR, Blough DK, Overstreet KA, Shankaran V, Newcomb P. Financial insolvency as a risk factor for early mortality among patients with cancer. J Clin Oncol. 2016;34(9):980–6.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Ma SJ, Iovoli AJ, Attwood K, Wooten KE, Arshad H, Gupta V, McSpadden RP, Kuriakose MA, Markiewicz MR, Chan JM, et al. Association of significant financial burden with survival for head and neck cancer patients treated with radiation therapy. Oral Oncol. 2021;115: 105196.

    Article  PubMed  PubMed Central  Google Scholar 

  73. Klein J, Bodner W, Garg M, Kalnicki S, Ohri N. Pretreatment financial toxicity predicts progression-free survival following concurrent chemoradiotherapy for locally advanced non-small-cell lung cancer. Future Oncol. 2019;15(15):1697–705.

    Article  CAS  PubMed  Google Scholar 

  74. Williams AD, Buckley M, Ciocca RM, Sabol JL, Larson SL, Carp NZ. Racial and socioeconomic disparities in breast cancer diagnosis and mortality in Pennsylvania. Breast Cancer Res Treat. 2022;192(1):191–200.

    Article  PubMed  Google Scholar 

  75. Figueiredo F, Adami F. Income inequality and mortality owing to breast cancer: evidence from Brazil. Clin Breast Cancer. 2018;18(4):e651–8.

    Article  PubMed  Google Scholar 

  76. Ingleby FC, Woods LM, Atherton IM, Baker M, Elliss-Brookes L, Belot A. An investigation of cancer survival inequalities associated with individual-level socio-economic status, area-level deprivation, and contextual effects, in a cancer patient cohort in England and Wales. BMC Public Health. 2022;22(1):90.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Bidard FC, Michiels S, Riethdorf S, Mueller V, Esserman LJ, Lucci A, Naume B, Horiguchi J, Gisbert-Criado R, Sleijfer S, et al. Circulating tumor cells in breast cancer patients treated by neoadjuvant chemotherapy: a meta-analysis. J Natl Cancer Inst. 2018;110(6):560–7.

    Article  PubMed  Google Scholar 

  78. Ren T, Cattell R, Duanmu H, Huang P, Li H, Vanguri R, Liu MZ, Jambawalikar S, Ha R, Wang F et al. Convolutional neural network detection of axillary lymph node metastasis using standard clinical breast MRI. Clin Breast Cancer. 2020;20(3):e301-8.

    Article  CAS  PubMed  Google Scholar 

  79. Ren T, Lin S, Huang P, Duong TQ: Convolutional neural network of multiparametric MRI accurately detects axillary lymph node metastasis in breast cancer patients with pre neoadjuvant chemotherapy. Clin Breast Cancer. 2022;22(2):170-7.

    Article  CAS  PubMed  Google Scholar 

  80. Cattell RF, Kang JJ, Ren T, Huang PB, Muttreja A, Dacosta S, Li H, Baer L, Clouston S, Palermo R et al: MRI volume changes of axillary lymph nodes as predictor of pathologic complete responses to neoadjuvant chemotherapy in breast cancer. Clin Breast Cancer. 2020;20(1):68-79.e61.

    Article  PubMed  Google Scholar 

Download references

Funding

None.

Author information

Authors and Affiliations

Authors

Contributions

Study conception and design contributed by KD, AV, TM, RA, TD. Acquisition of data contributed by AE, KD. Statistical analysis contributed by KD, AV, WH, TD. Machine learning analysis contributed by AV, WH, TD. Tables 1, 3, 4 and 5 contributed by KD. Table 2, Fig. 1, Table 6, Supplementary tables contributed by AV. Writing, drafting, and reviewing manuscript contributed by KD, AV, TM, SF, JC, LH, DM, TD.

Corresponding author

Correspondence to Tim Q. Duong.

Ethics declarations

Ethics approval and consent to participate

All subjects gave their informed consent for inclusion before they participated in the study. The study was approved by [redacted for blind review] with waived consent #2020-12169.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1:

Table S1. AUCs for all four univariate models across all 4 tumor types (output variable is pCR) (N = 240).

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dell’Aquila, K., Vadlamani, A., Maldjian, T. et al. Machine learning prediction of pathological complete response and overall survival of breast cancer patients in an underserved inner-city population. Breast Cancer Res 26, 7 (2024). https://doi.org/10.1186/s13058-023-01762-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13058-023-01762-w

Keywords