- Open Access
An integrated deep learning model for the prediction of pathological complete response to neoadjuvant chemotherapy with serial ultrasonography in breast cancer patients: a multicentre, retrospective study
Breast Cancer Research volume 24, Article number: 81 (2022)
The biological phenotype of tumours evolves during neoadjuvant chemotherapy (NAC). Accurate prediction of pathological complete response (pCR) to NAC in the early-stage or posttreatment can optimize treatment strategies or improve the breast-conserving rate. This study aimed to develop and validate an autosegmentation-based serial ultrasonography assessment system (SUAS) that incorporated serial ultrasonographic features throughout the NAC of breast cancer to predict pCR.
A total of 801 patients with biopsy-proven breast cancer were retrospectively enrolled from three institutions and were split into a training cohort (242 patients), an internal validation cohort (197 patients), and two external test cohorts (212 and 150 patients). Three imaging signatures were constructed from the serial ultrasonographic features before (pretreatment signature), during the first–second cycle of (early-stage treatment signature), and after (posttreatment signature) NAC based on autosegmentation by U-net. The SUAS was constructed by subsequently integrating the pre, early-stage, and posttreatment signatures, and the incremental performance was analysed.
The SUAS yielded a favourable performance in predicting pCR, with areas under the receiver operating characteristic curve (AUCs) of 0.927 [95% confidence interval (CI) 0.891–0.963] and 0.914 (95% CI 0.853–0.976), compared with those of the clinicopathological prediction model [0.734 (95% CI 0.665–0.804) and 0.610 (95% CI 0.504–0.716)], and radiologist interpretation [0.632 (95% CI 0.570–0.693) and 0.724 (95% CI 0.644–0.804)] in the external test cohorts. Furthermore, similar results were also observed in the early-stage treatment of NAC [AUC 0.874 (0.793–0.955)–0.897 (0.851–0.943) in the external test cohorts].
We demonstrate that autosegmentation-based SAUS integrating serial ultrasonographic features throughout NAC can predict pCR with favourable performance, which can facilitate individualized treatment strategies.
Breast cancer is the most common cancer and the leading cause of cancer-related deaths in women . Neoadjuvant chemotherapy (NAC) is the standard of care for patients with locally advanced breast cancer, and it is increasingly used for patients with operable breast cancer to allow more conservative surgery in the breast and axilla . However, not all patients benefit from NAC, and the reported rates of pathological complete response (pCR) are generally less than 70% [3,4,5]. Accurate prediction of pCR allows for early intervention for non-pCR patients to increase pCR rates  and guides clinicians in choosing breast-conserving surgery. However, no reliable biomarkers currently exist to aid in pCR prediction.
Magnetic resonance imaging (MRI) is one of the main imaging methods used to monitor the response to NAC in breast cancer [7,8,9]. However, few patients can use MRI to monitor treatment response in each cycle of NAC because of its high cost and inflexibility. Ultrasound is widely used to evaluate treatment response in clinical practice due to its low cost and convenience. In addition, ultrasound is recommended by guidelines to evaluate or re-evaluate the lesions before, during, and after NAC [2, 10]. However, the performance of conventional ultrasound remains far from satisfactory, with a false negative rate (FNR) for pCR up to 39.2% .
Recently, radiomics has been used for breast cancer diagnosis, treatment assessment, and prognosis prediction [12,13,14,15,16,17]. Indeed, radiomics based on the analysis of medical images showed the ability to noninvasively describe tumour phenotypes with more predictive power than routine clinical methods . However, in traditional quantitative image analysis, tumour segmentation is delineated manually by radiologists, which is time-consuming and has inter/intraobserver variability [12, 17]. In contrast, deep learning has certain advantages in segmentation speed and reducing variability. Nevertheless, most previous studies have focused on identifying imaging biomarkers at a single time point [16, 18, 19]. Biological behaviour is a dynamic ecosystem with various cellular contributions; hence, tumour heterogeneity may not be fully captured at a single time point [20, 21]. It may be beneficial to integrate serial ultrasound images during NAC as a way to monitor changes in tumour biological characteristics [12, 17, 22].
Thus, in this study, we developed and validated a deep learning-based serial ultrasonography assessment system (SUAS) for predicting the neoadjuvant chemotherapy response of breast cancer using serial ultrasound images.
Participants and data acquisition
The ethics committees of the Guangdong Provincial People’s Hospital (GPPH), Yunnan Cancer Hospital (YNCH), and Shanxi Province Cancer Hospital (SPCH) approved this multicentre retrospective study. The board waived the requirement for informed consent because of the study’s retrospective nature. All data in the study were deidentified and anonymized.
Eligible female patients diagnosed with breast cancer who completed NAC, followed by surgery, were retrospectively recruited from May 2015 to June 2020 according to the inclusion and exclusion criteria (Additional file 1: SI and Fig. S1). Serial ultrasonographic images of the target lesions were acquired at three time points: (1) pretreatment ultrasonography, within one week before NAC (Phase 0); (2) early-stage ultrasonography, during the first–second cycle of NAC (Phase 1); and (3) posttreatment ultrasonography, after NAC and within 1 month before surgery (Phase 2) (Fig. 1A). The cross-sectional image slice with the largest dimension of the tumour was selected for subsequent analysis. All images were reviewed by a radiologist with 10 years of experience in breast imaging (Y.L.). Details of the ultrasound scanners and probes used in the three centres are summarized in Additional file 1: Table S1.
Due to the retrospective nature of this study, the patients were not randomized into different cohorts. Patients enrolled from YNCH were divided into the training and internal validation cohorts between May 2015 and June 2018 and between July 2018 and June 2020 because this institute had the largest number of cases, while others recruited from GPPH and SPCH were used as two independent external test cohorts (Additional file 1: Fig. S1).
The oestrogen receptor (ER)/progesterone receptor (PR) status was considered positive if ≥ 1% of tumour cells were positive in immunohistochemical (IHC) staining . For Ki-67 status, the cut-off values were < 20% and ≥ 20%. The human epidermal growth factor receptor 2 (HER2) status was considered positive if IHC was scored as 3+ , and negative if it was 0 or 1+ . In situ hybridization (ISH) was employed for cells with IHC scores of 2+ , and the HER2 status was considered positive with amplified result and negative with nonamplified results [24, 25].
Assessment of pathological response to NAC
Six or more cycles of taxane-, anthracycline-, or anthracycline and taxane-based NAC protocols were administered to all patients (Table 1) according to the National Comprehensive Cancer Network (NCCN) and China Anti-Cancer Association breast cancer guidelines [10, 26]. For HER2(+) patients, an additional prescription of trastuzumab (8 mg/kg loading dose, 6 mg/kg maintenance dose) was given. Some of the HR(+)/HER2(−) patients received exclusive neoadjuvant endocrine therapy at the same time according to the recommendation.
The postoperative assessment of pathological response was performed based on the American Joint Committee on Cancer staging system [27, 28]. pCR status was defined as no residual invasive disease in the breast and lymph nodes(with or without ductal carcinoma in situ)(ypT0/isypN0). All the specimens were evaluated by pathologists (with at least 9 years of experience).
To compare the predictive performance between the constructed models (see below) and radiologist interpretation, a board-certified radiologist with 10 years of experience (Y.L), who was blinded to the clinical records, independently reviewed the posttreatment ultrasound images, and patients without visible target lesions in the ultrasound image were classified as pCR .
Manual tumour segmentation is a time-consuming task, especially for ultrasound images that are affected by acoustic interference, signal attenuation, and artefacts, which may potentially increase the difficulty of manual segmentation. We proposed a deep learning segmentation model based on the 2D U-Net to achieve automated tumour segmentation (Fig. 1B, C). The regions of interest (ROIs) were manually delineated using itk-SNAP (www.itksnap.org) to obtain the ground truth by a trained radiologist (M.L, with 11 years of experience), then, an expert radiologist (Y.W, with 16 years of experience) confirmed the ROIs. In cases of disagreement, the ROI was adjudicated by a senior radiologist (Y.X.W, with 20 years of experience). The tumour ROI included the surrounding chords and burrs. If the tumour lesion was not visible after the NAC, the tumour bed fibrosis, the biopsy marker, and/or surrounding anatomic landmarks before NAC were used as the reference for ROI placement.
The segmentation network was based on the U-Net architecture proposed by Ronneberger . The architecture consisted of two parts: (1) the encoding network, consisting of cascaded convolutional layers, maximum pooling layers, and full convolutions with skip connections, the purpose of which was to reduce the resolution of the input images and extract progressively abstract features; and (2) the decoding network, composed of a convolutional layer and an upsampling layer, the purpose of which was to offer an expanding path for resuming the spatial resolution of the extracted feature map to the original level of the input image (Additional file 1: Fig. S2). The details are provided in Additional file 1: SI II–III and Fig. S2. We used the Dice similarity coefficient (DICE) to evaluate the accuracy of automated segmentation.
Feature extraction and signature construction
Since ultrasound images were collected from different image acquisition machines at multiple centres, the intensity distribution of the images was quite different. We first used the Z-Score method to standardize ultrasound images before extracting image features. Then, we further used the cosine similarity and Bland‒Altman plots to compare the similarity and consistency between the image features that were extracted from automated and manual segmentation by the Pyradiomics toolkit. Next, the ComBat model was used to reduce the batch effect caused by the images acquired by different machines. Finally, singular value decomposition and reconstruction (SVD-R), XGBoost, and support vector machine–recursive feature elimination (SVM-RFE) algorithms were used to execute feature selection. The details are provided in Additional file 1: SI IV–VI (Fig. 1G, Additional file 1: Fig. S3).
After feature selection, the optimal feature sets with correlations with pCR were selected for phase 0 (P0), phase 1 (P1), and phase 2 (P2). They were further used to build distinct single-time point prediction signatures (P0-Signature, P1-Signature, P2-Signature) by multivariable logistic regression.
Model development and validation
Each single time point prediction signature generated a prediction score for each patient, which reflected the new characteristics of the tumour at different time points. Then, three prediction scores and clinicopathological factors were applied to generate four models to form the SUAS (Fig. 1G) to predict the pCR of patients receiving NAC: (1) the clinicopathological prediction model, which was built based on clinicopathological factors; (2) Model 1, which was built based on the P0-Signature and potentially significant clinicopathological factors; (3) Model 2, an early-stage treatment model, which was built based on Model 1 plus the P1-Signature; and (4) Model 3, which was built based on Model 2 plus the P2-Signature.
The application of SAUS was also investigated in the three molecular subtypes of breast cancer, namely, the HER2 (+),HER2(−)/HR(+), and triple-negative subgroups. To further validated the performance of models, we merged three datasets into one superset and then randomly split into training-validation-test cohort with a ratio of 6:2:2 (training cohort: n = 481, validation cohort: n = 160 and test cohort: n = 160). Then, we evaluated the predictive performance of the P0-Signature, P1-Signature, P2-Signature and SUAS model in each cohort.
Continuous variables were expressed as the mean ± standard deviation (SD) or medians with interquartile range (IQR), as appropriate. Continuous and categorical variables were compared between groups utilizing Student's t test or the χ2 test. All statistical analyses were executed in R (version 3.5.0). A p value < 0.05 was considered statistically significant, and all tests were two-sided.
The cosine similarity, Bland‒Altman analysis and intraclass correlation coefficient (ICC) were used to analyse the similarity, consistency and agreement of the image features from automated segmentation and manual segmentation. The area under the receiver operating characteristic curve (AUC) and other performance evaluation metrics (Brier score, accuracy, sensitivity, specificity, positive predictive value [PPV], and negative predictive value [NPV]) were used to compare the performance between models (clinicopathological model and Models 1–3) and human interpretation. The 95% confidence intervals (95% CIs) were calculated using the bootstrapping strategy (n = 2000). DeLong's test, decision curves, the net reclassification improvement (NRI) test, and the integrated discrimination improvement (IDI) test were applied to assess the predictive performances of the models.
In total, there were 1395 consecutive female patients potentially eligible for enrolment in the present study (YNCH, 718 patients; GPPH, 329 patients; SPCH, 348 patients), and 594 patients (YNCH, 279 patients; GPPH, 117 patients; SPCH, 198) were excluded. Therefore, 801 female patients (mean age 48 ± 9 years, range 25–75 years) were included in the final study cohorts, with 242 (YNCH), 197 (YNCH), 212 (GPPH), and 150 (SPCH) patients in the primary, internal validation and external test cohorts (Additional file 1: Fig. S1). Table 1 summarizes the clinicopathological characteristics of all patients.
There was no significant difference in the pCR rates among the primary, internal validation, and external test cohorts (29.3% vs. 28.9% vs. 36.2% [GPPH] vs. 25.3% [SPCH], p = 0.059). No significant difference in age, menstruation status, Ki-67 status, or NAC protocols (all p > 0.05) was observed between the pCR and NpCR groups in any of the four cohorts. In addition, ER and PR status were found to be significantly correlated with pCR in all cohorts except the training cohort. Her2 status showed no significant difference only in external test cohort 2.
Deep learning enables comparability between automated and manual tumour segmentation.
The deep learning segmentation network we trained achieved satisfactory segmentation accuracy (DICE > 0.750) in two external test cohorts (Fig. 1D). During NAC, the residual tumours may become scattered foci distributed within the tumour bed , posing a great challenge for precise manual segmentation. However, our model demonstrated good segmentation accuracy for the images throughout the three phases (DICE > 0.780) (Fig. 1E). Meanwhile, our results showed that the automated model could perform effective segmentation not only for large-sized, but also for small-sized lesions (less than 2 cm) (Fig. 1F).
In addition, the similarity and consistency of the image features segmented by the two methods were favourable. A total of 3535 quantitative image features were extracted from automated and manual segmentation. According to the cosine similarity (mean > 0.900, range: 0.700–1.00) and Bland‒Altman test (mean difference ≤ 5.525e−11), the automated segmentation model demonstrated very close results to the manual segmentation performed by experienced radiologists in each cohort (Additional file 1: SI VI, Table S2). Therefore, we used the automatically segmented image features to complete the subsequent analysis.
Feature assessment and SUAS construction and validation
Principal component analysis (PCA) and linear models show that Combat model indeed corrects the batch effect of machines (Additional file 1: Fig. S4). With the feature selection strategies, 12, 11, and 9 features were finally selected from phases 0–2 to build the P0-Signature, P1-Signature, and P2-Signature, respectively (Additional file 1: Table S3). The three signatures were significantly different between the pCR and NpCR groups and were important predictors for predicting pCR at multiple time points (Table 1, all p values < 0.001); however, unexpectedly, none of the clinicopathological factors, namely, ER, PR, HER2, Ki-67 status, were found to be significant in the multivariate regression analysis (Additional file 1: Table S4, all p values > 0.100). However, since previous studies have shown that these variables are important biomarkers for pCR prediction [7, 32], they were also built into the clinicopathological prediction model using the forced entry method of regression analysis. Furthermore, we observed that the relative contribution of each clinicopathological factor (0.77–13.63%) in Models 2–3 to predict pCR was much smaller than that of the radiomics signatures (25.21–46.85%) (Additional file 1: Table S4 and Fig. S5). Therefore, the clinicopathological factors were discarded in Models 1–3.
The performance of the single time point signature in predicting pCR was significantly superior to that of the clinical model (Pall < 0.001) and radiologist evaluation (Pall < 0.001) (Additional file 1: Table S5). Similar results were found in Models 2–3 for clinical usefulness (Additional file 1: Fig. S6). Furthermore, adding the single time point signature to another signature to form a multitime point model significantly improved the prediction for pCR (Fig. 2A, B, Table 2, Additional file 1: Table S6). Additionally, in the evaluation of the relative variable contribution to SUAS, the highest percent contribution was the posttreatment signature (46.16%), followed by the pretreatment signature (25.21%), early-stage treatment signature (16.22%), and clinical factors (12.4%) (Additional file 1: Fig. S5). In clinical practice, early assessment of treatment response can help assess the effectiveness of treatment options . The early-stage treatment-based Model 2 (P0 + P1-Signatures) was superior to existing methods, such as clinical models or radiologists (Fig. 3A, B, Table 2). A similar result was detected in Model 3 (P0 + P1 + P2-Signatures), which aimed at preoperative evaluation after NAC (Fig. 3B, C, Table 2). Moreover, 468 of 558 (83.9%) patients with NpCR (136 of 171, 123 of 140, 115 of 135, and 94 of 112 patients in the training, internal validation, and two external test cohorts, respectively) were successfully identified by Model 2(Fig. 3A). Meanwhile, 206 of 243 (84.8%) patients with pCR (63 of 71, 47 of 57, 66 of 77, and 30 of 38 patients in the training, internal validation, and two external test cohorts, respectively) were successfully identified by Model 3 (Fig. 3C). Finally, the subgroup analysis of SUAS was implemented, and Models 2 and 3 achieved better predictive performance within the HER2(+) and HER2(−)/HR(+) subgroups (AUCs all > 0.860), than within the triple-negative subgroup (AUCs < 0.860) (Fig. 3D, E, Additional file 1: Table S6). The outperformance of SUAS was further validated by the NRI test (with all p < 0.001, Additional file 1: Table S7) and IDI test (with all p < 0.001, Additional file 1: Table S8). Besides, the superiority of SUAS in predicting pCR and NpCR status was also validated in randomly divided datasets (Additional file 1: Table S9).
Visualization and interpretability of SUAS
A Sankey Diagram (Fig. 4A) was employed to visualize the predictive accuracy of SUAS throughout NAC, which reflected the constituent proportions of pCR prediction performance (true positivity (TpCR), true negativity (TnpCR), false positivity (FpCR) and false negativity (FnpCR)) of SUAS for each prognostic factor. The diagram showed that with the increment of the predictors in SUAS, all categories (TpCR, FpCR, FnpCR, and TnpCR) presented significant and constant fluidity (e.g. from clinicopathological factors to the P0 signature). The largest shift was observed during the transition from the P0-Signature to the P0 + P1-Signatures, where the FpCR decreased by 12.4% and the TnpCR increased by 12.4%. Similar results were found during the transition from the P0 + P1-Signatures to the P0 + P1 + P2-Signatures. Briefly, with the accumulation of information on multiple prognostic factors, the proportion of false positivity prediction showed a downward trend, while the proportion of true negativity gradually increased.
We also quantified three entropy-related feature changes over time in the three institutions. The results revealed that patients with pCR showed reduced entropy, while those with NpCR showed the opposite (Additional file 1: Fig. S7).
Meanwhile, four patients (two with pCR and two without) were randomly chosen from the study population to explore the interpretability of SUAS by the clustering of entropy during NAC (Fig. 4B). In patients with NpCR, the entropy clustering within the tumour bed became increasingly scattered from Phase 0 to 2, with chords and burrs around. In patients achieving pCR, the entropy clustering within the tumour bed demonstrated increased compactness throughout the three phases, with a well-defined margin. This result suggested that the SUAS could interpret the potential biological changes in breast cancer throughout NAC.
This study showed that the autosegmentation-based SUAS, integrating serial multitime imaging biomarkers throughout NAC, could accurately predict pCR in the training, internal validation and two external test cohorts. Moreover, the performance of SUAS was largely unaffected by the molecular subtypes. To the best of our knowledge, this is the first large-sample, multicentre study that incorporated pre, early-stage, and posttreatment ultrasonographic imaging features. The outperformance of SUAS over the clinical model, human interpretation, and conventional single time-point prediction models indicated its potential in facilitating individualized clinical decision-making noninvasively before surgery in breast cancer patients.
The most essential finding of the present study was the importance of serial (rather than single time-point) assessment in predicting the pCR of breast cancer. Breast cancer is a group of highly heterogeneous neoplasms that evolve continuously over space and time . In particular, the dynamic response to NAC may contain a large amount of information that is potentially associated with the pathological outcome. Therefore, how to track the full-scale changes during NAC and whether dynamic imaging profiling would contribute to the improvement of prediction performance have become the major concerns. In the present study, a trend towards an increase in performance with higher AUCs was noted, when new imaging signatures of different time-points throughout NAC were added to the model. This was especially evident in Model 3 which included the posttreatment signature. The relative variable contribution analysis also confirmed that the posttreatment signature contributed to 46.16% of the predictive power of SUAS, followed by the pretreatment signature (25.21%), early-stage treatment signature (16.22%), and clinical factors (12.4%). The two latest published studies also developed a deep learning radiomic model from serial ultrasonographic data to predict the treatment response to NAC in patients with breast cancer [12, 17]. However, Gu et al.  study mainly focused on early adjustment of the NAC treatment strategy; thus, ultrasonographic data were obtained before treatment and after the second and fourth courses. Jiang et al.  prediction model only focused on the preoperative prediction of pCR based on pre- and posttreatment ultrasonographic data to guide surgical options. Our work achieved accurate prediction of pCR not only in the early stage (AUC of 0.874 and 0.897 in two external validation cohorts) but also in posttreatment of NAC (AUC of 0.927 and 0.914 in two external validation cohorts) by integrating the pre, early-stage, and posttreatment information. In the present study, the early prediction model (Model 2) successfully identified 83.9% (468 of 558) of NpCR patients, who may benefit from adjusting the treatment regimen; moreover, the preoperative model (Model 3) successfully identified 84.8% (206 of 243) of the pCR patients who may benefit from breast-conserving surgery and the omission of axillary node dissection. In addition, both the internal and external validation cohorts were included in the present study, ensuring a more robust assessment of model performance. In addition, we did not employ the deep learning technique in model construction to avoid the so-called black-box phenomenon, so the results of our study were more explicable. We also searched MRI-based radiomics to predict treatment response in breast cancer. Most of them were based on single-time MRI features [7, 32, 34, 35] because it is difficult for patients to undergo repeat MRI examination at short time intervals. Given the accessibility and operational simplicity of ultrasonography, the feasibility of predicting pCR of breast cancer using serial ultrasonographic assessment warrants consideration in clinical practice.
Another finding of this research was that the developed deep learning segmentation model could enable automated tumour segmentation comparable to manual segmentation, with satisfactory consistency and agreement, which significantly decreased the annotation time for the application of SUAS in clinical settings. Manual segmentation of the tumours is a laborious task with potential intra- and interobserver variability, especially for a large amount of data obtained from multiple time-points throughout NAC. To facilitate the feasibility of SUAS, quantitative analysis should be as user-friendly as possible. Therefore, the automated segmentation method was developed with a deep learning network (U-net). Our findings suggest that SUAS has the potential to become an automatic tool for pCR assessment before surgery.
In SUAS, we identified 12, 11, and 9 different radiomics features from pre-, early-stage, and post-NAC ultrasound images to discriminate pCR and NpCR status. This suggested that the biomarkers of tumour could be changed during NAC. However, the biological interpretation of these features remains an area of active investigation . A common entropy-related feature from serial images measured complexity of grey-level intensity (Entropy, Run Entropy) and heterogeneity of texture patterns (Zone Entropy), possibly reflecting the texture of cell proliferation and tissue hypoxia, which has been shown to be associated with response to neoadjuvant therapy . Among the 32 features, 15 features (such as Large Dependence Low Grey-Level Emphasis) were related to the grey level of images, which evaluate overall and clustered low or high grey-level intensity values. Changes in greyscale may reflect fibrotic and aggressive growth of the tumour and are associated with poor treatment outcomes [38, 39]. Other features (such as clustershade, skewness) are indicators that measure grayscale intensity and texture uniformity, which reflecting the intratumoural heterogeneity and slight variation of tissue morphology within the tumour. In total, radiomic features evaluated in SUAS highlight tumour heterogeneity at a regional and local level, which, depending on types of feature matrix, could be linked with proliferation, angiogenesis, and necrosis.
Currently, clinician assessment of pCR by human interpretation of ultrasound images is limited due to insufficient accuracy, probably because the tumour response is more reflected by changes in pathological compositions and microenvironment, such as necrosis and fibrosis, rather than changes in size which can be readily perceived by the naked eye . We compared the performance of the SUAS with that of clinicians for all datasets and found that the SUAS was far superior to the human experts. However, the predictive performance of clinicopathological factors was unfavourable in the external test cohorts, with a low contribution to the predictive power of the SUAS, probably because of the inconsistent distribution of molecular types across the four cohorts.
Since different molecular subtypes of breast cancer may result in variable responses to NAC, we also performed subgroup analysis to determine the performance of SUAS in the specific subtypes of breast cancer. First, our study suggested that the SUAS could accurately predict the pathological outcome in the HER2(−)/HR(+) subgroup, with the highest performance among the three subtypes (AUC = 0.954 in the external validation cohort), even without the posttreatment signature (AUC = 0.916 in the external validation cohort). Conventionally, the HER2(−)/HR(+) subtype is considered insensitive to NAC, with a pCR rate < 10% . For patients with this subtype who have large tumours but still desire to conserve the breast, Model 2 of the SUAS can assist in the early determination of potential candidates that can truly benefit from NAC and avoid the unnecessary toxic effects of chemotherapy and cost of the treatment. Second, patientsinHER2(+) subgroup and the triple-negative group are well-known for their high probability of response to NAC . However, the predictive performance of Model 2 was relatively unsatisfactory in the TNBC subgroup analysis of the GPPH external test cohort (AUC = 0.798), probably because the proportion of patients with TNBC in the training cohort was the lowest among the cohorts (only 7 patients, 9.9%). After integrating the posttreatment ultrasonographic signature into the prediction model, the AUC increased to 0.802, which implied the significance of serial ultrasonography assessment throughout NAC. Of note, the ultrasonography-based prediction model in our study outperformed the multiparametric MRI-based prediction model developed by Liu et al.  in most of the subgroup analyses, with AUCs ranging from 0.78 to 0.87 among the three external cohorts for the HER2(−)/HR(+) subgroup, 0.58–0.79 for the HER2(+) subgroup, and 0.79–0.84 for the triple-negative subgroup. This was unexpected because MRI has been considered the method of choice that provides the most correlated measurement of tumour size with pathological results . A possible explanation is that the multiphase biological and pathophysiological changes during NAC captured by serial ultrasonography made a considerable contribution to the outcome prediction, which outweighed the information detected by single time-point pretreatment MRI.
There were several limitations of the present study. First, the distribution of most patient characteristics and NAC regimens were not balanced among the four cohorts, which may have a potential influence on the validation of the SUAS. However, our study showed that the AUCs of Models 1–3 were similar between the training cohort and the other three cohorts, which implied the general applicability of the SUAS in various clinical situations. Second, only the ultrasound images obtained during the first–second cycle of NAC were used for model construction, the purpose of which was to simplify the entire data procurement protocol and improve the feasibility of clinical evaluation. The relative variable contribution analysis showed that the pretreatment and posttreatment signatures made a greater contribution to the prediction performance of the SUAS. Third, the present SUAS was meant to be a preliminary tool for patient stratification, which should be applied with caution because of the retrospective nature of this study with inherent selection bias. Given the complexity of patients’ clinical situations, it should be noted that the surgical strategy must be based on the comprehensive assessment by the multidisciplinary team.
In conclusion, the present proof-of-concept study developed a feasible model (the SUAS) based on automated segmentation of pre, early-stage, and post-NAC ultrasonographic imaging features for predicting patients with breast cancer who could benefit from optimal therapeutic management after NAC.
Availability of data and materials
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Additional file 1. Additional data related to this paper (including deidentified participant data with the data dictionary, original ultrasonographic images, study protocol and statistical analysis plan, etc.) will be made available to the scientific community on publication but should be reasonably requested from the corresponding authors. A signed data use agreement and institutional review board approval will be required before the release of research data.
Serial ultrasonography assessment system
Pathological complete response
Area under the receiver
The human epidermal growth factor receptor 2
The region of interest
Net reclassification improvement
Integrated discrimination improvement
Positive predictive value
Negative predictive value
Intraclass correlation coefficient
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
Cardoso F, Kyriakides S, Ohno S, Penault-Llorca F, Poortmans P, Rubio IT, Zackrisson S, Senkus E, Committee EG. Early breast cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann Oncol. 2019;30(10):1674.
Earl H, Provenzano E, Abraham J, Dunn J, Vallier AL, Gounaris I, Hiller L. Neoadjuvant trials in early breast cancer: pathological response at surgery and correlation to longer term outcomes—what does it all mean? BMC Med. 2015;13:234.
Sparano JA, Gray RJ, Makower DF, Pritchard KI, Albain KS, Hayes DF, Geyer CE Jr, Dees EC, Perez EA, Olson JA Jr, et al. Prospective validation of a 21-gene expression assay in breast cancer. N Engl J Med. 2015;373(21):2005–14.
Loibl S, Gianni L. HER2-positive breast cancer. Lancet. 2017;389(10087):2415–29.
Coudert B, Pierga JY, Mouret-Reynier MA, Kerrou K, Ferrero JM, Petit T, Kerbrat P, Dupre PF, Bachelot T, Gabelle P, et al. Use of [(18)F]-FDG PET to predict response to neoadjuvant trastuzumab and docetaxel in patients with HER2-positive breast cancer, and addition of bevacizumab to neoadjuvant trastuzumab and docetaxel in [(18)F]-FDG PET-predicted non-responders (AVATAXHER): an open-label, randomised phase 2 trial. Lancet Oncol. 2014;15(13):1493–502.
Liu Z, Li Z, Qu J, Zhang R, Zhou X, Li L, Sun K, Tang Z, Jiang H, Li H, et al. Radiomics of multiparametric MRI for pretreatment prediction of pathologic complete response to neoadjuvant chemotherapy in breast cancer: a multicenter study. Clin Cancer Res. 2019;25(12):3538–47.
Eun NL, Kang D, Son EJ, Park JS, Youk JH, Kim JA, Gweon HM. Texture analysis with 3.0-T MRI for Association of response to neoadjuvant chemotherapy in breast cancer. Radiology. 2020;294(1):31–41.
Kim SY, Cho N, Choi Y, Lee SH, Ha SM, Kim ES, Chang JM, Moon WK. Factors affecting pathologic complete response following neoadjuvant chemotherapy in breast cancer: development and validation of a predictive nomogram. Radiology. 2021;299(2):290–300.
CA A: Experts consensus of breast cancer neoadjuvant therapy in China (version 2019). China Oncol 2019; 29(5):390–400.
Baumgartner A, Tausch C, Hosch S, Papassotiropoulos B, Varga Z, Rageth C, Baege A. Ultrasound-based prediction of pathologic response to neoadjuvant chemotherapy in breast cancer patients. Breast. 2018;39:19–23.
Jiang M, Li CL, Luo XM, Chuan ZR, Lv WZ, Li X, Cui XW, Dietrich CF. Ultrasound-based deep learning radiomics in the assessment of pathological complete response to neoadjuvant chemotherapy in locally advanced breast cancer. Eur J Cancer. 2021;147:95–105.
Moustafa AF, Cary TW, Sultan LR, Schultz SM, Conant EF, Venkatesh SS, Sehgal CM. Color doppler ultrasound improves machine learning diagnosis of breast cancer. Diagnostics. 2020;10(9):631.
Gao Y, Luo Y, Zhao C, Xiao M, Ma L, Li W, Qin J, Zhu Q, Jiang Y. Nomogram based on radiomics analysis of primary breast cancer ultrasound images: prediction of axillary lymph node tumor burden in patients. Eur Radiol. 2021;31(2):928–37.
Fleury EFC, Marcomini K. Impact of radiomics on the breast ultrasound radiologist’s clinical practice: from lumpologist to data wrangler. Eur J Radiol. 2020;131:109197.
DiCenzo D, Quiaoit K, Fatima K, Bhardwaj D, Sannachi L, Gangeh M, Sadeghi-Naini A, Dasgupta A, Kolios MC, Trudeau M, et al. Quantitative ultrasound radiomics in predicting response to neoadjuvant chemotherapy in patients with locally advanced breast cancer: results from multi-institutional study. Cancer Med. 2020;9(16):5798–806.
Gu J, Tong T, He C, Xu M, Yang X, Tian J, Jiang T, Wang K. Deep learning radiomics of ultrasonography can predict response to neoadjuvant chemotherapy in breast cancer at an early stage of treatment: a prospective study. Eur Radiol. 2021. https://doi.org/10.1007/s00330-021-08293-y.
Adrada BE, Candelaria R, Moulder S, Thompson A, Wei P, Whitman GJ, Valero V, Litton JK, Santiago L, Scoggins ME, et al. Early ultrasound evaluation identifies excellent responders to neoadjuvant systemic therapy among patients with triple-negative breast cancer. Cancer. 2021;127(16):2880–7.
Rix A, Piepenbrock M, Flege B, von Stillfried S, Koczera P, Opacic T, Simons N, Boor P, Thoröe-Boveleth S, Deckers R, et al. Effects of contrast-enhanced ultrasound treatment on neoadjuvant chemotherapy in breast cancer. Theranostics. 2021;11(19):9557–70.
Natrajan R, Sailem H, Mardakheh FK, Arias Garcia M, Tape CJ, Dowsett M, Bakal C, Yuan Y. Microenvironmental heterogeneity parallels breast cancer progression: A histology-genomic integration analysis. PLoS Med. 2016;13(2):e1001961.
Failmezger H, Muralidhar S, Rullan A, de Andrea CE, Sahai E, Yuan Y. Topological tumor graphs: A graph-based spatial model to infer stromal recruitment for immunosuppression in melanoma histology. Can Res. 2020;80(5):1199.
Bhardwaj D, Dasgupta A, DiCenzo D, Brade S, Fatima K, Quiaoit K, Trudeau M, Gandhi S, Eisen A, Wright F, et al. Early changes in quantitative ultrasound imaging parameters during neoadjuvant chemotherapy to predict recurrence in patients with locally advanced breast cancer. Cancers. 2022;14(5):1247.
Fujii T, Kogawa T, Dong W, Sahin AA, Moulder S, Litton JK, Tripathy D, Iwamoto T, Hunt KK, Pusztai L, et al. Revisiting the definition of estrogen receptor positivity in HER2-negative primary breast cancer. Ann Oncol. 2017;28(10):2420–8.
Wolff AC, Hammond MEH, Allison KH, Harvey BE, Mangu PB, Bartlett JMS, Bilous M, Ellis IO, Fitzgibbons P, Hanna W, et al. Human epidermal growth factor receptor 2 testing in breast cancer: American Society of Clinical Oncology/College of American Pathologists Clinical Practice Guideline Focused Update. J Clin Oncol. 2018;36(20):2105–22.
Allison KH, Hammond MEH, Dowsett M, McKernin SE, Carey LA, Fitzgibbons PL, Hayes DF, Lakhani SR, Chavez-MacGregor M, Perlmutter J, et al. Estrogen and progesterone receptor testing in breast cancer: ASCO/CAP guideline update. J Clin Oncol. 2020;38(12):1346–66.
Gradishar WJ, Anderson BO, Balassanian R, Blair SL, Burstein HJ, Cyr A, Elias AD, Farrar WB, Forero A, Giordano SH, et al. Breast cancer, Version 4.2017, NCCN clinical practice guidelines in oncology. J Natl Compr Canc Netw. 2018;16(3):310–20.
Giuliano AE, Connolly JL, Edge SB, Mittendorf EA, Rugo HS, Solin LJ, Weaver DL, Winchester DJ, Hortobagyi GN. Breast cancer-major changes in the American Joint Committee on eighth edition Cancer staging manual. CA Cancer J Clin. 2017;67(4):290–303.
Cserni G, Chmielik E, Cserni B, Tot T. The new TNM-based staging of breast cancer. Virchows Arch. 2018;472(5):697–703.
Eisenhauer EA, Therasse P, Bogaerts J, Schwartz LH, Sargent D, Ford R, Dancey J, Arbuck S, Gwyther S, Mooney M, et al. New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer. 2009;45(2):228–47.
Ronneberger O, Fischer P, Brox T: U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention: 2015. Springer; 2015. p 234–241.
Choi J, Laws A, Hu J, Barry W, Golshan M, King T. Margins in breast-conserving surgery after neoadjuvant therapy. Ann Surg Oncol. 2018;25(12):3541–7.
Xiong Q, Zhou X, Liu Z, Lei C, Yang C, Yang M, Zhang L, Zhu T, Zhuang X, Liang C, et al. Multiparametric MRI-based radiomics analysis for prediction of breast cancers insensitive to neoadjuvant chemotherapy. Clin Transl Oncol. 2020;22(1):50–9.
Gerlinger M, Rowan AJ, Horswell S, Math M, Larkin J, Endesfelder D, Gronroos E, Martinez P, Matthews N, Stewart A, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366(10):883–92.
Braman N, Prasanna P, Whitney J, Singh S, Beig N, Etesami M, Bates DDB, Gallagher K, Bloch BN, Vulchi M, et al. Association of peritumoral radiomics with tumor biology and pathologic response to preoperative targeted therapy for HER2 (ERBB2)-positive breast cancer. JAMA Netw Open. 2019;2(4):e192561.
Cain EH, Saha A, Harowicz MR, Marks JR, Marcom PK, Mazurowski MA. Multivariate machine learning models for prediction of pathologic response to neoadjuvant therapy in breast cancer using MRI features: a study using an independent validation set. Breast Cancer Res Treat. 2019;173(2):455–63.
Incoronato M, Aiello M, Infante T, Cavaliere C, Grimaldi AM, Mirabelli P, Monti S, Salvatore M. Radiogenomic analysis of oncological data: a technical survey. Int J Mol Sci. 2017;18(4):805.
Henderson S, Purdie C, Michie C, Evans A, Lerski R, Johnston M, Vinnicombe S, Thompson AM. Interim heterogeneity changes measured using entropy texture features on T2-weighted MRI at 3.0 T are associated with pathological response to neoadjuvant chemotherapy in primary breast cancer. Eur Radiol. 2017;27(11):4602–11.
Braman NM, Etesami M, Prasanna P, Dubchuk C, Gilmore H, Tiwari P, Plecha D, Madabhushi A. Intratumoral and peritumoral radiomics for the pretreatment prediction of pathological complete response to neoadjuvant chemotherapy based on breast DCE-MRI. Breast Cancer Res. 2017;19(1):57.
Wu J, Gong G, Cui Y, Li R. Intratumor partitioning and texture analysis of dynamic contrast-enhanced (DCE)-MRI identifies relevant tumor subregions to predict pathological response of breast cancer to neoadjuvant chemotherapy. J Magn Reson Imaging. 2016;44(5):1107–15.
Dialani V, Chadashvili T, Slanetz PJ. Role of imaging in neoadjuvant therapy for breast cancer. Ann Surg Oncol. 2015;22(5):1416–24.
This work was funded by the Key-Area Research and Development Program of Guangdong Province (No. 2021B0101420006); Key-Area Research and Development Program of Guangdong Province (No. 2021B0101420006); National Natural Science Foundation of China (No. 82071892, 82271941, 82272088, 82171920); Guangdong Provincial Key Laboratory of Artificial Intelligence in Medical Image Analysis and Application (No. 2022B1212010011); the National Science Foundation for Young Scientists of China (Nos. 82102019, 82001986); Project Funded by China Postdoctoral Science Foundation (No. 2020M682643, 2021M700897); High-level Hospital Construction Project (DFJHBF202105).
Ethics approval and consent to participate
The ethics committees of each participating hospital were approved this multicenter retrospective study. The board waived the requirement for informed consents because of the study’s retrospective nature.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
SI. Inclusion and exclusion criteria. SII. Annotation of ultrasound images. SIII. Details of theautomated segmentation model for breast ultrasound images. SIV. Data Standardization and Feature Extraction. SV. Feature Selection and Model Construction. SVI. Consistency evaluation of features by Bland-Altman. Fig. S1. Inclusion and exclusion criteria. Fig. S2. U-net architecture. Fig. S3. Flowchart of feature assessment and SUAS construction. Fig. S4. Principal component analysis plot and linear model for evaluating batch effect correction in three institutions. Fig. S5. Relative variable contribution in Model2 and Model3. Fig. S6. Clinical usefulness evaluation of Model2 and Model3. Fig. S7. The changing trend of entropy-related features in three institutions. Table S1. The ultrasonographic images acquisition parameters of the multiple centers. Table S2. Similarity and consistency evaluation between automated segmentation and manual segmentation. Table S3. Key features selected for construction of phasal signatures based on ultrasound images at each single time point. Table S4. Multivariate analysis of clinicopathological model, Model 2 (the early-treatment model) and Model 3 (the posttreatment model). Table S5. Prediction performance of the signatures. Table S6. AUCs of early-pretreatment model (Model 2) and post-treatment model (Model 3) in breast cancer subtypes. Table S7. NRI test for prediction improvements of SUAS compared to single-time signature in multiple cohorts. Table S8. IDI test for prediction improvements of SUAS compared to single-time signature in multiple cohorts. Table S9. Prediction performance of the signatures and SUAS in randomly split datasets.
About this article
Cite this article
Wu, L., Ye, W., Liu, Y. et al. An integrated deep learning model for the prediction of pathological complete response to neoadjuvant chemotherapy with serial ultrasonography in breast cancer patients: a multicentre, retrospective study. Breast Cancer Res 24, 81 (2022). https://doi.org/10.1186/s13058-022-01580-6
- Deep learning
- Breast cancer
- Neoadjuvant chemotherapy
- Serial ultrasonography