Magnetic resonance imaging and ultrasound for prediction of residual tumor size in early breast cancer within the ADAPT subtrials

Background Prediction of histological tumor size by post-neoadjuvant therapy (NAT) ultrasound and magnetic resonance imaging (MRI) was evaluated in different breast cancer subtypes. Methods Imaging was performed after 12-week NAT in patients enrolled into three neoadjuvant WSG ADAPT subtrials. Imaging performance was analyzed for prediction of residual tumor measuring ≤10 mm and summarized using positive (PPV) and negative (NPV) predictive values. Results A total of 248 and 588 patients had MRI and ultrasound, respectively. Tumor size was over- or underestimated by < 10 mm in 4.4% and 21.8% of patients by MRI and in 10.2% and 15.8% by ultrasound. Overall, NPV (proportion of correctly predicted tumor size ≤10 mm) of MRI and ultrasound was 0.92 and 0.83; PPV (correctly predicted tumor size > 10 mm) was 0.52 and 0.61. MRI demonstrated a higher NPV and lower PPV than ultrasound in hormone receptor (HR)-positive/human epidermal growth factor receptor 2 (HER2)-positive and in HR−/HER2+ tumors. Both methods had a comparable NPV and PPV in HR−/HER2− tumors. Conclusions In HR+/HER2+ and HR−/HER2+ breast cancer, MRI is less likely than ultrasound to underestimate while ultrasound is associated with a lower risk to overestimate tumor size. These findings may help to select the most optimal imaging approach for planning surgery after NAT. Trial registration Clinicaltrials.gov, NCT01815242 (registered on March 21, 2013), NCT01817452 (registered on March 25, 2013), and NCT01779206 (registered on January 30, 2013). Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01413-y.


Background
Neoadjuvant therapy (NAT) allows monitoring of tumor response to treatment, provides important prognostic information, and may permit breast-conserving surgery by downstaging cancer [1]. Efficacy of NAT can be measured using pathological complete response (pCR), most commonly defined as the absence of invasive cancer and in situ cancer in the breast and axillary nodes (ypT0 ypN0); absence of invasive cancer in the breast and axillary nodes, irrespective of ductal carcinoma in situ (ypT0/is ypN0); and absence of invasive cancer in the breast irrespective of ductal carcinoma in situ or nodal involvement (ypT0/is). A meta-analysis by Cortazar et al. demonstrated that pCR defined as either ypT0/is ypN0 or ypT0 ypN0 was associated with improved overall survival and event-free survival compared to ypT0/is without information on the nodal status. Nowadays, ultrasound (US) and magnetic resonance imaging (MRI) are commonly used to monitor tumor response to NAT, and several studies have investigated their application for the prediction of pCR and measurement of residual tumor. Assessment of tumor size after NAT allows to determine the best surgical approach, or alternatively, it may provide evidence to continue the same systemic therapy or switch to another regimen [2,3]. However, there is conflicting evidence regarding the accuracy of these two methods for the evaluation of residual tumors with some studies indicating the superiority of MRI [4,5] while other reports suggest comparable accuracy of MRI and US [6][7][8]. The accuracy of the employed method in estimating tumor size has a profound impact on the success of surgery in terms of long-term outcomes and good cosmetic results. Only the most precise evaluation of the lesion allows complete tumor resection while keeping the removal of surrounding tissue to a minimum. Underestimation of lesion size carries the risk of resection with tumor-positive margins thus potentially worsening patient prognosis and requiring repeat surgery. Overestimation, however, may increase the likelihood of mastectomy in cases where breast-conserving surgery would have normally been recommended [9]. Therefore, the accuracy of the imaging method is of profound importance for disease prognosis and for patients' quality of life. Thus far, only a few studies have compared the post-NAT assessment of tumor size by MRI and US with histological measurements as the gold standard.
The present imaging subproject was performed within the framework of the Adjuvant Dynamic Marker-Adjusted Personalized Therapy Trial Optimizing Risk Assessment and Therapy Response Prediction in Early Breast Cancer (ADAPT) umbrella trial conducted by the West German Study Group (WSG). The aim of the ADAPT study was to identify early markers for therapy response to individualize NAT by avoiding over-and under-treatment. The primary objective of this analysis was to compare the value of MRI versus US performed at the end of neoadjuvant therapy (EoT) for prediction of pCR and residual disease and their accuracy in predicting histological tumor size in hormone receptorpositive, human epidermal growth factor receptor-2 positive (HR+/HER2+), HR−/HER2−, and HR−/HER2+ tumors from the respective ADAPT substudies.

Study design
Details on the design of WSG-ADAPT, a prospective, controlled, randomized, non-blinded, multi-center, and investigator-initiated umbrella clinical trial, and the results of three substudies, ADAPT triple-negative (TN, NCT01815242), ADAPT HER2+/HR+ (NCT01817452), and ADAPT HER2+/HR− substudies (NCT01779206), have been previously reported [10][11][12][13]. With regard to this analysis, three breast cancer subtypes were investigated: (i) HR+/HER2+ tumors after NAT including trastuzumab emtansine monotherapy (T-DM1), T-DM1+ endocrine therapy (ET), or trastuzumab+ET; (ii) HR −/HER2− tumors after neoadjuvant nab-paclitaxel+gemcitabine or nab-paclitaxel+carboplatin; and (iii) HR −/HER2+ tumors after neoadjuvant trastuzumab+pertuzumab treatment with or without paclitaxel. Enrolled patients were examined with US (mandatory) and MRI (optional) before systemic therapy, at 3 and 6 weeks after the start of NAT and at EoT. Here, we investigated only those patients with MRI and/or US performed at EoT. After neoadjuvant therapy, surgery within 3 weeks or histologic confirmation of non-pCR by core needle biopsy was obligatory. In case of a histologically confirmed residual invasive tumor by core needle biopsy, patients received standard NAT according to the national guidelines and underwent surgery afterwards. Clinically nodepositive patients underwent axillary dissection after completion of NAT. Sentinel node biopsy in clinically nodenegative patients was performed either before or after NAT, at the investigator's discretion. Adjuvant therapy was administered according to the national guidelines.
Eligibility criteria for enrollment of the patients Women aged ≥18 years, with histologically confirmed unilateral, primary invasive BC, and HR/HER2 status centrally confirmed at Institute of Pathology, University of Hannover, Germany, were eligible to participate in the study. HR-positive (≥1% of tumor nuclei staining positive for estrogen receptor and/or progesterone receptor) and HER2-positive status (immunohistochemistry (IHC) 3+ positive or in situ hybridization (ISH) positive) was required for participation in the ADAPT HER2+/HR+ substudy, HR-negative (< 1% of tumor nuclei staining positive for estrogen receptor and progesterone receptor) and HER2-positive status was required for the ADAPT HER2+/HR− substudy, and HR-negative and HER2negative (IHC 1+ negative or IHC 0 negative, or ISH negative) status was required for participation in the ADAPT TN substudy. Eastern Cooperative Oncology Group Performance Status ≤1 or Karnofsky Performance Status ≥80%, normal organ function, and adequate hematologic parameters were required for inclusion. Detailed inclusion and exclusion criteria for participation in the ADAPT study have been described elsewhere [10][11][12][13]. All patients provided written informed consent prior to study enrollment.

MRI technique
Breast MRI was performed at 43 sites thus providing an overview regarding radiology practice in Germany. In order to obtain images of comparable quality across all study locations, the central MRI reading site (Department of Diagnostic and Interventional Radiology, University Hospital, RWTH Aachen, Germany) provided a standardized imaging protocol which consisted of basic sequences; there was no need for special hardware or software. Prior to study participation, images provided by local reading sites were evaluated by the central reading site to ensure that the required MRI technique standards were met.
The standardized imaging protocol of 1.5-T and 3.0-T systems with a dedicated breast multichannel surface coil consisted of an axial bilateral twodimensional multi-section gradient-echo dynamic series (repetition time 250 ms; echo time 4.6 ms (1.5 T) or 2.3 ms (3 T); flip angle 90°) with a section thickness of 3 mm and full 512 × 512 acquisition matrix. The voxel size of all scans was kept constant along all exams with a maximum of 0.8. To ensure a high spatial resolution, the field of view was adapted to the individual breast size of the patient with a minimum of 280 mm and a maximum of 350 mm keeping the voxel size within the intended range. The dynamic sequence was performed prior to and four times after bolus injection of macrocyclic gadolinium agent, gadobutrol (Gadovist®/Gadavist®, Bayer AG, Leverkusen, Germany; 0.1 mmol/kg body weight), followed by a saline flush. Depending on the site preference, fat suppression or image subtraction was used for visualization of enhancement in the T1 gradient-echo sequence. A standard center of k-space between 60 and 90 s was used in all exams. In addition, an axial T2-weighted fast spin-echo sequence without fat suppression and with identical anatomic parameters as the T1-gradient echo sequence was performed.

MRI interpretation
A blinded analysis of locally acquired MRI images was conducted at the central reading site by two specialized breast radiologists with 13 (SS) and 23 years (CK) of experience in interpreting breast MRI images. Images were read according to BI-RADS guidelines (5th edition, [14]). Initially, the first (early) post-contrast subtracted or fatsuppressed T1 image was read for an overview of enhancing lesions and to access background enhancement. Afterwards, the complete unsubtracted dynamic series pre-and post-contrast at each slice was thoroughly analyzed for characteristics of any enhancing lesion and evaluation of enhancing residual disease. Then, the T2weighted series was analyzed for any structural changes and fluid containing lesions/edema. Any lesions in the dynamic contrast-enhanced images were correlated with the T2-weighted series for morphology and signal intensity. Final evaluation for complete imaging response was based on the whole dynamic series, at which enhancement characteristic and morphologic criteria at T1 preand post-contrast considering the early and late phase and also on the signal intensity on T2. Lesion size was measured in the longest diameter on the unsubtracted images. In patients with no visible tumor after NAT, anatomic landmarks in non-subtracted T1-and T2weighted images were analyzed to identify the site of the lesion.

US imaging
Before the first NAT cycle, patients underwent a systematic sonographic examination of both breasts and axillae by experienced gynecologists using breast US systems with at least 7.5 MHz and an electronic linear US probe. If possible, the tumor was measured in one to three diameters, and measurements were registered in the electronic case report forms. US was then repeated after one and two cycles of NAT and at EoT. All US images were read and interpreted at the local study site by experienced gynecologists. The lesions were described, and tumor size measured. The tumor was marked with a clip before the first cycle of NAT to be able to reliably identify the tumor region at the subsequent examinations.

Histological assessment
Post-NAT surgical specimens underwent local histopathological assessment, and the longest diameter of the residual tumor was documented. No histopathological evidence of residual invasive tumor cells, either in the breast or the axillary lymph nodes (ypT0/is ypN0), was denoted as pCR.

Statistical analyses
Accuracy of post-NAT US and MRI was analyzed for prediction of residual tumor size of ≥10 mm (the gold standard). The concordance between imaging and histological assessment was summarized by computing Spearman's correlation coefficient, graphically displaying the difference in tumor diameter (mm) between imaging and histological assessment, as well as comparing the positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity in each tumor subtype. PPV was defined as the proportion of correctly predicted tumors measuring > 10 mm on post-NAT imaging compared to the final pathology specimen. NPV was defined as the proportion of correctly predicted tumors measuring ≤10 mm by imaging. In addition, we analyzed the value of US and MRI for the prediction of pCR by calculating PPV (probability that non-pCR was documented when no complete response was observed), NPV (probability that pCR was actually achieved following complete response on imaging), sensitivity, and specificity. Receiver operating characteristic (ROC) curves and ROC area under the curve (AUC) predicting a residual tumor from imaging were computed. Furthermore, patient characteristics at baseline were compared between the cohorts with tumor size overestimation or underestimation by imaging (difference > 10 mm for both) or with a ≤ 10 mm difference in tumor size between imaging and histological assessments.
Available data were analyzed and compared between the three groups: patients who underwent MRI (MRI group), patients who underwent US (US group), and patients who underwent both MRI and US (MRI and US group).
Statistical data analyses were performed with SAS software (version 9.4, SAS Institute, NC) and Stata (version 16.1/ SE, StataCorp LLC, TX). Graphs, including the quadratic curves, were plotted using the GraphPad Prism 8 software (GraphPad Software, CA).
In the US group, estimated tumor size differed from histology by ≤10 mm in 69.2% of patients (n = 407/ 588, Table 2). US overestimated and underestimated   Table 1 and Supplementary Figure 1).

Prediction of residual tumor size > 10 mm by MRI and US
US yielded a higher specificity than MRI overall (0.72 vs 0.61) and in HR+/HER2+ tumors (0.75 vs 0.44) while both methods were comparable in HR−/HER2− and HR −/HER2+ BC (Table 3). MRI demonstrated a higher sensitivity and NPV than US among all tumors (sensitivity 0.89 vs 0.74; NPV 0.92 vs 0.83) and in HR+/HER2+    Figure 2). Both MRI and US underestimated and overestimated histological tumor size (Fig. 3,  Supplementary Figure 3). There was a tendency towards underestimating tumor size in larger tumors which appeared to be more pronounced for US than for MRI.

Discussion
The assessment of tumor response to NAT is of importance when planning surgery. In this study, we investigated post-NAT MRI and US for the prediction of pCR and analyzed the accuracy of these methods in the determination of residual tumor size. We found that MRI more often correctly predicted pCR in HR−/HER2+ followed by HR+/HER2+ than in HR−/HER2− tumors. Conversely, US more frequently correctly predicted pCR in HR−/HER2− tumors than in HR+/HER2+ and HR −/HER2+ BC, thus corroborating previously published results [15]. This suggests that MRI may less reliably identify pCR in HR−/HER2− tumors and that assessment of tumor response to NAT should rather be performed by US in this BC subtype. In contrast, Gampenrieder and colleagues reported that MRI correctly predicted pCR more often in HR−/HER2− and HR−/HER2+ tumors than in HR+/HER2+ BC [16]. Moreover, Scheel et al. found no impact of BC subtype on the prediction of pCR by MRI in the ACRIN 6657/I-SPY trial [17]. In our study, residual disease was correctly predicted by US in 77% of HR+/HER2+ and 75% of HR−/HER2− tumors; however, this approach identified only 55% of the cases of non-pCR in HR−/HER2+ BC. In contrast, MRI consistently displayed a high accuracy for the prediction of residual tumor presence across all BC subtypes analyzed (81-86%) with the highest value obtained in HR−/HER2+ tumors. This indicates that MRI misses fewer cases of non-pCR than US and appears to be a method of choice particularly in HR−/HER2+ BC. Nevertheless, other studies demonstrated a variable accuracy of MRI and US for residual disease prediction. For example, both US and MRI were shown to be more accurate for non-pCR prediction in HR+ than in HR− tumors [18]. Furthermore, Gampenrieder et al. found that MRI correctly predicted non-pCR much less frequently in HR−/HER2+ than in HR+/HER2+ and HR −/HER2− tumors [16]. However, the number of patients with HR−/HER2+ tumors was low in that study and in our analysis, which could impact the relative differences in predictive values between this and other BC subtypes.
The evidence regarding the optimal imaging method (MRI versus US) for the prediction of residual tumor size after NAT is conflicting. Studies investigating the correlation between tumor size by imaging and by histology have reported discrepant results with some showing a higher accuracy for MRI vs US [4,8,19] and others demonstrating a similar performance of both methods [6,18,20]. In our study, the correlation between imaging measurement and final pathology size was similar in the analysis of all tumors. However, we found that US measurements correlated with residual tumor size more closely than MRI in HR+/HER2+ tumors, whereas better correlation coefficients were obtained with MRI than with US in HR−/HER2+ and particularly in HR−/HER2− BC. Our results thus corroborate available evidence suggesting that the correlation between MRI-measured and histological tumor size is highest in HR−/HER2− tumors [21,22]. However, in a study by Scheel et al., the correlation between tumor size estimated by MRI and final pathology measurement was not affected by BC subtype [17]. Previously, MRI was shown to overestimate and US to underestimate residual tumor size [6] while in other analyses, both methods demonstrated a similar degree of overestimation [23]. According to the NPV values obtained in our study, MRI more often than US correctly predicted the presence of residual tumors measuring 0-10 mm in HR+/ HER2+ and HR−/HER2+ tumors. Conversely, US was superior to MRI in terms of correctly estimating the tumor size in lesions measuring > 10 mm in these BC subtypes (as demonstrated by PPV values). Therefore, our results imply that MRI confers a lower risk of underestimating while US is less likely to overestimate residual tumor size in HR+/HER2+ and HR−/HER2+ BC. Compared to our study, Vriens et al. reported slightly higher risks of underestimation and lower probability of overestimation of residual tumor size by MRI and US. They reported that MRI and US were less likely to underestimate lesion size in HR− than in HR+ tumors measuring 0-10 mm (with results favoring US over MRI in this BC subtype, [18]). Conversely, the size of HR+ tumors in  HER2+ tumors (d, h). Quadratic curve was fitted using the leastsquares method; shaded areas represent the 95% confidence interval for fitted curves that study was less often overestimated than in HR− BC, particularly by US. Evaluation of residual disease by MRI was previously shown to depend on tumor phenotype with a lower rate of underestimation in solid tumors positive for HER2, negative for HR, and triple-negative subtype compared to HR+/HER2− tumors that frequently present as non-mass/diffuse enhancement [24,25]. Moreover, the extent of the response to NAT was proposed to affect the rates of overestimation by MRI [24]. The presence of enhancing tissue could be misinterpreted as a residual disease, particularly in HR −/HER2− tumors with fibrosis and inflammationinduced during the response to neoadjuvant chemotherapy thus leading to size overestimation. Furthermore, treatment type could impact the imaging accuracy. For example, MRI may less accurately predict pCR after taxane-based therapy due to reduction of contrast enhancement [26]. This limitation should be taken into consideration in assessing the predictive value of MRI in HR−/HER2− tumors. Moreover, given the heterogeneity of NAT regimens in our combined analysis of three substudies, the findings attributed to tumor subtypes may in fact be at least partly attributable to the therapy administered. The NPV values obtained here suggest that MRI and US may underestimate the size of tumors measuring 0-10 mm in 8% and 17% of cases, respectively. However, the PPV values indicate that the risk of tumor size overestimation by MRI and US is far greater and may affect 48% and 39% of cases, respectively. The highest risk of tumor size overestimation was observed in HR−/HER2+ BC in which as much as 59% and 45% of tumors may be smaller in pathologic examination than on MRI and US, respectively. Inaccurate estimation of residual tumor size has implications for the success of the surgery. On the one hand, overestimation may result in excessive resection leading to poor cosmetic outcomes or even the decision for mastectomy instead of breast-conserving surgery. On the other hand, underestimation may lead to excision with tumor-positive margins which may require additional surgery or have a negative impact on long-term outcomes.
Our study has some limitations. First, although the US was mandatory for tumor evaluation, not all patients could be included in this analysis due to missing data. Moreover, both MRI and US were only performed in 174/662 patients which could influence the relative value of these techniques for the prediction of pCR and residual tumor size. Furthermore, although the central reading radiologists were blinded to the US results, the study protocol did not prespecify that the gynecologists should have been blinded to MRI results. Additionally, the quality of MRI image interpretation was ensured by specialized central radiologists, however, the US was performed and interpreted by site gynecologists. Lastly, our study did not provide an insight into the impact of accuracy in the prediction of residual tumor size on successful breast conservation and unnecessary mastectomy rates.

Conclusions
Our study demonstrated that US and MRI were similarly accurate in predicting the presence of residual tumor after NAT in HR+/HER2+ and HR−/HER2− BC while MRI was more predictive in HR−/HER2+ tumors. The size of HR+/HER2+ and HR−/HER2+ tumors was less likely to be underestimated by MRI while US conferred a lower risk of overestimation in these BC subtypes. The risk of underestimating the size of HR−/HER2− tumors was similarly low for both MRI and US. However, both methods were prone to overestimate the size of each BC subtype, and particularly in HR−/HER2+ tumors. Our findings are clinically relevant for selecting the optimal imaging modality, interpretation of imaging results, and subsequent planning of the surgery in patients with incomplete imaging response to NAT. KS participated in the collection and assembly of the data and manuscript writing. CH participated in the collection and assembly of the data and manuscript writing. LU participated in the collection and assembly of the data and manuscript writing. AF participated in the collection and assembly of the data. DRD participated in the collection and assembly of the data and manuscript writing. RW participated in the conception and design of the study, data analysis and interpretation, and manuscript writing. RC participated in the data analysis and interpretation, manuscript writing, and collection and assembly of the data and provided study materials or patients. CE participated in the data analysis and interpretation and manuscript writing. JA participated in the data analysis and interpretation and manuscript writing. HN participated in the collection and assembly of the data and manuscript writing. AP participated in the data analysis and interpretation and manuscript writing. SK participated in the data analysis and interpretation and provided study materials or patients. EMG participated in the collection and assembly of the data and data analysis and interpretation. HF participated in the collection and assembly of the data and provided study materials or patients. MB participated in the collection and assembly of the data and manuscript writing. JP participated in the data analysis and interpretation and provided study materials or patients. RS participated in the collection and assembly of the data. BA participated in the collection and assembly of the data and manuscript writing. CKL participated in the data analysis and interpretation and manuscript writing. NH participated in the conception and design of the study, collection and assembly of data, data analysis and interpretation, and manuscript writing and provided study materials or patients. CKK participated in the conception and design of the study, collection and assembly of the data, data analysis and interpretation, and manuscript writing. UN participated in the conception and design of the study, collection and assembly of data, data analysis and interpretation, manuscript writing, and administrative support and provided study materials or patients. All authors read and approved the final manuscript.

Funding
The analysis of MRI data presented in this manuscript was funded by Bayer AG Germany. ADAPT HER2+/HR+ and WSG-ADAPT HER2+/HR− trials were financially supported by Hoffmann la Roche; the WSG-ADAPT TN trial was financially supported by Celgene and Teva. The industry sponsors of the ADAPT trials had no role in the trial design, data collection, analysis, data interpretation, writing, or decision to submit the manuscript. Open Access funding enabled and organized by Projekt DEAL.
Availability of data and materials Data used for this analysis are available upon reasonable request to the corresponding author.

Declarations
Ethics approval and consent to participate The ADAPT study was performed in accordance with the Declaration of Helsinki. The substudy protocols were approved by the Ethics Committee of the Medical Faculty of the University of Cologne, Germany (approval number: 11-283), and national authorities. Written informed consent was obtained from each patient prior to study participation.

Consent for publication
Not applicable.
Competing interests MG received honoraria from AstraZeneca and travel support from Daiichi Sanyko. OG has an ownership interest in WSG GmbH; received honoraria from Genomic Health, Roche, Celgene, Pfizer, Novartis, NanoString Technologies, and AstraZeneca; served in consulting/advisory role for Celgene, Exact Sciences, Lilly, MSD Brazil, Novartis pharma SAS, Pfizer Pharmaceuticals Israel, and Roche; and received travel support from Roche. LU received honoraria, travel support and served in consulting/advisory role for Siemens Healthcare, Bayer Healthcare, and received research funding from Siemens Healthcare. RW served in consulting/advisory role and received travel support from Agendia, Amgen, Aristo, AstraZeneca, Boeringer Ingelheim, Carl Zeiss,