Study population and treatment
The design of the study is presented in Fig. 1. The study population comprised a training cohort, validation cohort, and an external validation cohort (TCGA-BRCA cohort).
Training and validation cohorts were collected in Fudan University Shanghai Cancer Center. A total of 441 patients who presented with clinical stage II–III breast cancer and who underwent NAC and PORT from 2010 to 2015 were retrospectively reviewed. The study was approved by the Institutional Review Board, and the reviewed data included clinical, histopathological, and imaging data.
The inclusion criteria of this study are presented in the supplement. Finally, 278 patients were enrolled and were randomly divided at a 1:1 ratio into training and validation cohorts.
The baseline clinical characteristics and pathologic data were derived from patients’ medical records and included age, laterality, cT stage, cN stage, pT stage, pN stage, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor-2 (HER-2) status, treatment response, chemotherapy regimen, and RT plan. Preoperative breast MRI scans were recorded and extracted from the picture archiving and communication system (PACS) in the Department of Radiology in Fudan University Shanghai Cancer Center. The TNM staging for each patient was reclassified according to the eighth edition of the Cancer Staging Manual of the American Joint Committee on Cancer (AJCC)/International Union Against Cancer. The pathological results were reviewed and confirmed by the central laboratory of the Department of Pathology in Fudan University Shanghai Cancer Center. The pCR of the primary tumor after NAC was defined as the eradication of all invasive diseases in the breast and regional lymph nodes.
The systemic treatment, surgery, and RT plan were consistent with the previously published data [21]. The taxane-based PC (paclitaxel and carboplatin) regimen was the most commonly used NAC regimen. Most patients underwent mastectomy and axillary lymph node dissection (ALND). PORT was performed after the completion of adjuvant chemotherapy. Regional nodal irradiation (RNI) was delivered to the ipsilateral supraclavicular region and the infraclavicular region + / − internal mammary lymph nodes (IMNs). The dissected axillary region was excluded from the irradiation fields. The prescription dose for most patients who underwent mastectomy was 50 Gy in 25 fractions. For patients who underwent lumpectomy, a boost of 10 Gy in five fractions was delivered to the lumpectomy cavity. IMNI was given according to the discretion of radiation oncologists, and the prescription dose was 50 Gy in 25 fractions. RT fields were confirmed by reviewing each RT plan.
The external validation cohort (TCGA-BRCA cohort) was collected from The Cancer Imaging Archive (TCIA). Breast cancer cases from TCIA TCGA-BRCA cohort without enhanced MRI or those with had no clear lesions on MRI were excluded. As a result, 91 cases were enrolled.
Endpoints and follow-up
The primary endpoints of this study were DFS (defined as the interval from the date of curative surgery to disease recurrence, secondary malignancy, death, or the last visit) and RFS (recurrence-free survival, defined as the interval from the date of curative surgery to disease recurrence, death, or the last visit).
Patients treated in our center were followed up every 3 months in the first 2 years, every 6 months in the next 3 years, and once a year after that. Physical examination, laboratory test, and ultrasound of the breast, lymph nodes, and abdominal organs were performed at every follow-up. Radiological imaging such as chest CT and mammogram was performed once a year. Breast MRI was performed when necessary. Follow-up data for the external validation cohorts were obtained from the TCIA.
MRI acquisition and segmentation
Patients from our center underwent breast MRI in the 4 weeks before NAC. A detailed description of MRI acquisition is presented in the supplement.
MRI imaging segmentation were performed before feature extraction. To obtain the good contrast between tumor and surrounding breast parenchyma, the T1-enhanced MRI digital imaging and communication in medicine (DICOM) images that have been archived in PACS were acquired. The target MRI image was then loaded into personal computer for segmentation. Two experienced radiologists reviewed MRI images respectively. They were blinded to prognosis but aware that these patients were eventually diagnosed as breast cancer. Region of interest (ROI) of tumor was manually segmented along the lesion in every slice by the first reviewer and then reviewed by the second reviewer. The intra- and inter-observer agreement of features extraction was assessed by ICC. We randomly selected 20 patients from our center and calculated ICC of all radiomics features. Inter- and intra-ICC greater than 0.75 represent good agreement between reviewers.
MRI-based texture analysis
Image preprocessing, tumor segmentation, and feature extraction were performed via 3D Slicer (version 4.11.0; http://www.slicer.org) and its extension “slicer radiomics” derived from Pyradiomics. A detailed description is presented in the supplement.
Feature selection and radiomics score building
A three-step feature selection procedure was applied to the training cohort to construct the radiomics score (RS). First, features with both inter-observer and intra-observer ICC higher than 0.75 were selected for further analysis. Second, a univariable analysis was performed using the Cox proportional hazards model. All features were ranked according to the p value. Only features with p value < 0.05 were kept for further analysis. With the remaining features, the least absolute shrinkage and selection operator (LASSO) method was applied to select features from training cohort. LASSO regression is an appropriate feature selecting method for high-dimensional data which has been applied in many radiomics studies.
A time-dependent receiver operator characteristic curve (ROC) was applied to evaluate the predictive accuracy of rad-score and calculate the best cutoff value.
Prognostic value of radiomics score
The potential association between radiomics score and DFS was assessed in the training cohort and validation cohort. Patients were divided into a high-score group (radiomics score higher than the cutoff value) and a low-score group (radiomics score lower than cutoff value) groups according to the cutoff rad-score value. Kaplan–Meier survival analysis was performed in both cohorts. The baseline characteristics between the two groups were compared with Chi-square test.
Univariate and multivariate Cox proportional hazards model was applied in this study. Factors with a p value < 0.1 in univariate analysis and clinically significant variants were included in multivariate analysis.
External validation of radiomics score
The feasibility of the radiomics score was further validated using the TCGA-BRCA cohort from the TCIA (external validation). The same radiomics score calculation procedure was applied, and the same cutoff value was used to divide the cohort into the low- and high-score groups.
Predictive value of radiomics score
We constructed three prediction models to assess the incremental prognostic value of the radiomics score. The TNM model was based on cTNM staging, and the clinical model was based on all clinicopathological data, including the TNM stage, ER status, and PR status and subtype. Eventually, the radiomics model was based on radiomics and clinicopathological data. The concordance index (C-index) and AUC were calculated for each model.
The incremental prognostic value of the radiomics score to TNM staging was calculated using integrated discrimination improvement (IDI) and net reclassification improvement (NRI). Decision curve analysis (DCA) was performed to analyze the clinical usefulness of the radiomics model, which incorporated radiomics score and clinicopathological data via quantitatively measuring the net benefit at different threshold probabilities.
We further explored the prognostic value of the radiomics score in the non-pCR subgroup and the HR + /HR − subgroup. Subgroup analysis was performed in the training and validation cohorts. Kaplan–Meier survival analysis and multivariate analysis were applied.
Development and validation of radiomics nomogram
A radiomics nomogram was constructed based on the radiomics model. The calibration curve and time-dependent ROC were applied to assess the calibration of the nomogram.
Gene set enrichment analysis of groups with different radiomics scores
To examine the hypothesis that radiomics may reflect tumor heterogeneity, and to identify the association between radiomics score and gene expression, we extracted and incorporated genomic data with the radiomics score in the TCGA-BRCA cohort. Patients from the TCGA-BRCA cohort with RNA-seq and microRNA-seq were enrolled (n = 64 cases). RNA-seq data were also downloaded from the TCGA. Patients were divided into three groups according to the radiomics score. Patients with high radiomics scores (top 1/3; 21 cases) were defined as the high-score group, patients with low radiomics scores (bottom 1/3; 21 cases) were defined as the low-score group, and patients who did not fall into either category were defined as the intermediate-score group (22 cases).
Gene set enrichment analysis (GSEA) was performed to identify enriched biological pathways associated with the high- and low-score groups. FDR < 0.1 and p < 0.05 were considered statistically significant. False discovery rate was calculated using the Benjamini and Hochberg procedure. The gene expression data of more than 100 normal breast tissues in the TCGA were set as the baseline.
Identification of differentially expressed genes (DEGs) and functional annotation
DEGs (mRNA, lncRNA, and miRNA) between the high- and low-score groups were identified using “lemma” and “edgeR.” An absolute log2-fold change (|FC|) > 1 and an adjusted p value < 0.05 were set as the cutoff criteria. We conducted Gene Ontology (GO) enrichment, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses based on the DEGs using a web tool Metascape (http://metascape.org/) and the “clusterProfiler” package. GO terms and KEGG pathways with adjusted p values < 0.05 were considered statistically significant and visualized.
Prognostic value of DEGs
The identified DEGs were examined for their prognostic value. Survival analysis was performed based on TCGA-BRCA data. Data were obtained from UCSC Xena (https://xenabrowser.net/). Clinical and RNA-seq data were derived from TCGA-BRCA and GSE118527. GSE118527 (FUSCC-TNBC cohort) was from our center and applied to further validate the prognostic value of the DEGs [22].
Identification of the association of radiomics score with the tumor microenvironment and immunophenotype
To test the hypothesis that the radiomics score could reflect the heterogeneity of the tumor microenvironment, the ESTIMATE method was used to infer tumor purity of the high- and low-score groups [23]. Furthermore, The TCGA TIL MAP was applied to assess the spatial distribution of tumor infiltration lymphocytes (TILs) in these patients. The TIL map data were downloaded from the TCIA (https://cancerimagingarchive.net/datascope/TCGA_TilMap) [24]. The CIBERSORTx web tool was applied to characterize the abundance of 22 immune cell types based on the RNA-seq data of the high- and low-score groups [25]. Only samples with p values < 0.05 were kept for analysis because of the high reliability of the inferred cell composition. The abundance of the 22 infiltrative immune cells was compared between the two groups using Chi-square test. Finally, to characterize the immunophenotype difference between the high- and low-score groups, the immunophenscore (IPS) was calculated for each case and compared between the two radiomics groups using the Kolmogorov–Smirnov test (https://tcia.at/tools/toolsMain) [26].
Association between computational histopathology and radiomics
As implied by GO analysis results, we applied an open-source software QuPath to characterize computational histopathological features [27]. The whole slide images (WSIs) of patients from different score groups were obtained from https://portal.gdc.cancer.gov/.
Computational histopathological features were extracted by QuPath to represent the levels of tumor cell differentiation and tumor morphology. The features extraction procedure was as follows: ROI on WSIs was cut into tiles with the width and height set as 100 \(\mathrm{\mu m}\); next, given the size difference between tumor cells and TILs within ROI, the QuPath automated cell detection function was applied to detect tumor cells. The cell detection strategy was similar to that outlined in a previous study. The detailed thresholds were as follows: detection image, hematoxylin OD; requested pixel size, 0.3 µm; background radius, 8 µm; median filter radius, 0 µm; sigma, 1.5 µm; minimum cell area, 24 µm2; maximum cell area, 1,000 µm2; and threshold, 0.1. The quality control of the automated cell detection was confirmed by a pathologist. A total of 85 features based on tumor cells and tumor tiles were calculated by QuPath and then aggregated across the case-level tiles by the min, median, max, 25-quantiles, and 75-quantiles of the values.
The correlation between radiomics features and computational histopathological features was assessed. Computational histopathological features were compared between the two groups (low vs intermediate, low vs high). FDR < 0.1 and p < 0.05 was consider statistically significant (FDR was calculated using Benjaminiand Hochberg procedure).
Statistical analysis
The differences in baseline characteristics were examined using two-sample t test, Pearson’s Chi-square test, and Fisher’s exact test as appropriate. The Kaplan–Meier method and log-rank test were used to estimate DFS. Multivariate analysis was conducted using the Cox proportional hazards model. The C-index was applied to measure the accuracy of the predictive model. The cell type distribution was assessed by Chi-square test, and the proportion of immune cells was assessed by t-test.
Statistical analysis was performed using R software (version 4.0.2, www.Rproject.org) and SPSS (Chicago, v20). A two-sided p value < 0.05 was considered to be statistically significant.