Identification of typical medullary breast carcinoma as a genomic sub-group of basal-like carcinomas, a heterogeneous new molecular entity

Introduction Typical medullary breast carcinoma (MBC) has recently been recognized to be part of the basal-like carcinoma spectrum, a feature in agreement with the high rate of TP53 mutations previously reported in MBCs. The present study was therefore designed to identify phenotypic and genetic alterations that distinguish MBCs from basal-like carcinomas (BLC). Methods Expression levels of estrogen receptor (ER), progesterone receptor (PR), ERBB2, TP53, cytokeratins (KRTs) 5/6, 14, 8/18, epidermal growth factor receptor and KIT, as well as TP53 gene sequence and high-density array comparative genomic hybridization (CGH) profiles, were assessed and compared in a series of 33 MBCs and 26 BLCs. Results All tumors were negative for ER, PR and ERBB2. KRTs 5/6 were more frequently expressed in MBCs (94%) than in BLCs (56%) (p = 0.0004). TP53 mutations were disclosed in 20/26 MBCs (77%) and 20/24 BLCs (83%). Array CGH analysis showed that a higher number of gains (95 regions) and losses (34 regions) was observed in MBCs than in BLCs (36 regions of gain; 13 regions of losses). In addition, gains of 1q and 8q, and losses of X were found to be common to the two groups, whereas gains of 10p (53% of the cases), 9p (30.8% of the cases) and 16q (25.8% of the cases), and losses of 4p (34.8% of the cases), and amplicons of 1q, 8p, 10p and 12p were the genetic alterations found to characterize MBC. Conclusion Our study has revealed that MBCs are part of the basal-like group and share common genomic alterations with BLCs, the most frequent being 1q and 8q gains and X losses; however, MBCs are a distinct entity within the basal-like spectrum, characterized by a higher rate of KRT 5/6 expression, a higher rate of gains and losses than BLCs, recurrent 10p, 9p and 16q gains, 4p losses, and 1q, 8p, 10p and 12p amplicons. Our results thus contribute to a better understanding of the heterogeneity in basal-like breast tumors and provide potential diagnostic tools.


Annex 3: unsupervised clustering and statistical comparison of groups
All data were initially log2 transformed. The ratio of each BAC locus of the sex chromosomes was standardized so that the median of the BAC loci distribution among tumors was centered on 1. We then verified that the between-tumor correlation for the BAC loci of the X and Ychromosomes was not significantly different from that generated in silico for Gaussian noise (data not shown). We concluded that the BAC loci of the sex chromosomes could be removed from the analysis without impairing the conclusions of the study. The BAC loci ratios of the non-sex chromosomes were then standardized within-tumor by subtracting the median ratio of the tumor and dividing by the median absolute deviation of the same tumor. The gain and loss BAC loci ratio thresholds were recalculated as the interval of values of the ratios in all tumors that covered 95% of their distribution (-1.5 for losses and +1.5 for gains). The threshold for amplicons was the 99 th centile of the distribution of ratios (+5.25). Amplicon ratios were then removed before clustering.
For unsupervised clustering, the standard correlation between tumors was chosen as the criterion to define genomic-wide similarity among tumors; the Ward linkage method was used as the criterion to aggregate tumors into groups that maximize their pair-wise correlation [1].
Unsupervised clustering identified 4 groups of tumors (CL1, CL2, CL3-1, CL3-2). In order to assess the robustness of the 4 groups when the composition of genomic loci and tumor samples was modified, we removed one chromosome and one tumor at a time and recalculated the clustering dendrograms (1298 combinations using the same clustering parameters). At each time, we assessed the agreement between the 4 original groups CL1, CL2, CL3-1, CL3-2 and the high level groups S1, S2, S3 and S4 of the new dendrogram.
Agreement was tested via the KAPPA techniques (package IRR in R), by counting the number of tumors hold in each original group which were found in the recalculated clustering groups (independent on the ordering of the groups, see Table 1 Annex 3). When the KAPPA index was greater than or equal to 0.9, the two cluster dendrograms had very high degree of resemblance and we assumed that the removal of the specific chromosome and tumor did not affect the original clustering configuration. The removal of chromosome 3, 16, 19, 20, 21 and 22 did not affect clustering in combination with 50% to 60% of tumors (see Table 1 of Annex 3). The remaining chromosomes were instead critical to maintaining the original clustering configuration. Next, we removed chromosome 3, 16, 19, 20, 21 and 22 and performed a new clustering on the whole set of tumors. As expected, this latest dendrogram showed 4 high levels groups, which were mostly overlapping the original CL1 (black boxes), CL2 (blue boxes), CL3-1 (magenta boxes) and CL3-2 (orange boxes) clusters. In specific the corresponding CL1' and CL3-1' were significantly enriched in BLC and MBC (respectively 88% and 66%, p:0.01). Figure 1 of Annex 3 shows the overlap between the groups of the two clustering.
We then used two already published techniques, the MultiScale Bootstrap Resampling (Suzuki and Shimodaira, 2006) and the Consensus Clustering Resampling (Monti et al., 2003) as an alternative approach to estimate the stability of the clusters.
In order to test the Multiscale Boostrap Resampling methodology, we used the "pvclust" and "scaleboot" packages in R version 4.1.0 (Suzuki and Shimodaira, 2006). Following the authors' indications, we chose the number of clones of the experiment as the scaling dimension of the bootstrap algorithm; clones were scaled from half to one an a half the size of the original data set (i.e. 3264 clones). Tumors were not resampled. The Approximately Unbiased probability-values for each dendrogram of the original cluster were estimated using the "scaleboot" package (Shimodaira H, 2006a R package technical notes). The algorithm calculates the Akaike information criterion (AIC) of goodness of fit for a specific set of numeric approximation methods (Shimodaira H, 2006b). Then following the indications of the author, we used the third order accuracy probability-values which minimized the AIC mesure. We ran the algorithm with an increasing number of bootstrap resamples (respectively 1,000, 5,000, 10,000, 100,000 runs) to check the convergence of the procedure.
The Approximately Unbiased probability values of our four major clusters are shown in red on the right hand side of each dendrogram in fig 2 of Annex 3. Cluster CL1, CL3-1 and CL3-2, surrounded by a red box, had probability-values higher than the standard 95% confidence threshold (respectively 99%, 99%, 100%). Cluster CL2, surrounded by a blue box, was estimated at a lower 82% probability-value. This is consistent with our original conclusion regarding the robustness of the Basal-enriched and Medullar-enriched clusters (respectively CL1 and CL3-1, CL3-2) whereas CL2 represents a more heterogeneous group of tumors. to the different size of the clusters. Hence CL1 and CL3-1 stood out as the relatively more robust clusters even at lower rate of resampling.
Following, the standard Welch T-test was then used to determine the genomic regions that differentiated the clusters of tumors. Briefly, the normality of each BAC loci log-ratio was tested in each of the 4 groups using the Shapiro-Francia normality test (R-Normtest package).
The null hypothesis was then tested for each group and each BAC: Benjamini-Hochberg correction for multiple tests, over the total number of BAC loci [2]. The subset of contiguous BAC loci for which the mean ratio was significantly different from 1 in p-value and falling outside the fold threshold range for deletions and for gains was defined as "candidate" genomic regions of gain/loss in each tumor group. For the BAC loci contained in the "candidate" BAC loci regions, we further tested whether any subsets that could differentiate each cluster from the others by testing the null hypothesis: µ represents the mean ratio of the BAC locus b in clusters c and g. The two mean ratios were considered significantly different when the False Discovery Rate of the test was less than or equal to 0.05 after Benjamini-Hochberg correction for multiple tests over the total number of BACs. The subset of contiguous BAC loci whose mean ratios were significantly different in p-value and falling outside the fold threshold range in at least one of the two groups was considered to constitute "differential" genomic regions of gain/loss of each tumor group.