Clustering of intrinsic breast cancer subtypes according to protein expression
In total 477 breast tissue samples were successfully analyzed (from a total of over 600 samples) in duplicate using 2D-DIGE, allowing the profiling of several thousand proteins and isoforms/PTMs (post-translational modifications) per sample and the elimination of those with protein degradation. The dataset is described in “Methods” and Additional file 1: Table S1. In parallel, 370 of these tumors were analyzed for gene expression using microarrays [17] and were assigned to the breast cancer subgroups defined by Sørlie/PAM50 or the Hu classifications [6]: the remainder had extensive mRNA degradation and could not be analyzed. Unsupervised analysis of the protein data revealed striking agreement with the gene expression subtyping of the tumors. In Fig. 1, all samples with known gene expression profiles are displayed as a Sammon map (Fig. 1a) and as hierarchal clustering (Fig. 1b). Luminal A and basal-like tumors were most clearly defined by separate clusters. HER2-enriched tumors and normal-like tumors had some degree of clustering, whereas luminal B tumors were more evenly spread across the clusters. Tumors carrying a BRCA1 mutation (indicated by open rings) fell mostly within the basal-like cluster, in agreement with gene expression results. Strikingly, the BRCA1-mutated tumors assigned to other subtypes at a transcriptional level, all fell very close to the basal-like cluster in the Sammon map.
Figure 2 shows hierarchical clustering of all tissue included in the 2D-DIGE study including those without a corresponding mRNA analysis. The associated clinical parameters, including tissue type, hereditary mutation status, ER and PgR status, tumor grade, and the histological type of cancer, are indicated below the clusters. Three major branches emerged, one enriched with tumors classified as luminal A and one enriched with basal-like tumors. The normal-like tumors also tended to form a cluster within the third branch and interestingly, the non-malignant samples clustered close to the normal-like tumors, an effect that has also been seen at transcriptional level, which initially named the subgroup [4]. Luminal B and HER2-enriched tumors were more spread across the clusters. Out of the 39 tumors carrying a BRCA1 mutation, 27 fell within the basal-like cluster, which is a clear overrepresentation (Fig 1b). Out of the 27 BRCA1-mutated tumors falling within the basal-like protein cluster, 7 had not been analyzed for gene expression and 3 of these tumors were assigned to other subtypes in the gene expression analysis (HER2 or luminal B). Also most non-primary tumors, i.e., secondary tumors or metastases, were associated with the basal-like cluster. Tumors carrying a BRCA2 or a BRCAx mutation (familial but not BRCA1 or BRCA2) were more evenly spread across the cluster as would be expected, as BRCA2 tumors usually are associated with the luminal B subtype, which we could not distinguish in this analysis.
The tumors that had not been analyzed for gene expression due to RNA degradation are labeled in gray and are distributed over the entire cluster (Fig. 2). The 2D-PAGE images allowed rapid identification of protein degradation due to the appearance of characteristic low molecular-weight spots corresponding to protein breakdown products. These samples were also eliminated from the analysis (n = approximately 130). Sample collection was well-controlled in this study and we therefore conclude that degradation of some samples is due to long storage times in the freezer. The stratification of these tumors could be indicated using the clinical information available for the tumors. ER and PgR status was strongly associated with the different clusters, with the majority of the ER-positive and PgR-positive tumors falling within the cluster dominated by the luminal A tumors and the ER-negative and PgR-negative tumors falling within the basal-like cluster. Also, the majority of the tumors in the basal-like cluster were of grade 3 and all medullary tumors in the dataset were associated with the basal-like cluster.
Separation of molecular subtypes
Pairwise Wilcoxon analysis of the subtypes defined at the gene expression level was performed for the set of tumors analyzed for protein expression. This further confirmed the large differences in protein expression for these tumors. Figure 3 shows the distribution of p values for all protein spots for each pairwise group comparison. There was a clear overrepresentation of proteins with p values below 0.001, indicating that significant subsets of proteins are differentially expressed between the groups. The separation was strong for all pairwise comparisons except for the comparison between the HER2 and luminal B subtypes. The highest number of spots with a p value below 0.001 was seen for comparison of the basal-like and luminal A subtypes, with almost 700 protein spots. In summary, unsupervised analysis of the breast tumor proteome demonstrates remarkable similarities with the molecular portraits of breast cancer subtypes and thus, these molecular changes are also reflected at the protein expression level even at the level of abundant proteins represented on 2D gels.
In-depth protein characterization of molecular subtypes of breast cancer
In order to characterize the tumor groups further and to build a library of breast tumor proteins accessible to mass spectrometric analysis for molecular subtype classification, we performed a comprehensive protein identification analysis. A subset of tumors representing the five molecular subtypes, luminal A, luminal B, HER2, basal-like and normal-like, were selected (Additional file 2: Table S2). The PAM50, Hu and Sørlie gene classifications of all tumors were in concordance. In total 4255 classes of protein representing over 14,000 proteins were identified, which to our knowledge is the largest set of proteins identified from breast cancer tumors (Additional file 3: Table S3).
Gene expression enrichment and pathway analysis differences in the protein subtypes
An initial overview of the in-depth dataset analyzed using DAVID shows that of the 2734 identifications accepted as Swiss-/UniProt accessions, 768 are epithelial, and 161 are annotated as cancer-associated gene products. The latter proteins are enzymes clustered around drug metabolism, followed by immune system response, DNA repair, cell signaling, and cytoskeletal remodeling. A more thorough investigation using MetaCore™ to compare pathway differences amongst the various intrinsic molecular types showed clear differences.
Both luminal subtypes had strong upregulation of chromatin remodeling including increased expression of BAF53 [SwissProt:O96019] and histone acetyl-transferases indicating nucleosome disassembly. Several proteins in the Toll signaling pathway are also upregulated, probably in response to increased interleukin levels. Several DNA repair proteins RAD50 [SwissProt:Q92878], UBE2V1 [SwissProt:Q13404], and UBE2N [SwissProt:P61088] are increased, while C-Jun [SwissProt:P05412] and MTA1 (metastasis associated protein) [SwissProt:Q13330] are downregulated, leading to a decrease in Bcl-XL-induced apoptosis. The luminal B subtype differs strongly from the luminal A subtype, in that many cytoskeletal remodeling pathways and several clathrins involved with the HER2 receptor and vesicle transport are highly upregulated.
In the basal subtype alpha3, beta1 integrin [SwissProt:P26006 and SwissProt:P05556] are downregulated, which leads to lowering of the activity of focal adhesion kinase (FAK) and hence, many mitogen-activated protein (MAP) kinases (MAP2K4, MAPK14 and MAP2K3 [SwissProt:P45985, SwissProt:Q16539 and SwissProt:P46734]). This is reinforced by downregulation of c-JUN, which lowers PKC, which normally activates FAK. This is supported by downregulation of IKK-beta [SwissProt:O14920], which stops activation of NCOA3, which activates FAK. Poly (ADP-ribose) polymerase 1 (PARP1) [SwissProt:P09874], a protein that in recent years has been extensively studied in breast and ovarian cancer, is upregulated in the basal-like cluster. PARP1 is involved in DNA repair, as is the BRCA1 protein. Targeted PARP1 inhibition has proven especially effective as a treatment for patients carrying a BRCA1 mutation, and in triple-negative breast cancer (because of the similarities with BRCA1-mutated tumors) leading to an inhibition of dual DNA repair pathways leading to cell death [42]. Gene expression of S100-A11 has specifically been reported as upregulated in basal-type breast cancers compared to non-basal types [43], which agrees with our findings and S100-A11 [SwissProt: P31949] has been proposed as a diagnostic marker for breast cancer [44].
The HER2 subtype is defined by upregulation of the Her2 receptor [SwissProt:P04626]. BAF53 [SwissProt:O96019], and BAF60 [SwissProt:P51532] (both part of the CREST-BRG1 complex) are upregulated and this complex may be required for the activation of transcriptional programs associated with oncogene and proto-oncogene mediated growth induction. This is accompanied by downregulation of E3 ubiquitin-protein ligase TRAF2 also known as tumor necrosis factor type 2 receptor-associated protein (TRAF2) [Q12933], tumor necrosis factor receptor type 1-associated DEATH domain protein (TRADD), which usually acts as a tumor suppressor [Q15628], and NF-kappa-B essential modulator (NEMO) [SwissProt:Q9Y6K9], which usually form a complex activating receptor interacting protein (RIP1).
The normal-like subtype has clear upregulation of metabolic pathways, with an increase in tricarboxylic acid (TCA) cycle enzymes, glucose transport, oxidative phosphorylation, and especially the associated nicotinamide adenine dinucleotide (NADH) dehydrogenase subunits. The normal-like subtype also has blood coagulation proteins such as thrombin [SwissProt:P00734], fibrinogen alpha [SwissProt:P02671], Serpine 2 [SwissProt:P07093] and plasmin. Previous studies using highly purified and washed ovarian epithelial cells showed that these proteins were not just due to contaminating blood but that these proteins are either strongly bound to the plasma membrane or have been internalized and can be found in both the gel and shotgun experiments [22].
Targeted protein assay profiling for tumor classification
Out of the comprehensive set of proteins identified, 256 proteins were chosen for use in an SRM-based subtype classification. The proteins were among the top 20 % most intense in the MS analysis and had the greatest discriminatory power in pairwise group comparisons. A specific MS-based assay, SRM, was established for these proteins, which after refinement, contained 190 proteins represented by 253 peptides (Additional file 4: Table S4): 24 tumors from the original dataset were analyzed in duplicate using the assay, together with 17 tumors from an independent dataset, giving a total of 41 tumors (Additional file 5: Table S5). Of the 41 tumors, 26 were assigned to the same molecular subtype by all the three RNA classifiers. We reasoned that this core set of tumors is representative of the five intrinsic subtypes, as the classification of these was consensual and we used this set for hierarchal clustering using multi-group ANOVA comparison with a p value cutoff of 0.01 (Fig. 4). The hierarchal clustering demonstrates remarkably strong similarities to the molecular classification of breast tumors at the gene expression level using the intrinsic gene set [4]. Tumors of luminal and basal origin separated in the first branching followed by further sub branching of tumors assigned to the basal-like, HER2 and normal-like subtypes. The majority of tumors in this analysis clustered with tumors of the same RNA subtype, although there was a slight overlap between the HER2-enriched and normal-like cluster. There were only two tumors classified as luminal B available in this analysis and both clustered with the basal-like tumor.
Discriminatory power of the SRM signature
Based on ANOVA, the best discriminators, twelve proteins, were tested using hierarchal clustering of the entire dataset of 41 tumors (Fig. 5). In this total dataset, in addition to tumors with precise classification according to the three gene expression signatures, there were tumors not analyzed at all on gene expression (labeled in gray for all three classifiers) and tumors where the three different classifiers assigned tumors to different subclasses or in some cases, could not classify them at all (labeled in gray for that specific classifier). This dataset also displayed very high similarities to clusters at gene expression level. Luminal A tumors were separated from the remaining tumors of basal origin. Of these tumors, the basal-like subtype formed a clear cluster and HER2 and normal-like tumors formed a separate branch with relatively high separation between these two groups also. Again, the few luminal B tumors were spread across the cluster and it is clear that for proper analysis of this subset more samples would have been needed. Information on the status of BRCA1 methylation, BRCA, ER, and PgR was available: five of the six tumors in this dataset carrying either BRCA1 methylation (n = 3 tumors) or a BRCA1 mutation (n = 3 tumors) fell within the basal-like cluster. The rest of the familial (BRCAx) tumors were spread across the cluster. Eight out of nine tumors in the luminal A cluster were ER-positive (dark blue) and seven out of nine were PgR-positive (dark purple). The normal-like tumors seem to divide into two separate clusters dominated by ER/PgR-positive or ER/PgR-negative tumors, respectively.
A support vector machine and a leave one out cross-validation approach was used to create ROC curves for all pairwise comparisons. All three different RNA classifiers were tested. Eight out of the ten pairwise comparisons provided good ROC areas with p values below 0.05. For four of the pairwise comparisons the PAM50 classification provided the best ROC areas with p values below 0.02; ROC areas: basal versus luminal A = 0.95, basal versus normal = 0.83 HER2 versus luminal A = 1, and HER2 versus normal = 0.89. For basal versus luminal B the ROC area was 0.86 and for luminal A versus normal the ROC area was 0.89, using the Hu classifier. Using the Sörlie classifier the best ROC areas were for the basal versus HER2 comparison with a ROC area of 0.75 and a ROC area of 0.84 for the luminal A versus luminal B comparison providing. ROC areas for all classifiers and pairwise comparisons are provided in Additional file 6: Figure S1.