Amplification and high-level expression of heat shock protein 90 marks aggressive phenotypes of human epidermal growth factor receptor 2 negative breast cancer

Introduction Although human epidermal growth factor receptor 2 (HER2) positive or estrogen receptor (ER) positive breast cancers are treated with clinically validated anti-HER2 or anti-estrogen therapies, intrinsic and acquired resistance to these therapies appears in a substantial proportion of breast cancer patients and new therapies are needed. Identification of additional molecular factors, especially those characterized by aggressive behavior and poor prognosis, could prioritize interventional opportunities to improve the diagnosis and treatment of breast cancer. Methods We compiled a collection of 4,010 breast tumor gene expression data derived from 23 datasets that have been posted on the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database. We performed a genome-scale survival analysis using Cox-regression survival analyses, and validated using Kaplan-Meier Estimates survival and Cox Proportional-Hazards Regression survival analyses. We conducted a genome-scale analysis of chromosome alteration using 481 breast cancer samples obtained from The Cancer Genome Atlas (TCGA), from which combined expression and copy number data were available. We assessed the correlation between somatic copy number alterations and gene expression using analysis of variance (ANOVA). Results Increased expression of each of the heat shock protein (HSP) 90 isoforms, as well as HSP transcriptional factor 1 (HSF1), was correlated with poor prognosis in different subtypes of breast cancer. High-level expression of HSP90AA1 and HSP90AB1, two cytoplasmic HSP90 isoforms, was driven by chromosome coding region amplifications and were independent factors that led to death from breast cancer among patients with triple-negative (TNBC) and HER2-/ER+ subtypes, respectively. Furthermore, amplification of HSF1 was correlated with higher HSP90AA1 and HSP90AB1 mRNA expression among the breast cancer cells without amplifications of these two genes. A collection of HSP90AA1, HSP90AB1 and HSF1 amplifications defined a subpopulation of breast cancer with up-regulated HSP90 gene expression, and up-regulated HSP90 expression independently elevated the risk of recurrence of TNBC and poor prognosis of HER2-/ER+ breast cancer. Conclusions Up-regulated HSP90 mRNA expression represents a confluence of genomic vulnerability that renders HER2 negative breast cancers more aggressive, resulting in poor prognosis. Targeting breast cancer with up-regulated HSP90 may potentially improve the effectiveness of clinical intervention in this disease.


Introduction
Despite the progress that has been made in reducing mortality rates of breast cancer in the most recent time period, more than 40,000 breast cancer deaths occur in the United States annually [1]. Substantial progress in treatment requires identification of a specific set of actionable genomic abnormalities that drive or facilitate tumorigenesis, resistance to a given treatment and recurrence. Although significant amounts of gene expression profile analyses have been performed in breast cancers, assessing expression levels as the primary parameter to characterize breast cancers may be confounded by the phenotypic heterogeneity that arises as a consequence of abnormal signaling nodes and extensive biological cross-talk and redundancy. On the other hand, copy number aberrations in cancer cells can quantitatively affect gene function [2], and multiple copy number aberrations collectively regulate clinical phenotypes and cancer prognosis [3]. Analyses of chromosomal copy number aberrations (CNAs) have been proposed as a critical indicator of the possible location of aggressive cancer phenotype related genes [4,5]. Therefore, we undertook an integrative analysis of copy number and gene expression in a large population study to identify molecular factors abundant in breast cancer cells, especially in those characterized by aggressive behavior and poor prognosis, by which to prioritize interventional opportunities to transform breast cancer diagnosis, characterization, treatment and ultimately prevention.
Although a number of aberrant signaling pathways in breast cancer have been identified, heat shock protein 90 (HSP90), which is one of the most abundant proteins in mammalian cells [6], plays an important role in folding newly synthesized proteins or stabilizing and refolding denatured proteins after stress, and would influence a large number of signaling pathways. To date, more than 200 HSP90 clients have been identified, including key regulators in signal transduction and cell cycle control, steroid hormone receptors, and tyrosine and serine/ threonine kinases [7][8][9]. HSP90 exists as multiple isoforms that include HSP90AA1 (an inducible form) and HSP90AB1 (a constitutive form) in cytoplasm, HSP90B1 in endoplasmic reticulum and TRAP1 in mitochondria [10]. However, unlike HSP90AA1 and HSP90AB1, the client proteins selectively interacting with HSP90B1 or TRAP1 chaperones have yet to be defined.
HSP90 contains an N-domain ATP binding site and its ATPase activity is necessary for all of its cellular functions [11]. In vivo Hsp90 does not function alone but acts in concert with co-chaperones such as Sba1/ p23 and Cdc37 [8]. Interactions with co-chaperones are thought to be important to direct Hsp90 function for specific physiological processes such as regulation of cell cycle progression, apoptotic responses, or kinasemediated signaling cascades [10]. The protein is regulated both at the expression level and through posttranslational modifications such as phosphorylation, acetylation and methylation. These processes control its ATPase activity, and its ability to interact with its clients and co-chaperones, as well as its degradation [6,7]. In addition, HSP90 has a higher affinity for amino-terminal ligands in cancer cells, compared with the HSP90 in normal cells [12].
In breast cancer, HSP90 is required for the stabilization of many proteins in pathways that play key roles in cancer growth and survival, such as estrogen receptor (ER), progesterone receptor (PR), essential components of HER2 signaling (HER2, AKT, c-SRC, RAF and HIF-1α), and EGFR [9,13]. For example, HER2 is among the most sensitive client proteins of HSP90 [14,15], and HSP90 inhibition mediates degradation of HER2, as well as PI3K and AKT in HER2-overexpressing cancer cells [16]. Consequently, HSP90 inhibitors plus trastuzumab have significant anticancer activity in patients with HER2-positive, metastatic breast cancer previously progressing on trastuzumab [17]. Although a number of agents are in development for HER2+ and ER+ breast cancers, HSP90 inhibitors also represent therapeutic opportunities in other molecular subtypes. Triple negative breast cancer (TNBC) is defined by the clinical laboratory evaluation revealing a lack of expression of ER, PR and HER2 receptors, accounts for 10% to 20% of all breast cancer [18], and has a higher rate of distant recurrence and a poorer prognosis than other breast cancer subtypes [19,20]. Unfortunately, the lack of expression of a credentialed therapeutic target in this subtype of breast cancer limits the effective treatment options. Of interest, TNBCs often express increased EGFR protein, but in early clinical trials, response rates to EGFR inhibitors were minimal.
One potential therapeutic opportunity in tumor subtypes that do not have a known therapeutic target could include targeting Hsp90 function. Although Hsp90 protein expression was reported to be relatively low in TNBC compared to other subtypes, this early report only evaluated nine tumors [21]. More encouragingly, in pre-clinical models, TNBCs have been sensitive to Hsp90 inhibitors [22,23]. Similarly to HER2 positive tumors, TNBCs were sensitive to Hsp90 inhibition through down-regulation of components of the Ras/Raf/ MARK pathway in preclinical and in vitro studies [23]. Being a central integrator of multiple pathways, activation of HSP90 may maintain the malignant phenotype, facilitate metastasis, and promote treatment-resistance under the stress of cancer therapy in multiple breast cancer subtypes. It has been suggested that Hsp90 upregulation may be a sign of poor disease prognosis [24] and a recent study has demonstrated that co-expression of HSP90 and PI3K or expression of HSP90 in combination with the loss of PTEN were associated with significantly worse recurrence-free survival in patients with breast cancer [25]. However, adequately powered population studies correlating up-regulated HSP90 with prognosis in breast cancer patients have not been performed to date.
In this study, we exploited the availability of publicly available data and performed a genome scan for somatic copy number aberrations and gene expression profiling of primary breast tumors to address the general prognostic significance of gene amplification and high-level expression in breast cancer. We found that up-regulated HSP90 was one of the most significant poor prognosis factors in triple negative and HER2-/ER+ breast cancer subtypes. Our result suggested that targeting breast cancer with up-regulated HSP90 would potentially reduce the risk of lethal recurrence and distant metastasis.

Human breast tumor samples and data collection
A total of 4,010 breast cancer gene expression profiles were collected from 23 independent data sets (GSE22093, GSE17705, GSE11121, GSE12093, GSE7390,  GSE5327, GSE6532, GSE1456, GSE2034, GSE3494,  GSE26639, GSE20685, GSE23720, GSE21653,  GSE16446, GSE23177, GSE19615, GSE12276, GSE9195, GSE17907, GSE16391, GSE22035 and GSE5460) that were on NCBI Gene Expression Omnibus (GEO). Primary breast tumor samples were obtained before treatment and gene expression profiles were measured using Affymetrix U133A or U133 Plus 2.0 expression array. Each dataset selected for this study should have either clinical outcome data and/or HER2, ER or PR status determined by immunohistochemistry (Additional file 1). Patients' unique IDs were also collected from series matrix files (GEO) to ensure there is no redundant sample set. In addition, we successfully processed somatic copy number alterations (CNAs) of 481 breast invasive carcinoma samples that were measured using Affymetrix Genome-Wide Human SNP Array 6.0, of which gene expression profiles of the same set of primary tumor samples were also measured using Agilent Expression 244 K microarrays by The Cancer Genome Atlas Project (TCGA).

Processing of gene expression data
Raw Affymetrix expression CEL files from each dataset were RMA (Robust Multi-array Average) normalized independently using Expression Console Version 1.1 (Affymetrix). All data were filtered to include those probes on the HG-U133A platform. Assuming that the signal from the 69 Affymetrix control probes should be invariant, we found the structure in those probes by taking the first 15 principal components, and then removed the contribution of those patterns in the expression of genes using Bayesian Factor Regression Modeling (BFRM) [26]. A Principal Component Analysis (PCA) and Heatmap were used to confirm dataset normalization ( Figure 1 and Additional file 2). By this procedure, we generated a normalized gene expression dataset compiling 4,010 breast tumor samples.

Copy number analyses
Somatic copy number alterations (CNAs) of invasive breast cancer samples collected from 517 female patients were measured using Affymetrix Genome-Wide Human SNP Array 6.0. CEL files were available from TCGA. SNP array data from matched blood lymphocytes or matched normal tissue were also available for 494 patients. We generated a canonical genotype cluster using a data set of 799 Affymetrix Genome-Wide Human SNP 6.0 arrays that measured from normal blood lymphocytes obtained from TCGA. In total, 1,831,105 SNP and copy number markers were analyzed to construct canonical clustering positions and Log R ratio (LRR) and B allele frequency (BAF) from raw CEL files were calculated using PennCNV-Affy [27]. Matched normal samples were genotyped using Affymetrix genotyping console (version 4) and all samples were compared to ensure there was no duplication. All copy number markers and SNPs with genotype call rate higher than 90% were selected for tumor copy number analysis, and CNA calls were generated using genoCN software [28]. Genotype calls from normal tissues of the same individual were applied for genoCNA analysis, if they were available. Thirty-six samples that failed to obtain estimated parameters after 200 iterations of EM were removed from further study. All probe coordinates were mapped to the human genome assembly build 36 (hg18). In total, tumor copy number on chromosome 1-22 and chromosome X were successfully measured in 481 TCGA breast tumor samples, and normalized gene expression data from the same set of samples were downloaded from TCGA.

Statistics analyses
We downloaded the Affymetrix U133A annotation file (hg18) from Affymetrix and removed probe sets that do not have a matched gene symbol or whose probe set's alignment did not match with gene chromosome location (pseudogenes). Using all 4,010 samples, we defined the gene expression level at each probe set as low-level expression (bottom 10% low expression value), intermediate-level expression (middle 80% expression value) and high-level expression (top 10% high expression value), and compared survival differences among those three groups using Cox-regression survival analyses. Coefficiency was used to ensure if high-level expression was associated with poor prognosis and low-level expression was correlated with better outcome. A total of 11,761 known genes were analyzed. Statistical analyses were performed using R Project for Statistical Computing (Augasse, Austria), Matlab (Natick, MA, USA) or STATISTICA (Tulsa, OK, USA). Kaplan-Meier survival analyses on selected genes were conducted using GraphPad (La Jolla, CA, USA).
To measure the correlation between copy number aberration and gene expression, we generated copy number calls at 1,794,774 probes on chromosome 1-22 and chromosome X from all samples, including 857,551 SNPs and 937,223 CN markers. We determined copy number calls at each marker site as homozygous deletion (CN = 0), hemizygous deletion (CN = 1), normal copy number (CN = 2), low level amplification (CN = 3) and high level amplification (CN ≥4). We downloaded normalized expression data (level 2) from the TCGA database and analyzed the association between copy number and gene expression using analysis of variance (ANOVA). Associated region was defined as the region that should cover at least five consecutive SNPs or CN markers and should be longer than 10 kb. Direct correlation was defined as amplification associated with high-level expression and deletion was correlated with lowlevel expression.

Analysis of 4,010 breast cancer samples
To conduct a genome wide survey for poor prognosisassociated genes in breast cancer, we compiled a collection of breast tumor gene expression data (n = 4,010) derived from 23 datasets that were posted on the NCBI Gene Expression Omnibus (GEO, Table 1) and normalized by Bayesian Factor Regression Modeling (BFRM) to remove technical variation ( Figure 1A; Additional file 2) [26]. In addition to the raw expression data, we also obtained clinical outcome data from a subset of the samples (Additional file 1), which included data on overall survival (n = 1,027), recurrence-free survival (n = 1,372), and distant metastasis free survival (n = 2,187), as well as disease specific survival (event of death from breast cancer, n = 395).
As shown in Table 1, the majority of samples lacked the molecular analysis of HER2, ER and PR expression as measured by immunohistochemistry (IHC) or fluorescent in situ hybridization (FISH) analysis. Nevertheless, we found significant correlations between mRNA expression level and reported HER2, ER or PR status measured by IHC (P < 1 × 10 -8 , Mann-Whitney U test, Additional file 3), which was consistent with previous reports that ER, HER2 and PR biochemical status was concordant with Affymetrix microarray data [29,30]. By fitting two normal distributions of mRNA expression into IHC positive and negative groups, we identified a bimodal cutoff that represents maximum likelihood of IHC status, using samples where the biochemical status of HER2 (n = 1,004), ER (n = 2,771) and PR (n = 1,559) was available [29], and then applied this predictive cutoff to the entire set of 4,010 samples (Additional file 4). Clinical outcomes of gene expression defined subtypes were highly concordant with IHC subtypes (Additional file 4). When mRNA expression of HER2, ER and PR were applied together, the over-all accuracy for HER2+, triple-negative and HER2-/ER+ was 91.7%, 91.5%, and 89.6%, respectively, comparing with the biochemical defined breast cancer subtypes ( Figure 1).

Genome-scan of copy number aberration in 481 breast cancer samples
Chromosomal aberrations reflect oncogene activation and loss of tumor suppressor genes. Surveys of DNA gain or loss have been considered a fertile area to search for determinants of treatment response and disease outcome in human cancer cells. In breast cancer, it has been reported that 44% to 62% of highly amplified genes were over-expressed [31,32] and at least 12% of the total variation in gene expression was directly attributed to copy number aberrations [33]. TCGA data provide a unique opportunity to enable different and potentially complementary forms of analysis of cancer phenotypes given the comprehensive nature of the datasets generated in this effort. We were particularly interested in the opportunity to link genomic copy number alterations with the observed gene expression profile and clinical data as a strategy to identify genomic determinants of poor prognosis. We therefore performed a genome-scale analysis of chromosome alteration using 481 breast cancer samples obtained from the TCGA project, from which combined expression and copy number data were available. We revealed the distribution of copy number amplifications and deletions across the entire genome ( Figure 2). As expected, we observed that 23.7% of breast cancer samples had amplification (CN ≥3) on the HER2 coding region. Although copy number abnormalities on chromosome 1, 8, 11 and 16 are more common in studied populations (n = 481), we found that in most chromosome regions, both amplifications (CN ≥3) and deletions (CN ≤1) occurred in approximately 10% of analyzed samples (Figure 2).

Identification of genes that were correlated with risk of death from breast cancer
The large cohort of 4,010 gene expression samples provided an opportunity to define a subpopulation of patients containing either extremely high or low expression levels of candidate genes and to identify genes whose high-level expression is predominant in a poor prognosis stage compared to a better prognosis stage.
To determine poor prognosis-associated genes, we performed two stage analyses. In the first stage, we selected a universal cut-off and assigned each of the 4,010 samples into low, intermediate and high expression categories for each of 11,761 known genes. Then, we carried out an unbiased, genome wide Cox-regression survival analysis, comparing the prognosis difference among those three groups. By doing this, poor prognosis-associated genes should show a poor prognosis in the high expression group and a better outcome in the low expression group. In the second stage, we further assessed the poor prognosis correlation of the identified genes using gene-expression as a continuous variable and sought to correlate copy number aberrations with gene expression by measuring if amplification was correlated with high-level expression and deletion was associated with low-level expression.
Starting with the extreme, we defined the lowest 10% of expression values across the entire 4,010 samples as low-level expression and the highest 10% of expression values as high-level expression. Using death from breast cancer as the incident event, we carried out a genome wide Cox-regression survival analysis and identified 152 genes whose high-level expression was significantly associated with higher risk of death from breast cancer (P < 0.01, Figure 2 and Additional file 5). In addition, we assigned each of the 4,010 samples into first quartile (lowest 25%), second quartile (intermediate 50%) and third quartile (highest 25%) subgroups according to the expression levels of the 152 identified genes, and compared prognosis differences among these subgroups. Furthermore, we applied expression signal as a continuous variable to measure the distribution of the identified genes. A total of 47 of the 152 genes showed linear correlation between increased expression and poor prognosis. The highest risk of death from breast cancer was observed in patients with either top 10% or 25% higher level gene expression (P < 0.05, Additional file 5).
Since amplifications or deletions are likely to control the expression of genes within the corresponding region, and the correlation between copy number and expression has been recently suggested as an approach to predict the authentic molecular drivers in carcinogenesis [34], we then extended this analysis of gene expression to assess the correlation between somatic copy number alterations and gene expression using 481 invasive breast cancer samples obtained from TCGA. We found that 26 of 47 poor prognosis-associated genes showed a significant correlation between copy number aberrations and mRNA expression (P < 1 × 10 -8 , ANOVA, Additional file 5 and Additional file 6). To support this modeling, we analyzed the expression of HER2, a well known oncogene associated with poor prognosis based on increased copy number and high gene expression. As expected, high-level expression of HER2 was driven by coding region amplification and was significantly associated with poor prognosis (Additional file 5). Importantly, we found both cytoplasmic HSP90 isoforms, HSP90AA1 and HSP90AB1, were among the most significant factors that led to higher risk of death from breast cancer, indicating that HSP90 plays an important role in modulating poor prognosis phenotypes in breast cancer (Additional file 5).

Increasing expression of HSP90 was correlated with poor prognosis of breast cancer
To address the extent to which HSP90 is a prognostic factor in breast cancer, we analyzed the correlation between HSP90 expression and clinical disease outcomes, such as survival, recurrence, and metastasis, in different subtypes of breast cancer. Other HSP90 isoforms, such as HSP90B1 and TRAP1, may affect treatment responses in specific subtypes of breast cancer and this effect could be largely diluted in the analysis of a heterologous population. Therefore, HSP90B1 and TRAP1, as well as HSP transcriptional factor 1 (HSF1), were also included.
We assessed the correlation between mRNA expression and poor prognosis in different breast cancer subtypes using Cox-regression survival analysis and compared survival differences between high-level expression (top 10% or 25%) and low-level expression groups using Kaplan-Meier Estimated survival analysis. To elucidate if high-level expression of HSP90 isoforms were truly independent prognostic factors, we conducted Cox Proportional-Hazards Regression (COXPH) survival analyses to quantify the weight of the hazard ratios associated with high expression and their significance when considered alongside other clinical variables, such as size, grade, nodal status, age, HER2, ER and PR, in the whole cohort and in the relevant subtype of cancer. We found that high-level expression of HSP90AA1 independently led to higher risk of death from breast cancer in TNBC, while HSP90AB1 caused poor survival among patients with the HER2-/ER+ breast cancer subtype through increased risk of distant metastasis ( Table  2 and Additional file 7). High-level expression of HSP90AB1 was an independent factor affecting diseasespecific survival (death from breast cancer) and over-all survival of breast cancer ( Table 2). In addition to these findings, we found that a higher risk of recurrence in HER2+ and HER2-/ER+ breast cancer subtypes was significantly correlated with increased expression of HSP90AA1 and HSP90B1; and increasing expression of HSP90AA1 and HSP90AB1 were significantly associated with a higher chance of distant metastasis in patients with HER2-/ER+ tumor (Additional file 7).
Among patients with TNBC, higher expression of HSP90 isoforms (HSP90AA1, HSP90AB1, HSP90B1 and TRAP1) was correlated with higher risk of recurrence.
However, these significant interactions were not observed after adjusted multiple clinical availables. This might be affected by the fact that the entire set of clinical variables were only available in a small proportion of the samples. It also indicated that a single HSP90 isoform might only have a slight influence on disease outcome, such that when several interactions occur together, the combined effect becomes clinically significant. Nevertheless, high-level expression of HSF1 was an independent factor for recurrence in TNBC (Additional file 7).
Amplifications of HSP90AA1, HSP90AB1 and HSF1 collectively defined a subpopulation of breast cancer samples with up-regulated HSP90 gene expression We found a significant association between gene expression and copy number aberrations in HSP90AA1, HSP90AB1, TRAP1 and HSF1 (P < 1 × 10 -8 , ANOVA; Figure 2) and a trend for significant correlation in HSP90B1 (P < 1 × 10 -5 , ANOVA; Figure 2), indicating that high-level expression of HSP90 and HSF1 was driven by gene amplification. Although hemizygous deletion of HSP90 isoforms and HSF1 were found in 4.37% to 18.09% of breast cancer samples, homozygous deletion was uncommon. Only 1 of 481 (2%) breast cancer samples had two allele deletions on the TRAP1 coding region, and no patients carried a homozygous deletion of other HSP90 isoforms and HSF1, suggesting that loss of expression of HSP90 is a rare event in breast cancer.
On the other hand, we found that amplification of HSP90AA1 and HSP90AB1 was a predominant genomic feature of the highest 10% of HSP90AA1 (P = 0.0001, n = 481, Fisher's exact Test) and HSP90AB1 (P = 2.71 × 10 -6 , n = 481, Fisher's exact Test) expressing tumors. High-level amplification of HSF1 (CN ≥4) was significantly enriched in the samples with the highest 20% of HSF1 (P = 3.30 × 10 -10 , n = 481, Fisher's exact Test) expressing tumors. When samples with the highest 10% of HSP90AA1 and/or highest 10% of HSP90AB1 expressing tumors were combined with the highest 20% of HSF1 expressing tumors, this collective set of samples clearly captured the subpopulation of amplified HSP90 (P = 3.99 × 10 -25 , n = 481, Fisher's exact Test). Because high expression of HSP90AA1, HSP90AB1 and HSF1 was driven by amplification, and high-level amplification of HSF1 was associated with higher expression of HSP90 in un-amplified HSP90 samples, we defined upregulated HSP90 as a collection of samples with the top 10% high expression value of HSP90AA1 and/or HSP90AB1, and the top 20% higher expression of HSF1.
Using these definitions, up-regulated HSP90 accounted for 31% of the breast cancer population (Additional file 1) and up-regulated HSP90 was significantly correlated with higher expression of all HSP90 isoforms (P < 1 × 10 -8 , Mann-Whitney U test, Additional file 8).

Up-regulated HSP90 was independently correlated with poor prognosis in HER2 negative breast cancer subtypes
To investigate the correlation of up-regulated HSP90 and poor breast cancer prognosis, we performed a univariate Kaplan-Meier survival analysis and a multivariate Cox Proportional-Hazards Regression (COXPH) survival analysis using other poor clinical outcome-associated clinical cofactors, such as tumor size, grade, nodal status, age, HER2, ER and PRstatus, as co-variants. We found that up-regulated HSP90 was significantly associated with a higher risk of death from breast cancer (P = 0.0049, n = 395, Figure 3B) and poor overall survival in a subset of 1,027 patients in which overall survival . P values were calculated using log-rank Mantel-cox test. Tick marks indicate patients whose data were censored by the time of last followup.
data were available (P = 0.0034, log-rank Mantel-cox test, Figure 3C). This poor prognosis phenotype was independent of clinical cofactors (P = 0.0062, n = 421, COXPH test, Table 3 and Additional file 9). Furthermore, we found that up-regulated HSP90 was significantly associated with a higher risk of recurrence and distant metastasis in TNBC and breast cancer with the HER2-/ER+ phenotype (Additional file 10). Up-regulated HSP90 was an independent factor that led to higher risk of death from breast cancer in the HER2-/ER+ breast cancer subtype (P = 0.0042, n = 421, COXPH test, Table 3), with a trend of significantly higher risk of distant metastasis in this subtype (Table 3). Particularly, up-regulated HSP90 independently increased risk of recurrence in TNBC (P = 0.0101, n = 421, COXPH test, Table 3; Additional file 9), and more than 70% of TNBC patients with up-regulated HSP90 had disease recurrence within eight years after initial treatment (Additional file 10).

Discussion
The phenotypic heterogeneity of cancer arises as a consequence of numerous genetic abnormalities (such as somatic mutations and chromosomal aberrations) acquired during tumor development and results in the formation of a disease that is enormously complex and highly variable between patients. An ability to dissect this heterogeneity will facilitate a deeper understanding of the relevance of these alterations for disease phenotypes by which to develop rational therapeutic strategies that can be matched with the characteristics of the individual patient's tumor. In fact, this has already been achieved in some instances of breast cancer where HER2-positive tumors are treated with trastuzumab or lapatinib, and ER-positive tumors are treated with antihormonal therapy. To identify additional molecular characteristics for a more effective treatment of breast cancer, an approach to rapidly and efficiently leverage available breast cancer genomic data and correlate both genetic and clinical features and outcomes is urgently needed. Gene expression profiling has become a major tool for the study of breast cancer and substantial amounts of data are available from public databases. To date, microarray data from more than 6,000 primary breast cancer samples have been posted on the Gene Expression Omnibus (GEO) database. To capture the complexity of breast cancer heterogeneity and pinpoint molecular factors that can be therapeutically targeted, we compiled a large collection of breast tumor gene expression data (n = 4,010) derived from 23 datasets that were published from October 2005 to February 2011, including subsets of samples in which clinical prognosis data were available. We identified a series of genes whose high-level expression increased the risk of death from breast cancer, which may be exploited to improve the effectiveness of clinical intervention in this disease. We found that HSP90AA1 and HSP90AB1, two cytoplasmic HSP90 isoforms, were among the most significant factors of poor prognosis in different breast cancer subtypes. As one of the most abundant proteins in malignant cells and a key factor that stabilizes oncoproteins involved in cancer growth and survival, our results suggest that increased HSP90 expression may play an important role in promoting aggressive breast cancer phenotypes. Furthermore, we found that highly expressed HSP90AA1, HSP90AB1 and HSF1 were driven by somatic amplifications, which collectively were found in approximately 30% of tumors, which we classified as up-regulated HSP90. We revealed that up-regulated HSP90 was significantly associated with risk of death from breast cancer among patients with HER2-/ER+ breast cancer, and greatly increased the chance of disease recurrence in TNBC, and these interactions were independent of clinical variables.
Perhaps the most significant challenge presented by the complexity of breast cancer is the ability to design and develop therapeutic regimens that can match the characteristics of the individual patient's tumorto achieve the goal of personalized cancer treatment. In addition to the well credentialed or previously described genes HER2 and GRB7, we found additional factors associated with an increased risk of death from breast cancer, such as CUTL1 [35], CTTN [36] and GINS2 [37] that have been previously linked with poor prognosis of breast cancer. This reflects the nature of cancer heterogeneity in which multiple mutations and alterations generate the cancer phenotype. The development of therapeutic strategies that can completely and precisely match the complexity of breast cancer with equally complex combinations of regimens will be clinically challenging, particularly considering the need to utilize combinations of drugs that must be shown to be safe when combined together. A more practical approach would prioritize the more universal molecular factors associated with aggressive behavior and poor prognosis, upon which more general therapeutic regimens can be developed for use in combinations. Previous reports have indicated that high expression of HSP90, assessed by protein expression analysis, is associated with a poor overall prognosis in breast cancer patients [24]. High HSP90 expression was associated with high expression of HER2 and ER, large tumors, high nuclear grade, and lymph node involvement [9]. Our results demonstrated that up-regulation of multiple isoforms of HSP90 in primary breast cancer were independent poor prognosis factors, indicating that HSP90 targeted therapies in combination with cytotoxic chemotherapies or other targeted agents, may improve diagnosis and treatment of highly aggressive breast cancers. Because HSP90 is a key component of oncogenic signaling, an increasing number of candidate HSP90 inhibitors have been developed and evaluated, both in preclinical models and in clinical trials. Although HSP90 inhibitors have exhibited clinical activity in the treatment of breast and other cancers, targeting HSP90 alone generally results in cytostatic rather than cytotoxic effects on tumors. In the majority of patients, disease progression occurs following cessation of treatment with an HSP90 inhibitor [8]. Our results suggest that upregulated HSP90 might not be an independent poor prognosis factor among patients with HER2-positive breast cancer, as no statistically significant correlation was observed between poor survival and high-level expression of any HSP90 isoforms, which is consistent with the previous finding that the most common clinical response in patients with HER2-positive breast cancer who received HSP90 monotherapy is stable disease. In contrast, multiple studies using cell-based or various tumor xenograft models of breast cancer have shown a large degree of synergy by combining HSP90 inhibitors with therapies targeting HER2 (such as trastuzumab or lapatinib) [38,39]. Indeed, in animal xenograft models, tumors often do not immediately re-grow upon drug withdrawal, and often significant tumor regression can be observed [17]. In clinical trials, chronic administration of the majority of HSP90 inhibitors is well tolerated by humans, with manageable toxicity. At first glance this seems surprising given the essential role of the protein in numerous normal cellular processes; however, the apparent lack of toxicity of HSP90 inhibitors may be related to the recent realization that cancer cells are addicted to HSP90-a prime example of tumor cell nononcogene addiction [8]. This may provide a sufficiently large therapeutic window for the safe use of HSP90 inhibitors in cancer. Additionally, there is evidence that oncogenic clients can alter the conformation of HSP90. Several inhibitors of the protein have been developed that only recognize this activated conformation [40,41] suggesting an even greater therapeutic index.
TNBC has been considered a more aggressive breast cancer subtype with a higher rate of distant recurrence and a poorer prognosis [19,20]. We found that increased expression of each of the HSP90 isoforms was correlated with a higher risk of recurrence and more than 70% of patients with up-regulated HSP90 experienced disease recurrence within eight years after initial treatment, suggesting that TNBC patients might benefit from therapies that target multiple HSP90 isoforms, such as HSP90AA1, HSP90AB1 and TRAP1. In fact, in pre-clinical models, TNBC have been sensitive to Hsp90 inhibitors [22,23]. Similar to HER2 positive tumors, TNBCs were sensitive to Hsp90 inhibition through down-regulation of components of the Ras/Raf/MAPK pathway in preclinical and in vitro studies [23]. Furthermore, our results demonstrated that up-regulated HSP90 was also a significant prognostic factor in HER2-/ER+ breast cancers, suggesting a broad application of HSP90 targeted therapies in the 80% of breast cancers that do not overexpress HER2. In addition, other hormone receptors, such as androgen receptor, utilized HSP90, which provides a rationale for the use of HSP90 inhibitors and AR antagonist in the subset of AR+ breast cancers. Given the fact that HSP90 is one of the most abundant proteins in breast cancer cells, and HSP90 has been proposed as a potential therapeutic target for other cancers, including non-small cell lung cancer [42], our results indicate that HSP90 is an important oncogenic signaling node in breast cancer, whose high expression is associated with aggressive behavior and poor prognosis of breast cancer. Diagnostic and therapeutic strategies directed to cancer expressing high levels of HSP90 are warranted.