Recent translational research: microarray expression profiling of breast cancer – beyond classification and prognostic markers?

Genomic expression profiling has greatly improved our ability to subclassify human breast cancers according to shared molecular characteristics and clinical behavior. The logical next question is whether this technology will be similarly useful for identifying the dominant signaling pathways that drive tumor initiation and progression within each breast cancer subtype. A major challenge will be to integrate data generated from the experimental manipulation of model systems with expression profiles obtained from primary tumors. We highlight some recent progress and discuss several obstacles in the use of expression profiling to identify pathway signatures in human breast cancer.


Breast Cancer Research Vol 6 No 5 Wilson and Dering
Gene expression profiling has refined the classification of human breast cancers into distinct subtypes that can be recognized in separate patient cohorts even when different microarray platforms are used [1][2][3][4][5]. Although the estrogen receptor (ER) and the HER-2 gene (ERBB2) remain central classifiers, the contribution of cell type has emerged as a dominant feature in gene expression profiles that segregate primary human breast cancers (Fig. 1). The biological relevance of this classification scheme is validated by clinical observations. For example, ERnegative tumors expressing basal markers exhibit a poor clinical outcome whereas ER-positive, luminal cancers are associated with a favorable prognosis [2,[4][5][6].
A logical next step is to delineate the dominant signaling pathways that drive the pathogenesis of the different breast cancer subtypes. Will expression profiling of breast cancers help achieve this goal? Can this approach facilitate the identification of new drug targets and improve the efficacy of existing targeted therapies ? We believe the answer is yes, but we recognize that there are many significant challenges to be met. One of the most critical challenges, in our view, is the integration of expression data from primary human breast cancers with data obtained from the experimental manipulation of model systems.
The response of human breast cancer cells to estrogen (E 2 ) and anti-estrogens is thoroughly examined by gene expression profiling in two recent reports [7,8]. These new studies provide an opportunity to assess whether data generated in cell line models can be used to recognize the gene activity linked to important signaling pathways in primary tumors.
In the present commentary, we examine the feasibility of integrating microarray data generated from primary breast cancers with pathway-specific expression profiles generated experimentally. We critically explore several issues related to data quality, gene coverage and platform compatibility, as well as the confounding effect of cell type origin on the identification of the ER signaling pathway in gene expression profiles of human breast cancers.

How good are the data?
Specifically, are the data sufficiently quantitative to allow for the recognition of coordinated patterns of gene expression indicative of a particular signaling pathway? To determine what we might expect under the best circumstance, we examine selected genes whose expression should be particularly well coordinated in breast cancer cells.
ERBB2 is amplified and pathologically overexpressed in about 25-30% of breast cancers [9] along with the neighboring gene GRB7 [10]. The log ratios or intensity values have been downloaded for these two probes from each of four publicly available primary breast cancer microarray data sets [3][4][5]11]. High positive correlation coefficients for ERBB2 and GRB7 co-expression ranging from 0.633 to 0.910 (Table 1 and Fig. 2) were found in all four data sets. For each study, the corresponding graph in Fig. 2 provides a good indication of which tumors are amplified at the ERBB2 locus.
We also looked for co-expression of cytokeratins as another measure of data quality. Cytokeratins are abundant proteins that form the intermediate filaments of epithelial cells. The basic units of the fibers are heterodimers of one type I cytokeratin and one type II cytokeratin subfamily member [12], and distinct 'expression pairs' have been identified including KRT5/KRT14 and KRT8/ KRT18 [13]. Thus, these genes should show a high degree of co-expression. In every case, when the probes were present, the correlation coefficients were high for the co-expression of KRT8 with KRT18 and for the co-expression of KRT5 with KRT14 (Table 1). These correlation coefficients were in the similar range of high significance that was observed for genes co-amplified with ERBB2 (GRB7 and STARD3).
These and other examples confirm that microarray platforms have, in fact, generated high-quality gene expression data with a strong quantitative character for RNA isolated from human tumor samples. In general, the correlations and gene coverage were highest in the data of van't Veer and colleagues [11], which used a 60-mer oligonucleotide array platform representing approximately 25,000 genes. Also, these data had the fewest missing values. The reference in this study was a pool of RNA extracted from all 78 sporadic tumors. Interpreting the data intensity values in this case is simplified since it is intuitive to think of zero as the average expression of a particular gene in this breast cancer cohort [11]. We have focused on the van't Veer and colleagues data for tumor comparisons in the remaining discussions.

The ER signaling pathway is obvious in breast cancer expression profiles -or is it?
It has been often reported that the gene expression patterns associated with ER status in breast cancer are Cell-type origin model for the classification of human breast cancers. Illustration of the relationship between cell type and of the two main branches of the tumor subclassification schema. ER, estrogen receptor. remarkably distinct and that the set of ER classifiers is comprised of up to several hundred genes [3,11,[14][15][16][17]. We believe that many or even most of these 'ER predictors' primarily distinguish tumors according to celltype origin (i.e. those tumors with predominantly 'luminal features' from those tumors with predominantly 'basal features' [1]) rather than according to ER regulation. To make this argument, the expression ratios of estrogen receptor message (ESR1) and ERBB2 from the van't Veer and colleagues data [11] were used to divide the 78 sporadic tumors into five groups. The samples are arranged from highest to lowest ESR1 level in Fig. 3a, with the ERBB2 tumors grouped separately. The ERBB2 tumors were identified by positive values for both ERBB2 and GRB7 (Fig. 3b). A sixth group was defined based on BRCA1/BRCA2 mutation status (Fig. 3). Figure 3 highlights several important features of the data. The first is that ESR1 is expressed as a continuous variable whereas ERBB2 and GRB7 have essentially a binary expression pattern due to gene amplification. Second, none of the ERBB2 samples have above average (positive log ratios) values for ESR1. This is consistent with other larger data sets measuring ERBB2 and ER protein levels as continuous variables, where it has been suggested that low ER levels contribute to the reduced anti-estrogen sensitivity of ERBB2 amplified tumors [18]. Another feature is that the assignment of -0.5 as the cutoff for 'true' ER negativity results in 17/18 of the BRCA1 tumors being classified as ER-negative, as has been confirmed elsewhere [19]. Only the two BRCA2 ( Fig. 3a) samples and a single BRCA1 tumor appear to express any ESR1.
We also classified the tumors from the breast cancer data sets [3][4][5] based on the level of ESR1 alone and compared these results with the subgroups generated by the various clustering methods (see Additional file 1). All of the tumors defined as basal using clustering methods were in the lowest ESR1 category, and nearly all luminal A or luminal 1 tumors were found in the highest ESR1 groups. It is clear that the tumor groups defined by the extremes of ESR1 expression are the most easily recognized and consistently observed subtypes of breast cancer. Tumors with mid-range ESR1 expression (luminal B/C, luminal 2) will require further analysis with larger a If more than one probe was present for a gene, the one with the highest correlation and the fewest missing data values was used. b Correlation coefficients calculated from log 10 intensity ratios from van't Veer and colleagues [11]. The 78 sporadic tumors and 20 BRCA1/BRCA2 tumors were used to compute the correlations. c Correlation coefficients calculated from log 2 intensity ratios from Sørlie and colleagues [4]. d Correlation coefficients calculated from log 2 intensity ratios from Sotiriou and colleagues [5]. e Correlation coefficients calculated from intensity values from West and colleagues [3]. The R 2 value for the plot of ERBB2 and GRB7 in Fig. 2a is based on log 10 (intensity). f N/A, data for one or both the two genes not included in the publicly distributed data set. numbers of tumors to define the dominant molecular signals driving their pathogenesis.
As previously reported, the signatures of luminal cell types versus basal cell types are dominant in the expression profiles obtained from primary breast cancer [1][2][3][4][5]. One explanation for these strikingly different patterns is that the cell type in which the oncogenic transformation took place is fundamentally different in these two tumor groups (Fig. 1). The use of even a small number of markers illustrates this distinction. We show the co-expression of the prototypical luminal cytokeratins (KRT8/KRT18) and the basal cytokeratins (KRT5/KRT14) from the van't Veer and colleagues study [11] in Fig. 4.
The samples clearly show coordinated expression of KRT8 and KRT18 (Fig. 4a) The reverse is true for the expression of KRT5 and KRT14 (Fig. 4b), as well as for KRT5 and KRT17 (data not shown). It is the 'truly' ER-negative tumors that have high basal cytokeratin expression, and the four groups of luminal tumors cluster together in the negative region of the plot. There are hundreds of genes that divide these tumor samples into these two main groups (luminal and basal), and these genes are often a major component of the various ER discriminator gene sets [3,11,[14][15][16][17]. However, the role of these genes in ER signaling and their regulation by either E 2 or the ER remain unconfirmed.

Functional identification of E 2 -responsive genes
The ligand-dependent genomic action of the ER is relatively well understood and the in vitro analysis of E 2responsive genes in breast cancer cells has been actively pursued for many years [20]. Part of the Cunliffe and Available online http://breast-cancer-research.com/content/6/5/192

Figure 2
ERRB2 and GRB7 co-expression in microarray profiling data from primary breast cancers. The log ratios or log intensity values were downloaded for the ERBB2 and GRB7 probes from each of four publicly available microarray profiling data sets of primary breast cancers. (a) Log 10 ratios generated using 60-mer oligonucleotide arrays for the 98 node-negative tumors (78 sporadic tumors and 20 BRCA1/BRCA2 mutant tumors) versus a pooled reference of all 78 sporadic breast cancer RNA from the van't Veer and colleagues data set [11]. (b) Log 2 ratios generated using cDNA microarrays for 115 breast cancers and seven nonmalignant breast samples versus a universal reference RNA (mixed human cell lines) from the Sørlie and colleagues data set [4]. (c) Log 2 ratios generated from cDNA microarrays for 99 unselected breast cancers versus a universal reference (mixed human cell lines) from the Sotiriou and colleagues data set [5]. (d) Log 10 intensity values generated using Affymetrix oligonucleotide arrays for 49 breast cancers from the West and colleagues data set [3]. colleagues study [7] was to characterize the dynamic transcriptional response of two different breast cancer cell lines (MCF-7, T47D) to 17β-estradiol and anti-estrogens (ICI 182,780 and 4-hydroxy tamoxifen) using a custommade 10K cDNA array. This resulted in the identification of 386 hormone-responsive genes.
The study by Frasor and colleagues [8] was undertaken to better understand the transcriptional activities of selective ER modulators in breast cancer cells. They report the transcriptional changes induced in MCF-7 cells by E 2 , and classify the genes according to their response to the pure anti-estrogens (ICI 182,780) and trans-hydroxytamoxifen and raloxifene using the Affymetrix Hu95A array. Their analysis identified a highly focused E 2 -responsive gene signature of 129 genes.
We compared the MCF-7 expression data in these two recent studies in terms of the E 2 and ICI treatment responses ( Table 2). The overlap consists of a surprisingly  Cell-type-specific cytokeratin expression in breast cancer subgroups. The log ratios for the cytokeratin probes for the van't Veer and colleagues tumor samples [11] are colored by group as in Fig. 3.   The issue of cross-platform consistency has been recently explored in detail [21,22]. In one study, the correlation for gene expression data from breast cancer cell lines obtained using Affymetrix HG-U95v2 and a custom-made cDNA array was found to be in the range 0.66-0.76 [21]. The differences between platforms appear to result from errors in chip fabrication, ambiguities in gene annotation, specificity differences inherent in the hybridization of oligonucleotides versus cDNA clones and alternative methods for data filtering and normalization [21]. The authors found that the biological differences between the cell lines (e.g. BT-474 versus MCF-7) were more prominent than the variation between platforms. However, it appears probable that platform variability actually exceeds changes in expression induced by treating MCF-7 cells with E 2 . In this light, the failure to find a robust consensus gene signature for the E 2 signaling pathway is not surprising.
Despite the disappointing overlap, some of the genes identified in these two studies are still likely to be valid targets of ER signaling. In order to investigate the feasibility of integrating experimentally generated pathway responses with data from primary tumors, we focused on a few specific genes identified in each of the in vitro studies.
The expression values for each gene were plotted against the ESR1 level for the sporadic tumors (n = 78) in the data of van't Veer and colleagues (Fig. 5).
We compared LIV-1, AREG, TFF1 and PGR with the ESR1 expression levels in the van't Veer and colleagues data [11]. LIV-1 and AREG were identified as E 2 -induced and ICI-repressed in the Frasor and colleagues study [8] and TFF1 was found to be E 2 -induced and ICI-repressed in the Cunliffe and colleagues study [7]. PGR, the best known target of liganded ER, was not identified in either study. Each experimentally identified target gene showed a reasonably good correlation with ESR1 in the van't Veer and colleagues data (Table 1), and an interesting pattern is evident when plotted using the color-coded groups ( Fig. 5a-d). There is a trend for higher expression of these E 2 -stimulated genes in the samples that express more ESR1: moderate ESR1 samples > weak ESR1 samples. We also examined the expression of several of the genes identified by Frasor and colleagues to be downregulated by E 2 and upon which ICI acted as an antagonist. Two of these genes, TGFβ 2 and NRDG1, have negative correlations to ESR1 in the data of van't Veer and colleagues [11], consistent with being targets of repression by ER signaling (Table 1 and Fig. 5e,f).
The quantitative expression of ER has been shown to have clear clinical implications both in the adjuvant and the metastatic setting in terms of response to anti-estrogen therapy [23][24][25]. This is certainly consistent with the concentration-dependent occupancy of cis-regulatory sites being a fundamental aspect of DNA-binding transcription factors. It is interesting to note that about one-half of the tumors with the highest ESR1 expression do not show co-expression of PGR, AREG or TFF1 (Fig. 5b-d). The ER signaling pathway in these very highly ESR1 expressing samples may thus be fundamentally different. In a patient cohort very similar to this one (nodenegative patients, younger than 60 years old at diagnosis), where ER protein was measured quantitatively, a paradoxical reduction in overall survival was associated with very high ER levels [26]. The clinical outcome data in the van't Veer and colleagues study also exhibits a trend towards a worse prognosis in the highest ESR1 subgroup (see Additional file 2).

Conclusions
The E 2 response in breast cancer cells would seem to provide a straightforward opportunity for microarrays to Available online http://breast-cancer-research.com/content/6/5/192  Estrogen-modulated genes and their co-expression in estrogen receptor-positive primary breast cancers. The log ratios of the four experimentally confirmed, estrogen-induced genes (a) LIV-1, (b) TFF1, (c) PGR and (d) AREG were plotted against ESR1 from the van't Veer and colleagues data [11]. Data for genes shown to be repressed by estrogen treatment in MCF-7 cells [8] are also plotted against ESR1:  several factors in addition to the cross-platform comparison issues already discussed. One longstanding technical obstacle is the over-reliance on a single ERpositive breast cancer cell line (MCF-7) or on only a few ER-positive breast cancer cell lines (ZR-75-1, T47D) as experimental models. We have suggested that the true ER signature may be much smaller than originally proposed due to a failure to appreciate the close association between luminal cell differentiation and ER activity. The data in Figure 4 suggest that this association may be circumstantial. Alternatively, it is possible that breast cancers arise from a transforming event in multipotent progenitor cells and that acquired alterations (e.g. high autocrine ER stimulation or ERBB2 amplification) drive the differentiation of the malignant cells towards a luminal fate.
Global gene expression profiling has fundamentally changed the way we look at cancer cells by providing simultaneous measures of the activity of thousands of genes. We are optimistic that the quantitative ER pathway data generated by microarray experiments will better predict patient prognosis and clinical response to antiestrogen therapy. Although many significant challenges remain, these technologies will undoubtedly play an important role as we identify new therapeutic targets and improve the efficacy of current breast cancer treatments.

Competing interests
None declared.