Microarrays in the 2010s: the contribution of microarray-based gene expression profiling to breast cancer classification, prognostication and prediction

Breast cancer comprises a collection of diseases with distinctive clinical, histopathological, and molecular features. Importantly, tumors with similar histological features may display disparate clinical behaviors. Gene expression profiling using microarray technologies has improved our understanding of breast cancer biology and has led to the development of a breast cancer molecular taxonomy and of multigene 'signatures' to predict outcome and response to systemic therapies. The use of these prognostic and predictive signatures in routine clinical decision-making remains controversial. Here, we review the clinical relevance of microarray-based profiling of breast cancer and discuss its impact on patient management.

molecular subtypes, often with identical histopathological features, do exist [11]. Moreover, numerous multigene signatures associated with prognosis and response to systemic therapies have emerged [1][2][3]. Some of these signatures are commercially available ( Table 1) and two of them (MammaPrint, Agendia BV, Amsterdam, Th e Netherlands, and Oncotype DX, Genomic Health, Redwood City, CA, USA) are currently being tested in randomized pros pective clinical trials [14,15]. Here, we discuss the potential clinical relevance of gene profi ling in breast cancer and its potential impact on patients' clinical care.

Molecular classifi cation of breast cancer
Th at breast cancer comprises a heterogeneous and complex group of tumors has been known for decades, and attempts to develop standardized classifi cation systems to account for the diversity of this disease were initiated in the late '60s [16]. Nevertheless, clinical and translational investigators had historically considered breast cancer to be a single group of tumors in the context of clinical trials. Th e observation that tumors that had similar histopathological characteristics behaved in distinct manners was often used to disregard the histological heterogeneity of breast cancer.
Th e whole landscape of breast cancer research changed with the publication of seminal, class discovery, microarray-based gene expression profi ling studies [11][12][13], in which the heterogeneity and complexity of breast cancers were rediscovered at the molecular level ( Figure 2). To the average 'microarrayer' and bioinfor matician, the experiments performed by Perou and colleagues [11] may now sound almost quaint, but in 2000 they had a major impact on how breast cancer was perceived given that they demonstrated that (a) ER-positive and ERnegative breast cancers were funda mentally distinct at the transcriptomic level and (b) breast cancer could be divided into at least fi ve molecular subtypes: luminal A, luminal B, normal breast-like, HER2, and basal-like [12,17] (Figure 2).
Several groups have now demonstrated that ERpositive and ER-negative breast cancers have their prognosis governed by distinct biological processes [18,19]

Estimation of the risk of recurrence (prognostic factors) and of the benefit from systemic treatments (predictive factors)
and that at least some of these subtypes (for example, basal-like) have distinct risk factors, clinical presentation, histological features, response to therapy, and outcome [2,3,20]. Th ese data have led some experts in the fi eld to suggest that traditional clinicopathological features and immunohistochemical markers be replaced by this molecular taxonomy [21]. Th e initial approach employed for the identifi cation of the molecular subtypes was based on hierarchical clustering analysis. It should be noted, however, that this approach requires large datasets, is to some extent subjective, and cannot be employed for the classifi cation of individual samples prospectively [22][23][24][25]. Th erefore, 'single sample predictors' (SSPs) were developed on the basis of the correlation between the expression profi le of a given sample with the centroids for each molecular subtype (that is, average expression profi le of each molecular subtype) [13,17,26]. Over the last decade, three distinct SSPs were developed [13,17,26]. Further more, on the basis of this approach, Parker and colleagues [17] developed a quantitative reverse transcriptase-poly merase chain reaction (qRT-PCR)-based or NanoString-based method (PAM50) that can be used to classify formalin-fi xed paraffi n-embedded (FFPE) samples into the molecular subtypes. Our group [27] and others [28,29] have demonstrated that subtle variations in data normalization and centering, as well as in the proportion of samples from each of the subtypes, may lead to changes in the classi fi cation of samples using SSPs. Moreover, independent groups have demonstrated that the classifi cation of tumors into the molecular subtypes, except for the basallike subtype, is dependent on the SSP used [27,28]. Th is is best exemplifi ed by the modest agreement in the classifi cation of samples (agreement of 64%, kappa score of 0.527, and 95% confi dence interval of 0.456 to 0.597) when a cohort of 295 breast cancers was classifi ed into the molecular subtypes by the authors of the original studies on the molecular classifi cation using SSPs by Sorlie's [13,30] and Perou's [26,31] groups.
Despite the enthusiasm for the use of this molecular taxonomy for the design of clinical trials and routine oncology practice, there are several issues that ought to be considered. First, the subdivision of luminal tumors into A and B is strongly dependent on the SSP used [27] and principally depends on the expression of prolifera tionrelated genes [17,26,32]; there is burgeoning evidence to  [11] carried out a cDNA microarray analysis of 38 invasive breast cancers, 1 ductal carcinoma in situ, 1 fi broadenoma and 3 normal breast samples, and a number of biological replicates of tumors from the same patients and defi ned an 'intrinsic gene' list (that is, genes that vary more between tumors from diff erent patients compared with samples from the same tumor/patient). Hierarchical clustering analysis using these 'intrinsic' genes led to the identifi cation of four subtypes: luminal, normal breast-like, human epidermal growth factor receptor 2 (HER2), and basal-like. In subsequent studies, it was demonstrated that similar molecular subtypes of breast cancer could be identifi ed in multiple cohorts and that luminal cancers could be subclassifi ed into two groups (luminal A and B) [12] or three groups (luminal A, B, and C) [13]. The estrogen receptor (ER)-positive branch of the dendrogram contains the luminal tumors, which express low-molecular weight cytokeratins 8/18, ER, and genes associated with an active ER pathway [2,3,[11][12][13]17,26,34]. Luminal A tumors (dark blue) present high levels of expression of ER-activated genes and low proliferation rates and are associated with an excellent prognosis, whereas luminal B breast cancers (light blue) are more often of higher histological grade and have higher proliferation rates and a worse prognosis [2,3,[11][12][13]17,26,34]. The ER-negative branch includes at least three subtypes: normal breast-like, HER2, and basal-like. HER2 tumors (purple) overexpress HER2 and genes associated with the HER2 amplicon on 17q12 (that is, GRB7) and/or the HER2 pathway [2,3,[11][12][13]17,26,34]. Basal-like tumors (red) express genes usually found in normal basal/myoepithelial cells of the breast, including high-molecular weight cytokeratins (5 and 17), caveolins 1 and 2, P-cadherin, nestin, CD44, and EGFR [20]. Morphological and immunohistochemical features of basal-like cancers are similar to those described for tumors arising in BRCA1 germ-line mutation carriers [20]. The HER2 and basal-like subgroups share an aggressive clinical behavior. Normal breast-like cancers (green) are still poorly characterized [3,22] and there is evidence to suggest that they may constitute an artefact of gene expression profi ling associated with a disproportionately high content of normal breast tissue [3,17,26,34].
demonstrate that the expression of proliferation-related genes in luminal cancers forms a continuum [3,19,33] and that the division of these tumors into two subgroups on the basis of the currently available SSPs [13,17,26] may be artifi cial. Th e subclassifi cation of ER-positive breast cancers into subtypes is not only a challenge for the 'intrinsic' subtype classifi cation. In fact, given that prolifera tion is a continuum in ER-positive cancers and that proliferation is a strong determinant of outcome in this group of tumors, the allocation of ER-positive breast cancer patients into good or poor prognosis by using other microarray-based methods (for example, MammaPrint and genomic grade index) or into low, intermediate, or high histological grade should be considered arbitrary to some extent (see 'Multigene prognostic signa tures' section). Second, normal breast-like cancers are now considered by some to be an invalid molecular subtype given that these tumors are likely to constitute an artefact of frozen tissue procurement and representation (that is, samples with a disproportionately high content of normal breast and stromal cells) [3,17,26,27,34,35]. Th ird, the HER2 (or HER2-enriched) subtype as defi ned by microarrays does not encompass all cases classifi ed as HER2positive by clinically validated methods (that is, immunohistochemistry and in situ hybridization with methods approved by the US Food and Drug Adminis tration), and not all HER2-positive cancers by clinical methods are classi fied as HER2 subtype by microarrays [3,17,21,36,37].
Th e above discrepancies do not invalidate the existence of the 'intrinsic' subtypes. As recently pointed out by Perou and colleagues [38], this is an evolving classifi cation system and PAM50 [17], rather than the SSPs described by Sorlie and colleagues [13] or Hu and colleagues [26], should be employed. With the development of the PAM50 assay, prospective testing of this classifi cation by independent groups will determine its prognostic and predictive power and clinical utility above and beyond the clinicopathological parameters currently available.
Th e putative histogenetic implications of the molecular subtypes (that is, luminal cancers would originate from luminal cells and basal-like cancers would stem from basal/ progenitor cells) [12,13,[39][40][41][42] have proven incorrect. Although this observation does not have a direct impact on the clinical utility of the 'intrinsic' molecular subtypes, it has led to the assumption that diff erent subtypes of breast cancer would originate from diff erent cell types [13,[39][40][41][42]. Importantly, there is independent direct evidence to demonstrate that the likeliest cell of origin of basal-like breast cancers lies in the luminal progenitor population rather than the 'basal' population of the normal breast [43,44].
Additional evidence to support the contention that the 'intrinsic' molecular taxonomy remains a working model in development stems from the recent identifi cation of at least three additional molecular subtypes of ER-negative cancers: the 'interferon-rich' subtype [26,45], the 'molecu lar apocrine' subtype [46][47][48], and the 'claudin-low' subgroup [35,49] (Figure 2). Th e 'interferon-rich' subtype, fi rst described by Hu and colleagues [26], is characterized by high expression of interferon-regulated genes, such as STAT1 [26,45]; the 'molecular apocrine' subtype, which is characterized by activation of androgen receptor signaling, frequently displays HER2 gene amplifi cation and may be associated with PTEN germline mutations [46][47][48]; and the 'claudin-low' subgroup, which comprises tumors that express low levels or lack of expression of E-cadherin and claudin mRNA, displays an enrichment for the expres sion of genes often expressed in the process of epithelial-to-mesenchymal transition and immune response genes and allegedly harbors features suggestive of a 'cancer stem cell-like' phenotype [35,49]. Intriguingly, greater than 40% of these cancers do express E-cadherin and claudins at the protein level, despite the low expression levels of these genes by microarray analysis [35]. Importantly, a substantial proportion of tumors classifi ed as of claudin-low subtype by using the cell linederived SSP described by Prat and colleagues [35] were previously classifi ed as normal breast-like by using other SSPs; these samples may have a disproportionately high content of stromal and normal breast cells. Hence, it remains to be determined whether breast cancers that do express E-cadherin and claudins at the protein level and that were classifi ed as claudin-low by the SSP predictor were not classifi ed as such due to stromal cell contamination. Another point for considera tion is the overlap between the transcriptomic charac teristics of the claudin-low subtype and those of spindle cell metaplastic breast carcinomas [49,50].
Given the above observations, but despite recent claims that PAM50 models derived from archival formalin-fi xed RNA are 'a potential replacement for grade-, hormone receptor-, Ki67-, and HER2-based prognostic models' [21], we argue that the microarray-based gene classification for breast cancer is not yet ready for clinical use in prognostic models or otherwise [1,3,8,27]. In fact, standardi zation of the defi nitions and the methodologies for the identifi cation of the molecular subtypes and pros pective clinical trials to validate the contribution of the 'intrinsic' subtypes in addition to the current clinicopathological parameters for the management of breast cancer patients are still required [1,3,8,27]. Robust, independently validated methods for the identifi cation of these subtypes are yet to be published.

First-generation signatures
Th e development of microarray-based multigene prognostic classifi ers (also known as 'gene signatures') has been pursued by many groups in the last decade [51][52][53][54][55][56][57][58] with the aim of defi ning which patients would have such a good prognosis that they could forgo chemotherapy. Th e fi rst prognostic gene signature [51] consisted of 70 genes and was shown to identify a group of goodprognosis patients with minimal risk of development of distant metastasis within 5 years in patients who were systemic therapy-naïve. In a subsequent study, van de Vijver and colleagues [59] demonstrated that the 70-gene signature was a predictor of outcome independently of the current clinicopathological prognostic markers in a dataset comprising 295 cases (64 cases from the analysis that led to the development of the 70-gene signature and 231 new cases). Importantly, in that [59] and subsequent [60,61] studies, it has been repeatedly demonstrated that the 70-gene signature classifi es greater than 95% of ERnegative cancers as poor prognosis and that there is a strong correlation between 70-gene signature-defi ned poor prognosis and high histological grade. Furthermore, the studies demonstrated that the 70-gene signature would outperform the current methods based on clinicopathological parameters for chemotherapy use [51,59]. Th is has led to the development of MammaPrint, a commercially available version of the 70-gene signature. Subsequent studies have led to the development of several other prognostic signatures, including the 76-gene signature [54,62] and genomic grade index [55,[63][64][65], which were also shown to be independent predictors of outcome. MammaPrint is currently being tested in the MINDACT (Microarray In Node-negative and 1-3 positive lymph-node Disease may Avoid ChemoTh erapy) trial [15] (Figure 3), which will deter mine whether this signature can actually replace clinico pathological parameters for the identifi cation of patients who could be spared from the use of chemotherapy. Table 1 summarizes the prognostic signatures more extensively studied to date. For comprehensive reviews on microarray-based prognostic gene signatures, readers are referred to Sotiriou and Pusztai [2], Weigelt and colleagues [3], and Kim and Paik [66].
In parallel with the development of microarray-based prognostic signatures, Paik and colleagues [52] developed Oncotype DX, a qRT-PCR-based analysis of 21 genes (16 cancer-related and 5 reference genes), which can be used for risk stratifi cation of ER-positive, node-negative breast cancers from patients treated with adjuvant tamoxifen. In contrast to microarray-based predictors, Oncotype DX can be applied to FFPE samples, and this test was developed and validated on the basis of a retrospective analysis of the existing material from two randomized clinical trials (NSABP-B-20 and NSABP-B- 14). Th e signature is based on the expression of genes that are associated with proliferation, ER signaling, HER2, and invasion [52]. Th e expression of these genes is presented as a recurrence score (RS) that ranges from 0 to 100. Th ese scores provide an estimate of 10-year distant recurrence risk. For clinical use, patients are separated into three categories: low-risk (RS <18), inter mediaterisk (RS ≥18 and <31), and high-risk (RS ≥31) [52]. Oncotype DX has been shown to be an independent prognostic factor for patients with ER-positive, nodenegative breast cancers treated with tamoxifen and to outperform standard clinicopathological parameters for the prediction of 10-year distant recurrence risk [52]. Oncotype DX has been subsequently evaluated in other populations of breast cancer [67] and shown to be an independent prognostic parameter in patients with ERpositive tumors with up to three positive nodes receiving adjuvant chemotherapy [68] and in postmenopausal patients with ER-positive tumors treated with aromatase inhibitors (that is, anastrozole) [69].
Oncotype DX RSs have also been shown to be corre lated with the benefi t patients derive from adjuvant chemotherapy in samples from clinical trials [70][71][72]. In fact, patients with tumors displaying high RSs despite the poor prognosis derive signifi cantly more benefi t from chemotherapy than those with low-RS tumors. In addition, patients with low-RS cancers appear to derive negligible benefi t from the addition of chemotherapy to tamoxifen [70,71]. Th erefore, Oncotype DX has also been considered a predictive marker of benefi t from chemotherapy.
Despite the numerous publications on fi rst-generation signatures, level II evidence to support the prognostic role was achieved only for Oncotype DX; for the remaining signatures, only level III evidence has been obtained so far. Given the level of evidence that has been accrued, Oncotype DX has received approval from the American Society of Clinical Oncology [73] and was included in the National Comprehensive Cancer Network guidelines (Breast Cancer version 1.2011 [74]) as an option to evaluate prognosis and as a complement to clinicopathological features to predict response to chemotherapy for patients with ER-positive, nodenegative breast cancer. None of the microarray-based prognostic signatures has been endorsed by these professional bodies.

Are the fi rst-generation prognostic gene signatures ready for use in clinical practice?
Although the diff erent fi rst-generation signatures described above provide relevant information for outcome prediction, they have yet to be incorporated into clinical practice [1,3,8]. Th e reasons for this apparent failure are multifactorial, and not a single fi rst-generation signature is currently supported by level I evidence for their prognostic power. Th is information will be available only after the completion of the two randomized trials, MINDACT [15] and TAILORx (Trial Assigning IndividuaLized Options for Treatment Rx) [14] (Figures 3 and 4), which evaluate the genomic signatures MammaPrint and Oncotype DX, respectively.
First-generation signatures have been shown not to be stable in terms of the list of genes they are composed of [75,76]; however, comparative studies and meta-analyses have demonstrated that, despite having a negligible overlap in their constituent genes, the fi rst-generation signatures tend to have similar performance and show a relatively good concordance in their prognostic classification, identifying similar but not identical subgroups of patients with poor prognosis [31,33,77].
Th e ability of these signatures to determine prognosis seems to be directly correlated to the assessment of proliferation-/cell cycle-related genes [18,33]. Th e fact that these fi rst-generation signatures arguably are mere surrogates of proliferation poses some important problems for their use. First, given that proliferation has been shown to be prognostic in ER-positive disease and not in ER-negative cancers, fi rst-generation signatures are applicable only for the prognostication of patients with ER-positive and HER2-negative breast cancers [18,54,60,61]. As the expression level of proliferationrelated genes in ER-positive cancers has been demonstrated to follow a continuum rather than a bimodal distribution, the subdivision of ER-positive cancers into good-prognosis (that is, luminal A) and poor-prognosis (that is, luminal B) groups is artifi cial [18,33]. In fact, the continuous nature of the Oncotype DX RS is more representative of the ranges of prognosis of patients with ER-positive disease. It should be noted, however, that this approach for clinical decision-making may be problematic. For instance, the prognostication and management of patients with an intermediate RS remain unclear, and Patients with concordant results are being treated accordingly (high-risk: chemotherapy with or without endocrine therapy, depending on estrogen receptor (ER) status; low-risk: hormonal therapy if ER-positive without chemotherapy). Discordant cases are being randomly assigned to receive adjuvant therapy on the basis of either clinicopathological or 70-gene signature risk assessment. Launched in 2006, the trial intends to confi rm the validity of the signature and demonstrate that its clinical use would reduce the number of patients receiving unnecessary treatments, but the results will probably take years to be revealed. Clinico-path, clinicopathological; N, lymph node; N0, lymph node-negative; RANDOM, randomization; TAM, tamoxifen; yrs, years.  [78]. Th erefore, the actual contribution of Oncotype DX to the management of this particular group of patients remains to be elucidated [78]. Th e lack of prognostic power of fi rst-generation prognostic signatures in ER-negative breast cancer and their association with proliferation in ER-positive breast cancer have brought to the forefront of cancer research the limitations of histological grading. In a way akin to fi rst-generation prognostic gene signatures, histological grade is not prognostic in ER-negative disease and is strongly associated with proliferation [18,79]. It should be noted, however, that the levels of intra-and interobserver agreement of histological grade remain suboptimal, despite the numerous eff orts to implement a standardized histological grading system [79]. It could be argued, on the basis of the above obser vations, that the major contribution of fi rst-generation prognostic gene signatures is to provide a standardized proliferation assay for breast cancer.
A second limitation of the fi rst-generation prognostic signatures stems from the fact that most of them were developed to predict short-term distant recurrence (<5 years) and were shown to have a strong 'time dependence' and a reduced prognostic value after 5 to 10 years of follow-up [61,80]. Hence, these signatures may represent merely early distant recurrence surrogates and are unable to predict late relapses with the same accuracy. Th us, there is still a need to develop signatures that could identify patients who have a higher risk of late relapse and who may benefi t from prolonged therapy.
Another important consideration in relation to the currently available fi rst-generation prognostic signatures is that they were derived on the basis of the analysis of tissue samples with varying contents of neoplastic cells, stromal cells, infl ammatory infi ltrate, and normal breast tissue. Th ere is evidence to suggest that the percentage of non-neoplastic cells has a substantial impact on the fi nal expression profi le of a tumor and on the ability to derive biologically meaningful prognostic signatures [81]. It should be noted that, although stromal cells and infl ammatory infi ltrate may be integral parts of the expression profi le of a tumor and provide important prognostic and predictive information, most studies employed samples with percentages of stromal cells, infl ammatory infi ltrate, and normal breast tissue ranging from 0% to 50%.  It remains to be determined whether repeated samples of the same tumor with drastically diff erent percentages of neoplastic cells (for example, 50% versus 100%) would be allocated to the same prognostic subgroup consistently. Th erefore, methods to estimate the non-neoplastic cell content of samples or tissue microdissection to standardize the proportion of neoplastic/non-neoplastic cells would be desirable in the development of new microarray-based classifi ers and implementation of currently available gene expression signatures.
Despite the initial claims that these signatures would replace current clinicopathological parameters for the management of patients with breast cancer, clinicopathological variables have been shown to add prognostic information independent from that off ered by fi rst-genera tion signatures [1][2][3]. Th erefore, these gene signa tures should be perceived as ancillary tools that complement current methods based on the clinicopatho logical features of the tumors rather than as a replace ment for them [1][2][3]. Importantly, the additional prog nostic information provided by fi rst-generation signa tures appears to be limited when clinicopathological parameters are analyzed in a centralized fashion with standardized methods (that is, centralized reassessment of histological grade and standard ized assessment of ER, PR, HER2, and proliferation rate as defi ned by Ki67 immunohisto chemical analysis) [82]. Th erefore, the true contribution of the commercially available fi rst-generation signatures beyond tumor morphology and immunohistochemistry remains to be determined [8].
Recently, 'second-generation' signatures specifi c for the distinct subtypes of breast cancers have been reported by studying breast cancer microenvironment or host immune response [1,[83][84][85][86][87]. Immune response-related signatures  25) were randomly assigned to receive either hormonal therapy alone or hormonal therapy plus chemotherapy. To minimize potential under-treatment in both the high-risk and the randomly assigned groups, the RS ranges for TAILORx were diff erent from those originally defi ned (11 to 25 instead of 18 to 31). FFPE, formalin-fi xed paraffi n-embedded; N0, lymph node-negative; RANDOM, randomization. have been shown to be potential prog nosticators in ERnegative or triple-negative breast cancers [83][84][85]. Although these signatures are promising, additional evidence in support of the use of these signatures as potential predictors of outcome is still required.

Multigene predictive signatures
Beyond prognostic classifi ers, an important challenge is to provide physicians with biomarkers that could predict the response or lack of response to treatments and determine the most eff ective regimen for a specifi c patient or subgroup of patients. In clinical practice, only ER and HER2 are currently used as predictive markers for the selection of patients likely to respond to endocrine therapy and trastuzumab, respectively. In addition to Oncotype DX, whose RSs have been shown to be associated with benefi t from the addition of chemotherapy to tamoxifen, other prognostic signatures were also shown to have predictive value for the incremental benefi t of chemotherapy [1][2][3]65,88,89]. However, unlike Oncotype DX, the predictive power of MammaPrint [88,89] and genomic grade index [65] have only been tested in retrospective datasets from patients treated with multidrug chemotherapy regimens.

Gene expression signatures and response to chemotherapy
With the clinical need for predictive markers for specifi c chemotherapy agents and multidrug regimens, several groups have developed multigene signatures specifi cally designed to predict response in patients receiving either chemotherapy or endocrine therapy. Using supervised approaches, several studies have attempted to identify multigene signatures of response to chemotherapy by comparing gene expression profi les between highsensitivity and low-responsiveness tumors [90][91][92][93]. Th e majority of the studies focused on neoadjuvant chemotherapy and, by means of microarrays or RT-PCR, analyzed tumor samples obtained from biopsies taken at diagnosis before initiation of chemotherapy (Table 2). Chemotherapy sensitivity usually was estimated with rate of pathological complete response to neoadjuvant therapy (pCR) as a surrogate of long-term benefi t from the treatment. For example, the MD Anderson Cancer Center group developed a 30-gene signature in 82 breast cancer patients receiving T/FAC chemotherapy (paclitaxel, fl uorouracil, doxorubicin, cyclophosphamide) [90,92]. Th is DLDA-30 predictor was then validated in 51 independent patients and predicted pCR probability with higher sensitivity and negative predictive value than clinical variables based on age, grade, and ER status [92]. Th e accuracy of this predictor was confi rmed in an indepen dent study [94]. Despite these interesting preliminary results, the accuracy of the 30-gene predictor was not found in a recent study in which it was not an independent predictor of pCR after multivariate analysis and did not perform better than clinical variables, questioning its potential utility in the clinical setting [95].
An alternative attempt to predict chemosensitivity to specifi c chemotherapy regimens was developed with the use of in vitro models [96]. Th e combination of in vitro signatures associated with drug sensitivity in cell lines was thought to provide composite signatures that could predict response to multidrug regimens and be translated to patients receiving multidrug chemotherapy [96]. Th ese 'regimen-specifi c' signatures tested in patients who, as participants in the European Organization for Research and Treatment of Cancer (EORTC) BIG00-01 clinical trial, received TET (docetaxel, epirubicin-docetaxel) or FEC (fl uorouracil, epirubicin, and cyclo phos phamide) chemotherapy resulted in a validation study published in 2007 [97]. Importantly, problems with the methodology of these studies have been identifi ed [98][99][100] and serious concerns about the validity of the published results were raised [101,102]. Subsequently, after a series of investigations, the fi ndings derived from in vitro studies were considered invalid, and this led to the discontinuation of the clinical trials based on these prediction models. Furthermore, several high-profi le publi cations have recently been retracted.
Another method to develop multigene classifi ers of chemosensitivity is based on the use of metagenes (that is, groups of coexpressed genes associated with a small number of biological processes). A retrospective microarray analysis of prospectively collected ER-negative breast cancer samples demonstrated that increased stromal gene expression predicted resistance to FEC chemotherapy [103]. Th is 'stromal' multigene classifi er was subsequently validated in two independent cohorts [103]. Further validation of this metagene is awaited.
Despite the promising initial results, the signatures of chemotherapy sensitivity have so far had limited use in clinical practice. Most of them have been developed in small, convenience cohorts and require further external validation. None of the diff erent predictors of chemosensitivity is commercially available, and additional evidence is still required before they can be implemented in clinical practice. For a detailed discussion of the reasons for the limited success of the predictive signatures available to date, readers are referred to a recent review by Borst and Wessels [102]. On the basis of the design employed in most of the studies, the predictive signatures for multidrug regimens are likely to capture the transcriptomic features of sensitivity/resistance to cytotoxic agents in general. Th ese mechanisms may constitute convergent phenotypes [104] (that is, there are multiple genetic/epigenetic aberrations that may lead to resistance to cytoxic agents). Th e next generation of signatures ought to focus on specifi c drugs within a given subtype of breast cancer, as the predictors of response to chemo therapy in ER-positive and ER-negative breast cancers appear to be fundamentally diff erent [19]. Furthermore, potential mechanisms of resistance to chemotherapy identifi ed by orthogonal methods (for example, RNA inter ference screens [105], microarraybased comparative genomic hybridization [106,107], proteomic analyses [108], and hypothesis-driven studies [109]) may be used as the basis for the development of multigene predictive signatures. With the availability of multiple microarray datasets from retrospective cohorts and clinical trials in the public domain, novel signatures derived from analyses using orthogonal methods can be tested in a timely fashion.

Predictive multigene markers of response to endocrine therapy
ER status has an important negative predictive value for evaluating the response to anti-estrogen therapy. Nevertheless, ER expression alone is not suffi cient to predict which ER-positive tumor will respond or be resistant to diff erent modalities of endocrine therapies. Microarraybased gene expression signatures to predict outcome of tamoxifen-treated patients have been developed (Table 3). For example, a 44-gene signature, identifi ed by Jansen and colleagues [110], compared gene expression profi les in patients with advanced ER-positive breast cancers treated by tamoxifen. Other hormone sensitivity tests studying estradiol-induced genes in MCF-7 cell line culture [111] or clusters of correlated genes [112] have also been reported.
More recently, the sensitivity to endocrine therapy (SET) index was developed in a large series of ER-positive breast cancers [113]. Th e SET index is based on the principle that expression of genes correlated with ER may better predict response to endocrine treat ment than ER expression alone [113]. Microarray analysis of a discovery set of ER-positive tumors led to the identifi cation of 165 genes coexpressed with ER; the SET index was devised and applied to a validation cohort to defi ne three categories of sensitivity (low, intermediate, and high). Association between SET and outcome was then analyzed in three types of ER-positive cohorts receiving either adjuvant tamoxifen for 5 years or neoadjuvant chemotherapy followed by endocrine therapy (tamoxifen or aromatase inhibition) or no adjuvant sys temic treatment. Th e SET index was signifi cantly asso ciated with the outcome of patients receiving any type of endocrine treatment (tamoxifen or chemo endo crine treatment) but had no prognostic value in untreated patients. Unlike other multigene signatures evaluating proliferation in ER-positive tumors, the SET index seems to be predictive of benefi t from endocrine therapy independently of the inherent prognosis of the tumor. Interestingly, for a potential clinical application, the SET index could identify a subset of tumors associated with an excellent prognosis and no relapse in the tamoxifentreated group (node-negative and high-SET index tumors) and in the chemoendocrine group (high-and intermediate-SET index). Studies evaluating the clinical relevance of the SET index are warranted to expand its indications in clinical practice.

Predictors for specifi c targeted therapies
To date, only a few gene signatures have been developed to predict the response to specifi c targeted therapies in breast cancer. Recently, Loi and colleagues [114] reported promising results focusing on PIK3CA (phospho inosi tide-3-kinase, catalytic) gene mutations and the PI3K-AKT-mTOR signaling pathway targeted by PI3K/mTOR (mammalian target of rapamycin) inhibi tors. By analysis of gene expression from 1,800 breast cancers, a gene expression signature associated with PIK3CA mutation was developed (PIK3CA-GS). Th e signature predicted PIK3CA mutations in two independent datasets of breast cancers and was shown to identify good-prognosis patients in the ER-positive, HER2-negative breast cancer subgroup even in the case of highly proliferative tumors. Th e PIK3CA-GS was nega tively correlated with mTORC1 signaling, making it a potential predictor of response to PI3K/mTOR inhibitors like rapamycin, rapamycin analogs, or dual kinase inhibitors. Breast cancer cell lines with high PIK3CA-GS were confi rmed to be resistant to rapamycin [114]. Th is approach exemplifi es the potential use of microarrays as potential predictive markers for tailored therapies.

Conclusions
Microarray-based gene expression profi ling analysis has undoubtedly had a dramatic impact on our understanding of breast cancer biology by bringing the concept of the heterogeneity of breast cancer to the forefront of breast cancer research and clinical practice. In fact, it is currently inconceivable to consider ER-positive and ERnegative breast cancers to be a single disease. However, how the information derived from the classifi cation of breast cancer into the current molecular subtypes [17] will be used for breast cancer patient management remains unclear. First-generation prognostic signatures have led to the realization of the importance of proliferation for the prognostication of patients with ERpositive cancers [1][2][3]. However, despite the resources allocated to their development and valida tion, prognostic signatures have proven to add limited information to prognostic models based on clinico patho logical parameters and standardized assessment of ER, PR, HER2, and proliferation. Gene signatures predictive of response to specifi c chemotherapy regimens have proven elusive.
With the development of massively parallel sequencing techno logies, it has become possible to determine the repertoire of genetic aberrations a tumor harbors in a single experi ment. Given the successful use of genetic information as predictive markers for the use of targeted therapies in breast cancer (for example, HER2 amplifi cation as a predictive marker for anti-HER2 agents) and tumors from other sites (for example, KIT and PDGFRA [platelet-derived growth factor receptor alpha] mutations as predictive markers of response to imatinib mesylate in gastrointestinal stromal tumors; EML4-ALK gene re arrange ments as predictive markers of ALK inhibitors in non-small cell lung cancer), it is plausible that the next generation of classifi ers based on sequencing information may have a greater impact on our ability to successfully stratify breast cancer patients into predictive subgroups [115]. Integrative approaches combining genetic, trans criptomic, and proteomic information are likely to lead to breast cancer classifi cation systems that better refl ect the biology of the disease, and are more clinically relevant [1]. Although the deluge of high-throughput data will most certainly be a formidable challenge for the breast cancer research community, our ability to characterize tumors at an unprecedented level of detail will undoubtedly lead to novel paradigms for stratifi ed medicine and tailored therapies. Abbreviations ALK, anaplastic lymphoma kinase; ER, estrogen receptor; FEC, fl uorouracil, epirubicin, and cyclophosphamide; FFPE, formalin-fi xed paraffi nembedded; HER2, human epidermal growth factor receptor 2; MINDACT, Microarray In Node-negative and 1-3 positive lymph-node Disease may Avoid ChemoTherapy; mTOR, mammalian target of rapamycin; NSABP, National Surgical Adjuvant Breast and Bowel Project; pCR, pathological complete response to neoadjuvant therapy; PIK3CA, phosphoinositide-3kinase (catalytic); PR, progesterone receptor; qRT-PCR, quantitative reverse transcriptase-polymerase chain reaction; RS, recurrence score; RT-PCR, reverse transcriptase-polymerase chain reaction; SET, sensitivity to endocrine therapy; SSP, single sample predictor.