Prognostic molecular markers in early breast cancer

A multitude of molecules involved in breast cancer biology have been studied as potential prognostic markers. In the present review we discuss the role of established molecular markers, as well as potential applications of emerging new technologies. Those molecules used routinely to make treatment decisions in patients with early-stage breast cancer include markers of proliferation (e.g. Ki-67), hormone receptors, and the human epidermal growth factor receptor 2. Tumor markers shown to have prognostic value but not used routinely include cyclin D1 and cyclin E, urokinase-like plasminogen activator/plasminogen activator inhibitor, and cathepsin D. The level of evidence for other molecular markers is lower, in part because most studies were retrospective and not adequately powered, making their findings unsuitable for choosing treatments for individual patients. Gene microarrays have been successfuly used to classify breast cancers into subtypes with specific gene expression profiles and to evaluate prognosis. RT-PCR has also been used to evaluate expression of multiple genes in archival tissue. Proteomics technologies are in development.


Introduction
Breast cancer is the most common malignancy in women, and it is highly curable if diagnosed at an early stage. Traditional prognostic factors include the axillary lymph node status, the tumor size, and the nuclear grade and histologic grade. Interest in novel prognostic markers is based on the fact that a significant number of patients with early-stage breast cancer harbor microscopic metastasis at the time of diagnosis. It is now well established that adjuvant systemic therapy improves survival in patients with early-stage breast cancer [1,2]. Treatment options for early-stage breast cancer include chemotherapy (e.g. anthracyclines, taxanes) and hormone therapy (e.g. tamoxifen, aromatase inhibitors). The use of trastuzumab is under investigation in the adjuvant setting for patients with human epidermal growth factor receptor (HER) 2 overexpressing breast cancer.
Systemic therapies are potentially toxic, however, and identifying the individual patients who are at high risk and likely to benefit remains a major challenge. For example, the risk of recurrence for a patient with negative axillary lymph nodes and a tumor measuring 1-2 cm is approximately 20-30%. Most patients in this group are currently offered adjuvant systemic therapy, although up to 70% of patients would not need it because they are already cured of their disease. Unfortunately, the histologic information is clearly not sufficient to accurately assess individual risk and to possibly avoid adjuvant systemic therapy. A large number of molecular markers have been studied to determine their ability to predict prognosis or response to therapy, or both (Table 1). Prognostic factors correlate with survival independent of systemic therapy, and are used to select patients at risk. Predictive factors correlate response to therapy independent of prognosis, and have a significant impact in selected patient populations. Some molecular markers are associated with prognosis, some are associated with response to therapy, and some are associated with both.
Although a large number of molecules have been investigated as potential prognostic and predictive factors, the National Institute of Health Consensus Development Conference held in 2000 stressed the need for validation and appropriate quality control for most of the markers studied to date [3]. The present article reviews the available data on established and investigational prognostic molecular markers in patients with early-stage breast cancer.

Proliferation markers
The tumor proliferation rate is an important prognostic factor in breast cancer. Several methods have been developed to estimate the proliferative rate of tumor cells. The S-phase fraction, as measured by flow cytometry, is a validated method for measuring tumor proliferation [4]. However, flow cytometry is not commonly used because of the amount of tissue consumed for the assay. Alternative methods for measuring tumor proliferation have been developed, including immunohistochemistry (IHC) to detect cell cycle-related antigens, that are better suited for the evaluation of small archival tissue samples.
Ki-67 is a nuclear antigen found in cells in the proliferative phases of the cell cycle (G1 phase, S phase, G2 phase, and M phase) but not cells in the resting phase (G0 phase). MIB-1 is a monoclonal antibody that identifies the Ki-67 protein in paraffin-embedded tissue. A strong correlation has been noted between the percentage of cells showing Ki-67 staining and the nuclear grade, age, and mitotic rate [5,6]. Patients whose tumors overexpress Ki-67 in more than 50% of the cells are at high risk of developing recurrent disease [7]. In addition, Ki-67 correlates with other well-characterized proliferation markers, such as the proliferating cell nuclear antigen [6]. Mitosin, a recently described 350-kDa nuclear phosphoprotein, is expressed in the late G1 phase, S phase, G2 phase, and M phase of the cell cycle, but not in the G0 phase [8]. Clark and colleagues [9] showed that mitosin is a proliferation marker that correlates with high S-phase fraction and negative estrogen receptor (ER)/progesterone receptor (PR) status. Although mitosin was not a predictor of survival in the study by Clark and colleagues, it was an independent predictor of recurrence. Additional studies are necessary to validate these findings.

Estrogen receptors and progesterone receptors
Estrogen mediates its functions through two specific intracellular receptors, the ERα and the ERβ, which act as hormone-dependent transcriptional regulators [10,11]. The ER pathway plays a critical role in the pathophysiology of human breast cancer. Overexpression of ERα is a well-established prognostic and predictive factor in breast cancer patients. The prognostic significance of ERβ is not well defined [12][13][14][15]. Overexpression of the PR serves as a functional assay because it indicates that the ER pathway is intact, even if the tumor is reported as ER-negative. When biochemical ligand-binding assays indicate concentrations of 10 fmol/mg cytosol protein or more, the tumors are generally considered ER-positive and PR-positive for clinical purposes.
The ER and PR status can be measured using IHC. The results of IHC correlate closely with biochemical ligandbinding assays and with clinical response rates to endocrine therapy [16]. Because IHC, unlike chemical assays, does not require the destruction of tissue specimens, and because it shows the tissue distribution of ER, it has become the preferred method for determining the ER/PR status in breast cancer specimens. Quantitative methods using computer-aided image analysis are being developed to improve the accuracy of IHC.
The value of ER status as an independent prognostic variable is diminished by its association with other established indicators of favorable prognosis. These include older age, low-grade histology, a favorable nuclear grade, a low S-phase fraction, a normal complement of DNA, a low proliferative index, and a low thymidinelabeling index [17]. In addition, ER-positive patients receive and benefit from either adjuvant or palliative hormone therapy so regularly that it is difficult to evaluate the prognosis apart from the influence of therapy.
In some studies, the higher disease-free survival (DFS) and overall survival rates of patients with ER-positive tumors are seen only in the presence of hormone therapy. Often the favorable effect of ER-positive status as a

Plasminogen activators and inhibitors
Angiogenesis-related proteins discriminant is lost after several years, suggesting that the influence of treatment is temporary [18,19]. When nodepositive patients not receiving adjuvant hormone therapy were studied, the 5-year DFS rate was 20% higher for ERpositive patients compared with that for ER-negative patients. However, the 5-year DFS rate of the most favorable subgroup (i.e. patients with one to three positive nodes and ER-positive tumors) did not exceed 60% [20].
Among node-negative patients, small but statistically significant differences in DFS and overall survival rates have been found between ER-positive cases and ERnegative cases after various periods of follow-up [21]. The results of a multivariate analysis of prognostic factors by McGuire and colleagues [22], including the ER status for more than 3000 patients, showed the ER status to be more important for prognosis than tumor size in nodenegative cases but not in node-positive cases. In one study, the ER status was found to be less important for predicting duration of DFS or overall survival than the nuclear grade and the number of positive nodes [23]. Allred and colleagues [24] showed that tamoxifen decreased the risk of local-regional recurrence in patients with ER-positive ductal carcinoma in situ.
The ER IHC assay is not standardized. Methods of tissue procurement, preservation, antigen retrieval, and, more importantly, the definition of positivity vary between different laboratories.
The prognostic role of ERβ is not well defined. Fuqua and colleagues [25] evaluated ERβ expression using IHC in 242 breast cancer patients. Their study showed that most tumors coexpressed both ERα and ERβ. Although ERα expression was positively correlated with low tumor grade, with diploidy, and with low S-phase fraction (all biological parameters of a good prognostic profile), ERβ trended toward an association only with aneuploidy. No association with tumor grade or S-phase fraction was seen for ERβ. Larger studies are needed to determine the clinical utility of ERβ expression in breast cancer.

Human epidermal growth factor receptor 2
The most frequently implicated receptors and growth factors in human breast cancer are members of the epidermal growth factor receptor subfamily of tyrosine kinase receptors. In addition to epidermal growth factor receptor, the type I subfamily includes HER-2, HER-3, and HER-4 [26,27]. These receptors share a common molecular architecture; they all possess a large glycosylated extracellular ligand-binding domain, a single hydrophobic transmembrane domain, and a cytoplasmic tyrosine kinase domain.
HER-2 (also known as c-erbB-2 or neu) is a proto-oncogene that encodes a 185-kDa tyrosine kinase glycoprotein. Allred and colleagues [35] evaluated HER-2 expression using IHC in 613 patients with node-negative breast cancer enrolled in the Intergroup Study 0011. In their study, patients were stratified into low-risk groups (n = 307) and high-risk groups (n = 306) on the basis of tumor size and ER status. Low-risk patients were defined as having small (<3 cm), ER-positive tumors and were observed without additional treatment after initial surgery. High-risk patients had either ER-negative tumors or large (≥3 cm), ER-positive tumors and were randomized to be observed (n = 146) or to receive adjuvant chemotherapy (n = 160) after surgery. In Allred and colleagues' study, HER-2 was overexpressed in 14.3% of all tumors combined, and overexpression was higher in invasive carcinomas associated with an extensive in situ component (21.5%) than in carcinomas without a significant noninvasive or in situ histologic component (11.2%; P < 0.0001). When patients with low-risk lesions not containing a significant in situ component (n = 179) were analyzed, HER-2 was a strong prognostic factor. Patients in this group with HER-2-positive tumors showed only 40% DFS at 5 years, compared with more than 80% in patients with HER-2-negative tumors (P < 0.0001).
HER-2 overexpression has been associated with improved response to doxorubicin-based chemotherapy [36-40]. HER-2 overexpression does not seem to predict response to taxane-based chemotherapy [41]. The association between HER-2 overexpression and response to hormonal therapy is controversial [40, 42,43]. Osborne and colleagues [44] reported an association between the ER coactivator AIB1 (SRC-3) and tamoxifen resistance, particularly in patients with HER-2-positive tumors treated with tamoxifen.
One of the main reasons for the clinical utility of the tissue measurement of HER-2 is for selection of patients with invasive breast cancer for trastuzumab monoclonal antibody therapy (Herceptin™; Genentech, San Francisco, CA, USA). The pivotal clinical trials of trastuzumab were conducted in patients with metastatic breast cancer overexpressing HER-2. The HER-2 status was determined by IHC using two monoclonal antibodies: CB-11 and 4D5. The HER-2 expression was scored as 0, 1+, 2+, and 3+, depending on the number of cells with membrane staining and on the intensity of the staining. If the tumor was 0 or 1+, it was considered HER-2-negative. If the tumor was 2+ or 3+ it was considered HER-2-positive, and patients received trastuzumab therapy. Retrospective studies showed that the response rate for trastuzumab therapy was higher among patients with an IHC score of 3+ expression compared with patients having a score of 2+. Good correlation was noted between a 3+ score using IHC and the presence of HER-2 gene amplification using fluorescence in situ hybrization. Recent data indicate that fluorescence in situ hybrization may be a better predictor of response to trastuzumab-based therapy [45,46]. Trastuzumab is currently approved for treating patients with metastatic breast cancer. Adjuvant trials are ongoing to determine the safety and efficacy of trastuzumab in patients with early-stage breast cancer.

Plasminogen activators and inhibitors
Tumor cell invasion and metastasis is a multifactorial process that at each step may require the action of proteolytic enzymes, such as collagenases, cathepsins, plasmin, or plasminogen activators. Some of these molecules have been associated with specific prognoses and are now discussed in more detail.
The urokinase-type plasminogen activator (uPA) is a serine protease that plays an important role in the invasion and metastasis process through degradation of the extracellular matrix. High levels of tissue uPA and its inhibitors (plasminogen activator inhibitor [PAI]-1, PAI-2) measured using ELISAs have been correlated with poor outcome in node-negative breast cancer patients [47][48][49][50][51].
Janicke and colleagues [52] conducted a prospective, randomized multicenter clinical trial of adjuvant therapy versus observation for patients with node-negative breast cancer. In their study, patients whose primary tumors had low tumor levels of uPA and PAI-1 (low risk) did not receive adjuvant systemic therapy. Patients with elevated tumor levels of uPA and/or of PAI-1 (high risk) were randomized to receive cyclophosphamide, methotrexate, and 5-fluorouracil adjuvant chemotherapy or to receive no treatment. The first interim analysis showed an estimated 3-year recurrence rate of 6.7% in the low-risk group and of 14.7% in the high-risk group (P = 0.006). The intent-totreat 3-year DFS rate for patients in the high-risk group assigned to chemotherapy or to observation was not statistically different. When the results were analyzed based on actual treatment delivered, however, the 3-year DFS rate for patients treated with adjuvant chemotherapy was 9% versus 19% for patients who did not receive chemotherapy (P = 0.016). The improvement in actuarial 3-year DFS was maintained at a median follow-up of 50 months [53].
Zemzoum and colleagues [54] showed that uPA/PAI-1 levels in primary tumor tissue are associated with an aggressive course of disease in lymph node-negative breast cancer, independent of HER-2 status. It has been suggested that patients with node-negative breast cancer and low levels of uPA and PAI-1 may be spared the trauma of adjuvant chemotherapy [55]. However, the ELISAs of uPA/PAI-1 require extracts of primary tumor tissue, and this is a major limitation for patients with small tumors.

Angiogenesis-related prognostic markers
It is now accepted that solid tumors must develop a vascular network to grow beyond 1 cm 3 , and they do so by stimulating the formation of new blood vessels (socalled angiogenesis). Angiogenesis is an active process, regulated by a large number of pro-angiogenic and antiangiogenic molecules ( Table 2). Interest in neovascularity as a prognostic factor was stimulated by the work of Folkman on tumor angiogenesis and by the potential for treatment with anti-angiogenic agents [56].
The prognostic relevance of tumor angiogenesis in breast cancer was first reported by Weidner and colleagues [57], who counted microvessels (veins and arteries) in the most densely vascularized areas of 49 invasive carcinomas and found their number and density significantly increased in Breast Cancer Research Vol 6 No 3 Esteva and Hortobagyi Table 2 Pro-angiogenic and anti-angiogenic proteins  [58] reported considerable variability in the microvessel count in different parts of the same tumor and between the readings of two evaluators. These investigators found no significant correlation between microvessel count and other tumor factors of prognostic value, and found no significant correlation between the microvessel count and DFS. Until these issues are resolved, microvessel count should not be used routinely for making treatment decisions in breast cancer patients.
Angiogenic growth factors have been identified that may have important prognostic utility. These include the vascular endothelial growth factor, the platelet-derived endothelial cell growth factor (also known as thymidine phosphorylase), and the fibroblast growth factor family [59].

Apoptosis-related prognostic markers
Programmed cell death, also know as apoptosis, is an endogenous cellular process whereby an external signal activates a metabolic pathway that results in cell death (Table 3) [60]. This form of cell death is commonly seen in breast cancer tissue. Apoptotic cells can be quantitated by light microscopy, and an apoptotic index can be calculated. However, the prognostic significance of the apoptotic index is not well defined. Wu and colleagues [61] reported a correlation between a low apoptotic index and decreased patient survival. However, other studies found no correlation between apoptotic index and prognosis [62][63][64].
Bcl-2 is a mitochondrial protein known to inhibit apoptosis triggered by chemotherapy and radiation therapy. Lower levels of apoptosis could lead to malignant cell accumulation and therefore to a more aggressive clinical course for the disease. Although Bcl-2 can block apoptosis in vitro, several studies have shown that Bcl-2 overexpression is associated with improved DFS rates [65]. This may be in part because of the close association between Bcl-2 expression and ER expression. Perhaps more important is the potential association between Bcl-2 expression and response to chemotherapy. Several studies have shown that patients with Bcl-2-negative breast cancer were more likely to respond to chemotherapy than patients with Bcl-2-positive tumors [66][67][68]. However, other studies found no association between Bcl-2 expression and the response to chemotherapy [69,70]. Further studies are needed to establish the role of Bcl-2 as a predictive factor of response to therapy.

Genomics
In addition to the markers already discussed, literally hundreds of other molecules have been evaluated as potential prognostic factors. Breast cancer is a complex heterogeneous disease, and therefore evaluation of a handful of genes and/or proteins provides only limited prognostic information. High-throughput gene expression profiling using microarray technology is a promising new technology that has been applied to the classification of breast cancers [71][72][73], to prognosis [74][75][76][77], and to prediction of response to treatment [78].
Using cDNA microarrays, Perou and colleagues [71] classified invasive breast carcinomas into five subtypes based on their distinct gene expression profile (Norway/ Stanford dataset). These included a luminal epithelial cell phenotype (subtypes A and B), a basal epithelial cell type phenotype, a HER-2 (+) phenotype, and a group of cancers expressing a 'normal-like' gene profile. Sorlie and colleagues [79] showed that patients whose tumors exhibited the basal-like and HER-2-positive subtypes had the worst survival rates, while the luminal epithelial type was associated with improved survival rates. Although initially the luminal subtype correlated with ER positivity, Sorlie and colleagues noted that the ER levels were not uniform among tumors classified as luminal or basal types.
van't Veer and colleagues [74] used a different microarray platform and identified a 'poor prognosis signature' that included 70 genes involved in regulation of the cell cycle, in invasion, in metastasis, and in angiogenesis. The 70gene prognostic profile was validated by the same investigators in 295 consecutive patients with primary breast cancer [75]. Among the 295 patients, 180 had a poor-prognosis signature and 115 had a good-prognosis signature; the mean overall 10-year survival rates were Available online http://breast-cancer-research.com/content/6/3/109  54.6% and 94.5%, respectively. At 10 years, the probability of remaining free of distant metastases was 50.6% in the group with a poor-prognosis signature and was 85.2% in the group with a good-prognosis signature. The estimated hazard ratio for distant metastases in the group with a poor-prognosis signature as compared with the group with the good-prognosis signature was 5.1 (95% confidence interval, 2.9-9.0; P < 0.001). This ratio remained significant when the groups were analyzed according to their lymph-node status. This prognostic signature had a strong independent value on multivariate analysis. Ongoing studies are validating these results in commercially available microarrays for potential clinical and diagnostic applications.
Sorlie and colleagues [80] reanalyzed their Norway/ Stanford dataset, including 84 tissue samples from their previously published work [71,79] and 38 additional tumor samples from patients with locally advanced breast cancer treated with preoperative chemotherapy. The first gene list and the list used for the reanalyzed report had approximately 200 genes in common, and tumors could be classified in the five main gene clusters as previously described. In addition, Sorlier and colleagues attempted to validate their findings in two independent datasets reported by Van't Veer and colleagues [75] and by West and colleagues [81]. Ninety-seven tumors from the van't Veer and colleagues' study could be classified into the five subtypes, and these different breast cancer types were associated with prognosis. Patients with the luminal-A subtype had the best survival rates, while the worst survival rates were associated with the basal and HER-2 subtypes. However, van't Veer and colleagues based their analysis on 461 genes (out of 24,480). The dataset from West and colleagues, generated on an Affymetrix platform, could also be classified into the previously described subtypes after selecting 242 genes out of a total of 7129 genes.
One of the main shortcomings of microarray technology is the lack of validation of gene sets across platforms. For example, when Sorlie and colleagues [80] tested the prognostic impact of the 231 markers published by van't Veer and colleagues on the Norwegian cohort, the positive predictive value for DFS was only 47%. This may in part be due to the different patient cohorts and treatments. In fact, the differences in outcomes across studies are based on the subset of genes that was analyzed in all the studies, and the number of genes held in common across studies is limited.
Clinical trials are evaluating the prognostic and predictive value of gene expression profiles in patients with earlystage breast cancer. Chang and colleagues [82] evaluated gene expression profiles in tumors from 24 patients undergoing neoadjuvant docetaxel chemotherapy. Core biopsies were obtained prior to initiation of chemotherapy, and cDNA analysis of RNA extracted from biopsy samples was completed using the HgU95-Av2 GeneChip (Affymetrix, Santa Clara, CA, USA). Differential patterns of expression of 92 genes correlated with docetaxel response (P = 0.001). Symmans and colleagues [83] showed that fine-needle aspiration yielded sufficient RNA for gene expression profiling. Since most breast cancer in patients is diagnosed at an early stage, the fine-needle aspiration approach may become acknowledged as the optimal way to obtain tissue for gene profiling. Pusztai and colleagues [78] extracted RNA from fine-needle aspiration specimens and identified a group of genes that predicted pathologic complete response to neoadjuvant chemotherapy.
Transcriptional profiling could until recently only be completed using fresh or frozen tissue, not using tissue from paraffin blocks. To overcome this limitation, several groups are developing methods to extract RNA from formalin-fixed, paraffin-embedded tissue for genomics studies. Ma and colleagues [84] microdissected breast cancer cells from paraffin-embedded tumors and measured expression on more than 20,000 genes in cancer cells using an Affymetrix platform. The authors were able to correlate gene expression signatures with prognosis. This is a step forward, and these findings should be validated in groups of patients treated homogeneously or not treated with adjuvant systemic therapy at all.
Several groups are evaluating the prognostic and predictive value of a multigene RT-PCR assay using paraffin-embedded tissue (Oncotype DX™; Genomic Health, Redwood City, CA, USA). Sixteen genes had significant prognostic value in three preliminary studies that included patients with early-stage breast cancer treated with adjuvant tamoxifen and/or chemotherapy [85][86][87]. Five genes were added as reference genes, and a recurrence score was developed. Paik and colleagues [85] showed that the multigene RT-PCR assay had a strong predictive value in patients with a history of nodenegative, ER-positive tumors treated with tamoxifen in the adjuvant setting. The 10-year distant recurrence rate was 6.8% for patients with a low recurrence score, was 14.3% for patients with an intermediate recurrence score, and was 30.5% for patients with a high recurrence score.
A smaller study conducted at MD Anderson Cancer Center showed no relationship between OncotypeDX's recurrence score and distant recurrence-free survival in patients with node-negative breast cancer who had not received any adjuvant systemic therapy [88]. Although there may be many explanations for this finding, it is also possible that the model is good at predicting response or lack of response to tamoxifen but has limited prognostic power. More studies are needed to establish the prognostic role of this assay in clinical management.

Proteomics
In the postgenome era, scientists have turned to proteomics to understand complex biological systems. Proteomics is defined as the identification, characterization, and quantification of all proteins involved in a particular tissue, organ, or organism to provide accurate and comprehensive data about that system.
One of the methods most commonly used to study differences in protein expression between two samples (e.g. cancer and normal tissue) is two-dimensional gel electrophoresis. Highly sensitive mass spectrometry methods are currently being used together to identify greater numbers of lower abundance proteins that are differentially expressed in defined cell populations. Matrixassisted laser desorption/ionization time-of-flight and surface-enhanced laser desorption/ionization time-of-flight analyses enable high-throughput characterization of lysates from even a very few tumor cells, and they may be best suited for clinical biomarker studies [89,90].
Novel technologies still in developmental phases will enable identification of validated targets in small biopsy specimens, including high-density protein, antibody, and lysate arrays [91,92]. No proteomics-based assay for assessing prognosis in breast cancer patients has yet been developed.

Conclusion
Prognostic and predictive molecular markers commonly used in clinical practice include Ki-67, ER, PR, and HER-2.
From the National Institute of Health overview it was clear that, once basic pathology had been excluded, there was very little else that had been appropriately validated and in which there was good quality control. This issue of quality control is one of the most important challenges for validation of most molecular markers discussed.
Prognostic indices that integrate clinical, histologic, and molecular parameters will need to be developed and validated in conjunction with novel bioinformatic methodologies (i.e. artificial intelligence) to aid clinical decisionmaking. High-throughput cDNA microarray technologies and tumor array technologies are allowing the expression of literally thousands of genes and proteins to be analyzed at one time. Validation of these technologies in adequately powered prospective clinical trials will allow the integration of multiple molecular factors in the risk assessment and management of individual patients with breast cancer.