Gene signatures of breast cancer progression and metastasis

Breast cancer is a heterogeneous disease. Patient outcome varies significantly, depending on prognostic features of patients and their tumors, including patient age, menopausal status, tumor size and histology, nodal status, and so on. Response to treatment also depends on a series of predictive factors, such as hormone receptor and HER2 status. Current treatment guidelines use these features to determine treatment. However, these guidelines are imperfect, and do not always predict response to treatment or survival. Evolving technologies are permitting increasingly large amounts of molecular data to be obtained from tumors, which may enable more personalized treatment decisions to be made. The challenge is to learn what information leads to improved prognostic accuracy and treatment outcome for individual patients.


Introduction
Breast cancer is a heterogeneous disease, including sub types based on hormone receptor status and amplifi cation of HER2 [1,2]. Th ese subtypes have distinct under lying molecular defects that aff ect both their aggressive ness and the signaling pathways that are vulnerable to targeted therapies [3,4]. While these designations are extremely useful, breast cancer also can exhibit signifi cant intratumoral heterogeneity, both between individual tumor cells and also between tumor and stromal compartments. For example, tumors classifi ed as hormone receptor positive may have diff erent proportions of estrogen receptor (ER) or progesterone receptor (PR) positive cells. Th us, there may exist within a tumor some cells that are more versus less responsive to a given treatment, or cells that are more likely than others to spread distantly. Contributing to this intratumoral heterogeneity is the concept of breast cancer stem cells, which may be more resistant to therapies and/or more likely to metastasize [5,6]. In addition, breast cancer is also 'temporally heterogeneous' , with cancers presenting at diff erent stages of their evolution. In general, cancers detected early in progression are less dangerous and more amenable to treatment than those detected later.
Characterizing the nature of an individual breast cancer, both in terms of type of breast cancer and stage of progression, is crucial for estimating prognosis of the patient and for the prediction that a given treatment will be successful. However, prognostic and predictive information is population-based. While useful, this information does not necessarily predict the fate of an individual with breast cancer. As a result, some women may be over-treated and others under-treated, or treated with therapy that will not off er benefi t. Th us, improved ways to 'individualize' prognosis and treatment decisions are needed [7].
As an attempt to meet this need for more 'personalized' information to guide treatment, additional ways are being studied to classify individual tumors, based on single biomarkers or more complex molecular signatures. Rapidly evolving technologies that enable detailed molecular profi ling of tumors are raising hopes that breast cancer treatment decisions may become even more tailored to an individual breast cancer patient's tumor.
Here we discuss the role that some of these new profi ling approaches may play in cancer patient management, and the role that tumor and patient heterogeneity may play in using this information to best benefi t patients.
Th e prognosis, prediction and treatment of breast cancer are complicated by the diverse constellation of causative alterations within multiple biological pathways that lead to this heterogeneous disease. Initial strategies to treat breast cancer have therefore employed genespecifi c, tissue-specifi c as well as whole genome approaches to identify specifi c signatures related to particular breast cancer types, which can then be exploited to optimize treatment targeting a specifi c patient's tumors. Some studies have evaluated the expression status of individual candidate genes in cell lines and/or tumor material in a tissue-specifi c manner. For example, signifi cantly reduced levels of mRNA expression of the metastasis suppressor genes BRMS1, KISS1 (kisspeptin), KAI1 (CD82) and Mkk4 (MAP2K4; mitogen-activated protein kinase kinase 4) have been shown in breast cancer brain metastasis [8], with specifi c suppression of BRMS1 modifying several metastasis-related phenotypes [9]. Whole genome approaches using microarray platforms have identifi ed more extensive gene sets that can predict a short interval to distant metastases (that is, a poor prognosis signature) [10,11] or have identifi ed gene sets that mediate metastasis from a specifi c primary tissue to a tissue-specifi c host site [12,13]. Minn and coworkers [14] identifi ed a complex 54-gene breast cancer set that marks and mediates breast cancer metastasis to the lungs and appeared to consist of at least two separate classes of genes that confer both breast tumorigenicity and lung metastagenicity, as well as one that is advantageous to cells in that lung environment. Additionally, Kang and co-workers [15] identifi ed a functionally diverse gene set that, when overexpressed, cooperatively promotes the metastasis of breast cancer cells to bone. Importantly, clinically signifi cant 21-gene [16] and 70-gene signatures [10,17] have formed the basis for widely used molecular diagnostic tests that have been translated and validated as eff ective clinical tools as prognostic and predictive markers for eff ective treatment decisions in specifi c breast cancer patient cohorts. Th ese particular markers will be discussed in detail later in this review. Finally, several reports have addressed the contributions of altered epigenetic signatures in breast cancer models [18,19] and through the integration of multiple genetic and epigenetic multi-gene platforms [20].
Th ese reports underscore the complexity of metastasis as a multigenic process and support the concept that hetero geneous, selectable subpopulations of cells in the primary tumor may possess specifi c gene sets that are permissive for metastasis and/or for the colonization and growth of those cells at specifi c secondary sites. Th e challenge for the clinician remains in identifying the relevant gene sets and to exploit this information to permit better prognosis and personalized treatment options for individual patients.

Current prognostic and predictive factorsa clinical perspective
Traditional clinical prognostic factors are still commonly used to guide therapy. Pathologic subtyping is important. For example, pure infi ltrating lobular [21], phylloides [22], mucinous and tubular carcinomas [23] have a generally better prognosis than infi ltrating ductal cancers, although the lobular cancers may have more late relapses. Increased nodal status, high tumor grade, high Ki67, increased tumor size and negative receptor status (especially PR) are associated with a poorer prognosis [24]. Th e increased use of sentinel lymph node dissection and subsequent more detailed examination of fewer nodes have resulted in more nodes with micrometastases (>0.2 to ≤2.0 mm), resulting in a new category for nodal status in the American Joint Committee on Cancer (AJCC) Cancer Staging Manual [25]. Although micro meta stases have been associated with a poorer prog nosis [26], it is possible that their prognostic impact has been diluted or eliminated by the use of modern systemic therapy [27]. More recent classifi cations include HER2 status [28] and basal-like breast cancer [3]. Interestingly, the National Comprehensive Cancer Net work (NCCN) and American Society for Clinical Oncology (ASCO) guidelines give discordant recommen dations for use of HER2 status for prognosis [29,30]. Basal breast cancer is generally thought to have a poorer short-term but better long-term prognosis [31], but understanding of this variant is hampered by the absence of a universally accep ted defi nition [3,32]. Th ere is increasing evidence that prognosis also may be related to patient-specifi c factors, including very young age [33] and postmenopausal women who are overweight and have excessive alcohol consumption [34,35]. Th us, environ mental factors may have a role in determining recurrence of cancer. Although race has been associated with poorer prognosis [36,37], this might be an epiphenomenon related to a complex interplay between socio-economic, cultural and biologi cal factors [38]. Th erefore, a better understanding of tumor biology may help discriminate among the relative importance of these factors. Research on prognostic markers would be more clinically relevant in the future if the REMARK (Reporting Recommendations for Tumor Marker Prognostic Studies) reporting recom men dations for tumor marker studies developed by the National Cancer Institute-European Organisation for Research and Treatment of Cancer (NCI-EORTC) were imple men ted [39]. However, a recent sampling of 50 studies from high impact journals indicated poor compliance with the recom mendations [40]. Th ese guidelines apply not only to single biomarkers, but also to panels of markers and profi les [41].
Guidelines for the use of predictive factors to target therapy have been published by the St Gallen's group [42], the National Comprehensive Cancer Network [29] and ASCO [30]. Th e Adjuvant! Online decision aid [43], although widely used, does not incorporate HER2 status and suff ers from diffi culties in interpretation of the comorbidity index, which may signifi cantly impact on the interpretation of benefi t when compared to overall and not just cancer mortality risks. It also does not incorporate potentially important independent risk factors, such as presence of lymphatic or vascular invasion in node negative disease [43,44].
Th e most diffi cult areas of controversy are in deciding whether to give chemotherapy to postmenopausal women who have low or even intermediate grade ER or PR positive, HER2 negative breast cancers with one to three nodes positive or those with negative nodes and ER or PR positive, HER2 negative intermediate grade tumors [45,46]. Th ere may also be subsets of women, especially those with HER2+ T1bN0 cancers, who might be at increased risk of relapse but for whom, at this time, there are no clear guidelines for treatment. Neoadjuvant chemo therapy is increasingly used both in clinical and research settings. Although pathologic complete res ponse is an important surrogate endpoint, more useful functional and molecular imaging tools along with biological assessment of tissue are required [47]. It is in these areas where there is the greatest potential for the use of newer biologically derived profi ling technologies. Finally, a greater understanding of molecular subtypes may allow for more rational use of chemotherapy in impor tant subsets of breast cancers [48].
Molecular subtyping provides a 'snapshot' of a tumor at a single point in time. However, tumor status may change when metastases are compared to primary cancers. A meta-analysis of 8 observational studies totaling 658 paired ER samples and 418 paired PR samples comparing primary and metastatic tumors showed discordance rates of 29% and 27% for ER and PR, respectively [49]. Information on HER2 status when primary and metastatic sites were compared has given discordance rates between 0% and 13.6% in seven studies, suggesting somewhat higher concordance [50][51][52], although one other study had a 34% discordance rate [53]. Discordance in markers led to a change in management in 20% of patients, suggesting that repeat biopsies should be considered in patients with metastases [54]. Discordance in HER2 status also has been reported between primary tumors and bone marrow metastases [55] as well as circulating tumor cells [56,57], raising questions about treatment decisions based solely on the HER2 status of the primary tumor. Much remains to be learned about molecular alterations and gene expression patterns in primary tumors versus their metastases, but these studies are complicated by the frequent diffi culty of obtaining matched tissue samples, especially when metastases may be detected long after a primary tumor has been resected. However, recent studies are beginning to document this heterogeneity [58][59][60]. How much these changes are driven by treatment, tumor progression, discrepancies in initial typing or intrinsic heterogeneity is unclear. It is clear that use of prognostic and predictive information obtained from the initial diagnosis of breast cancer and resection of the primary tumor may be imperfect in guiding treatment of metastatic disease.

'First-generation' expression profi ling as prognostic and predictive factors
As noted above, a small number of expression profi ling strategies have been successfully developed and validated for clinical use, some of which are now commercially available [61,62]. Th ese include the 70-gene expression signature as used in the MammaPrint® (Agendia, Amsterdam, Th e Netherlands) assay, and the 21-gene profi le used in the Oncotype Dx® (Genomic Health, Redwood City, CA, USA) assay. Clinical evidence in hormone responsive breast cancer supports the abilities of these assays to distinguish between patients who will do well and do not benefi t from chemotherapy added to hormone therapy, and patients who have poorer prognosis and who will benefi t from added chemotherapy [61,62]. Th ese assays are becoming increasingly used in the clinical setting to help in treatment decisions. A comparison of four studies from the US and the Netherlands indicated that these assays led to changes in treatment decisions in 18 to 44% of cases, and often in the direction of not giving chemotherapy to patients predicted not to benefi t. However, it should be noted that a recent study by Parisi and colleagues [63], which compared protein levels of 14 markers used in the Oncotype Dx assay with nodal status, tumor size, nuclear grade and age, found that a combined model incor pora ting both molecular and standard clinicalpathological information provided better prognostic information than either system alone. Th ere thus remain questions about the most eff ective use of molecularly based assays in the clinical setting.
Some of these questions will be addressed in two ongoing clinical trials, MINDACT (Microarray In Node negative Disease may Avoid ChemoTh erapy) and TAILORx (Trial Assigning IndividuaLized Options for Treatment (Rx)). Both trials are designed to assess the abilities of molecularly based assays to determine best adjuvant treatment for specifi c subsets of breast cancers, and in particular to determine which patients need chemotherapy and which are unlikely to benefi t from chemotherapy. Details of these trials have been summarized in detail elsewhere [61,62,64].
Th e TAILORx trial is using the Oncotype Dx 21 gene assay, in lymph node negative, ER and/or PR positive, and HER2-negative tumors [62,65]. Women with low 'recurrence scores' (RS <11) will receive hormone treatment only, and women with high RS (>25) will receive chemotherapy plus hormone therapy, as current standard of care. Women with intermediate RS (11 to 25), where there is uncertainty about need for chemotherapy, will be randomized to hormone therapy, plus or minus chemotherapy, to test the benefi t of adding chemotherapy for this group of patients.
Th e MINDACT trial will use the 70-gene profi le (MammaPrint), from fresh tissue from women with node negative breast cancer, and will compare the utility of this assay with current clinical-pathological assessment, as defi ned by the Adjuvant! Online tool [66,67]. Women whose risk assessments are concordant using the two assays will receive current standard treatment for their risk groups. Women with discordant determinations from MammaPrint versus Adjuvant! Online will be randomized to receive either chemotherapy or no chemo therapy. Together, the MINDACT and TAILORx trials will provide prospective evidence about the utility of molecularly based tests, to help determine the need for adjuvant chemotherapy in some women and identify women who are unlikely to benefi t from chemotherapy, thus providing more individualized treatment decisions for women with breast cancer [61,62,64]. Figure 1 diagrams the path from traditional clinical and prognostic factors, as well as currently available and evolving signatures, to clinical application for improved and more personalized treatment decisions, as exemplifi ed by the examples discussed above.

The road ahead -challenges and opportunities
Th e advent of next generation sequencing (NGS) technolo gies promises to provide powerful new tools to identify those individuals who may be at risk of developing primary or metastatic tumors, and has the potential to further enhance 'personalized' treatment decisions. NGS allows complete genomes to be sequenced in a matter of days, resulting in valuable, personalized information identifying mutations in patient or tumor DNA or RNA samples. While a full review of the technologies available today is beyond the scope of this work, readers are directed to excellent reviews that have been written on the subject [68,69].
A recent report [60] demonstrated how NGS can be used to characterize somatic mutations occurring during the development and progression of lobular breast cancer. Using DNA and RNA resequencing, 32 somatic non-synonymous mutations in a metastatic tumor were found, 19 of which were not present in the primary lesion. In addition, RNA sequencing detected two new RNA editing events that recode the amino acid sequences of two proteins, SRP9 and COG3. Th ese compelling results demonstrate that heterogeneity at the single nucleotide level can be an inherent property in low to intermediate grade tumors, and that signifi cant evolution can occur with progression of the disease.
In the clinical setting, testing of inherited loss of function mutations to tumor suppressor genes in women with a family history of breast or ovarian cancer is generally limited to the BRCA1 and BRCA2 genes. To address the fact that there are many other inherited mutations that may predispose one to these cancers, a recent report [70] developed an NGS assay to capture, sequence and detect all mutations in 21 genes (including BRCA1 and BRCA2) in women previously diagnosed with breast or ovarian cancer and carrying a mutation in at least one of the genes responsible for inherited predisposition of these diseases. Th ey were able to detect all single nucleotide substitutions, indel mutations, and large duplications and deletions that had been previously confi rmed, with no false positive calls. Taken together, their approach showed that widespread genetic testing and personalized risk assessment in these patients is feasible.
Th e use of massively parallel sequencing technologies, however, is not without signifi cant challenges that will have to be overcome if they are to be used extensively in the clinical setting. Th e foremost of these is that the   [69], although, as the technologies evolve, it is expected that this cost will drop signifi cantly, as was seen with microarray analyses. Indeed, the National Human Genome Research Institute in the US has announced a program with the ultimate goal to completely resequence the human genome for $1,000 or less [60,71]. Secondly, the samples will rarely be purely tumor tissue, with the presence of 'contaminating' DNA or RNA derived from normal tissue, immune cells or stromal tissue making the acquisition of a 'true' tumor signature a challenge. Th irdly, an inherent issue in NGS is the sheer volume of data generated by these analyses and whether appropriate bio informatics expertise is available to assess these vast datasets.
To date, no large scale studies analogous to those that led to the Oncotype Dx or MammaPrint assays have been performed using NGS technologies. However, eff orts are underway to create a comprehensive database of genetic alterations in breast cancer, such as that being undertaken by the Breast Cancer International Cancer Genome Consortium [69,72]. Coupled with eff orts to create a panel of 'normal' samples (for example, the 1000 Genomes Project [73]), these initiatives have the potential to allow a panel of disease-specifi c genetic anomalies that may eventually be used in elucidating a 'genomic alteration signature' . Th ese signatures may one day be tested in large scale clinical trials similar to the MINDACT or TAILORx studies referred to earlier. In addition, as the technologies and associated analyses are perfected, NGS information may be integrated with global gene expression studies on a personalized basis, allowing for a comprehensive and refi ned prognostic ability and treatment plan.

Challenges posed by heterogeneity
Perhaps the greatest challenge to successfully develop clinically valid gene signatures for breast cancer diagnosis, prognosis and prediction of treatment res ponse relates to the multiple concepts of heterogeneity of breast cancer. Th ese exist at the level of the causative molecular pathway(s), with regard to the clonal composition of the tumor itself and in the context of genetic variability within the patient population. Tumor development is essentially Darwinian, in that any of a number of molecular pathways that have been selected for in a specifi c tumor cell can contribute to the 'successful' meta static tumor [74]. Moreover, this heterogeneity is dynamic, as selective pressures change (that is, in the new environment encountered by a metastatic cell in a secondary tissue site) [75,76]. Th us, gene signatures may off er no more than a snapshot of a tumor's gene expression profi le that is best relevant for only a particular point in time. Furthermore, the presence of subpopulations of tumor cells that diff er in their genetic makeup, metastatic potential, invasiveness and capacity to replicate may further compromise an already complex signature, in that the most clinically relevant signature may be masked by a 'non-lethal' signature that dominates the tumor's DNA or RNA sample. Lastly, the selection and fate of specifi c tumor cells and the susceptibility of these cells to appropriate treatments is also likely dependent on inherited genetic variations that can aff ect the patient's tumor and response to chemotherapy [77,78]. Taken together, these multiple aspects of biological, clonal and patient heterogeneity make the process of establishing gene signatures both challenging and complex. Th us, comprehensive genomic analysis of tumor subpopulations and of the host patient is likely the best way to eff ectively use gene signatures from both patient and tumor, so that treatment plans can be optimized.

Conclusion
Signifi cant progress has been made over the past decade that has utilized the technical advances in molecular genetics to develop clinically relevant tools to aid in the prediction and treatment of breast cancer. However, even as these advances have been made, we are learning more about the complex biology that underwrites this complex set of potentially devastating diseases. Several important challenges must be faced. First, it is clear that there will be no shortage of information available regarding clinical characteristics of the patient (that is, age, menopausal status) or the clinical and molecular characteristic of her/his tumor (ranging from tumor histology to genomic signatures). Instead, the clear challenge is to be able to capture the clinically relevant signature(s) from the cacophony of molecular noise that exists, due to inherent issues related to tumor heterogeneity and disease complexity. In addition, these individual data sets must be linked directly with informative patient/tumor infor mation that is specifi c to that individual. Th e selection advantage provided by a particular set of genetic changes is critically important to the survivability of that tumor cell, and ultimately that same set of information is critical in guiding the choice of an eff ective treatment regime for that patient. As we move forward it is therefore necessary to link together these new genetic signatures with specifi c patient subgroups, while concurrently developing the molecular therapies that target the specifi c diseaserelated genetic alterations identifi ed in those signatures.
This article is part of a review series on Multiple gene prognostic factors, edited by Lewis Chodosh.

Abbreviations
ASCO, American Society for Clinical Oncology; ER, estrogen receptor; MINDACT, Microarray In Node negative Disease may Avoid ChemoTherapy; NGS, next generation sequencing; PR, progesterone receptor; RS, recurrence score; TAILORx, Trial Assigning IndividuaLized Options for Treatment (Rx).

Competing interests
The authors declare that they have no competing interests.