The breast cancer somatic 'muta-ome': tackling the complexity

Acquired somatic mutations are responsible for approximately 90% of breast tumours. However, only one somatic aberration, amplification of the HER2 locus, is currently used to define a clinical subtype, one that accounts for approximately 10% to 15% of breast tumours. In recent years, a number of mutational profiling studies have attempted to further identify clinically relevant mutations. While these studies have confirmed the oncogenic or tumour suppressor role of many known suspects, they have exposed complexity as a main feature of the breast cancer mutational landscape (the 'muta-ome'). The two defining features of this complexity are (a) a surprising richness of low-frequency mutants contrasting with the relative rarity of high-frequency events and (b) the relatively large number of somatic genomic aberrations (approximately 20 to 50) driving an average tumour. Structural features of this complex landscape have begun to emerge from follow-up studies that have tackled the complexity by integrating the spectrum of genomic mutations with a variety of complementary biological knowledge databases. Among these structural features are the growing links between somatic gene disruptions and those conferring breast cancer risk, mutually exclusive coexistence and synergistic mutational patterns, and a clearly non-random distribution of mutations implicating specific molecular pathways in breast tumour initiation and progression. Recognising that a shift from a gene-centric to a pathway-centric approach is necessary, we envisage that further progress in identifying clinically relevant genomic aberration patterns and associated breast cancer subtypes will require not only multi-dimensional integrative analyses that combine mutational and functional profiles, but also larger profiling studies that use second- and third-generation sequencing technologies in order to fill out the important gaps in the current mutational landscape.

Acquired somatic mutations are responsible for approximately 90% of breast tumours. However, only one somatic aberration, amplification of the HER2 locus, is currently used to define a clinical subtype, one that accounts for approximately 10% to 15% of breast tumours. In recent years, a number of mutational profiling studies have attempted to further identify clinically relevant mutations. While these studies have confirmed the oncogenic or tumour suppressor role of many known suspects, they have exposed complexity as a main feature of the breast cancer mutational landscape (the 'muta-ome'). The two defining features of this complexity are (a) a surprising richness of low-frequency mutants contrasting with the relative rarity of high-frequency events and (b) the relatively large number of somatic genomic aberrations (approximately 20 to 50) driving an average tumour. Structural features of this complex landscape have begun to emerge from follow-up studies that have tackled the complexity by integrating the spectrum of genomic mutations with a variety of complementary biological knowledge databases. Among these structural features are the growing links between somatic gene disruptions and those conferring breast cancer risk, mutually exclusive coexistence and synergistic mutational patterns, and a clearly nonrandom distribution of mutations implicating specific molecular pathways in breast tumour initiation and progression. Recognising that a shift from a gene-centric to a pathway-centric approach is necessary, we envisage that further progress in identifying clinically relevant genomic aberration patterns and associated breast cancer subtypes will require not only multi-dimensional integrative analyses that combine mutational and functional profiles, but also larger profiling studies that use second-and third-generation sequencing technologies in order to fill out the important gaps in the current mutational landscape.

The copy-number 'muta-ome'
The most prominent feature in the breast cancer copynumber muta-ome is amplification of the HER2 locus, present in about 10% to 15% of all breast tumours. It is remarkable that since the discovery of this amplification no further ERBB2-like oncogene has been conclusively identified.
Although two recent large-scale (145 and 171 tumours) genome-wide profiling studies combining high-resolution copy-number and matched gene expression data have confirmed candidate oncogenes in well-known regions of recurrent amplification (notably, 8p12, 8q24, 11q13-14, 17q21-24, and 20q13), none of these appears to be as frequently amplified as ERBB2 and they rarely exhibit amplification profiles that clearly point at a specific genomic location or target [1,2]. Instead, the amplification profiles are complex and multi-modal, suggesting that multiple targets may coexist within these regions. This identification problem is compounded by the fact that a relatively high proportion of variation at the gene expression level (approximately 20%) is driven by copy-number changes; thus, focusing on regions of expression bias that are driven by underlying amplifications (so-called 'hotspots') still leaves an unmanageably large number of targets. Nevertheless, by focusing within these hotspots on genes that are also druggable, Chin and colleagues [1] prioritised a smaller set of eight targets, FGFR1, IKBKB, PROSC, ADAM9, FNTA, ACACA, PNMT, and NR1D1, including ERBB2. Confirming the robustness of these findings, all of these were also found to reside in amplification hotspots in an independent breast cancer cohort [2] (A.E. Teschendorff and C. Caldas, unpublished data). In spite of this agreement, the two studies were discordant when hotspots were associated with clinical outcome, mirroring the disagreements of initial gene expression studies. Thus, whereas in [1] associations with survival and recurrence were restricted to the amplicons on 8p11-12 and 17q11-12, in [2] outcome-associated amplicons were found on 8q22.3, 8q24.3, 8q24.11-13, and 11q14. As explained in [2], this disagreement is most likely due to substantial differences in the clinical characteristics of the two cohorts, yet the lessons learned from mRNA expression microarray studies also suggest that larger studies Viewpoint The breast cancer somatic 'muta-ome': tackling the complexity Andrew E Teschendorff 1 and Carlos Caldas 2,3 may help to resolve such discrepancies. In fact, given the increased complexity of genomic breast cancer profiles relative to mRNA profiles, much larger sample sizes (approximately 500 to 1,000) might be needed before substantial overlap between prognostic copy-number signatures is found. In addition, given that among copynumber aberrations it is high-level amplifications that seem to carry most of the prognostic significance [1,2], the development of a future prognostic copy-number signature, similar to the expression signatures currently being tested in clinical trials, would benefit from higher-resolution studies such as those using single-nucleotide polymorphism (SNP) arrays or next-generation sequencing technologies, as these have the capability to detect more focal (that is, less than 20 to 200 Kb) amplifications and homozygous deletions (HDs). Indeed, a recent high-resolution 317,503 to 555,351 SNP array study of 45 breast tumours was able to detect a larger repertoire of copy-number amplifications and HDs, some smaller than 250 Kb [3]. Thus, in addition to identifying focal aberrations encompassing known oncogenes (CCND1, CCNE1, and FGFR2) and tumour suppressors (CDKN2A and PTEN), they uncovered a number of other important genes with likely oncogenic or tumour suppressor roles (PCDH8, MRE11A, and HOXA3) [3]. The higher-resolution SNP array also allowed Leary and colleagues [3] to estimate an average number of 18 copy-number aberrations (7 HDs and 11 high-level amplifications) that were altered significantly above the background mutation rate, implicating about 24 genes that are driving tumour progression.
Larger-scale profiling studies are also needed in order to more fully characterise the genomic landscape of breast cancer. So far, four main genomic subtypes that roughly correlate with the well-known intrinsic subtypes defined from gene expression studies have been identified [1,2]. Specifically, one subtype (called 'simple') consisted almost entirely of estrogen receptor-positive (ER + )/luminal-A tumours and is characterised mainly by a relatively simple genomic profile defined by a gain of 1q and 16p and a simultaneous loss of 16q. Another genomic subtype ('simple amplifier') was characterised mainly, but not exclusively, by amplifications on 11q13-14 and 17q11-13 and mapped roughly to the ER + /luminal-A and ER -/HER2 + expression subtypes, respectively. Thus, amplifications on 11q13-14 and 17q11-13 occurred in a largely mutually exclusive fashion, suggesting that these tumours are using distinct oncogenic mechanisms. A third genomic subtype ('complex amplifier') showed the highest degree of genomic instability (GI) and complex rearrangements, including frequent amplifications at the 8q24 and 8p12 loci. This genomic subtype is that of worst prognosis, consisting mainly of ER -/basal and ER + /luminal-B tumours. A fourth genomic subtype consisted mainly of ER -/basal tumours but was characterised, surprisingly, by a low GI profile ('flat') [2]. It is doubtful, however, whether such classifications will have much relevance as it is likely that the clinically relevant phenotypes are determined by specific combinations of different types of somatic mutations. Thus, to identify breast cancer genomic subtypes that are more relevant, we envisage that it will be necessary to derive and analyse multi-dimensional mutational profiles.

The point 'muta-ome'
Two recent landmark sequencing studies [4,5] have also revealed the remarkable complexity of the point mutational landscape of breast cancer. In addition to the high-frequency somatic point mutations in TP53 (53%), PIK3CA (26%), and CDH1 (21%) (the so-called gene 'mountains'), these studies have confirmed a surprisingly larger number of other genes that also appear to be more frequently mutated than what can be accounted for by chance, albeit at much lower frequencies than TP53 or PIK3CA. Specifically, for the 11 breast tumours considered in the discovery screen of [5], a total of 1,137 RefSeq genes contained a somatic mutation and 167 of these were also validated in an independent screen of 24 samples. When a number of different estimation procedures for passenger mutation rates were used, it was found that approximately 120 genes were more frequently mutated than the passenger rate, suggesting that these genes are more likely to carry driver mutations (so-called CAN genes). This translates into an average of 101 non-synonymous somatic mutations in RefSeq genes per breast tumour, and approximately 14 of these are thought to be located in CAN genes [5]. It could be argued that this is a gross overestimate due to the highly tumourigenic nature of the breast samples considered (all were ERcell lines mostly representing metastases); however, it is also plausible that it would increase as larger-scale sequencing studies get completed. Combining the published estimate with the average number of 24 driver amplifications/HDs affecting any given breast tumour and assuming mutual exclusivity of mutation type, this would implicate an average of approximately 40 CAN genes in any given tumour.
Given that estimates of passenger mutation rates are only approximate and that a high mutation rate may reflect mechanisms that are only indirectly related to tumour genesis or progression, it could be expected that many of the identified CAN genes are false-positives. Nevertheless, several of the CAN genes identified in [4,5] (notably, IKBKB, IKBKA, CHD5, STK11, STK6, and BRAF) have been independently implicated in breast or other tumours, whereas other genes (for example, ATM and FGFR2) have been associated with germline mutations and an increased risk of breast cancer. Moreover, in an independent bioinformatics study, several breast cancer CAN genes (GAB1, NLE1, and CNTN6) were rediscovered as members of protein interaction subnetworks with combined expression levels that correlated with clinical outcome [6], consistent with the idea that mutations in these genes are disrupting important signalling pathways and thus fuelling malignant progression. Taken together, these observations provide strong support for the direct causal role of the identified CAN genes in tumour progression and, for some, probably tumourigenesis also.

Massively parallel sequencing: a fusion 'muta-ome'?
The emerging pattern of a few highly prevalent CAN genes ('mountains') with a much larger number of low-frequency events ('hills') parallels the aberration landscape pattern observed at the copy-number level. Thus, a question posed by these studies is whether the relative scarcity of cancer gene 'mountains' is a genuine feature of the cancer muta-ome or whether it merely reflects the inability of the technology to detect further high-frequency mutations. While the latter scenario seems to be highly unlikely in the case of point mutations [5], more focal (=20 to 50 kB) amplifications and deletions will be discovered using higher-resolution SNP arrays and massively parallel sequencing, as already demonstrated with SNP arrays by Leary and colleagues [3], yet it also seems unlikely that these will constitute novel gene 'mountains'. Another possibility is that the missing highfrequency events involve other types of genomic rearrangements such as balanced translocations or inversions, which are not detectable using copy-number analysis but which may be discovered using massively parallel paired-end sequencing or transcriptome sequencing, as illustrated recently in lung cancer [7], prostate cancer [8], and breast cancer [9,10] cell lines. Using a combined strategy of short-and long-read transcriptomic sequencing of a metastatic prostate cancer cell line, Maher and colleagues [8] identified a variety of mechanistically different and novel gene fusions, including a novel read-through gene fusion chimaera SLC45A3-ELK4, which interestingly was also found to be aberrantly expressed in 7 out of 20 (35%) metastatic prostate cancer cell lines, 6 of which were negative for ETS gene fusions. Similarly, longread transcriptomic sequencing of a breast cancer cell line (HCC1954) identified not fewer than seven chimaeric transcripts arising mostly from intragenic to intergenic gene fusion events leading to protein truncation. Notably, among the seven rearrangements, one, t(4;11)(q32;q21), causes truncation of the DNA-binding domain of MRE11A, a candidate tumour suppressor involved in DNA damage repair, whereas two other fusions, t(5;8)(q35.3;q24.21) and t(5;8)(p15.33;q24.21), implicate the known cancer genes NSD1 and PVT1 and the 8q24.21 breast cancer susceptibility region. Although these findings are very encouraging, large-scale sequencing studies will be needed to evaluate the prevalence of these genomic fusion aberrations in breast cancer. If these studies were to confirm that specific inter-and intra-chromosomal gene fusions are highly frequent events in breast cancer (as already demonstrated in prostate cancer), it would vindicate the longheld belief that this type of rearrangement constitutes the most prevalent category of somatic aberrations in all tumours.

Tackling the complexity: structural features of the 'muta-ome'
The broad picture emerging from the large-scale lowresolution array-based comparative genomic hybridisation studies [1,2] and the smaller-scale higher-resolution sequen-cing studies [5,9,10] is that of a highly complex genomic landscape, not only in terms of the sheer numbers of genes that may be directly implicated, but also in terms of the different types and flavours of genomic rearrangements that seem to be causally involved. Clearly, building a vast catalogue of somatic mutations in breast cancer [11] constitutes only the first step on the long road to identifying novel drug targets and developing therapies that are more effective. Although the catalogue is still incomplete, it is comprehensive enough that various structural features are starting to emerge.
First, point mutations that are activating or inactivating often occur in genes that are also commonly amplified or deleted [3,12]. This observation is important for it identifies genes that are disrupted by multiple, and possibly equivalent, mechanisms and that therefore are more likely to drive tumour progression. Thus, it may be possible to categorise all somatic mutations into activating or inactivating aberrations and to carry this simplified picture forward when analysing the systems-level implications of these aberrations. A similar approach was taken by Leary and colleagues [3], who combined the point and copy-number mutomes to identify pathways targeted for disruption.
A second structural feature to emerge has been the mutual exclusivity pattern of mutations [13], often involving genes in a common pathway or genes exhibiting high sequence similarity [14], as well as cooperative or synergistic interactions, often involving physically interacting proteins such as the SMAD2 and SMAD3 receptors in colorectal cancer [4]. In the case of breast cancer, there are some clear mutual exclusivity patterns emerging (for example, those of PIK3CA and TP53 mutations; A.E. Teschendorff and C. Caldas, unpublished data) and independent hints of further patterns such as those of PIK3CA and RAS [13,15] as well as hints of cooperation or coexistence such as that involving HER2 amplification and PTEN loss (including loss of PTEN expression) [15]. Although in most of these studies sample sizes are still too small for the results to be conclusive, the strong mutual exclusivity and cooperative patterns observed in, for example, a sequencing study of 188 lung tumours [12] suggest that similar mutational patterns will be seen in breast cancer.
Finally, a third feature to emerge is the evidently non-random distribution of mutations and affected genes in relation to specific protein domains, gene functions, and molecular pathways. In the case of somatic mutations, a strong association with spectric repeat and fibronectin domains, GTPase activation, extracellular matrix, and cell-cell adhesion as well as calcium ion-binding functions was observed [14,16]. This reinforces the crucial role that cytoskeletal, including cell-cell adhesion, dysregulation plays in malignant progression of breast cancer, although it remains to be seen how this may depend on the subtype and stage of the specific tumours studied [14,16]. In the context of pathways, somatic mutations were found to be enriched in many pathways known to be pathogenic in breast cancer, including interferon signalling, cell-cycle checkpoint (G 1 -S), BRCA1/2 (breast cancer 1/2, early onset)-related DNA repair, and phosphatidylinositol 3 kinase (PIK3) and v-akt murine thymoma viral oncogene homolog 1 (AKT) and transforming growth factorbeta (TGF-β) signalling [14,16]. In an integrative strategy, incorporating both point and copy-number aberrations in the enrichment analysis, Leary and colleagues [3] identified further important signalling pathways such as those of Notch, epidermal growth factor receptor (EGFR), fibroblast growth factor (FGF), and v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 (ERBB2). Thus, these results should help to direct increasingly focused efforts that aim to develop therapies tailored to restoring these disrupted pathways. The study of Lin and colleagues [14] also illustrates nicely the added insights gained by adopting a more systems-level approach, mapping mutations onto well-curated protein interaction networks in order to identify subnetworks more prone to disruption. Specifically, they were able to show that genes targeted by point mutations are more prone to occupy hubs in the corresponding protein interaction networks as opposed to genes that are not. Interestingly, of the 83 somatically mutated proteins in breast cancer [17], over half (59) were shown to be part of a large interaction cluster involving TP53, BRCA1, PIK3R1, and NFKB. Thus, a full interpretation and understanding of how the approximately 30 to 40 CAN genes driving a given tumour cooperate to disrupt important signalling pathways, that themselves exhibit significant cross-talk, will undoubtedly benefit from networkbased approaches that try to give meaning to the aberrations in the context of the whole interaction network. Some further insights in this direction were recently offered by Taylor and colleagues [18], who showed that tumours are preferentially disrupted at 'intermodular' hubs, representing proteins that mediate signals between different cellular functions.
The structural features of the complex muta-ome are only beginning to emerge, and full elucidation of these features will be crucial for translating the vast and growing catalogue of somatic mutations into improvements for therapy and eventual cures. Given the heterogeneity of breast cancer, accomplishing this goal will require upscaling the current Genome Atlas Research Project [19] to generate multidimensional high-resolution mutational and functional profiles of hundreds if not thousands of tumours.