Skip to main content

Genomic subtypes of breast cancer identified by array-comparative genomic hybridization display distinct molecular and clinical characteristics



Breast cancer is a profoundly heterogeneous disease with respect to biologic and clinical behavior. Gene-expression profiling has been used to dissect this complexity and to stratify tumors into intrinsic gene-expression subtypes, associated with distinct biology, patient outcome, and genomic alterations. Additionally, breast tumors occurring in individuals with germline BRCA1 or BRCA2 mutations typically fall into distinct subtypes.


We applied global DNA copy number and gene-expression profiling in 359 breast tumors. All tumors were classified according to intrinsic gene-expression subtypes and included cases from genetically predisposed women. The Genomic Identification of Significant Targets in Cancer (GISTIC) algorithm was used to identify significant DNA copy-number aberrations and genomic subgroups of breast cancer.


We identified 31 genomic regions that were highly amplified in > 1% of the 359 breast tumors. Several amplicons were found to co-occur, the 8p12 and 11q13.3 regions being the most frequent combination besides amplicons on the same chromosomal arm. Unsupervised hierarchical clustering with 133 significant GISTIC regions revealed six genomic subtypes, termed 17q12, basal-complex, luminal-simple, luminal-complex, amplifier, and mixed subtypes. Four of them had striking similarity to intrinsic gene-expression subtypes and showed associations to conventional tumor biomarkers and clinical outcome. However, luminal A-classified tumors were distributed in two main genomic subtypes, luminal-simple and luminal-complex, the former group having a better prognosis, whereas the latter group included also luminal B and the majority of BRCA2-mutated tumors. The basal-complex subtype displayed extensive genomic homogeneity and harbored the majority of BRCA1-mutated tumors. The 17q12 subtype comprised mostly HER2-amplified and HER2-enriched subtype tumors and had the worst prognosis. The amplifier and mixed subtypes contained tumors from all gene-expression subtypes, the former being enriched for 8p12-amplified cases, whereas the mixed subtype included many tumors with predominantly DNA copy-number losses and poor prognosis.


Global DNA copy-number analysis integrated with gene-expression data can be used to dissect the complexity of breast cancer. This revealed six genomic subtypes with different clinical behavior and a striking concordance to the intrinsic subtypes. These genomic subtypes may prove useful for understanding the mechanisms of tumor development and for prognostic and treatment prediction purposes.


The accumulation of genomic aberrations is a fundamental part of solid-tumor development. Identification of patterns of DNA copy-number alterations (CNAs) and the genes that are targeted may reveal underlying mechanisms of disease evolution and potential candidates for therapeutic intervention. Breast cancer (BC) is a profoundly heterogeneous disease that encompasses several distinct disease entities, in which correct stratification of patients is critical for optimal disease management. Conventional markers for prognostication or treatment prediction or both include tumor size and lymph-node involvement, histologic grade, estrogen (ER) and progesterone receptor (PgR) expression, as well as human epidermal growth factor receptor 2 (HER2, HER2/neu, ERBB2) amplification status. Recent development of high-throughput molecular methods offers new opportunities to capture the wide range of genomic and biologic variability in tumors. A pioneering study by Perou et al. [1] used a gene-expression signature to disclose five intrinsic molecular subtypes of BC: basal-like, normal-like, HER2-enriched, and luminal A and B, suggested to reflect differences in cellular origin and divergent progression of tumors. Subsequent analysis confirmed the relevance of these molecular subtypes by showing correlations to clinical parameters and overall survival (OS) [2].

A study of BC pathophysiology approached the subject of BC heterogeneity by linking transcriptional and genomic profiles [3]. By classifying tumors according to their patterns of CNA, three groups were identified and referred to as the 1q/16q, amplifier, and complex subtypes. Tumors of the 1q/16q genomic subtype showed better disease-specific survival as compared with the amplifier and complex subtypes, whereas tumors with a basal-like gene-expression profile were found predominantly within the complex genomic subtype [3].

In the present study, we analyzed high-resolution genomic data from array-comparative genomic hybridization (aCGH) analysis of 359 BCs to gain further insight into patterns of CNA and loci that are specifically targeted by focal amplification or homozygous deletion. We combined these results with gene-expression data to reveal underlying mechanisms of disease evolution to highlight genomic heterogeneity in BC. Hierarchical clustering analysis of significant genomic lesions revealed genomic subtypes that displayed different clinical outcomes and high similarity to the intrinsic gene-expression subtypes previously described [4]. Additionally, tumors from BRCA1 and BRCA2 mutation carriers were found in distinct genomic subtypes. In conclusion, we show that the genomic landscape of BC reveals subtypes that reflect biologic and clinical behavior.

Materials and methods

Patients and tumor material

Freshly frozen breast-tumor tissues (n = 359) were obtained from the Southern Sweden Breast Cancer Group tissue bank at the Department of Oncology, Skåne University Hospital, Lund, The Helsinki University Central Hospital, and Landspitali University Hospital. Of the 359 tumors, 346 were primary tumors, and the remaining cases were either local recurrences or lymph node metastases. Median OS follow-up time for patients from whom primary tumors were available was 8.1 years (range, 0.24 to 32 years). Tumor and patient characteristics are summarized in Table 1. Patients were diagnosed over a longer time period at different institutions and consequently not uniformly treated. The study was approved by the regional Ethical Committee in Lund (reg. no. LU240-01 and 2009/658), waiving the requirement for informed consent for the study, the Icelandic Data by the Protection Committee and the National Bioethics Committee of Iceland, and the Finnish Data by the Helsinki University Central Hospital Ethical Committee (207/E9/07). For Icelandic and Finnish patients, written informed consent was obtained according to the national guidelines.

Table 1 Patient and tumor characteristics for the 359 tumors

Gene-expression analysis

Global gene-expression analysis of breast tumors was performed by using oligonucleotide microarrays (Gene Expression Omnibus, GEO, [5] platform GPL5345) produced at the SCIBLU Genomics Centre at Lund University, Sweden [6], as described [7]. Data analysis and normalization [8] of the 359 tumors were performed, together with 218 other breast samples, as described (Additional File 1). Tumors were classified according to the intrinsic molecular subtypes reported by Hu et al. [4], a proliferation gene module [9], and a genomic-grade signature [10], as described [11] (Additional File 1). Gene-expression data for all 359 tumors are available through GEO as GSE22133.

aCGH analysis

BAC microarrays (GEO platform GPL4723) comprising approximately 32,000 clones were produced by the SCIBLU Genomics Centre, Lund University, Sweden, as described [7]. Labeling, hybridization, image analysis, and initial data analysis were performed as described [7]. Technical replicate experiments were performed on 15 tumors. Copy-number estimates (log2 ratios) for each array were normalized [12], and replicated samples were merged after normalization. Breakpoint analysis was performed by using circular binary segmentation (CBS) with α = 0.01 [13]. Only segments of four or more BAC probes were used in further analyses. Gains and losses were detected by applying sample adaptive thresholds, derived from 250-kbp smoothed data, to CBS log2 ratios, as described [12]. Recurrent high-level amplifications were defined as occurring in > 1% of tumors with a CBS log2 ratio ≥1. The fraction of the genome altered (FGA) was calculated as described [14]. CGH data for all 359 tumors are available through GEO as GSE22133.

Identification of significant CNAs by using GISTIC

Genomic Identification of Significant Targets in Cancer (GISTIC) [15] was used to identify significant amplification and deletion peaks in the 359 tumors, as described (Additional File 1). Student's t test performed on average scaled log2 ratios for significant GISTIC regions were used to identify regions associated with different clinical variables or molecular subtypes. Hierarchical clustering of significant GISTIC peaks was performed by using Pearson correlation and complete linkage on average scaled log2 ratios for each peak. Genomic coordinates for GISTIC regions are mapped to the UCSC Human Genome browser build 17 [16].

Construction of gene-expression centroids based on genomic subtypes

Gene-expression centroids for genomic subtypes were created based on genes used for classification according to the molecular subtypes by Hu et al. [4] (Additional File 1). The centroids were subsequently applied to a previously reported [14] combined BC data set (n = 1,881 tumors) comprising 11 public BC data sets generated on Affymetrix U133A arrays, including the Chin et al. data set [3] (Additional File 1). Complementary aCGH data for the Chin et al. data set were processed and analyzed as described [17].

Correlation of gene-expression data with genomic aberrations

Gene-expression data were compared with GISTIC aCGH log2 ratios for genomic subtypes by using Pearson correlation, as described [7]. GISTIC regions were expanded by one BAC probe in each direction to include borderline genes. A correlation cut-off representing a P value = 0.05 obtained from 100 permutations of aCGH sample labels was used to identify significantly correlated genes in GISTIC regions. Global correlation analysis for genomic subtypes by using genes mapped to individual BAC probes was performed similarly, with one modification; CBS log2 ratios were used for individual BAC probes.

Survival analysis

Univariate and multivariate regression analyses of overall survival (OS) and distant metastasis-free survival (DMFS) were performed in R [18] by using the Survival package. OS or DMFS was the end point. Survival curves were compared by using Kaplan-Meier estimates and the log-rank test. The full follow-up time was used for log-rank tests and regression analyses, if not otherwise specified. Tick marks in Kaplan-Meier plots indicate censored follow-up.


Comprehensive DNA copy-number analysis of BC

We used a tiling BAC array CGH to survey genome-wide CNAs in BC from BRCA1 (n = 22) and BRCA2 mutation carriers (n = 32), non-BRCA1/2 familial (n = 132), and sporadic cases (n = 173). The overall pattern of DNA CNAs displayed an extensive heterogeneity with most frequent CNAs on chromosomes 1q, 8q, and 16q (Figure 1). The GISTIC algorithm [15] was used to identify genomic changes that represented statistically significant events by using all 359 tumors. GISTIC identified 66 regions of gain and 67 regions of loss (Figure 1, Additional File 2).

Figure 1
figure 1

Copy-number alterations (CNAs) observed in 359 breast cancers. Blue regions indicate positions of significant genomic aberrations (n = 133) identified by Genomic Identification of Significant Targets in Cancer (GISTIC) analysis. Green corresponds to loss, and red, to gain. Most common CNAs are observed on chromosomes 1q, 8p, 8q, 11q, and 16q, as indicated in the figure.

The most frequent high-level amplification peaks were found on chromosomes 17q12 (13%), 8p12 (7%), 8q24.21 (6%), 11q13.3 (6%), and 11q13.5 (4%), encompassing known oncogenes such as HER2, MYC, CCND1, and PAK1 (Table 1 in Additional File 3). To identify coamplified regions, we selected loci amplified in at least three cases and determined the fraction of coamplified samples (Figure 2a). Amplifications located on the same chromosome or chromosomal arm were more commonly coamplified; however, chromosomes 11q13 and 8p12 were also coamplified in a significant fraction (Figure 2a). Additionally, chromosome 12q15 was found to be coamplified with 8p12 and 11q13. Several other loci were coamplified but contained too few cases to draw any reliable conclusions. As expected, the coamplification pattern also was evident on the gene-expression level, pinpointing novel and known target genes (Figure 2b).

Figure 2
figure 2

High-level amplifications in breast cancer (BC). (a) Coamplification patterns in BC. For each amplification (vertical axis), the fraction of samples with a coamplification (horizontal axis) is indicated in each box. Coamplification fractions smaller than 20% are excluded: for example, 30% of all 12q15-amplified samples also have 8p12 amplification, whereas the fraction of 8p12-amplified samples with 12q15 amplification is < 0.2 and is not displayed. (b) Overview of the coamplification pattern in chromosomes 8p12, 11q13, and 12q15. Amplification pattern is also evident on a gene-expression level, where a number of genes show a significant relation to gene-dosage effects.

In a subsequent step we aimed to identify target genes within high-level amplifications. In total, we found 31 loci affected by high-level amplifications. Integration of gene-expression data with DNA copy-number data revealed a number of candidate genes in which mRNA expression was significantly correlated with gene dosage. Among significantly correlated genes were known oncogenes, such as HER2, FGFR1, CCND1, MYC, and MDM2; however, we also found, for example, MDM4, ELK4, PTPRK, GAB2, and RAB22A to be significantly correlated (Table 2). Among the top significant deletion GISTIC peaks were chromosomes 9p21.3 and 10q23.31, encompassing the CDKN2A and PTEN genes, respectively (Additional File 2).

Table 2 Regions amplified with a frequency of >1% in the 359 tumors

Molecular classification of BC by using DNA copy-number alterations

BC has been divided into intrinsic subtypes by using gene-expression profiling [1, 2]. Consistent with previous reports [3, 1921], we found several chromosomal regions that were differentially altered between the intrinsic gene-expression subtypes, as well as for several clinical parameters (Figures S1 to S3 in Additional File 3). We used GISTIC regions derived from DNA copy-number data and hierarchical clustering to divide our cohort into six subtypes characterized by different FGA levels and CNAs (Figure 3a, b; Table 3; Additional File 4; Figure S4 in Additional File 3). To further characterize the identified genomic subtypes, we used available classification according to the intrinsic gene-expression subtypes, as well as other signatures derived from gene-expression analysis of all tumors.

Table 3 Patient and tumor characteristics for the six genomic subtypes
Figure 3
figure 3

Unsupervised analysis of Genomic Identification of Significant Targets in Cancer (GISTIC) regions identifies six CGH subgroups of breast cancer (BC) associated with different clinical and molecular characteristics. (a) Hierarchic clustering of 133 GISTIC regions identifies six subtypes with different clinical and molecular characteristics, and genomic aberrations. Horizontal dashed line for S-phase indicates the average across all samples. (b) Fraction of the genome altered (FGA) for genomic subtypes indicating that basal-complex samples are genomically unstable, whereas luminal-simple tumors are genomically stable. (c) Overall survival (OS) for 339 patients, for whom primary tumors were available, classified according to genomic subtypes, mirrors results obtained for the intrinsic gene-expression subtypes.

Four of the genomic subtypes displayed striking similarity to previously described intrinsic subtypes derived from gene-expression profiling. The 17q12 subtype comprised 67% of all cases with HER2 amplification (segmented log2 ratio > 0.5), and 88% of all HER2-enriched intrinsic subtype-classified cases were found in this genomic subgroup. The basal-complex subtype included mainly basal-like classified tumors, as well as 77% of all BRCA1-mutated tumors. Furthermore basal-complex tumors displayed higher S-phase fractions, higher FGA percentages, and higher correlation to a genomic-grade signature (Figure 3a). The luminal-simple subtype included mainly luminal A-classified tumors (72%) and was predominantly characterized by frequent losses on 16q, and gains on 1q and 16p. Furthermore, cases in this subtype displayed lower FGA levels, S-phase fractions, and genomic grade correlation. Finally, the luminal-complex subtype included 69% of all luminal B-classified cases, and 78% of all BRCA2-mutated tumors, but also 33% of all luminal A subtype-classified tumors. Moreover, two groups of tumors displayed mixed intrinsic subtype characteristic and heterogeneous genomic profiles (the amplifier and mixed subtypes). The mixed subtype showed heterogeneous losses across several chromosome arms, most frequently on 8p, 13q, and 17p, together with frequent gain on 1q (~60%) (Additional File 4). The amplifier subtype contained 48% of the 8p12-amplified cases (segmented log2 ratio > 0.5) as well as frequent gain of 1q, 11q13, 16p, 19p, and 19q and frequent losses on 8p, 11qter, 16q, and 17p (Additional File 4). Furthermore, the amplifier contained the highest fraction of luminal B-classified cases after the luminal-complex subtype. However, several GISTIC regions distinguished amplifier-classified cases from luminal-complex cases in a supervised analysis (Figure S5 in Additional File 3). Interestingly, 31% of all luminal-complex samples harbored 8p12-amplification; however, these tumors displayed differences in the genomic pattern compared with 8p12-amplified tumors in the amplifier subclass. First, 8p12-amplified tumors in the luminal-complex subtype showed higher FGA compared with 8p12-amplified tumors in the amplifier subtype (P = 0.02; t test). Second, in the amplifier subtype, 86% and 71% of 8p12-amplified tumors showed coamplification of the 11q13.5 and 11q13.3 GISTIC regions, respectively, as compared with 29% and 17% of 8p12-amplified tumors in the luminal-complex subtype.

To investigate whether the identified genomic subtypes showed an association with the outcome, we performed Kaplan-Meier analyses by using OS as end point. Not surprisingly, significant differences in OS between genomic subtypes were observed, with the luminal-simple subtype having the best outcome, and the 17q12 subtype, the worst outcome (Figure 3c).

Association of BRCA1/2 mutation status and genomic subtypes

The basal-complex and the luminal-complex subtypes contained 77% of all BRCA1- and 78% of all BRCA2-mutated tumors, respectively. Interestingly, by using supervised analysis, no GISTIC region was found to differ significantly between BRCA1 and non-BRCA1 tumors in the basal-complex subtype, although the former showed a significantly higher FGA (P = 0.01; t test). The five BRCA1-mutated tumors falling outside the basal-complex subtype did not differ significantly from other BRCA1 tumors regarding FGA, S-phase fraction or genomic grade correlation, with the exception that two of five tumors were ER positive. BRCA2 tumors in the luminal-complex subtype were characterized by losses on 3p21.31, 3p14.1, 6q16.2, 13q14.2, 14q24.3, and 22q13.31 and gains on 17q25.3 compared with non-BRCA2 tumors in the same genomic subtype, whereas the latter showed more-frequent gain of 11q13.3 (Figure 4a). Again, BRCA2 tumors in the luminal-complex subtype showed significantly higher FGA than did non-BRCA2 tumors in this subtype (P = 0.01; t test). The seven BRCA2-mutated tumors not belonging to the luminal-complex subtype were found in the basal-complex (n = 5), 17q12 (n = 1), and mixed (n = 1) subtypes. BRCA2 tumors in the basal-complex subtype were ER-negative (80%), showed higher FGA, higher genomic grade correlation than luminal-complex BRCA2 tumors, and displayed CNAs more similar to BRCA1 tumors.

Figure 4
figure 4

Supervised analysis in luminal genomic subtypes. (a) Significant Genomic Identification of Significant Targets in Cancer (GISTIC) regions between BRCA2-mutated and non-BRCA2 tumors within the luminal-complex subtype. (b) Significant GISTIC regions between the luminal-complex and luminal-simple subtypes. (c) Distant metastasis-free survival (DMFS) for luminal A tumors stratified by classification as luminal-simple or non-luminal-simple in a combined Affymetrix gene-expression data set. Significant GISTIC regions were identified by Bonferroni-adjusted Student t test (P < 0.05); red indicates more-frequent gain, and green indicates more-frequent loss, in comparisons between GISTIC regions. Only significant regions with ≥20% CNA frequency are displayed.

DNA copy-number alterations divide luminal breast cancer in two entities

Supervised analyses were performed to investigate differences in characteristic alterations between luminal-complex and luminal-simple tumors. Luminal-simple tumors were primarily characterized by frequent loss on 16q, whereas luminal-complex tumors were characterized by losses on 3p, 8p, 9p, 11q25, and 13q and gains on 8q, 11q13.3, and 17q (Figure 4b). Interestingly, luminal A tumors in the luminal-simple subtype showed several distinct differences compared with luminal A tumors in the luminal-complex subtype, including lower FGA (P = 0.0005; t test), different CNA pattern (Figure S6 in Additional File 3), lower genomic-grade correlation (P = 0.02; t test), lower correlation to the luminal B gene-expression centroid [4] (P = 0.0005; t test), higher correlation to the luminal A gene-expression centroid [4] (P = 0.05; t test), and a trend toward better OS (log-rank, P = 0.06). By comparison, in the luminal-complex subtype, luminal A cases showed better OS than luminal B cases (log-rank, P = 0.02), as well as lower FGA, S-phase fractions, and genomic grade correlation (P < 0.02; t test).

To confirm the observation that luminal A cases in the luminal-simple subtype show a trend toward better clinical outcome than other luminal A cases, we created gene-expression centroids for the genomic subtypes based on genes from the Hu et al. [4] gene list (Additional File 1). Gene-expression centroids were subsequently applied to a previously reported combined Affymetrix BC data set [14]. For luminal A-classified tumors in the combined Affymetrix data set, improved DMFS was observed for luminal-simple samples compared with non-luminal-simple samples (Figure 4c), further supported by multivariate analysis (n = 225, P = 0.004; HR = 0.36; 95% CI = 0.18-0.73) using LN status, ER status, tumor size, and histologic grade (grade 3 versus 1 and 2) as covariates. To investigate whether luminal-simple-classified luminal A tumors also showed lower FGA in independent data sets, we analyzed the Chin et al. [3] aCGH data set by using the genomic-subtype classification from corresponding gene-expression data in the combined Affymetrix data set. Convincingly, for luminal A tumors in the Chin et al. [3] data set, luminal-simple cases showed lower FGA than did luminal-complex cases (P = 0.01; t test) and a trend towards better OS (log-rank, P = 0.16).

High-level amplifications in genomic subtypes

High-level amplifications were observed in 41 of the 66 GISTIC gain regions, involving 139 (39%) of 359 tumors. Interestingly, none of these belonged to the luminal-simple subtype. Occurrence of high-level amplifications was associated with a worse OS (log-rank, P = 0.0007). The 17q12 subtype showed the highest percentage of tumors with high-level amplifications (78%), followed by the amplifier (50%), luminal-complex (38%), basal-complex (34%), and mixed (26%) subtypes. Certain high-level amplicons were predominantly observed in specific subtypes: for example, 10p14 (basal-complex), 17q11.2 (17q12 subtype), 17q12 (17q12 subtype), 17q21.33 (17q12 subtype), and 19p13.11 (17q12 subtype). Other amplicons were observed in two subtypes: for example, 8p12 (amplifier, luminal-complex), 11q13.3 and 11q13.5 (luminal-complex, amplifier), 12p13.31 (basal-complex, amplifier), 6q23.3 (basal-complex, mixed) and 8q24.21 (luminal-complex, basal-complex).


BC is a biologically heterogeneous disease and has been stratified into molecular subtypes by using gene-expression profiling [1]. These subtypes were subsequently found to display different clinical outcomes with the HER2-enriched, basal-like, and luminal B as poor prognostic groups [2]. Several studies have used microarray-based genome-wide DNA copy-number analysis to describe recurrent, almost universally affected, chromosomal regions on 1q, 8, and 16, but also recognized that BC is a profoundly heterogeneous disease on the genomic level [3, 2124].

Here, we used high-resolution BAC aCGH to confirm the frequent CNAs on chromosomes 1q, 8q, and 16, as well as recurrent high-level amplifications on 17q12, 11q13, 8p12, and 8q24, encompassing known oncogenes, as previously described [3]. Although these CNAs may have a major impact on the progression of a significant proportion of BC, other amplicons contribute to the genomic diversity of the disease. For instance, chromosome 1q32 was highly amplified in seven (2%) of the 359 tumors and includes the MDM4 gene. A significant correlation was found between gene and transcript levels, suggesting that MDM4 is a potential oncogene in BC with a p53-inhibiting function similar to that of MDM2 [25]. Coamplification of chromosomes 8p12 and 11q13.3, occasionally combined with 11q13.5-q14.1 or 12q15, was found in a significant fraction of tumors. This suggests that the targeted genes or oncogenic pathways act synergistically and are advantageous for the tumor: for example, CCND1 (11q13.3) targeting the G1/S checkpoint, and MDM2 (12q15) targeting the p53 pathway. Previous studies also indicated frequent coamplification of 11q13 and 8p12, in which functional analyses revealed an extended complexity of these two cooperating genetic events [26]. The GISTIC algorithm [13] also captured other rare high-level gene amplifications such as the PIK3CA and MYB loci, genes frequently affected by alternative mechanisms [27, 28], supporting their role in BC development.

Unsupervised hierarchical clustering based on significant (GISTIC) CNAs categorized the cohort into six genomic subtypes with striking similarity to the previously described gene-expression subtypes and differences in clinical outcome [1]. Earlier studies recognized three genomic subtypes in BC, characterized by a 1q/16q complex and amplifier genotype, respectively [3, 21]. Their complex subtype was associated with the basal-like gene-expression subtype [3], corroborating our observation of a genomically homogenous basal-complex subtype. The 1q/16q subtype most likely corresponds to our luminal-simple subtype, whereas we, because of a larger number of tumors, were able to divide the amplifier subtype [3, 21] further into one homogenous 17q12 subtype comprising the majority of HER2-amplified cases, a luminal-complex, and an amplifier subtype. The 17q12 subtype was associated with the worst clinical outcome and was essentially unified by a few amplified regions on 17q, including the HER2 locus on 17q12, as discussed in more detail elsewhere [17]. Moreover, 8p12-amplified tumors were confined mainly to two genomic subtypes, the amplifier and luminal-complex subtypes, with high-level 8p12 amplification occurring primarily in the amplifier subtype. In agreement with the findings by Chin et al. [3], we found that tumors with gain or loss at the 8p12 GISTIC region displayed inferior survival as compared with tumors with normal 8p12 copy number (log-rank, P = 0.04 and P = 0.008, respectively).

As expected, 17 of 22 tumors from BRCA1 mutation carriers were located in the basal-complex subtype, confirming the complex genome of BRCA1 tumors [29, 30] and previous gene-expression studies [2]. Interestingly, no significant CNA difference was found between BRCA1 and non-BRCA1 tumors in the basal-complex subtype, indicating an extensive genomic homogeneity in this genomic subtype. We did observe a significantly higher FGA in BRCA1 tumors, although it should be noted that all basal-complex tumors had significantly higher FGA than did non-basal-complex tumors. These results may point toward a general DNA-repair deficiency of tumors in the basal-complex subtype, which may extend recent therapeutic opportunities beyond patients with a germline BRCA1 mutation [31]. Basal-complex tumors had frequent copy-number losses on chromosome 5q, in line with previous studies showing loss of heterozygosity (LOH) and physical deletions of 5q in BRCA1 tumors [30, 32]. Although a target gene in this region is still to be identified, several candidates do exist (for example, PIK3R1 located on chromosome 5q13.1, previously found to be homozygously deleted in a BRCA1-mutated tumor) [32].

The majority of BRCA2-mutated tumors were located in the luminal-complex subtype, the exception being a few predominantly basal-like and ER-negative BRCA2 tumors that were in the basal-complex subtype. Deletions on chromosomes 3p, 6q, and 13q14 and gains on 17q were more frequent in BRCA2 tumors as compared with other luminal-complex tumors, as also was shown by others [30, 33], and is somewhat mirrored by the higher frequency of CNAs in BRCA2 tumors. Taken together, in contrast to the scenario for BRCA1, these results indicate that specific BRCA2-associated genomic aberrations exist.

In strong contrast, tumors of the luminal-simple genomic subtype displayed a stable genome without high-level amplifications and with CNAs primarily on chromosomes 1q and 16q. This subtype included almost exclusively ER-and PgR-positive tumors of low histologic grade, and included approximately one third of all luminal A-classified samples. However, another third of luminal A tumors were found in the luminal-complex subtype, and 13%, in the amplifier subtype, suggesting heterogeneity within the current gene-expression-based classification of luminal A tumors. Luminal A tumors in the luminal-complex subtype were characterized by a significantly higher FGA and higher correlation to the genomic-grade signature [10], as well as a different pattern of genomic alterations than luminal A tumors of the luminal-simple group. Most important, the latter cases showed a trend toward better survival, supporting the aim of a clinically meaningful stratification of luminal A tumors based on their pattern of DNA CNAs.

To test this hypothesis, we constructed gene-expression centroids for the different genomic subtypes and applied them to two independent breast cancer gene-expression data sets. Intriguingly, an improved clinical outcome was observed for luminal-simple-classified samples within the luminal A subtype in a large combined Affymetrix data set, as well as lower FGA in the Chin et al. [3] data set. Corroborating findings by Chin et al., this strongly suggests that the luminal A subtype could be further divided based on genomic alterations, warranting further investigation. Previously, reports suggested that a fraction of histologic grade 3 tumors progressed from grade 1 with accumulation of genomic aberrations [34, 35]. An interesting hypothesis would be that the luminal-simple and luminal-complex division reflects a tumor-progression pathway of luminal tumors, as the frequent genomic aberrations in luminal-simple cases (+1q, -16q) are also apparent in luminal-complex samples. Moreover, luminal-complex tumors have additional genomic aberrations not present in luminal-simple tumors, such as +8q, -11q, and -13q, suggesting that these represent late genomic events. However, more in-depth studies are needed to confirm this.


Six main groups of BC with distinct genomic-aberration patterns and striking similarity to gene-expression subtypes were found. BRCA1 tumors were confined to a uniform subtype termed basal-complex, characterized by a high frequency of low-level CNAs, basal-like, ER and PgR-negative, and histologic grade 3 tumors. BRCA2 tumors clustered among luminal-complex tumors characterized mostly as luminal B, ER-positive, and histologic grade 2. The genomic subtypes were significantly associated with clinical outcome, and the observation that luminal-simple cases display a better disease outcome within the intrinsic luminal A subtype was validated in independent data sets. Finally, our data emphasize the profound molecular heterogeneity in BC. Understanding the underlying genomic and biologic mechanisms may prove useful for prognostic as well as treatment-prediction purposes.



Array-comparative genomic hybridization


breast cancer


circular binary segmentation


distant metastasis-free survival


estrogen receptor


fraction of the genome altered


gene-expression omnibus


genomic identification of significant targets in cancer


lymph node


loss of heterozygosity


overall survival


progesterone receptor.


  1. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge O, Pergamenschikov A, Williams C, Zhu SX, Lonning PE, Borresen-Dale AL, Brown PO, Botstein D: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.

    Article  CAS  PubMed  Google Scholar 

  2. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lonning PE, Brown PO, Borresen-Dale AL, Botstein D: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA. 2003, 100: 8418-8423. 10.1073/pnas.0932692100.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL, Lapuk A, Neve RM, Qian Z, Ryder T, Chen F, Feiler H, Tokuyasu T, Kingsley C, Dairkee S, Meng Z, Chew K, Pinkel D, Jain A, Ljung BM, Esserman L, Albertson DG, Waldman FM, Gray JW: Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. Cancer Cell. 2006, 10: 529-541. 10.1016/j.ccr.2006.10.009.

    Article  CAS  PubMed  Google Scholar 

  4. Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, Nobel A, Parker J, Ewend MG, Sawyer LR, Wu J, Liu Y, Nanda R, Tretiakova M, Ruiz Orrico A, Dreher D, Palazzo JP, Perreard L, Nelson E, Mone M, Hansen H, Mullins M, Quackenbush JF, Ellis MJ, Olopade OI, Bernard PS, et al: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006, 7: 96-10.1186/1471-2164-7-96.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Gene Expression Omnibus. []

  6. SCIBLU Genomics. []

  7. Jonsson G, Staaf J, Olsson E, Heidenblad M, Vallon-Christersson J, Osoegawa K, de Jong P, Oredsson S, Ringner M, Hoglund M, Borg A: High-resolution genomic profiles of breast cancer cell lines assessed by tiling BAC array comparative genomic hybridization. Genes Chromosomes Cancer. 2007, 46: 543-558. 10.1002/gcc.20438.

    Article  PubMed  Google Scholar 

  8. Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J, Speed TP: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Reyal F, Van Vliet MH, Armstrong NJ, Horlings HM, de Visser KE, Kok M, Teschendorff AE, Mook S, Van't Veer L, Caldas C, Salmon RJ, Van de Vijver MJ, Wessels LF: A comprehensive analysis of prognostic signatures reveals the high predictive capacity of proliferation, immune response and RNA splicing modules in breast cancer. Breast Cancer Res. 2008, 10: R93-10.1186/bcr2192.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Ivshina AV, George J, Senko O, Mow B, Putti TC, Smeds J, Lindahl T, Pawitan Y, Hall P, Nordgren H, Wong JE, Liu ET, Bergh J, Kuznetsov VA, Miller LD: Genetic reclassification of histologic grade delineates new clinical subtypes of breast cancer. Cancer Res. 2006, 66: 10292-10301. 10.1158/0008-5472.CAN-05-4414.

    Article  CAS  PubMed  Google Scholar 

  11. Honeth G, Bendahl PO, Ringner M, Saal LH, Gruvberger-Saal SK, Lovgren K, Grabau D, Ferno M, Borg A, Hegardt C: The CD44+/CD24-phenotype is enriched in basal-like breast tumors. Breast Cancer Res. 2008, 10: R53-10.1186/bcr2108.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Staaf J, Jonsson G, Ringner M, Vallon-Christersson J: Normalization of array-CGH data: influence of copy number imbalances. BMC Genomics. 2007, 8: 382-10.1186/1471-2164-8-382.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Venkatraman ES, Olshen AB: A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics. 2007, 23: 657-663. 10.1093/bioinformatics/btl646.

    Article  CAS  PubMed  Google Scholar 

  14. Staaf J, Ringner M, Vallon-Christersson J, Jonsson G, Bendahl PO, Holm K, Arason A, Gunnarsson H, Hegardt C, Agnarsson BA, Luts L, Grabau D, Ferno M, Malmstrom PO, Johannsson OT, Loman N, Barkardottir RB, Borg A: Identification of subtypes in human epidermal growth factor receptor 2-positive breast cancer reveals a gene signature prognostic of outcome. J Clin Oncol. 2010, 28: 1813-1820. 10.1200/JCO.2009.22.8775.

    Article  PubMed  Google Scholar 

  15. Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, Linhart D, Vivanco I, Lee JC, Huang JH, Alexander S, Du J, Kau T, Thomas RK, Shah K, Soto H, Perner S, Prensner J, Debiasi RM, Demichelis F, Hatton C, Rubin MA, Garraway LA, Nelson SF, Liau L, Mischel PS, Cloughesy TF, Meyerson M, Golub TA, Lander ES, Mellinghoff IK, et al: Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci USA. 2007, 104: 20007-20012. 10.1073/pnas.0710052104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. UCSC Genome Browser. []

  17. Staaf J, Jönsson G, Ringner M, Vallon-Christersson J, Grabau D, Arason A, Gunnarsson H, Agnarsson B, Malmström P, Johannsson O, Loman N, Barkardottir R, Borg Å: High-resolution genomic and expression analyses of copy number alterations in HER2-amplified breast cancer. Breast Cancer Res. 2010, 12: R25-10.1186/bcr2568.

    Article  PubMed  PubMed Central  Google Scholar 

  18. The R Project for Statistical Computing. []

  19. Loo LW, Grove DI, Williams EM, Neal CL, Cousens LA, Schubert EL, Holcomb IN, Massa HF, Glogovac J, Li CI, Malone KE, Daling JR, Delrow JJ, Trask BJ, Hsu L, Porter PL: Array comparative genomic hybridization analysis of genomic alterations in breast cancer subtypes. Cancer Res. 2004, 64: 8541-8549. 10.1158/0008-5472.CAN-04-1992.

    Article  CAS  PubMed  Google Scholar 

  20. Natrajan R, Lambros MB, Rodriguez-Pinilla SM, Moreno-Bueno G, Tan DS, Marchio C, Vatcheva R, Rayter S, Mahler-Araujo B, Fulford LG, Hungermann D, Mackay A, Grigoriadis A, Fenwick K, Tamber N, Hardisson D, Tutt A, Palacios J, Lord CJ, Buerger H, Ashworth A, Reis-Filho JS: Tiling path genomic profiling of grade 3 invasive ductal breast cancers. Clin Cancer Res. 2009, 15: 2711-2722. 10.1158/1078-0432.CCR-08-1878.

    Article  CAS  PubMed  Google Scholar 

  21. Fridlyand J, Snijders AM, Ylstra B, Li H, Olshen A, Segraves R, Dairkee S, Tokuyasu T, Ljung BM, Jain AN, McLennan J, Ziegler J, Chin K, Devries S, Feiler H, Gray JW, Waldman F, Pinkel D, Albertson DG: Breast tumor copy number aberration phenotypes and genomic instability. BMC Cancer. 2006, 6: 96-10.1186/1471-2407-6-96.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Naylor TL, Greshock J, Wang Y, Colligon T, Yu QC, Clemmer V, Zaks TZ, Weber BL: High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization. Breast Cancer Res. 2005, 7: R1186-1198. 10.1186/bcr1356.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Melchor L, Honrado E, Garcia MJ, Alvarez S, Palacios J, Osorio A, Nathanson KL, Benitez J: Distinct genomic aberration patterns are found in familial breast cancer associated with different immunohistochemical subtypes. Oncogene. 2008, 27: 3165-3175. 10.1038/sj.onc.1210975.

    Article  CAS  PubMed  Google Scholar 

  24. Bergamaschi A, Kim YH, Wang P, Sorlie T, Hernandez-Boussard T, Lonning PE, Tibshirani R, Borresen-Dale AL, Pollack JR: Distinct patterns of DNA copy number alteration are associated with different clinicopathological features and gene-expression subtypes of breast cancer. Genes Chromosomes Cancer. 2006, 45: 1033-1040. 10.1002/gcc.20366.

    Article  CAS  PubMed  Google Scholar 

  25. Mancini F, Di Conza G, Pellegrino M, Rinaldo C, Prodosmo A, Giglio S, D'Agnano I, Florenzano F, Felicioni L, Buttitta F, Marchetti A, Sacchi A, Pontecorvi A, Soddu S, Moretti F: MDM4 (MDMX) localizes at the mitochondria and facilitates the p53-mediated intrinsic-apoptotic pathway. EMBO J. 2009, 28: 1926-1939. 10.1038/emboj.2009.154.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kwek SS, Roy R, Zhou H, Climent J, Martinez-Climent JA, Fridlyand J, Albertson DG: Co-amplified genes at 8p12 and 11q13 in breast tumors cooperate with two major pathways in oncogenesis. Oncogene. 2009, 28: 1892-1903. 10.1038/onc.2009.34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Saal LH, Holm K, Maurer M, Memeo L, Su T, Wang X, Yu JS, Malmstrom PO, Mansukhani M, Enoksson J, Hibshoosh H, Borg A, Parsons R: PIK3CA mutations correlate with hormone receptors, node metastasis, and ERBB2, and are mutually exclusive with PTEN loss in human breast carcinoma. Cancer Res. 2005, 65: 2554-2559. 10.1158/0008-5472-CAN-04-3913.

    Article  CAS  PubMed  Google Scholar 

  28. Kauraniemi P, Hedenfalk I, Persson K, Duggan DJ, Tanner M, Johannsson O, Olsson H, Trent JM, Isola J, Borg A: MYB oncogene amplification in hereditary BRCA1 breast cancer. Cancer Res. 2000, 60: 5323-5328.

    CAS  PubMed  Google Scholar 

  29. Wessels LF, van Welsem T, Hart AA, van't Veer LJ, Reinders MJ, Nederlof PM: Molecular classification of breast carcinomas by comparative genomic hybridization: a specific somatic genetic profile for BRCA1 tumors. Cancer Res. 2002, 62: 7110-7117.

    CAS  PubMed  Google Scholar 

  30. Jonsson G, Naylor TL, Vallon-Christersson J, Staaf J, Huang J, Ward MR, Greshock JD, Luts L, Olsson H, Rahman N, Stratton M, Ringner M, Borg A, Weber BL: Distinct genomic profiles in hereditary breast tumors identified by array-based comparative genomic hybridization. Cancer Res. 2005, 65: 7612-7621.

    PubMed  Google Scholar 

  31. Fong PC, Boss DS, Yap TA, Tutt A, Wu P, Mergui-Roelvink M, Mortimer P, Swaisland H, Lau A, O'Connor MJ, Ashworth A, Carmichael J, Kaye SB, Schellens JH, de Bono JS: Inhibition of poly(ADP-ribose) polymerase in tumors from BRCA mutation carriers. N Engl J Med. 2009, 361: 123-134. 10.1056/NEJMoa0900212.

    Article  CAS  PubMed  Google Scholar 

  32. Johannsdottir HK, Jonsson G, Johannesdottir G, Agnarsson BA, Eerola H, Arason A, Heikkila P, Egilsson V, Olsson H, Johannsson OT, Nevanlinna H, Borg A, Barkardottir RB: Chromosome 5 imbalance mapping in breast tumors from BRCA1 and BRCA2 mutation carriers and sporadic breast tumors. Int J Cancer. 2006, 119: 1052-1060. 10.1002/ijc.21934.

    Article  CAS  PubMed  Google Scholar 

  33. van Beers EH, van Welsem T, Wessels LF, Li Y, Oldenburg RA, Devilee P, Cornelisse CJ, Verhoef S, Hogervorst FB, van't Veer LJ, Nederlof PM: Comparative genomic hybridization profiles in human BRCA1 and BRCA2 breast tumors highlight differential sets of genomic aberrations. Cancer Res. 2005, 65: 822-827.

    CAS  PubMed  Google Scholar 

  34. Korsching E, Packeisen J, Helms MW, Kersting C, Voss R, van Diest PJ, Brandt B, van der Wall E, Boecker W, Burger H: Deciphering a subgroup of breast carcinomas with putative progression of grade during carcinogenesis revealed by comparative genomic hybridisation (CGH) and immunohistochemistry. Br J Cancer. 2004, 90: 1422-1428. 10.1038/sj.bjc.6601658.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Roylance R, Gorman P, Papior T, Wan YL, Ives M, Watson JE, Collins C, Wortham N, Langford C, Fiegler H, Carter N, Gillett C, Sasieni P, Pinder S, Hanby A, Tomlinson I: A comprehensive study of chromosome 16q in invasive ductal and lobular breast carcinoma using array CGH. Oncogene. 2006, 25: 6544-6553. 10.1038/sj.onc.1209659.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


The present study was supported by the Swedish Cancer Society, the Knut & Alice Wallenberg Foundation, the Foundation for Strategic Research through the Lund Centre for Translational Cancer Research (CREATE Health), the Mrs. Berta Kamprad Foundation, the Gunnar Nilsson Cancer Foundation, the Swedish Research Council, the Lund University Hospital Research Funds, the American Cancer Society, the IngaBritt and Arne Lundberg Foundation, The Icelandic Centre for Research, the Research Fund of Landspitali-University Hospital, The Icelandic association: "Walking for Breast Cancer Research," The Nordic Cancer Union, the Helsinki University Hospital Research Fund, Academy of Finland (110663), the Finnish Cancer Society, and the Sigrid Juselius Foundation. The SCIBLU Genomics center is supported by governmental funding of clinical research within the national health services (ALF) and by Lund University.

We thank Dr. Kirsimari Aaltonen and R. N. Hanna Jäntti for their help with the patient data and gratefully acknowledge the Finnish, Icelandic, and Swedish Cancer Registry for the cancer data, and Professor Johannes Björnsson and his staff at the Department of Pathology Landspitali-University hospital for support and help with tissue samples.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Göran Jönsson.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

GJ, JS, and ÅB conceived of the study. JS, GJ, AA, JVC, KH, CH, OK, RF, CS, and HG performed array experiments. JS performed data analysis with support by MR, JVC, and GJ. GJ and JS wrote the manuscript with the assistance of MR, JVC, and ÅB. BA, NL, LL, OJ, KA, PH, CB, HO, PM, RB, and HN contributed samples and clinical information. All authors approved the final manuscript.

Göran Jönsson, Johan Staaf contributed equally to this work.

Electronic supplementary material


Additional file 1: A pdf document containing supplementary information about methods used and data processing. (PDF 80 KB)

Additional file 2: An Excel table listing the 133 significant GISTIC regions. (XLS 37 KB)


Additional file 3: A pdf file containing one supporting table and six supporting figures. The supporting table describes recurrent high-level amplifications found in the 359 tumors. Supporting Figure 1 describes differences in CNAs and FGA associated with clinical variables in the 359 tumors. Supporting Figure 2 describes differences in CNAs, FGA, and outcome associated with the intrinsic gene-expression subtypes in the 359 tumors. Supporting Figure 3 shows CNA frequency for the intrinsic gene-expression subtypes in the 359 tumors. Supporting Figure 4 describes CNAs associated with the genomic subtypes. Supporting Figure 5 describes differences in CNAs between the luminal-complex and amplifier genomic subtypes. Supporting Figure 6 describes differences and frequencies of CNAs between luminal A tumors classified as luminal-simple or luminal-complex, as well as luminal B tumors classified as luminal-complex. (PDF 826 KB)

Additional file 4: A pdf file showing CNA frequency in the genomic subtypes. (PDF 453 KB)

Authors’ original submitted files for images

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jönsson, G., Staaf, J., Vallon-Christersson, J. et al. Genomic subtypes of breast cancer identified by array-comparative genomic hybridization display distinct molecular and clinical characteristics. Breast Cancer Res 12, R42 (2010).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: