Skip to main content
  • Research article
  • Open access
  • Published:

Prognostic stromal gene signatures in breast cancer



Global gene expression analysis of tumor samples has been a valuable tool to subgroup tumors and has the potential to be of prognostic and predictive value. However, tumors are heterogeneous, and homogenates will consist of several different cell types. This study was designed to obtain more refined expression data representing different compartments of the tumor.


Formalin-fixed paraffin-embedded stroma-rich triple-negative breast cancer tumors were laser-microdissected, and RNA was extracted and processed to enable microarray hybridization. Genes enriched in stroma were identified and used to generate signatures by identifying correlating genes in publicly available data sets. The prognostic implications of the signature were analyzed.


Comparison of the expression pattern from stromal and cancer cell compartments from three tumors revealed a number of genes that were essentially specifically expressed in the respective compartments. The stroma-specific genes indicated contribution from fibroblasts, endothelial cells, and immune/inflammatory cells. The gene set was expanded by identifying correlating mRNAs using breast cancer mRNA expression data from The Cancer Genome Atlas. By iterative analyses, 16 gene signatures of highly correlating genes were characterized. Based on the gene composition, they seem to represent different cell types. In multivariate Cox proportional hazard models, two immune/inflammatory signatures had opposing hazard ratios for breast cancer recurrence also after adjusting for clinicopathological variables and molecular subgroup. The signature associated with poor prognosis consisted mainly of C1Q genes and the one associated with good prognosis contained HLA genes. This association with prognosis was seen for other cancers as well as in other breast cancer data sets.


Our data indicate that the molecular composition of the immune response in a tumor may be a powerful predictor of cancer prognosis.


Gene expression profiling has enabled novel classification of breast cancer into different subgroups. At least five molecular subgroups (basal-like, normal-like, HER2-positive, and luminal A and luminal B) have been identified [1,2] and additional subclasses continue to be suggested [3-7].

Gene expression analyses of tumor samples are generally performed on whole tumor homogenates and thereby represent a pattern that reflects the expression from all cell types present in the tumor. The tumor microenvironment, comprising a large variety of cells, such as fibroblasts, immune cells, and endothelial cells, can constitute a significant part of the tumor and substantially contribute to observed expression patterns. Cells in the microenvironment can influence cancer progression [8-11] and have been shown to predict tumor outcome and therapy response in breast carcinomas [12-14]. One way to obtain cancer cell and stromal compartment-specific expression patterns is to isolate the compartments by laser capture microdissection (LCM).

For routine histological diagnosis of surgically removed tumors, formalin-fixed paraffin-embedded (FFPE) tissue is normally used and thus a wide range of FFPE tumors are available for gene expression analyses. However, fixation and embedding have a detrimental effect on RNA quality, resulting in fragmentation and chemical modifications and making it of less use for expression analyses [15]. In this study, we have established a procedure for global-gene expression analysis using LCM on FFPE triple-negative breast cancers. Isolation of stromal and cancer cell compartments of the tumor with subsequent analysis of the global mRNA expression revealed compartment-specific gene expression. Expanding the stroma-specific gene set by identifying genes with correlating expression levels using tumor data from The Cancer Genome Atlas (TCGA) database gave rise to 16 gene signatures of stromal genes with highly correlating gene expression. Two signatures, consisting of genes related to an immune response, are of particular interest since they are prognostic in multivariate Cox proportional hazard models in several breast cancer data sets and data sets of other cancers.


Tumor material

Tumor specimens, from subjects that had given informed consent, were obtained from Skåne University Hospital, Malmö. Ethical permission has been obtained from the local Research Ethics Committee in Lund (Dnr 2009/658). Information about the tumors was obtained from the pathology reports. All tumors were negative for estrogen and progesterone receptors and had no ERBB2 (HER2) amplification. Tumors with sufficient amount of cells in the stromal compartment to allow conclusive microarray analyses were selected. Two of the tumors used for microarray analysis were high-grade, grade III (3 + 3 + 3 for tubule formation, nuclear pleomorphism, and mitotic count, respectively), and one was low-grade, NHG (Nottingham histological grade) grade I (1 + 3 + 1 according to the NHG grading system). Two of the tumors were invasive ductal carcinomas, whereas one was reported as a medullary carcinoma.

Tissue preparation

Five consecutive sections (5 μm) of each tumor were prepared on a microtome and mounted onto polyethylene terephtalate (PET) membrane slides (Leica Microsystems, Wetzlar, Germany). When indicated, 30% ethanol was included. Mounted tissue sections were allowed to dry for 30 minutes in room temperature prior to incubation in −20°C for 24 hours for optimal adhesion. To preserve the RNA quality in the archived FFPE tissue specimens, the blocks used for biomarker analysis during the routine prognostic procedure were stored in 4°C.


During method development, staining of the tissue sections with either a cresyl violet LCM staining kit (Ambion, part of Thermo Fisher Scientific, Waltham, MA, USA) or standard hematoxylin-and-eosin staining was evaluated. For further analysis, cresyl violet staining was used. Tissue sections were initially rinsed 2 × 1 minute with xylene for deparaffinization and rinsed in 100%, 75%, and 50% ethanol for 30 seconds. Rehydration of the tissue with diethylpyrocarbonate (DEPC)-treated water was performed prior to 40-second incubation in staining solution, followed by additional rinsing with DEPC-treated water for 30 seconds and dehydration with 100% ethanol twice for 30 seconds. Sections were thereafter dried in room temperature and immediately used for LCM or stored at 4°C for up to 1 week.

Laser capture microdissection

LCM to isolate tumor compartments was performed on a Leica LMD6500. After microdissection, the tissue was collected in 0.5-mL polymerase chain reaction (PCR) tube caps containing Allprep RNA/DNA FFPE kit lysis buffer (Qiagen, Hilden, Germany) with Proteinase K.

RNA extraction and hybridization

RNA was extracted by using Allprep RNA/DNA FFPE kit (Qiagen) and evaluated with Nanodrop and Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) in accordance with standard procedures, including calculation of RNA integrity number (RIN) values as previously described [16]. Isolated RNA samples with a 260/280 ratio of at least 1.8 and RIN value of more than 2.0 were used for amplification with SensationPlus FFPE Amplification and WT labeling kit (Affymetrix, Santa Clara, CA, USA) in accordance with the protocol of the supplier. The resulting ds-cDNA was hybridized to a Human Gene 1.0 ST array (Affymetrix).

Data sets

TCGA expression data were downloaded November 2013 (breast cancer), January 2014 (kidney and lung cancer), and March 2014 (head and neck cancer) from the TCGA database [17]. The data were log2-transformed after the addition of 1 to each normalized value. Clinical and follow-up data were downloaded in May 2014. The basic patient characteristics of the TCGA breast cancer data used for follow-up analyses can be found in Additional file 1: Table S1. The NKI295 [18], Wang [19], and TransBig [20] datasets were downloaded in an assembled form, as described [21]. The original array data can be found at [22] (NKI295) or on the National Center for Biotechnology Information Gene Expression Omnibus website as GSE2034 and GSE7390.

Data analysis

All analyses were done with R by using the limma, survival, and cluster packages. Breast cancer subtypes were determined by using the nearest correlations with the PAM50 centroids [23]. For analyses of gene signatures from the Wang and TransBig data sets, probes with a mean log2 signal of more than 6 were included, and for the NKI295 dataset a threshold of −0.3 was used.


Isolation of RNA from laser capture-microdissected FFPE breast tumors

To obtain RNA of sufficient quality from tumor compartments isolated with LCM, sectioning, mounting, deparaffinization, and staining steps were performed with the highest possible purity to avoid RNase contamination. Adhesion of the tissue section to the membrane slide was obtained with 30% ice-cold ethanol (Additional file 2: Table S2) to avoid further degradation. As the standard histological staining with hematoxylin and eosin may influence the RNA integrity negatively [24], cresyl violet was used for tissue staining. We obtained similar RNA quality and amount from a cresyl violet-stained section as from a non-stained section either with or without rehydration (Additional file 3: Table S3 and Additional file 4: Figure S1). Since cresyl violet staining with rehydration was the only procedure that resulted in both an acceptable RNA quality and adequate tissue morphology, this was henceforth used as staining procedure.

To collect epithelial and stromal compartments from breast tumor tissue, five consecutive tissue sections from triple-negative tumors were laser-microdissected (Figure 1A-C). We found that collection of 25 to 28 mm2 from the cancer cell compartment yielded a sufficient amount of RNA (Table 1A). However, isolation of stromal compartments with or without inflammation showed that stroma with few inflammatory cells did not yield enough material for downstream applications and limited us to analyze only tumors with inflammatory stroma. Analysis of three triple-negative tumors with inflammatory stroma showed a similar relationship between dissected tissue area and extracted RNA amount for both compartments (Table 1B-D and Figure 1D-F).

Figure 1
figure 1

RNA integrity from laser capture-microdissected malignant epitheluim and surrounding stromal in breast cancer tumor sections. Cresyl violet staining of a representative triple-negative breast tumor section (A) with defined (B) and dissected (C) cancer cells. Electropherogram depicting RNA from microdissected cancer (D) and stromal breast tumor (E) cells in comparison with RNA from total tissue section (F). RNA length is indicated on the x-axis, and the y-axis corresponds to the fluorescence units (FU). RNA integrity number (RIN) values are calculated according to a standardized algorithm by using various features correlated to RNA integrity. nt, nucleotides.

Table 1 Collected RNA and RNA integrity number values of laser-microdissected breast tumor cells using Bioanalyzer and Nanodrop analysis

Identification of stroma- and epithelium-specific gene expression

Extracted and amplified samples were hybridized to an Affymetrix Human Gene ST 1.0 array. To identify genes that are selectively expressed in either the stromal or the cancer cell compartment, the probe list was shrunk by removing probes for which every sample had a log2 expression value below a threshold (less than 7) and by applying a variance filter, which removed probes with variance of less than 0.15. The remaining 5,971 probes were analyzed for difference in expression between stroma and cancer cells by using the limma package in R.

Even if only three tumors were analyzed, this approach identified 107 probes (Additional file 5: Table S4), with an adjusted P value of less than 0.05, that had higher levels in stroma and 48 probes (Additional file 6: Table S5) that were higher in cancer cells with the same adjusted P value (Additional file 7: Figure S2A-B). Several of the genes enriched in the stroma compartment are genes encoding matrix proteins, such as collagens (for example, COL1A1) and decorin (DCN). There was also enrichment of genes related to an immune response such as chemokines (CXCL12 and CCL19) along with their receptors (CXCR4 and CCR7), matrix metalloproteases (MMP2 and MMP9), the T cell-specific genes CD4 and granzyme K (GZMK), and the B cell-specific immunoglobulin genes such as both heavy (IGHA1, IGHA2, IGHV1-5, and IGHM) and light (IGLJ3, IGLV6-57, IGKC, IGKV1-5, and IGK@) chain-encoding genes. This indicates that the analysis has worked technically and that it identifies genes that are essentially stroma-specific in their expression. On the other hand, several of the genes in the cancer cell compartment are epithelial, such as the cadherin family member desmoglein (DSG2), intracellular junction protein desmoplakin (DSP), the epithelium-specific transcription factors (ELF3 and 5), keratin 7 (KRT7), claudin 4 and 7 (CLDN4 and 7), and integrin beta 8 (ITGB8).

Stroma-specific gene sets

The genes highly expressed in stroma most likely represent contributions from several different cell types that may be at different maturation stages that are located mainly in the stromal compartment. Therefore, the expression levels of these genes in a tumor homogenate may potentially reflect the combination of these cell types in the tumor. This raises the possibility to identify gene signatures that can be used as an estimate of the molecular and of the cellular composition of the stroma in a tumor. However, the method we have used, LCM of FFPE material followed by amplification and hybridization, is not optimal for an accurate estimate of RNA levels and will conceivably have low sensitivity. Therefore, to expand the set of genes that may constitute specific stromal signatures, we used TCGA mRNA data from 982 primary breast cancers and identified all genes whose expression level correlated with a correlation coefficient above 0.85 with at least one of the original genes identified as enriched in the stromal compartment of the tumors in our analysis. This set was further expanded by including genes that had a correlation coefficient above 0.89 with one of the genes in the expanded set. This led to an enlarged set comprising 361 genes. All of these genes were also found to be expressed at higher levels in the stroma in the laser capture-microdissected samples (Additional file 7: Figure S2C). To define signatures of highly correlating genes, an iterative correlation analysis was performed yielding clusters of genes, in which all genes in a cluster had an average correlation coefficient above 0.90 with all other genes in the set. Owing to the large amount of genes typical of an immune or an inflammatory response, the threshold was set to 0.91 for these genes. This step yielded 16 gene signatures which contained at least three genes (Table 2 and Additional file 7: Figure S2C).

Table 2 Stromal gene signatures of highly correlating genes

For each signature, an aggregated value was calculated for each tumor by taking the arithmetic mean of the log2 expression of the genes in the set. The breast cancer tumors in the TCGA database were thereafter clustered by using the signature scores as variables (Figure 2A). There was no obvious relationship between this clustering and the breast cancer molecular subtype, determined by the PAM50 centroids. However, the gene signatures were separated into three major groups: one with primarily matrix/fibroblast-related genes (signatures 1 and 2), one with endothelium-associated genes (signatures 4 and 5), and one with genes typical for immune/inflammatory cells. We thereafter compared the aggregated expression levels for the signatures in the molecular subtypes (Figure 2B demonstrates the signature with the largest number of genes, all signatures are shown in Additional file 8: Figure S3). Luminal B tumors were low in expression of all stromal signatures, whereas basal-like tumors were low in matrix/fibroblast and endothelial genes but high in immune/inflammatory signatures. HER2-enriched tumors were low in endothelial genes and luminal A in immune/inflammatory signatures. Thus, the tumor group associated with good prognosis (luminal A) was high in both matrix and endothelial genes.

Figure 2
figure 2

Expression levels of gene signatures in breast cancer tumors. (A) The means of the log2 expression levels of the genes in each signature were calculated for 982 breast cancers by using data from The Cancer Genome Atlas project. The tumors were subjected to non-supervised hierarchical clustering based on the signatures. Subtypes, determined by PAM50 centroids, are indicated in color. (B) The mean log2 expression levels of the genes in the signatures with the largest number of genes representing (ECM) genes (signature 1), endothelial genes (signature 4), and immune/inflammatory genes (signature 6) are shown for the indicated breast cancer subgroups. Asterisk indicates that the value is lower than in the groups without asterisk; P <10−7 (analysis of variance followed by Tukey’s honest significance test) except signature 4 (HER2 versus luminal A, P = 0.0016) and for signature 6 (luminal A versus HER2, P = 0.030; luminal B versus HER2, P = 0.000018; and luminal B versus normal, P = 0.0000062).

Prognostic value of gene signatures

To assess whether the identified signatures of highly correlated genes may have prognostic implications, univariate Cox proportional hazard regression analyses were performed for the tumors from the TCGA database that had follow-up data (Table 3). The standardized mean of log2 expression levels of all the genes in a set were computed and used for the analyses. None of the gene signatures, except gene set 5, which is a signature with endothelial-related genes, was associated with risk of recurrence. The hazard ratio (HR) of signature 5 was 0.714.

Table 3 Univariate Cox proportional hazard analysis of breast cancers using standardized mean values for the gene sets as variables and new tumor event as end point

In a full multivariate model, using all gene sets, some signatures had P values of less than 0.05 (not shown), indicating that a multivariate model may be more appropriate. Therefore, we performed both forwards and backwards selection to identify an optimal model. The P value of the likelihood ratio test of the model was used as an indicator of the quality of the model. This resulted in an optimal model that contained four signatures. A multivariate analysis with these signatures indicated that two (1 and 13) were associated with an increased and two (5 and 12) with a decreased risk for new tumor event (Table 4A). To analyze whether the HRs may be due to confounding effects of other parameters, we included tumor size, node stage, and estrogen receptor status in the model and stratified for stage (Table 4B). We also analyzed a model in which stratification was done for PAM50 subtype (Table 4C). This showed that the gene sets, as well as node stage and estrogen receptor status, are independent prognostic markers. The only exception was signature 5, which had a P value of more than 0.05 upon stratification for PAM50 subtype.

Table 4 Mulivariate Cox proportional hazard model using standardized mean values for the gene signatures as variables

Gene signatures 12 and 13 both have an inflammatory/immune profile with set 12 containing HLA genes and set 13 C1Q genes. These were the most influential signatures in the multivariate Cox model and their opposing HRs indicate that the characteristic of the immune response in a tumor has prognostic value. Multivariate analyses with only these sets demonstrated significant HRs for both sets in models both without or with adjustments for the clinicopathological parameters and PAM50 subtype (Table 4D-F).

Stromal gene signatures in other tumors

The analyses indicate that the profile of an immune/inflammatory response in the tumor has a prognostic value. To analyze whether this is general and also applies to other tumors, we used RNA HiSeq data from four other cancer forms available in the TCGA database (Figure 3A). For both renal clear cell carcinoma, with new tumor event as the end point, and lung squamous cell carcinoma, with death as the end point, a similar pattern with no significant roles of the gene signatures as isolated variables, but with opposing HRs in a multivariate model, was detected. A similar but not significant tendency was seen for lung adenocarcinoma and head and neck squamous cell carcinoma.

Figure 3
figure 3

Hazard ratio of the C1Q and HLA signatures in different tumor forms and other breast cancer data sets. Lines represent confidence intervals (95%) from uni- and multivariate Cox proportional hazard analyses using the C1Q and HLA gene signatures as variables of different tumors using The Cancer Genome Atlas data (A) or three other breast cancer data sets (B). End points are death, new tumor event (NTE), distant metastasis-free survival (DMFS), or recurrence-free survival (RFS). The numbers of cases and events for each data set, within parentheses, are shown for each data set.

We also analyzed the signatures in three other breast cancer data sets, in which the expression had been analyzed with microarray technology (Figure 3B). Here, a threshold of mean log2 expression level for the probes was applied, excluding a few genes from calculation of the gene signature values. For two of the three data sets (NKI295 [18] and TransBig [20]) analyzed, significant associations with opposing HRs was seen in analogy with the breast cancer TCGA data. In the third set (Wang [19]), a similar but, for the C1Q signature, not significant association was observed.

An immune/inflammatory score

The observed prognostic values of the C1Q and HLA signatures raises the possibility that a prognostic score, based on the ratio of the two signatures, could be calculated. Therefore, we defined a C1Q-HLA score as the difference in mean log2 expression of the two signatures. When Kaplan-Meier curves for the TCGA tumors were constructed on the basis of quantiles of this score, it is evident that the top 1/3 is clearly at higher risk than the bottom 1/3 for recurrence. In the latter group, there were hardly any new tumor events within five years after the initial treatment (Figure 4A).

Figure 4
figure 4

Kaplan-Meier curves based on the C1Q-HLA score. A score was calculated as the mean value of the C1Q signature minus the mean value of the HLA signature for each of 504 tumors with survival data from The Cancer Genome Atlas database. (A) All breast cancers were grouped in three quantiles of the C1Q-HLA score, and Kaplan-Meier curves were generated. Basal-like (B) and luminal B (C) breast cancers were grouped on the basis of the C1Q-HLA score above or below the median. Cox indicates the hazard ratio (HR) and P values obtained when the C1Q-HLA score was used as a continuous variable together with lymph node status in a multivariate Cox proportional hazard model.

We also evaluated the molecular subgroups. For HER2-enriched, luminal A, and normal-like tumors, the number of events were too few for relevant analysis, but for basal-like and luminal B tumors, Kaplan-Meier curves based on the C1Q-HLA score above or below the median were constructed (Figure 4B and C). We also evaluated the C1Q-HLA score together with lymph node status in a multivariate Cox proportional hazard model using the basal and luminal B tumors (HR and P values are shown in Figure 4). All of these analyses gave P values below 0.05 for the score as prognostic indicator, except the log-rank test in luminal B tumors, in which the P value was 0.070. Taken together, the analyses indicate that a C1Q-HLA score is of prognostic value also in these isolated subgroups.

Other immune/inflammatory genes

The opposing HRs obtained with the HLA and C1Q signatures in several cancer data sets indicate that the profile of the inflammatory components in a tumor has prognostic importance. To screen for other inflammatory genes that may provide prognostic information, we analyzed all immune/inflammation-related genes obtained in the expanded stromal gene set pairwise with both the C1Q and the HLA signatures in multivariate Cox proportional hazard models (Tables 5 and 6). Using a cutoff of 0.01 for the P value of the gene in the Cox models resulted in 26 genes with an HR of more than 1 in a multivariate model with the HLA signature and 53 genes with an HR of less than 1 in a model with the C1Q set. To further analyze these genes, a correlation matrix was generated by using the expression data from TCGA tumors (Figure 5). A pattern can be discerned: genes that are associated with higher risk in general are more correlated with each other than with genes that are associated with lower risk, and vice versa.

Table 5 Hazard ratios and P values for indicated genes obtained in multivariate Cox proportional hazard models together with the HLA signature
Table 6 Hazard ratios and P values for indicated genes obtained in multivariate Cox proportional hazard models together with the C1Q signature
Figure 5
figure 5

Correlation matrix of expression levels of genes found to provide altered hazard together with either the C1Q or the HLA signatures. The correlation coefficients were calculated by using the log2 expression levels of the genes that were found to be associated with either a higher (red bar) or a lower (green bar) hazard in a multivariate Cox analysis together with the HLA or C1Q signature, respectively. The Cancer Genome Atlas breast cancer data were used for the analysis. The genes are listed in Tables 5 and 6. HR, hazard ratio.


The tumor stroma is composed of several non-malignant cell types that together build up the tumor microenvironment which may promote both tumor progression and metastasis [8,10,11,25]. In this study, we have used LCM for isolation of epithelial and stromal compartments of FFPE triple-negative breast tumors to enable analysis of compartment-specific gene expression. The usage of FFPE tumors is advantageous because of the large amount of routinely stored tumors, but gene expression analysis of this material has been challenging because of chemical modifications and degradation of the RNA [26].

Methodology development has improved the possibilities of accurate analysis of whole transcript expression from FFPE material. Studies thus far have mainly been technically oriented [27,28], but also tumor grade-, prognostic-, and subtype-specific expression for identification of novel gene expression profiles has been analyzed [29-33].

In this study, we have optimized the RNA preparatory steps and used an amplification and labeling system specifically developed for analysis of degraded RNA, along with a microarray containing probes detecting sites across the whole transcript. The genes we thereby could detect as enriched in the stroma compartment could to a large extent be identified as typical for immune cells, such as the Ig genes, and extracellular matrix, conceivably emanating from fibroblasts. This indicates that the method, despite the limited quality and quantity of the material, is able to capture compartment-specific expression patterns.

We obtained a set of 107 genes expressed at substantially higher levels in stroma. By expanding this set using correlation analysis of TCGA tumor data and successive iterations, we could define 16 gene signatures with high intra-set correlation. All of the genes identified in the expansion were also found in our data to be expressed at higher levels in the stromal compartments, supporting that they are all related to stroma.

The gene signatures conceivably represent a combination of different stromal cell types or different maturation stages of the same cell type. Sets 1 and 2 mainly contain genes typical of extracellular matrix such as DCN, LUM, VCAN, and collagens. These genes have been seen to be highly expressed in stroma in other studies [10,11,34,35]. We also find FAP, a typical myofibroblast marker, as well as POSTN, which is synthesized by fibroblasts in set 1. It is likely that sets 1 and 2 could be used as indicators of the amount of fibroblasts and stroma in a tumor.

Sets 4 and 5 contain genes typical of endothelial tissue, both actual angiogenic regulators TIE1/TIE2 (TEK), ARHGEF15, ROBO4, and ELTD1 and other endothelial markers such as CD34, CLEC14A, and ESAM. Several (7/12) of the genes in gene sets 4 and 5 were identified in the angiogenesis signature obtained from 1,250 tumors from different cancer types [36].

The remaining gene sets contain genes typical of immune and inflammatory cells, such as T cell-associated CD3 (CD3G, CD3D and CD3E), CD4, ZAP70, GIMAP and granzymes (GZMA/GZMK and GZMM), major histocompatibility complex (MHC) II-encoding HLAs, and more general lymphocyte-associated genes such as LAIR1 and LST1.

Cox proportional hazard analyses revealed only limited value of the gene signatures in univariate models, which suggest that neither the amount of stroma nor the extent of vascularization or the magnitude of inflammatory components has prognostic information as isolated variables. However, in a multivariate model, four of the gene signatures were significant. Sets 1 and 13 are associated with an increased risk for new tumor event, whereas gene sets 5 and 12 negatively influenced the HR.

A typical fibroblast marker (FAP) was identified in gene set 1 together with fibroblast-expressed extracellular matrix-associated genes, such as collagens, whereas gene set 13 contained complement factor C1Q. FAP has been identified as a prognostic marker in various cancer studies, including breast and lung cancer, and has been suggested to be a potential target in solid tumors [37-40]. Complement initiator C1q proteins can be derived from various stromal cells and has, in addition to its role in immune complex recognition, been found to have a proangiogenic effect in wound healing [41,42] and to drive carcinogenesis [43]. Complement activation has, on the other hand, been considered to have tumor suppressive properties, and C1q can induce apoptosis in prostate cancer cells [44,45], indicating a complex and probably context-dependent role of C1Q in tumor development.

The gene sets with negative HR contain several MHC-II-encoding HLA genes (gene set 12) and angiogenesis-regulating genes, such as TEK and ELTD1 (gene set 5). ELTD1 expression has previously been shown to be associated with an improved outcome [36] which may explain why this signature negatively influences the hazard. Furthermore, higher levels of HLA-DR have been shown to predict better prognosis of invasive ductal carcinomas [46]. The signatures were also significant when lymph node status was included and when stratifying for molecular subtype, indicating that they are independent prognostic markers.

A more limited Cox model based on only the immune gene signatures 12 and 13, with or without lymph node status and stratification for subtype in the model, could also predict recurrence. The opposing effects by these signatures on the risk indicate that the composition of the immune response in the tumor is of importance for the progression. This conclusion is further supported by the fact that the signatures are predictive in a multivariate Cox model in several other tumor forms and other breast cancer data sets. It suggests that the importance of specific components of the immune response for progression of the disease may be a general phenomenon applicable to several cancer forms.

The prognostic signatures contain mainly C1Q and HLA genes, respectively. C1Q genes have been shown to be produced by a number of cells, including monocytes, macrophages, and dendritic cells [42]. However, the expression of C1Q has previously been reported to be highest in immature immune cells that are known to reflect a state of immune paralysis in cancer immunology [47,48]. Therefore, higher expression levels of these genes could potentially reflect a higher ratio of immature, non-functional antigen-presenting cells in the tumor. Likewise, higher expression levels of HLA genes may be indicative of a more active immune response. This assumption is further supported by the screen (Table 5) for immune genes that are associated with a higher risk for recurrence in a multivariate model with the HLA signature. Here, we found many genes typical for negative immune signaling. These include inhibitory co-receptors that are negative regulators of immune responses (LAIR1, LILRB1, LILRB2, and LILRB4) and macrophage genes (CD68, ITGB2, and CD86). On the other hand, genes that were associated with a lower risk in a multivariate model with the C1Q signature (Table 6) represented genes coding for proteins involved in a cytotoxic immune response (for example, GZMH, GZMA, GZMK, CD3D, CD3E, CD3G, CD247 (CD3ζ), CD8A, CD27, CD52, CD96, PYHIN1, SLAMF6, and IL12RB1). The number of genes with both high correlation of their expression and similar prognostic value indicates that it may be possible to identify fairly large gene sets that could be used as markers for the profile of an intra-tumor immune response. It thereby highlights potential factors determining this profile.

The results support the idea that the molecular profile of an immune response, rather than the amount of immune or inflammatory cells, is an important prognostic marker in breast cancer and in other cancer forms. Such a hypothesis is further supported by other results, including the finding that the amount of a specific set of CD4+ T cells, namely follicular helper T cells, predicts breast cancer survival [49].


We have developed a methodological procedure for isolation and characterization of compartment-specific genes by using LCM on FFPE triple-negative breast cancers. Through expansion of the gene list with data from TCGA of correlating genes, we could identify two immune/inflammatory signatures with prognostic information. The results further underscore the importance of the composition rather than the extent of the immune response as a prognostic indicator in cancer. It also provides novel signatures that are stable indicators of prognosis and valid for some other cancer types as well and highlights genes that may be of particular importance when analyzing the composition of an immune response in a tumor.





formalin-fixed paraffin-embedded


hazard ratio


laser capture microdissection


major histocompatibility complex


Nottingham histological grade


polyethylene terephtalate


RNA integrity number


The Cancer Genome Atlas


  1. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–52.

    Article  CAS  PubMed  Google Scholar 

  2. Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A. 2003;100:8418–23.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Farmer P, Bonnefoi H, Becette V, Tubiana-Hulin M, Fumoleau P, Larsimont D, et al. Identification of molecular apocrine breast tumours by microarray analysis. Oncogene. 2005;24:4660–71.

    Article  CAS  PubMed  Google Scholar 

  4. Herschkowitz JI, Simin K, Weigman VJ, Mikaelian I, Usary J, Hu Z, et al. Identification of conserved gene expression features between murine mammary carcinoma models and human breast tumors. Genome Biol. 2007;8:R76.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Hu Z, Fan C, Livasy C, He X, Oh DS, Ewend MG, et al. A compact VEGF signature associated with distant metastases and poor outcomes. BMC Med. 2009;7:9.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Prat A, Parker JS, Karginova O, Fan C, Livasy C, Herschkowitz JI, et al. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12:R68.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Sabatier R, Finetti P, Mamessier E, Raynaud S, Cervera N, Lambaudie E, et al. Kinome expression profiling and prognosis of basal breast cancers. Mol Cancer. 2011;10:86.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Dumont N, Liu B, Defilippis RA, Chang H, Rabban JT, Karnezis AN, et al. Breast fibroblasts modulate early dissemination, tumorigenesis, and metastasis through alteration of extracellular matrix characteristics. Neoplasia. 2013;15:249–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Bissell MJ, Radisky DC, Rizki A, Weaver VM, Petersen OW. The organizing principle: microenvironmental influences in the normal and malignant breast. Differentiation. 2002;70:537–46.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Allinen M, Beroukhim R, Cai L, Brennan C, Lahti-Domenici J, Huang H, et al. Molecular characterization of the tumor microenvironment in breast cancer. Cancer Cell. 2004;6:17–32.

    Article  CAS  PubMed  Google Scholar 

  11. Ma XJ, Dahiya S, Richardson E, Erlander M, Sgroi DC. Gene expression profiling of the tumor microenvironment during breast cancer progression. Breast Cancer Res. 2009;11:R7.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, et al. Stromal gene expression predicts clinical outcome in breast cancer. Nat Med. 2008;14:518–27.

    Article  CAS  PubMed  Google Scholar 

  13. Cleator SJ, Powles TJ, Dexter T, Fulford L, Mackay A, Smith IE, et al. The effect of the stromal component of breast tumours on prediction of clinical outcome using gene expression microarray analysis. Breast Cancer Res. 2006;8:R32.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Farmer P, Bonnefoi H, Anderle P, Cameron D, Wirapati P, Becette V, et al. A stroma-related gene signature predicts resistance to neoadjuvant chemotherapy in breast cancer. Nat Med. 2009;15:68–74.

    Article  CAS  PubMed  Google Scholar 

  15. Penland SK, Keku TO, Torrice C, He X, Krishnamurthy J, Hoadley KA, et al. RNA expression analysis of formalin-fixed paraffin-embedded tumors. Lab Invest. 2007;87:383–91.

    CAS  PubMed  Google Scholar 

  16. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, et al. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol. 2006;7:3.

    Article  PubMed  PubMed Central  Google Scholar 

  17. The Cancer Genome Atlas Research Network. Accessed date May 2, 2014.

  18. van de Vijver MJ, He YD, van’t Veer LJ, Dai H, Hart AA, Voskuil DW, et al. A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002;347:1999–2009.

    Article  PubMed  Google Scholar 

  19. Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, et al. Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005;365:671–9.

    Article  CAS  PubMed  Google Scholar 

  20. Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, et al. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007;13:3207–14.

    Article  CAS  PubMed  Google Scholar 

  21. Mackay A, Weigelt B, Grigoriadis A, Kreike B, Natrajan R, A'Hern R, et al. Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement. J Natl Cancer Inst 2011; 103:662–673.

  22. The Computational Cancer Biology. Accessed date June 5, 2014.

  23. Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160–7.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Clement-Ziza M, Munnich A, Lyonnet S, Jaubert F, Besmond C. Stabilization of RNA during laser capture microdissection by performing experiments under argon atmosphere or using ethanol as a solvent in staining solutions. RNA. 2008;14:2698–704.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Quail DF, Joyce JA. Microenvironmental regulation of tumor progression and metastasis. Nat Med. 2013;19:1423–37.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. von Ahlfen S, Missel A, Bendrat K, Schlumpberger M. Determinants of RNA quality from FFPE samples. PLoS One. 2007;2:e1261.

    Article  Google Scholar 

  27. Mittempergher L, de Ronde JJ, Nieuwland M, Kerkhoven RM, Simon I, Rutgers EJ, et al. Gene expression profiles from formalin fixed paraffin embedded breast cancer tissue are largely comparable to fresh frozen matched tissue. PLoS One. 2011;6:e17163.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Thomas M, Poignee-Heger M, Weisser M, Wessner S, Belousov A. An optimized workflow for improved gene expression profiling for formalin-fixed, paraffin-embedded tumor samples. J Clin Bioinforma. 2013;3:10.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Gabrovska PN, Smith RA, Tiang T, Weinstein SR, Haupt LM, Griffiths LR. Development of an eight gene expression profile implicating human breast tumours of all grade. Mol Biol Rep. 2012;39:3879–92.

    Article  CAS  PubMed  Google Scholar 

  30. Nishio M, Naoi Y, Tsunashima R, Nakauchi C, Kagara N, Shimoda M, et al. 72-gene classifier for predicting prognosis of estrogen receptor-positive and node-negative breast cancer patients using formalin-fixed, paraffin-embedded tumor tissues. Clin Breast Cancer. 2013;14:e73–80.

    Article  PubMed  Google Scholar 

  31. Waddell N, Cocciardi S, Johnson J, Healey S, Marsh A, Riley J, et al. Gene expression profiling of formalin-fixed, paraffin-embedded familial breast tumours using the whole genome-DASL assay. J Pathol. 2010;221:452–61.

    CAS  PubMed  Google Scholar 

  32. Budczies J, Weichert W, Noske A, Muller BM, Weller C, Wittenberger T, et al. Genome-wide gene expression profiling of formalin-fixed paraffin-embedded breast cancer core biopsies using microarrays. J Histochem Cytochem. 2011;59:146–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Waldron L, Ogino S, Hoshida Y, Shima K, McCart Reed AE, Simpson PT, et al. Expression profiling of archival tumors for long-term health studies. Clin Cancer Res. 2012;18:6136–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Boersma BJ, Reimers M, Yi M, Ludwig JA, Luke BT, Stephens RM, et al. A stromal gene signature associated with inflammatory breast cancer. Int J Cancer. 2008;122:1324–32.

    Article  CAS  PubMed  Google Scholar 

  35. Leygue E, Snell L, Dotzlaw H, Hole K, Hiller-Hitchcock T, Roughley PJ, et al. Expression of lumican in human breast carcinoma. Cancer Res. 1998;58:1348–52.

    CAS  PubMed  Google Scholar 

  36. Masiero M, Simoes FC, Han HD, Snell C, Peterkin T, Bridges E, et al. A core human primary tumor angiogenesis signature identifies the endothelial orphan receptor ELTD1 as a key regulator of angiogenesis. Cancer Cell. 2013;24:229–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Santos AM, Jung J, Aziz N, Kissil JL, Pure E. Targeting fibroblast activation protein inhibits tumor stromagenesis and growth in mice. J Clin Invest. 2009;119:3613–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Tchou J, Zhang PJ, Bi Y, Satija C, Marjumdar R, Stephen TL, et al. Fibroblast activation protein expression by stromal cells and tumor-associated macrophages in human breast cancer. Hum Pathol. 2013;44:2549–57.

    Article  CAS  PubMed  Google Scholar 

  39. Ariga N, Sato E, Ohuchi N, Nagura H, Ohtani H. Stromal expression of fibroblast activation protein/seprase, a cell membrane serine proteinase and gelatinase, is associated with longer survival in patients with invasive ductal carcinoma of breast. Int J Cancer. 2001;95:67–72.

    Article  CAS  PubMed  Google Scholar 

  40. Kraman M, Bambrough PJ, Arnold JN, Roberts EW, Magiera L, Jones JO, et al. Suppression of antitumor immunity by stromal cells expressing fibroblast activation protein-alpha. Science. 2010;330:827–30.

    Article  CAS  PubMed  Google Scholar 

  41. Bossi F, Tripodo C, Rizzi L, Bulla R, Agostinis C, Guarnotta C, et al. C1q as a unique player in angiogenesis with therapeutic implication in wound healing. Proc Natl Acad Sci U S A. 2014;111:4209–14.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ghebrehiwet B, Hosszu KK, Valentino A, Peerschke EI. The C1q family of proteins: insights into the emerging non-traditional functions. Front Immunol. 2012;3:52.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Rutkowski MJ, Sughrue ME, Kane AJ, Mills SA, Parsa AT. Cancer and the complement cascade. Mol Cancer Res. 2010;8:1453–65.

    Article  CAS  PubMed  Google Scholar 

  44. Chen J, Xu XM, Underhill CB, Yang S, Wang L, Chen Y, et al. Tachyplesin activates the classic complement pathway to kill tumor cells. Cancer Res. 2005;65:4614–22.

    Article  CAS  PubMed  Google Scholar 

  45. Hong Q, Sze CI, Lin SR, Lee MH, He RY, Schultz L, et al. Complement C1q activates tumor suppressor WWOX to induce apoptosis in prostate cancer cells. PLoS One. 2009;4:e5755.

    Article  PubMed  PubMed Central  Google Scholar 

  46. da Silva GB, Silva TG, Duarte RA, Neto NL, Carrara HH, Donadi EA, et al. Expression of the classical and nonclassical HLA molecules in breast cancer. Int J Breast C. 2013;2013:250435.

    Google Scholar 

  47. Castellano G, Woltman AM, Nauta AJ, Roos A, Trouw LA, Seelen MA, et al. Maturation of dendritic cells abrogates C1q production in vivo and in vitro. Blood. 2004;103:3813–20.

    Article  CAS  PubMed  Google Scholar 

  48. Hargadon KM. Tumor-altered dendritic cell function: implications for anti-tumor immunity. Front Immunol. 2013;4:192.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Gu-Trantien C, Loi S, Garaud S, Equeter C, Libin M, de Wind A, et al. CD4(+) follicular helper T cell infiltration predicts breast cancer survival. J Clin Invest. 2013;123:2873–92.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank Ivo Filinic for expert technical assistance during tissue sectioning and method development and Ingrid Magnusson Rading and Srinivas Veerla at SCIBLU (SWEGENE Centre for Integrative Biology at Lund University) for technical expertise and valuable discussions. This work was supported by grants from the Swedish Research Council; the Swedish Cancer Society; the Gunnar Nilsson, Ollie and Elof Ericsson, and Kock foundations; and Skåne University Hospital research funds. The results are based, in whole or part, on data generated by the TCGA Research Network (

Author information

Authors and Affiliations


Corresponding author

Correspondence to Christer Larsson.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

SW carried out all the experimental work and assembled the draft of the manuscript, participated in the design and interpretative discussions, and helped draft the manuscript. AE participated in the design and interpretative discussions, helped draft the manuscript, and conceived the study. CL participated in the design and interpretative discussions, helped draft the manuscript, conceived the study, and performed the statistical analyses. KL participated in interpretative discussions and helped draft the manuscript. All authors read and approved the final manuscript.

Anders Edsjö and Christer Larsson contributed equally to this work.

Additional files

Additional file 1: Table S1.

Basic patient and tumor characteristics of the breast cancer. The Cancer Genome Atlas (TCGA) cohort used for the follow-up analyses is shown.

Additional file 2: Table S2.

RNA integrity and concentrations after membrane mounting.

Additional file 3: Table S3.

RNA integrity after cresyl violet staining.

Additional file 4: Figure S1.

RNA integrity is maintained after cresyl violet staining. Electropherogram profiles of RNA from total non-mounted tissue (A), cresyl violet-stained mounted tissue with (B) and without (C) aqueous rinsing steps are shown. RNA length is indicated on the x-axis, and the y-axis corresponds to the fluorescence units. RNA integrity number (RIN) values are calculated according to a standardized algorithm by using various features correlated to RNA integrity.

Additional file 5: Table S4.

Genes enriched in stroma compartment, identified using the limma package in R. The magnitude of enrichment is indicated by the difference in log2 expression (logFC).

Additional file 6: Table S5.

Genes enriched in cancer cell compartment, identified by using the limma package in R. The magnitude of enrichment is indicated by the difference in log2 expression (logFC).

Additional file 7: Figure S2.

Identification of compartment-enriched genes and stroma gene sets. With the lmfit function in the limma package of R, probes that were enriched in either compartment were identified with an adjusted P value of less than 0.05 as cutoff. The scatter plots (A, B) show expression values for all probes of genes for which one probe was enriched in stroma (red) or cancer cell (blue) compartment. The black circles in (A) indicate genes for which no probe was significantly enriched in either compartment. (C) The expanded gene set (361 genes) are shown in red (original genes enriched in stroma) and blue (genes found after expansion with The Cancer Genome Atlas (TCGA) data). The multiplication sign denotes the genes in the final signatures.

Additional file 8: Figure S3.

Expression levels of gene signatures in breast cancer tumors. The mean of the log2 expression levels of the genes in each signature was calculated for 982 breast cancers using data from The Cancer Genome Atlas (TCGA) project. The aggregated values of each signature are shown for the indicated breast cancer subgroups.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Winslow, S., Leandersson, K., Edsjö, A. et al. Prognostic stromal gene signatures in breast cancer. Breast Cancer Res 17, 23 (2015).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: