Mouse mammary stem cells express prognostic markers for triple-negative breast cancer

Introduction Triple-negative breast cancer (TNBC) is a heterogeneous group of tumours in which chemotherapy, the current mainstay of systemic treatment, is often initially beneficial but with a high risk of relapse and metastasis. There is currently no means of predicting which TNBC will relapse. We tested the hypothesis that the biological properties of normal stem cells are re-activated in tumour metastasis and that, therefore, the activation of normal mammary stem cell-associated gene sets in primary TNBC would be highly prognostic for relapse and metastasis. Methods Mammary basal stem and myoepithelial cells were isolated by flow cytometry and tested in low-dose transplant assays. Gene expression microarrays were used to establish expression profiles of the stem and myoepithelial populations; these were compared to each other and to our previously established mammary epithelial gene expression profiles. Stem cell genes were classified by Gene Ontology (GO) analysis and the expression of a subset analysed in the stem cell population at single cell resolution. Activation of stem cell genes was interrogated across different breast cancer cohorts and within specific subtypes and tested for clinical prognostic power. Results A set of 323 genes was identified that was expressed significantly more highly in the purified basal stem cells compared to all other cells of the mammary epithelium. A total of 109 out of 323 genes had been associated with stem cell features in at least one other study in addition to our own, providing further support for their involvement in the biology of this cell type. GO analysis demonstrated an enrichment of these genes for an association with cell migration, cytoskeletal regulation and tissue morphogenesis, consistent with a role in invasion and metastasis. Single cell resolution analysis showed that individual cells co-expressed both epithelial- and mesenchymal-associated genes/proteins. Most strikingly, we demonstrated that strong activity of this stem cell gene set in TNBCs identified those tumours most likely to rapidly progress to metastasis. Conclusions Our findings support the hypothesis that the biological properties of normal stem cells are drivers of metastasis and that these properties can be used to stratify patients with a highly heterogeneous disease such as TNBC. Electronic supplementary material The online version of this article (doi:10.1186/s13058-015-0539-6) contains supplementary material, which is available to authorized users.


INTRODUCTION
Breast cancer is a highly heterogeneous disease broadly classified on the basis of clinical parameters such as size, grade and node status, as well as histopathological criteria, primarily expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) [1]. While defined targeted therapeutic strategies have been developed for patients with ER + /PR + and HER2 + diseases, chemotherapy is currently the mainstay of systemic treatment for triple negative (ER -/PR -/HER2 -) breast cancer (TNBC) patients, which represents approximately 20% of all breast cancers [2].
Clinically, TNBC encompasses a heterogeneous group of aggressive tumours with poor prognosis [1,[3][4][5][6][7], partly due to high recurrence within the first years and limited targeted therapy options. Although chemotherapy is often initially beneficial in these tumours, especially in the neoadjuvant setting, many TNBC have a high risk of relapse [8]. Since there is currently no means of predicting which TNBC will relapse, identification of subpopulations of TNBC that are most at risk is vital for the clinical management of these breast cancer patients.
Strong evidence is emerging supporting the hypothesis that cancer stem cells with similar features to normal tissue stem cells are resistant to standard chemotherapy and drive tumour regrowth after therapy finishes [9]. We hypothesised that biological properties of normal stem cells are re-activated in tumour cells to facilitate metastasis. Genes expressed in stem cells of the normal mammary gland might therefore carry prognostic information for relapse and metastasis in breast cancer. However, the development of such gene sets depends on the ability to isolate highly pure stem cells for analysis.
The mammary epithelium consists of two main layers, the luminal and basal layers. The luminal layer consists of ER-cells (mainly proliferative progenitors) and ER+ cells (mainly non-proliferative differentiated cells). The basal layer consists of myoepithelial cells and mammary stem cells, the latter characterised by their robust outgrowth activity in the cleared 6 fat pad transplant assay. The relationship between these populations is summarised in Additional File 1A. Previous studies have analysed total basal breast epithelial cells, without further purification of the minority stem cell fraction [10] or used a dye label-retention strategy to identify asymmetrically-dividing cells (putative stem cells) in non-adherent mammosphere cultures [11]. Only one previous study has attempted to freshly purify basal stem cells and compare their gene expression profile to myoepithelial cells [12]; however, that study identified only four genes expressed >2 fold more highly in stem cells compared to myoepithelial cells, and none of these achieved statistical significance. Here, we have defined the first gene signature specific for highly purified, freshly isolated mammary stem cells and further enriched the stem cell specificity by excluding basal-associated genes common to both the stem and myoepithelial populations. Pathway analysis revealed that this signature was enriched in genes associated with cell migration, adhesion and tissue morphogenesis. Single cell resolution gene expression analysis showed that the stem cell population included cells which expressed both epithelial-and mesenchymal-associated genes. Strikingly, when the expression of the stem cell gene signature was interrogated in two large independent TNBC cohorts, tumours with an activated stem cell signature showed a higher propensity to relapse in the first years after diagnosis in comparison to TNBC with lower activation scores for the stem cell gene signature. In contrast, in three large independent ER+ breast cancer data sets, an activated stem cell signature identified tumours least likely to metastasise. The prognostic power of the stem cell gene signature when applied to expression profiling of total tumour material implies that in poor prognosis TNBC the cancer stem cell-like genetic programme is not restricted to a minority cell population but rather is driving the behaviour of the bulk of tumour cells.
Our findings show that the biology of normal mammary stem cells, as reflected in their gene expression profiles, is highly relevant for understanding the drivers of aggressive disease in TNBC.

Preparation of Mammary Epithelial Cells for Flow Cytometry
All animal work was carried out under UK Home Office project and personal licences following local ethical approval by the Institute of Cancer Research Animal Ethics Committee and in accordance with local and national guidelines. Single cells were prepared from fourth mammary fat pads of 8-10-week-old virgin female FVB mice as described [13] and stained with anti-CD24-FITC, anti-Sca-1-APC, anti-CD45-PE-Cy7, anti-CD49f-PE-Cy5 and anti-c-Kit-PE. Mammary epithelial cell subpopulations were defined as shown in Figure 1 and Additional File 1.

Cleared mammary fat pad transplantations
Transplantation of primary mouse mammary epithelial cells was carried out as described [13]. Sorted cells were transplanted at 200 cells per fat pad into the cleared fat pads of 21 day old syngeneic FVB mice over three independent sort and transplant sessions for each population. Positive control transplants of total basal cells at 20,000 cells per fat pad were also included in each session. Fat pads were analysed by whole-mounting eight weeks after transplantation.

RNA isolations and gene expression analysis by quantitative real-time rtPCR (qrtPCR)
Freshly sorted primary cells were lysed in RLT buffer (Qiagen, Crawley, West Sussex, UK) and stored at -80 o C. Total RNA was extracted using an RNeasy MiniElute or MicroElute Kit (Qiagen), according to the manufacturers' instructions. qrtPCR reactions were performed as previously described use Taqman probes (see Additional File 2) [14]. Results either were calculated using the Ct method and expressed as the mean fold gene expression difference in three independently isolated cell preparations over a comparator sample with 95% confidence intervals, or, for single cell experiments, presented as raw 1/Ct values.

Immunofluorescent staining of cells sorted on to slides
Samples of 50-200 cells were sorted directly on to poly-L-lysine-coated slides, air dried, and stored at -20 C. The cells were fixed in 1:1 methanol acetone for 5 minutes at -20 C (Keratin 14/ Keratin 18)  Biotechnik, Heidelberg, Germany ; mouse anti-ER clone ID5, 9.9 ug/ml, #M7047 DAKO, Cambridge, UK) for 60 minutes at room temperature. Cells were washed with PBS three times for five minutes each before the Alexa-conjugated secondary antibodies (Alexa-488 and/or Alexa-555, each at 1:500, Invitrogen) were applied for 60 minutes at room temperature. The slides were washed three times for five minutes each in 0.01% PBS/DAPI, rinsed in water and mounted and coverslipped with Vectashield. Images were captured using a Leica TCS-SP2 microscope with images collected in three channels using Leica LCS software. 'No First Antibody' controls were used to set PMT levels. Controls using only one first antibody with both second antibodies were used to confirm lack of cross-reactivity of second antibody staining.

Immunocytochemical staining of mammary tissue
Paraffin embedded tissue sections on poly-L-lysine-coated slides were dewaxed in xylene (2x 5 min) and rehydrated by washing in decreasing concentrations of ethanol: 2x 3 min in 100% ethanol, 1x 3 min 95% ethanol and 1x 3 min 75% ethanol. Antigen retrieval was carried out by incubating the slides in preheated citrate buffer (99.9°C, pH6; ThermoFisher) for 20 minutes. Slides were then left to cool for 30 minutes at room temperature (RT). Slides were then incubated with a peroxidase blocking solution (Vimentin and SMA: 3% hydrogen peroxide, CK14: 1 in 60 hydrogen peroxide v/v in methanol) for 10 minutes at RT, followed by three 5 minute washes in washing buffer (Vimentin and CK14: 0.1% Tween in TBS, SMA: PBS). The slides were blocked with serum diluted in wash buffer (Vimentin and SMA: Normal Goat Serum, CK14: MOM diluents, MOM kit, Vector Labs) for 45 minutes at room temperature. The serum block was removed and slides were incubated immediately with the primary antibody in serum/wash buffer overnight at 4°C. Vimentin was detected using a goat polyclonal antibody (Santa Cruz SC-7557) at 1:300 dilution, SMA using a rabbit polyclonal (Abcam, #ab5694), and CK14 using a mouse monoclonal (Abcam #ab7800). Unbound primary antibody was removed by three 5 minute washes in wash buffer and then the slides were then incubated for 1 hour at room temperature with the secondary antibody in serum/wash buffer (Vimentin and SMA: Anti-rabbit biotinylated (Dako), CK14: Anti-mouse (MOM kit)). The positive signal was amplified by incubating the slides for 30 minutes at room temperature with the Avidin-Biotin Complex (ABC) kit (Vector labs), made up 30 minutes before it was applied, then positivity was visualised by incubating the slides with the DAB+ Chromogen reagent (EnVisionTM kit). The reagent was applied to the slides for 5 minutes, and then removed by three washes with wash buffer. The slides were counterstained in haematoxylin for 60 seconds, followed by a wash in running water for 5 minutes. The slides were dehydrated by washing in increasing concentrations of alcohol, placed in xylene for 2x5 minutes and then mounted with a glass coverslip.

Affymetrix Transcriptome Analysis
RNA was isolated from three independent myoepithelial and seven independent MaSC isolations. Samples were submitted to the UCL Genomics facility for amplification and hybridisation to the Mouse Genome 430 2.0 Affymetrix array (http://www.ucl.ac.uk/ich/ services/lab-services/ucl_genomics) for amplification and hybridisation. Total RNA was amplified using the NuGEN Ovation Pico WTA System (Nugen, Leek, The Netherlands).
Resulting double stranded cDNA was fragmented and labelled using the Affymetrix Genechip WT Terminal Labelling kit. Affymetrix Mouse Genome 430 2.0 chip arrays were hybridised and scanned according to manufacturer's instructions.
Expression data were normalised and summarised by robust multi-array analysis (RMA) using the Affymetrix package in R. Probesets mapping to unknown or multiple genes were removed from analysis. Probesets were used for two class unpaired comparison using Significance Analysis of Microarray (SAM) R package [15], genes that were enriched or depleted in the MaSC population compared to the myoepithelial population were determined by a local false discovery rate (FDR) < 5%. For comparison to data from Kendrick et al [16], all CEL files were RMA normalised together and two class unpaired SAM using a local FDR of 5% was applied to each population compared to the MaSC population. Probe sets were also used for a multiclass Significance Analysis of Microarray [16] to determine if their mean expression was different across the four mammary epithelial cell subpopulations.
Hierarchical cluster based on Pearson's correlation with average linkage was performed in the software package Cladist of the ROCK database [17,18]. Pathway analysis was performed using the DAVID KEGG pathway analysis tool and the ROCK pathway analysis tool [17-20]. All analyses were carried out using default settings. Pathway gene sets with an enrichment score of FDR of 5% were considered significantly enriched. Overlaps between gene sets were visualised using VENNY [21]. MIAME-compliant data have been uploaded to ArrayExpress with the accession number E-MTAB-2741.

Single Cell Resolution Gene Expression Analysis
Single MaSCs were subjected to PCR essentially as described previously [ In preliminary tests, monitoring of the amplification of the spiked control RNA (LTP4 added at 10 -2 pg, LTP6 at 10 -3 pg and TIM at 10 -4 pg; these values correspond to 8400, 900 and 90 molecules of RNA respectively) in single cell samples from the CommaD geo [23] mammary epithelial cell line demonstrated that using the procedure amplification was linear and preserved relative abundance of transcripts, although a small amount of variation was inherent to the second round of amplification (Additional File 3A). Furthermore, when qrtPCR for seven genes (Gapdh, Ubc, Jag1, Jag2, Wnt4, Wnt5a and Wnt10a; selected on the basis of probe availability) was carried out on amplification material from sixteen single CommaD cells, sixteen groups of ten CommaD cells and on unamplified cDNA collected from the bulk population, the mean of the expression levels of the single cells was not significantly different from the mean of expression in the sixteen groups of ten cells or to expression levels in the unamplified bulk cDNA. This analysis confirmed that relative expression levels were preserved upon amplification from a single cell with a strong correlation in relative expression levels obtained when comparing single cell and 10-cell amplified material (R = 0.98) and single cell amplified and whole population unamplified material (R = 0.95) (Additional File 3B; R values calculated in Excel).

Breast cancer dataset collection
Three TNBC cohorts were used in this study. 579 cases described by Karn and colleagues on which we carried out RMA pre-processing followed by a combat normalization to reduce batch effect [27]. For ER+ tumours, we retrieved the NKI295 [28], TRANSBIG [29] and the GSE2990 [30] datasets and extracted those cases which were termed positive for ER by IHC, resulting into 226, 134, 149 samples for NKI295, TRANSBIG and GSE2990, respectively. Clinico-pathological features for each of these data sets have been previously published in the original manuscripts referenced above, except for the Lehmann set, which is provided here as Additional File 4. Details of ethical approval for patient material can be found within the original publications relating to each data set.

Analysis of MaSC gene signature in breast cancer transcriptional profiles
The mouse MaSc gene signature derived from the SAM was converted to a human gene list using Biomart (www.biomart.org; Ensembl Genes 72// mus musculus genes GRCm38.p1).
To establish the overall activity of the MaSc genes signature in human breast cancers, we applied our previously published Denoising Algorithm based on Relevance network Topology (DART) [31], which identifies genes within a signature with highly correlated expression levels and uses these to infer molecular pathway activity. We also tested median centring the gene expression of the dataset and establishing the activation of the whole MaSC gene signature by averaging the relative expression values for all genes for each tumour. The "DART" activation score or averaged gene expression for each sample in each cohort were determined and log-rank tests were performed dichotomising the samples using either top tertile or median cut-offs, depending on the data set. Kaplan-Meier plots were generated for each dataset to provide a visualization of survival stratification.

Breast cancer subtype classification
Centroid classification for the PAM50 molecular breast cancer subtypes were performed as described previously [25]. PAM50 and IntClust classifications were retrieved from the original publications [5,32]. TNBC subtypes describe by Lehmann and colleagues were established using the online TNBC-type program [33]. To determine the four TNBC subtypes described by Burstein and colleagues [34], centroids for each subtype were extracted and correlation analysis performed.

Statistics
All statistical tests were two-sided unless otherwise noted. Hypergeometric testing was used to establish the significance of overlap between two gene lists. All analyses were performed within the statistical R environment [35] unless otherwise noted.
To confirm that the gating strategy isolated MaSCs, MYOs, luminal ER-cells and luminal ER+ cells, the populations were sorted and characterised by quantitative real-time rtPCR To finally validate the sorting strategy using the cleared mammary fat pad transplant assay, the MaSC, MYO, luminal ER+ and luminal ER-populations were sorted and transplanted into cleared fat pads at 200 cells per fat pad over three independent cell sorts and transplantations. After 8 weeks, glands were harvested, wholemounted, carmine stained to enable visualisation of outgrowths and scored. The results ( Figure 1D) showed that only the CD24 +/Low Sca-1 -CD49 High c-Kit -MaSC population had the ability to repopulate a mammary fat pad with high efficiency and, when taken in conjunction with the qrtPCR and cell staining data, confirmed that the sorting strategy was able to isolate MaSCs, MYO, luminal ER-and luminal ER+ cells.

MaSCs Have a Distinct Gene Expression Profile to Myoepithelial and Luminal Cells
MaSCs are localised in the basal cell layer of the mammary epithelium. While MaSCs exclusively show repopulation capacity, they share a number of features with the other, most numerous, cell type of the basal cell layer, the MYOs. For instance, both MaSCs and MYOs express K14, although Krt14 gene expression is slightly higher in MaSCs than MYOs [37].
Direct comparison between luminal cell gene expression and MaSC gene expression, even when using highly purified populations, will identify genes associated with the basal cell layer as a whole, as well as the MaSC genes. Therefore the comparison between the highly purified MaSC and myoepithelial populations is essential in identifying genes solely characterising the MaSC population.
We had previously profiled purified luminal ER+ and luminal ER-cells and the total basal epithelial population, which is dominated by MYOs [16]. To extract MaSC-specific but not common 'basal' genes or a MYO-dominated gene set, gene expression using Affymetrix microarray of highly purified CD24 +/Low Sca-1 -CD49 High c-Kit -MaSCs and CD24 +/Low Sca-1 -CD49 Low c-Kit -MYOs was carried out. These data were integrated with our previous work on the total basal and two separated luminal ER-and luminal ER+ cells populations [16] analysed on the same Affymetrix gene expression platform. Analysis of the distribution of the raw data from both the previous arrays and our new analyses showed no batch effects between the data sets which might have skewed results (Additional File 6A). Unsupervised hierarchical clustering of gene expression data ( Figure 2A) demonstrated that the individual samples of the total basal cells (CD24 -/Low Sca-1 -) from our previous analysis [16] and the new MYO (CD24 +/Low Sca-1 -CD49 Low c-Kit -) and MaSC (CD24 +/Low Sca-1 -CD49 High c-Kit -) samples were more similar to each other than to the two luminal populations. Notably, however, individual samples from the total basal, MYO and MaSC populations clustered with each other. In particular, the seven MaSC samples formed a distinct branch within the basal cluster. This suggested that the transcriptome of the MaSC samples was distinct to those of both the luminal, total basal and MYO populations. By applying a series of two-class unpaired SAM comparisons [15], genes significantly upregulated in the MaSCs relative to all other populations were determined as follows. First, MaSC genes significantly upregulated in the MaSCs compared to the MYOs were identified, using a false discovery rate (FDR) of <5% and a fold change cut off of 1.5. Such an approach will inevitably identify some genes which are expressed in MaSCs at a higher level than in myoepithelial cells but are in fact, when all cell populations are considered, much more highly expressed in luminal populations. This is partly due to the relative, rather than absolute, quantitative nature of the approach but also likely to result from 'lineage priming' [38]. Therefore, the MaSC gene expression data were also separately compared to the luminal ER-(CD24 High Sca-1 -) and

Comparison of the MaSC Signature to Previously Identified Stem Cell Signatures
Previous studies have identified human and mouse 'MaSC signatures' using either freshly isolated cells [10,12] or mammosphere cultures [11], with the caveats noted above. To establish whether any genes were identified common to these studies, gene expression signatures from these studies were overlaid with the signature reported here ( Figure 3B -E and Additional File 11). Only one gene was common between our MaSC signature and the genes identified by Stingl and colleagues [12], namely Fatty Acid Binding Protein 4, Adipocyte (Fabp4). This gene has recently been shown to mark a population of adipocyte progenitors but has not yet been linked functionally to MaSCs [48]. Fifty genes were found in common between the our MaSC gene signature and the signature identified by Lim and colleagues [10]. Of the genes expressed in stem cells isolated from mammosphere cultures using a label-retaining strategy [11], seventeen were shared with the MaSC signature reported here ( Figure 3B).
To assess whether adult MaSCs share gene expression profiles with their more primitive embryonic counterparts, the MaSC signature was compared to two embryonic mammary bud studies. One study profiled the gene expression signatures of the constituent tissues of purified mammary primordia, the Mammary Bud Epithelium (MBE) and Mammary Bud Mesenchyme (MBM) [40]. The other compared CD24 High CD49f High fetal MaSC (fMaSC) against either CD24 Med/Low/Neg fetal stromal cells or adult CD24 Low CD49f High MaSCs [39].
Only three genes overlapped between our MaSC signature and both embryonic epithelial profiles, namely Nkain2, Mtap7 and Mbp ( Figure 3C; Additional File 11, highlighted in red). To determine whether the prognostic power of the signature could be extended to ER+  (Figure 6D,E). However, in the TRANSBIG cohort, no association of the MaSC DART activation score could be observed in the ER+ breast cancers ( Figure 6F).

Breast cancer subtype-specific expression of the MaSC signature
Next we asked whether our MaSC gene signature was associated with a specific subtype across all breast cancers and within TNBCs. We made use of the comprehensive METABRIC breast cancer dataset and interrogated which of the PAM50 and IntClust subtypes were enriched for tumours with a high MaSC signature DART activation score.
Interestingly, tumours with a high MaSC signature were enriched in the normal-like subtype ( Figure 7A), followed by the Claudin-low and Luminal A subgroup. With the IntClust classification, an enrichment of MaSC signature high tumours was observed in IntClust 3 and IntClust4 ( Figure 7B). These clusters encompass relatively genomically stable tumours and mainly include Luminal A tumours, although IntClust4 also includes subsets of HER2 and basal-like tumours as well as the normal-like group, supporting the PAM50 analysis.

Tumours of the IntClust 3 and 4 subtypes have been associated with relatively good
prognoses [5], in agreement with our results in the ER+ datasets ( Figure 6D,E).

DISCUSSION
TNBC, as a whole, has a poor prognosis and unlike ER+ and HER2+ tumours currently lacks targeted therapies, leaving systemic chemotherapy as the only adjuvant treatment option. These immunohistochemically-defined breast carcinomas comprise a histologically, molecularly and clinically highly heterogeneous group of tumours, with some patients having low long-term recurrence rates and responding well to chemotherapy [55]. Thus, there exists a clinical need to stratify patients to ensure the most appropriate treatment is administered.
One approach to identification of high-risk disease subgroups in breast cancer is prognostication based on gene expression profiling of primary tumours [56]. Given large tumour cohorts, clinical outcome data and whole transcriptome expression profiles of tumours, it is possible to identify sets of genes whose expression has prognostic value. In ER+ disease, these have typically been genes associated with proliferation [57]. Recent studies have shown that the expression of immune-response genes [58], a metastasis regulator metagene [59] or a chromosomal instability metagene [60] may represent potential prognostic markers for TNBC.
Here, we have used our experience in separation of mammary epithelial cell subsets to isolate a highly purified population of MaSCs and derive a gene signature based on comparison to differentiated MYOs as well as to luminal ER+ and luminal ER-cells.
Remarkably, the a priori-defined MaSC gene signature was able to provide prognostic information when applied to gene expression profiles of human breast cancers which had undergone no purification protocols or microdissection of tumour tissue. Therefore, the biology of normal mammary stem cells, as reflected in their gene expression profiles, is highly relevant for understanding the drivers of aggressive disease in TNBC. The gene signature was able to identify TNBC patients with a particularly poor prognosis (especially within the recently identified BL-1, BL-2 and M subtypes) and who thus might benefit from a more aggressive therapy regime or potential enrolment on to clinical trials of new (targeted) therapies. Clearly, however, further extensive evaluation and refinement of the genes encompassed in this list to maximise its power and general applicability is required before it could be considered usable as a clinical tool.
In contrast to the TNBC data, in two data sets ER+ tumours with a high MaSC activation score had a better prognosis, rather than a worse one. The difference in behaviour of the MaSC gene signature in these tumour types is striking, but the reason for it remains unclear.
When all breast cancers in the METABRIC cohort were considered, the MaSC signature was Notably, hypoxia in tumours is thought to be an inducer of EMT and cancer stem cell-like activity [63]. Another recurrent theme in regulation of epithelial stem cells is the role of Wnt signalling [64]. For instance, Wnt pathway activation is required to maintain stem cell selfrenewal in cultured mammary epithelial cells [65] and Wnt signalling was found to be suppressed in mammary stem cells after pregnancy [66], consistent with our MaSC gene signature (from virgin animals) being enriched for Wnt pathway-associated genes.
The MaSC signature is of a population with mixed mesenchymal/epithelial features (as We have derived our stem cell signature from basal stem cells defined operationally by their high transplant potential in a cleared mammary fat pad [69]. However, controversy exists as to whether the adult mammary epithelium has a single basal stem cell population which maintains both luminal and basal layers [70] or two distinct stem cell compartments, one basal and one luminal [71]. Evidence for a separate luminal stem cell population has been derived from lineage tracing experiments, as luminal cells have poor transplantation potential [71], although we have shown they can repopulate a cleared fat pad [13]. A recent study has once again provided support for the existence of a common multipotent basal mammary Matrigel [13]. In our hands, at least, Procr expression and transplantation potential only partially overlap but understanding the detailed relationship between Procr + basal cells and CD24 + CD49f High MaSCs will require extensive lineage tracking, flow sorting and transplantation studies. Nevertheless, the exact relationship between these cell types does not affect our findings that the gene expression signature of the cells we have defined as MaSCs is a strong predictor of outcome in TNBC and therefore defines a set of genes which includes some that must be drivers of aggressive behaviour in this tumour subtype. The basal cell population we have profiled has been selected for its potent transplantation ability [69]. The early events which occur following injection of single mammary basal stem cells into a cleared fat pad are obscure, however, cells which survive this process and can form outgrowths must, by definition, have the ability to survive being reduced to single cells and introduced to a new growth site at low density (indeed, in some experiments even as single cells) [13,73] and then be able to invade and remodel the surrounding environment, forming a new tissue. The parallels with cells which can initiate metastatic dissemination are clear, although not exact, and we speculate that this underlies the strong association between the MaSC gene signature (or, the 'transplantable basal stem cell gene signature') and TNBC with high metastatic potential.

CONCLUSIONS
We have tested the hypothesis that genes associated with normal mouse mammary stem cells would have prognostic power in human breast cancer, and we have found that this is indeed the case. Our findings suggest that, as tumour gene expression profiling is based on whole tumour sampling, invasive stem-cell-like potential is not limited to a small subset of cells in aggressive TNBC. Furthermore, we have highlighted overlaps between our dataset and those of other workers to show that the regulation of cytoskeletal function is a key aspect of mammary stem cell biology. Finally, we have demonstrated that MaSCs have a dual epithelial-mesenchymal identity. Our findings will not only advance our understanding of the molecular regulation of mammary stem cell biology and relationship between the biology of mammary stem cells and of aggressive, poor prognosis TNBC but also have the potential to inform clinical management of breast cancer, particularly triple negative disease.     [74]. **p<0.01; *p<0.05; N.S not significant, relative to the next highest expressing population indicated by brackets.     The DART activation score of the MaSC gene signature is shown on the y-axis in both plots.