scRNA-seq reveals a unique quiescent population of stem-like cells in ER+ mammospheres
To study BCSCs, we isolated cells from mammospheres (MS), which we and others [35, 36] have shown are highly tumorigenic compared to 2D-cultured cells (Fig. 1A), and thus are enriched for BCSCs. We performed scRNA-seq on cells derived from MS of two ER+ breast cancer cell lines, MCF-7 and T47D, and then assessed BCSC properties using several widely recognized characteristics of stem cells: i) expression of classical stem cells markers and an enrichment for stem cell-associated gene signatures (Fig. 1), ii) proliferation status (Fig. 2 and Additional file 1: S. Fig 1), and (iii) enrichment in MS vs 2D-cultured cells (Fig. 3).
We first performed unsupervised clustering of cells derived from MCF-7 MS, which revealed 4 cell clusters with distinct transcriptomes (Fig. 1B, C). Analysis of classical stem marker expression across the 4 clusters revealed that: (i) expression of some stem markers was not detected in our dataset (e.g., ALDH1A1 and KRT5), (ii) some markers were expressed but to a similar level among the four clusters, (e.g., CD44, NANOG, POU5F1/Oct4), and (iii) some markers were differentially expressed across the clusters but not in the same cluster (e.g., CD24 and TWIST were expressed in Cluster 0, KLF4 was expressed in Cluster 1, Additional file 2: S.Table 1). Since cellular processes are often associated with changes in the expression of groups of genes that share common biological functions or properties, we reasoned that a more reliable approach to identify putative BCSCs will be to test a set of related genes (i.e., a gene signature) rather than rely on expression of any single gene, which could be affected by technical limitations of scRNA-seq, such as gene dropout [50]. For this purpose, we used MsigDB [42] as a source of stem cell-associated gene signatures, from which we selected 9 signatures that have been derived from or validated in in breast cancer models (a detailed description of each signature is presented in Additional file 2: S. Table 2). To identify which cluster(s) are enriched for stem cell-associated gene signatures, we applied Functional Enrichment Analysis (FEA), also known as Gene Set Enrichment Analysis, a widely used approach which was proven to detect even small but biologically significant changes in gene expression in different cancer datasets [40, 51]. We found that the majority of these stem cell-associated gene signatures were significantly enriched in MS Cluster 1 and MS Cluster 2 (Fig. 1D, Additional file 2: S. Table 3), suggesting that these clusters may be putative stem-like cell populations. To determine if similar putative stem-like populations are found in T47D MS, we performed an integration analysis of MCF-7 and T47D MS datasets, which resulted in 4 clusters, each consisting of cells from both lines (Fig. 1E, F), indicating transcriptomic similarities between MCF-7 MS and T47D MS. Two clusters of the integrated MCF-7/T47D dataset, Integrated Cluster 0 and Integrated Cluster 2, were found to be enriched for stem cell-associated gene signatures (Fig. 1G, Additional file 2: S. Table 4). To establish whether these two clusters are similar to the two stem-like clusters originally identified in MCF-7 MS, we used FEA for custom gene signatures generated from differentially expressed genes (DEGs) (Additional file 2: S. Tables 5.1–5.4) of MCF-7 MS Cluster 1 and Cluster 2 (Additional file 2: S. Tables 5.2, 5.3) (hereinafter referred to as “the MS Cluster 1 Signature” and “the MS Cluster 2 Signature”). FEA showed enrichment for the MS Cluster 1 Signature in the Integrated MCF-7/T47D Cluster 0, and the MS Cluster 2 signature was found to be enriched in the Integrated MCF-7/T47D Cluster 2 (Fig. 1H, I and Additional file 1: S. Fig 1). These results indicate that MS derived from different ER+ breast cancer cell lines have 2 common cell populations with stem-like gene signature enrichment.
We next focused on cell proliferation, since some studies have shown that stem cells remain primarily in a quiescent/low proliferative state to maintain their stemness [23, 52]. To understand which of the two MS stem-like populations are quiescent, we first performed a cell cycle scoring analysis. The majority of MS Cluster 1 cells were found to be in G1 cell cycle phase whereas MS Cluster 2 cells were split evenly between S and G2M (Fig. 2A, B). We observed a similar cell cycle distribution for the two putative stem-like populations in the integrated MCF-7/T47D dataset (Additional file 1: S. Fig 2). Next, we performed a functional EdU incorporation assay, which supported our bioinformatic analysis and showed a similar distribution of proliferative vs non-proliferative cells in MS for both cell lines (Fig. 2C, D and Additional file 1: S. Fig 2C, D). To identify which cluster is quiescent, we next utilized gene signatures derived specifically from cells in G0/quiescent state (a detailed description of signatures presented in Additional file 2: S. Table 6.1). FEA for these gene signatures showed enrichment exclusively in MCF-7 MS Cluster 1 (Fig. 2E–G) and in Cluster 0 of the integrated MCF-7/T47D dataset, indicating the presence of quiescent cells in these populations (Additional file 1: S. Fig 2 E–G, Additional file 2: S. Table 6.2).
Previously, Ben-Porath et al. showed that stem cell-associated signatures may inherit proliferation-related genes, which may explain the enrichment of both proliferative cells and stem cell-associated gene signatures in MS Cluster 2 [53]. Additionally Ben-Porath et al. showed that regression of proliferation-related genes from analysis did not affect stem cell-associated gene signature enrichment [53]. Based on these concepts, we decided to conduct cell cycle regression to eliminate the influence of proliferation from the MS dataset and re-cluster the cells to determine whether stem cell-associated gene signature enrichment was reliant on or independent of proliferation-associated genes (Fig. 2H–J). We found that only one cluster, Cluster 1, was predominantly enriched for stem cell-associated signatures after cell cycle regression (Fig. 2K, Additional file 2: S. Table 7). This cluster has up to 80% of genes in common with the original MS Cluster 1 identified before cell cycle regression and was enriched for the MS Cluster 1 Signature (Fig. 2L, M). However, no enrichment of the original MS Cluster 2 Signature was observed in this population (data not shown). These findings indicate that enrichment for stem cell-associated signatures in MS Cluster 1 is independent of proliferation-related genes and retained even after cell cycle regression, while enrichment for stem cell-associated gene signatures in MS Cluster 2 completely relies on the expression of proliferation-related genes.
Since it is well established that MS are enriched for BCSCs, we next examined whether either of the 2 putative stem-like populations are enriched in MS. To do this, we integrated the MCF-7 MS dataset with a dataset from MCF-7 cells in standard 2D culture (MS/2D integrated dataset) [39]. Unsupervised clustering revealed 3 clusters that are highly enriched for MS cells: Cluster 1, Cluster 2, and Cluster 4 (Fig. 3A, B). FEA revealed that the original MS Cluster 1 Signature is enriched in the MS/2D Integrated Cluster 1 (Fig. 3C), which consists primarily of MS cells. In contrast, the MS Cluster 2 Signature was enriched specifically in the MS/2D Integrated Cluster 3, which consists of both MS- and 2D-cultured cells (Fig. 3D and Additional file 1: S. Fig 3). As expected, cells from MS/2D Integrated Cluster 1 were predominantly in G1 cell cycle phase, consistent with quiescence, whereas cells from MS/2D Integrated Cluster 3 were found to be in S/G2M cell cycle phases (Fig. 3E), indicating this proliferative population can be found in both MS- and 2D-cultured cells. Using QPCR, we confirmed that MS Cluster 1 DEGs are in fact elevated in MS compared to 2D-cultured cells, while MS Cluster 2 DEGs were not different between MS and 2D (Fig. 3F). Collectively, these findings suggest that MS Cluster 1 represents a putative BCSC population based on its expression of stem cell-associated gene signatures, quiescence, and enrichment in MS.
ER and NFĸB are active in MS Cluster 1 and required for MS formation
To identify which genes/pathways are active and may promote stem properties in MS Cluster 1, we performed IPA network analysis on DEGs of this cluster (Fig. 4A). We found that top upregulated DEGs form a network with two central nodes, ESR1 and NFĸB complex, suggesting that both are active in MS Cluster 1. To assess this, we examined expression of ER and NFĸB family members, as well as ER and NFĸB activity based on target gene expression and target gene signature enrichment in the original MS Clusters. We found that ESR1 was most highly expressed in MS Clusters 0 and 2, whereas expression of ER target genes and enrichment of ER-associated gene signatures was observed in MS Clusters 1 and 2, suggesting that ER is active in MS Clusters 1 and 2 (Fig. 4B, C and Additional file 2: S. Table 8). NFĸB family members, RELA and REL, are highly expressed in MS Clusters 1 and 0, respectively, while expression of NFĸB target genes and enrichment of NFĸB-associated gene signatures were found in MS Clusters 1 and 3 (Fig. 4D, E and Additional file 2: S. Table 8). These findings confirm the IPA analysis indicating that both ER and NFĸB are active in MS Cluster 1.
To understand the function of ER and/or NFĸB in MS Cluster 1, we utilized the IPA network analysis to identify MS Cluster 1 DEGs that can potentially be regulated by ER or NFĸB independently, or by the two working together, interdependently. For independent mechanism(s), we performed a correlation analysis between Hallmark pathways in MsigDB and found that the HALLMARK_ESTROGEN_RESPONSE_LATE gene signature correlates with several lipid metabolism-associated pathways, whereas HALLMARK_TNFA_VIA_NFKB gene signature correlates primarily with inflammatory and immune response pathways (Fig. 4F and Additional file 2: S. Table 9). These findings correspond with previously published data showing ER and cholesterol pathways are associated with BCSC maintenance [54, 55], while NFĸB promotes stem properties by regulating pro-inflammatory cytokines [56]. As these pathways have been shown previously to be involved in regulation of stem cells, we decided to focus MS Cluster 1 DEGs that can be regulated by both ER and NFĸB working together. We found that GDF15 (growth/differentiation factor-15), a top 10 DEG for MS Cluster 1 (Additional file 2: S. Table 5.2), was predicted to be upregulated by both ER and NFĸB (Fig. 4A insert). GDF15 is a member of the TGFβ signaling pathway and has been shown to play role in promotion of stem-like features in multiple cancers [57,58,59]. In breast cancer cell lines, a recent study by Sasahara et al. showed that GDF15 treatment increases expression of stem-associated genes and mammosphere forming efficiency, which was reversed by a GDF15 blocking antibody [57]. Since GDF15 is a member of the TGFβ superfamily, we performed FEA for a TGFβ pathway gene signature and found it to be enriched in MS Cluster 1 (Fig. 4G). Moreover, we observed a correlation between the expression of the hallmark TGFβ gene signature and both the Hallmark ER and NFĸB gene signatures in MS Cluster 1 cells (Fig. 4H). These findings suggest that ER and NFĸB may promote stem-like features in MS Cluster 1 through regulating of GDF15 and activation of TGFβ signaling.
ER and NFĸB work together to promote stem-like features through GDF15
To validate our bioinformatic findings, we assessed ER and NFĸB activity in bulk MS cells (Additional file 1: S. Fig 4) and in isolated cell populations (Fig. 5, Additional file 1: S. Fig 5). In bulk MCF-7 and T47D MS, we found elevated ER and NFĸB activity based on target gene expression (Additional file 1: S. Fig 4A, 4E), DNA binding activity (Additional file 1: S. Fig 4B, 4F) and/or nuclear localization compared to 2D culture (Additional file 1: S. Fig 4C, 4G). To understand if ER and NFĸB activity are necessary for MS formation, we examined MS forming efficiency (MFE) in the presence of ICI182,780, an ER antagonist, and IKK7, an inhibitor of upstream kinases in the NFĸB pathway, IKKα/β. We found that each inhibitor reduced MFE in a dose-dependent manner (Additional file 1: S. Fig 4D, 4H), but that sublethal doses of ICI and IKK7 substantially decreased MFE compared to either inhibitor alone (Additional file 1: S. Fig 4I, 4J). These findings were further confirmed in 2 new ER+ breast cancer cell lines derived from patient-derived xenografts (PDX), UCD4 and UCD65 (Additional file 1: S. Fig 4K, 4L). These findings suggest that while both pathways are required, they may also work together cooperatively, as we have previously shown on target genes, MS formation, and BCSC genes [33, 60]. We also confirmed that GDF15 is regulated by ER and NFĸB to some extent in the bulk MCF-7 MS (Additional file 1: S. Fig 4M).
While these findings support a role for ER and NFĸB in MS in general, a population-specific approach is needed to uncover the role of ER and NFĸB in MS Cluster 1’s stem-like properties. Thus, in order to isolate MS Cluster 1 cells, we decided to take advantage of their ER and NFĸB activity. We established dual reporter ER+ breast cancer cell lines, where activation of NFĸB is detected by eGFP expression and activity of ER is indicated by mCherry expression. We confirmed that expression of fluorescent proteins corresponds to ER and NFĸB activation by E2 and TNFα, and that this activation is inhibited by ICI and IKK7, respectively (Fig. 5A, B and Additional file 1: S. Fig 5A, 5B). As expected, we found ER and NFĸB to be active in MS, with constant ER activity throughout MS development, while NFĸB activity was low initially but increased over time (Fig. 5C and Additional file 1: S. Fig 5C). In fully developed MS, 4 populations of cells with heterogeneous ER and NFĸB activity were observed: dual-negative (white), dual-positive (yellow), ER-active (red), and NFĸB-active (green) (Fig. 5D and Additional file 1: S. Fig 5D).
We next performed a sorting experiment to examine MS formation efficiency (MFE) and expression of the MS Cluster 1 DEGs (Fig. 5E) for each population. We found that MFE (Fig. 5F, Additional file 1: S. Fig 5E) were higher in two cell populations ER-active and dual-positive. Additionally, both of these populations showed similar abilities to form secondary MS with all 4 progeny: dual-negative, dual-positive, ER-active and NFĸB-active (Fig. 5G, Additional file 1: S. Fig 5F), suggesting that ER-active and dual-positive cell populations are not stable and able to repopulate a heterogenous MS. Along with MFE, expression of the MS Cluster 1 DEGs (Fig. 5H) were higher in the same two cell populations, ER-active and dual-positive. Taken together these findings suggest that MS Cluster 1, does not consist of dual-positive cells exclusively and may be distributed across both the ER-active and the dual-positive cell populations. To test this, we estimated the enrichment of MS Cluster 1 with each cell population based on ER and NFĸB activity. To calculate the enrichment, we first determined activation of ER and NFĸB pathways using z-scores of Hallmark NFĸB and ER gene signatures for each cell individually from MS Cluster 1 as an alternative approach to FEA for the entire cluster (Fig. 5I, J). We next used these signatures’ z-scores to assign each cell to a specific population (dual-negative, dual-positive, ER-active or NFĸB-active). Finally, enrichment for dual-positive, dual-negative, ER-active and NFĸB-active cells in MS Cluster 1 was calculated (Fig. 5K). Indeed, this approach showed that MS Cluster 1 is enriched for cells with high z-scores for both ER and NFĸB signatures and for ER signature alone, confirming the presence of both dual-positive and ER-active cell subpopulations in MS Cluster 1. However, the abundance of the dual-positive cells was significantly higher compared to the ER-active cells in MS Cluster 1 (Fig. 5K). The higher expression of GDF15 compared to other populations (Fig. 5L) suggested its involvement in the regulation of stem properties specifically in the dual-positive cells. To test this, we isolated and treated two cell populations, dual-positive and ER-active, with an antibody that inhibits GDF15 (anti-GDF15). It was found that inhibition of GDF15 reduces MS formation in the dual-positive cell population, but not in the ER-active cell population (Fig. 5M, N). These data suggest that the functional role of GDF15 in promotion of stem properties depends on activity of both ER and NFĸB and is restricted to a specific subpopulation of the MS Cluster 1 cells.
A gene signature derived from MS Cluster 1 is expressed in endocrine resistant and metastatic cell populations and is associated with aggressive disease.
Given the known role of BCSCs in therapy resistance, disease progression and metastasis, we next investigated whether MS Cluster 1 can be a predictor of aggressive ER+ disease. To answer this question, we took several approaches. First, we interrogated two integrated datasets derived from MCF-7 cells that are tolerant [32] or fully resistant to endocrine therapy [40] for expression of the MS Cluster 1 Signature (Fig. 6A, B and Additional file 1: S. Fig 6). We found that the signature was enriched in Integrated Cluster 2, which consists of tolerant and fully resistant cells (Fig. 6B, C), suggesting that the MS Cluster 1 is retained in a subpopulation of endocrine therapy tolerant/resistant cells. Each dataset was also tested individually and showed similar results (Additional file 1: S. Fig 6). We next investigated expression of the MS Cluster 1 Signature in Circulating Tumor Cells (CTCs) using a publicly available dataset of peripheral blood mononuclear cells derived from metastatic breast cancer patients [46]. Unsupervised clustering revealed 16 clusters, one of which represents CTCs (Cluster 13, Fig. 6D). FEA showed specific enrichment of the MS Cluster 1 Signature in the CTC population, suggesting the involvement of MS Cluster 1 cells in metastasis (Fig. 6E and Additional file 2: S. Table 10). To test this, we examined enrichment of the MS Cluster 1 Signature in primary vs. metastatic tumor cell populations. For that purpose, we utilized scRNA-seq datasets from primary and metastatic patient derived xenograft (PDX) tumors, UCD46 and UCD4, with low and high expression of ER, respectively. For the UCD46 PDX model, we identified 2 cell clusters and for UCD4 we identified 6 cell clusters (Fig. 6F, I), with each cluster consisting of a different ratio of primary or metastatic cells (Fig. 6G, J). FEA for the MS Cluster 1 Signature showed enrichment in Cluster 1 of UCD46, and Cluster 1 of UCD4, both of which consist of liver metastases (Fig. 6H, K), suggesting that MS Cluster 1 cells may be more abundant at metastatic sites than in primary tumors.
We next tested whether the MS Cluster 1 Signature enrichment can be detected in scRNA-seq datasets derived from primary ER+ breast cancer tumors [44] (Fig. 7A). We observed the same clustering pattern of cells by tumor identity, as shown by Wu et al. [44] (Fig. 7B, C). FEA showed that the MS Cluster 1 Signature was highly enriched in 3 clusters: Cluster 3, Cluster 4 and Cluster 7, which represent tumors CID4530N, CID4463, CID3948, and CID4461 (Fig. 7D). For other tumors, we found that individual cells demonstrate a high MS Cluster 1 signature score but that these cells did not drive clustering (Fig. 7E), so the overall signature score did not reach significance in any individual cluster. These data imply that cells expressing the MS Cluster 1 signature can be found in primary ER+ tumors to varying degrees, but in some tumors may only be identified at the single-cell resolution. Finally, we questioned whether the MS Cluster 1 Signature may be a predictor of poor outcome for breast cancer patients by interrogating a publicly available METABRIC dataset of ER+ tumors. It was found that tumors expressing the MS Cluster 1 Signature are more likely to be high grade, and of the Luminal B subtype, are associated with an increased risk of relapse, and showed a trend toward lower overall survival (Fig. 7F–I). Moreover, the higher signature score was observed in tumors with high histologic grade, Luminal B subtype and recurred tumors (Fig. 7J–M). Taken together, these findings indicate that the MS Cluster 1 Signature is associated with endocrine therapy resistance and metastases, and is predictive of poor patient outcome, implying that the MS Cluster 1 cell population represents a unique BCSC population that may be more abundant in aggressive ER+ disease.