Cis-acting super-enhancer lncRNAs as biomarkers to early-stage breast cancer

Background Increased breast cancer screening over the past four decades has led to a substantial rise in the diagnosis of ductal carcinoma in situ (DCIS). Although DCIS lesions precede invasive ductal carcinoma (IDC), they do not always transform into cancer. The current standard-of-care for DCIS is an aggressive course of therapy to prevent invasive and metastatic disease resulting in over-diagnosis and over-treatment. Thus, there is a critical need to identify functional determinants of progression of DCIS to IDC to allow discrimination between indolent and aggressive disease. Recent studies show that super-enhancers, in addition to promoting other gene transcription, are themselves transcribed producing super-enhancer associated long noncoding RNAs (SE-lncRNAs). These SE-lncRNAs can interact with their associated enhancer regions in cis and influence activities and expression of neighboring genes. Furthermore, they represent a novel, untapped group of therapeutic targets. Methods With an integrative analysis of enhancer loci with global expression of SE-lncRNAs in the MCF10A progression series, we have identified differentially expressed SE-lncRNAs which can identify mechanisms for DCIS to IDC progression. Furthermore, cross-referencing these SE-lncRNAs with patient samples in the The Cancer Genome Atlas (TCGA) database, we have unveiled 27 clinically relevant SE-lncRNAs that potentially interact with their enhancer to regulate nearby gene expression. To complement SE-lncRNA expression studies, we conducted an unbiased global analysis of super-enhancers that are acquired or lost in progression. Results Here we designate SE-lncRNAs RP11-379F4.4 and RP11-465B22.8 as potential markers of progression of DCIS to IDC through regulation of the expression of their neighboring genes (RARRES1 and miR-200b, respectively). Moreover, we classified 403 super-enhancer regions in MCF10A normal cells, 627 in AT1, 1053 in DCIS, and 320 in CA1 cells. Comparison analysis of acquired/lost super-enhancer regions with super-enhancer regions classified in 47 ER positive patients, 10 triple negative breast cancer (TNBC) patients, and 11 TNBC cell lines reveal critically acquired pathways including STAT signaling and NF-kB signaling. In contrast, protein folding, and local estrogen production are identified as major pathways lost in progression. Conclusion Collectively, these analyses identify differentially expressed SE-lncRNAs and acquired/lost super-enhancers in progression of breast cancer important for promoting DCIS lesions to IDC. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01479-8.

it the second most frequently diagnosed cancer behind skin cancer [1]. In 2021, an estimated 281,550 cases of invasive breast carcinoma are predicted to be diagnosed, and over 40,000 deaths are expected, accounting for almost 7% of all cancer mortality each year. Ductal carcinoma in situ (DCIS) is the presence of abnormal cells inside a milk duct in the breast and is a precursor to invasive cancer. DCIS accounts for 20% of breast cancer diagnoses per year [2], however, while not all DCIS lesions progress to invasive cancer, all are treated as such leading to overdiagnosis and overtreatment. In fact, DCIS lesions sometimes grow so slowly that even without treatment it would not affect a woman's health. Long-term studies have found that only 40% of women with untreated DCIS are ultimately diagnosed with invasive breast cancer [3]. The steep increase in diagnosis of DCIS over the past 30-40 years is believed to be a result of more frequent mammography [4]. However, because over half of these in situ lesions will not progress to invasive breast cancer, controversies have arisen about approaches to treatment.
As early screening is advocated and on the rise, better understanding of the progression of non-invasive to invasive breast cancer is a prerequisite for correct diagnosis of patients. There is a need to highlight functional determinants DCIS progression to invasive ductal carcinoma (IDC) thus allowing discrimination between indolent and potentially metastatic breast cancers. Understanding the mechanisms of transition of normal breast to invasive breast cancer can have significant implications for preventive and clinical management of breast cancer.
Transcriptome reprogramming is one of the crucial characteristics of cancer, where aberrant gene expression promotes tumor initiation, progression, and metastasis. This can be amplified by cis-element changes in noncoding genomic regions [5]. For example, super-enhancers (SEs), also known as stretch enhancers, are genomic regions where multiple enhancers are clustered together. They exert more potent effects than typical enhancers, are characterized by high levels of Mediator binding, and are associated primarily with tissue-specific genes [6]. Super-enhancers are most likely the major contributors to the expression of their associated genes [7]. There have also been several indications of links between superenhancers and diseases [8]. Recent studies have shown that SEs play key roles in determining cell identity in both healthy and pathological states. Over 25,000 enhancers were identified as differentially activated in renal, breast, and prostate tumor cells, as compared with normal cells. This suggests a network between malignancy and enhancer activity [9]. Likewise, cancer cells have been shown to acquire super-enhancers at oncogenes and cancerous phenotype relies on the abnormal transcription propelled by Ses [10]. Additionally, super-enhancer regions are transcribed generating long non-coding RNAs (lncRNAs) that may play a pivotal role in assisting the super-enhancer function [11].
Long non-coding RNAs are a large class of non-coding transcripts that are > 200 nucleotides in length and do not encode proteins [12]. Evidence has indicated that lncRNAs regulate gene expression at the levels of epigenetic modification, transcription, translation, and posttranslation [13]. At the transcription level, these RNAs can be associated with super-enhancer regions, and interact with enhancer sequences to influence activities of neighboring genes. A study published in 2011 shows that enhancer associated lncRNA (e-lncRNA) HOTTIP, which resides at the 5′ tip of the HOXA locus, regulates the transcription of various HOXA genes in vivo through chromosomal looping of its enhancer to the promoter region of these genes [14]. In the same fashion, Sigova et al. observe that nascent e-lncRNAs are necessary for the recruitment of Yin-Yang-1 transcription factor to its target enhancer [15]. As a result, e-lncRNAs have been indicated to play a vital role in engaging with transcription factors and localizing them to cognate enhancers. Enhancer lncRNAs are also involved in diverse tumor biological processes, including cell proliferation, apoptosis, invasion, metastasis, and angiogenesis as well as interacting with their enhancers to regulate genes specific to cell identity [16].
Previously, our laboratory has profiled global lncRNA expression in a unique patient-based model of breast cancer progression, wherein early DCIS lesions are directly contiguous with an IDC lesion. From this unbiased patient-based model, 132 lncRNAs were identified as differentially expressed in early breast cancer progression [17], of which 78 were transcribed from superenhancer regions. This statistically significant enrichment in enhancer associated lncRNAs suggests a core mechanism of breast cancer progression. In this study, we identify super-enhancer associated lncRNAs (SE-lncRNAs) that are differentially expressed between non-invasive and invasive breast cancer in the MCF10A progression series as well as DCIS and IDC patient samples. We also designate two of the most promising SE-lncRNAs from our list for their potential cis-acting capabilities in regulating nearby gene expression crucial for progression to IDC. Furthermore, we highlight super-enhancers that are acquired or lost in progression to IDC, giving insight about genes and pathways these super-enhancers regulate which may be necessary for progression.

Microarray
Arraystar Super-enhancer lncRNA arrays were used to systematically profile 7753 lncRNAs transcribed from super-enhancer (SE) regions along with 7040 corresponding SE-regulated protein coding genes. Briefly, an optimized mixture of oligo(dT) and random primers, each containing a T7 polymerase promoter, is annealed to the RNA. The cDNA is synthesized by reverse transcription followed by 5' adapter annealing and PCR amplification. Finally, cyanine 3-or cyanine 5-labeled cRNA is synthesized by in vitro transcription from the T7 promoter by T7 RNA polymerase.

Gene expression analysis
Gene expression was confirmed for the 27 potentially cis-acting SE-lncRNAs and 4 highest differentiated SE-lncRNAs. Forty-eight hours after seeding, total RNA was collected using Omega Bio-Tek E.Z.N.A. Total RNA Kit (Catalog No. R6-834-02). RNA was subsequently reversetranscribed using Applied Biosystems High-Capacity cDNA Reverse Transcription Kit (Life Technologies Catalog No. 4368813) and analyzed by quantitative reverse transcription PCR (qRT-PCR) with Sybr Green on the QuantStudio 12 K Flex (Thermo Fisher Scientific) for indicated genes. The ΔΔCt method was used to determine gene expression fold change with 18 s, Act-b, and GAPDH used for controls.

Cellular localization
Whole cell lysate, cytoplasmic fraction, and nuclear fraction were extracted from MCF10A (10A) and MCF10A-CA1 (CA1) cells using Protein and RNA Isolation System (PARIS) Kit (Life technologies Catalog No. AM1921). For whole cell lysate and for each cytoplasmic and nuclear fraction from 10A and CA1 cells, 87.5% of the sample was used for RNA extraction, while 12.5% was used for an immunoblot to ensure the fractionation was done correctly. The 7:1 ratio was used to maximize RNA content as lncRNAs are relatively lowly expressed. GAPDH (Cell Signaling 14C10) was utilized as control for cytoplasmic fraction, while Tri-Methyl Histone H3 (Lys 27) (Cell Signaling C36B11) was used as control for nuclear fraction for the immunoblot.
RNA was isolated from each fraction using Omega Bio-Tek E.Z.N.A. Total RNA Kit (Catalog No. R6-834-02). Reverse transcription was performed with the Applied Biosystems High-Capacity cDNA Reverse Transcription Kit (Life Technologies Catalog No. 4368813) on the extracted RNA from each compartment as well as the whole cell lysate and RT-qPCR were carried out to analyze the localization of these SE-lncRNAs within the 10A and CA1 cells as well as fold change. 18 s, Actin-B, HOTAIR were used as controls for localization. 18 s and Actin-B were also used as controls for fold change to validate the change in expression we saw in our microarray data.

Chromatin immunoprecipitation (ChIP)
MCF10A, MCF10A-AT1, DCIS.com, and MCF10A-CA1 cells were grown to a final count of 5 × 10 6 . Cells were chemically crosslinked by the addition of 1 mL of fresh 10% formaldehyde solution for 10 min at room temperature on a rocker. After 10 min 1/10 volume of 1.25 M Glycine was added to quench unreacted formaldehyde and incubated for 5 min on rocker. Cells were pelleted at 1000 g for 5 min, washed twice with 1 × PBS, flash frozen in liquid nitrogen, and stored at − 80 °C prior to use. Cells were resuspended, lysed in lysis buffer (50 mM HEPES, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Sodium deoxycholate, 0.1% SDS), and sonicated to solubilize and shear crosslinked DNA. Sonication conditions vary depending on cells, but cells were sonicated using a Diagenode Bioruptor Sonicator and sonicated at power 7 for 13 × 30 s pulses (30 s pause between pulses) at 4 °C while samples were immersed in an ice bath. The sonicated cells were centrifuged for 10 min at 8000 g at 4 °C and the supernatant collected to proceed with immunoprecipitation. The resulting whole-cell extract volume was divided into two, one for IgG and the other for H3K27ac targeting. Default parameters were used with q-value of 0.05 except to ensure broad regions were identified; broad peak calling was added with a broad cutoff of 0.05. Reads were normalized to mapped reads.

Super-enhancer identification
Super-enhancers were defined by stitching peaks using Rank Order Super-Enhancer (ROSE) (version python 2.7.3) with default parameters, except TSS exclusion zone size was adjusted to 250 bp. TSS exclusion was used because the H3K27ac signal is enriched for both active enhancers and promoters. However, several highthroughput reporter studies in mammals assessing either selected genomic regions (e.g., open chromatin regions or transcription factor binding sites) or human whole genomes have also found a substantial proportion of enhancers overlapping TSS-proximal regions [18,19]. Signals for super-enhancers identified in each cell line were quantified in progression. Briefly, super-enhancer bed files for each cell line were collapsed into AllSEs.bed using bedtools merge. Coverage of AllSEs.bed was quantified using bam files for each cell line in the progression series using bedtools intersect. Reads were normalized to the sequencing depth, and fold-change was calculated. H3K27ac ChIP seq data for ER + patients was obtained from European Nucleotide Archive under project no. PRJEB22757 and for TNBC patient samples and TNBC cell lines under project no. PRJEB33558.

Acquired/lost super-enhancer identification
Acquired super-enhancer regions were classified at the AT1 stage in progression by comparing super-enhancers ranked in AT1 cells with super-enhancers ranked in MCF10A cells and only keeping those that were ranked as super-enhancers in AT1 but not in MCF10A cells using bedtools intersect. Similarly, super-enhancers acquired at the DCIS stage were those that were ranked in DCIS cells but were not ranked in MCF10A and AT1 cells. Lastly, super-enhancers acquired at the CA1 stage were only ranked in CA1 cells but not in MCF10A, AT1, and DCIS cells. Lost super-enhancers were identified by comparing super-enhancers ranked in AT1, DCIS, and CA1 cells with super-enhancers in MCF10A cells and observing those that were not present in the corresponding cells but were in the normal MCF10A cells. Genes within 50 kb of the acquired/lost super-enhancers were classified using bedtools closest function. Gene Ontology (GO) analysis of these genes was performed using ENRICHR.

Statistical analysis
All experiments were run in triplicate, except ChIP seq, which were done in duplicates. Data are represented as the mean ± standard deviation (mean ± SD). All statistical analyses were carried out using GraphPad Prism 9 Software (La Jolla, CA, USA). Statistical significance for SE-lncRNAs and mRNAs expression in our progression series were characterized by one-way ANOVA with Tukey correction. All significant and non-significant results are shown in Additional file: 7 Table S4. Expression of 14 target SE-lncRNAs in FFPE DCIS and IDC patient samples was analyzed by unpaired Student's t-test and p-values are listed below in Fig. 4d and e for our two promising targets, while rest are shown in Additional file 2: Figure S2. Paired Student's t-test was used to analyze knockdown of the two promising SE-lncRNAs as well as the corresponding expression of their associated mRNA, p < 0.05 (compared with the control antisense oligonucleotide) was considered significant and is marked with an asterisk in the figures.

Global analysis of SE-lncRNAs acquired in IDC progression
Underlying mechanisms that support breast cancer progression have been well studied. However, clear functional determinants segregating non-invasive from invasive tumors have yet to be defined. Acquired lncR-NAs transcribed from super-enhancer loci can lead to discovering markers of progression, improving breast cancer diagnostics and treatment for patients. The MCF10A progression series mimics progression of breast cancer originating within the epithelial cells of the mammary ducts. The progression series was originally generated from MCF10A (10A) cells, a spontaneously immortalized mammary epithelial cell line derived from benign breast tissue from a woman with fibrocystic disease [20]. MCF10A cells were transformed with oncogenic HRAS to generate MCF10AT1 pre-malignant cells that form atypical ductal hyperplasia in mice [20]. MCF10DCIS.com (DCIS or DCIS.com) cells were derived from MCF10AT1 xenograft model and form predominantly comedo DCIS when injected into mice [21]. MCF10CA1 (CA1) cells, derived from MCF10AT1, form poorly differentiated malignant tumors in xenograft models [22]. Arraystar has developed a platform to comprehensively study super-enhancer lncRNAs and their downstream targets [23]. Taking advantage of this commercially available approach, we assessed SE-lncRNA expression within the MCF10A progression series. This analysis interrogated 7753 SE-lncRNAs (Fig. 1a), as well as 7040 associated mRNAs [24]. We found that super-enhancer associated lncRNAs are dynamically expressed during breast cancer progression. There are patterns of both SE-lncRNAs being acquired and lost as well as several SE-lncRNAs that exhibit a stepwise gain or loss in expression in progression (Additional file 4: Table S1). While the heat map from our analysis (Fig. 1a) demonstrates dynamic expression changes, the majority of these lncRNAs are not functionally defined.
Enhancer elements may become engaged either distally or locally to impact target gene expression. Transacting lncRNAs are transcribed, processed, and then vacate their sites of transcription to exert their function elsewhere, akin to mRNAs. Their final destination, be it in the cytoplasm or nucleus, does not depend on their transcription site [25]. By contrast, cis-acting lncRNAs are those whose activity is based at and dependent on the loci from which they are transcribed. Transcripts with the potential of acting in cis likely make up a substantial portion of known lncRNAs: the majority of lncRNAs are enriched in the chromatin fraction, and specifically are tethered to chromatin, presumably at their sites of transcription, through Pol II [25,26]. If functional, this would indicate that the effects of these lncRNAs are centered at these loci. In addition, the fairly low levels at which lncR-NAs are generally expressed, oftentimes just a few molecules per cell, naturally favor a cis mechanism of action, as diffusion or transport to other cellular compartments would render these transcripts too diluted to mediate a plausible function [27]. Identifying distally engaged enhancer elements is difficult since within the threedimensional structure of the nucleus there are a myriad of possible locations they can interact.
To classify super-enhancers that may become locally engaged and regulate cancer progression, we applied filters to identify potential cis-acting SE-lncRNAs (Fig. 1b). First, from the 7753 SE-lncRNAs screened (Fig. 1b), we sorted and extracted SE-lncRNAs that are up or down regulated from normal 10A cells to invasive CA1 cells (fold change cut-off: ± 2). Furthermore, from these SE-lncRNAs that met our cutoff, we identified those which have neighboring genes within 50 kilo-bases upstream or downstream of the lncRNA that also demonstrate changing expression. The 50 kb window upstream and downstream were assigned as enhancers tend to loop to and associate with adjacent genes in order to activate their transcription [28] and primarily these interactions occur within a distance of ∼ 50 kb of the enhancer locus [29]. This allowed us to identify 138 SE-lncRNAs (Additional file 5: Table S2). Gene Ontology (GO) assessment on the list of mRNAs was performed to provide insight into what pathways the SE-lncRNAs might be regulating (Fig. 1c). Many pathways that appear in our GO analysis, such as focal adhesion, p53 signaling pathway, and JAK-STAT signaling pathway, constitute a major group of related signaling pathways that control proliferation, survival, angiogenesis, and metastasis of breast cancer, suggesting these SE-lncRNAs could be regulating canonical cancer promoting genes.

Identifying clinically relevant potentially cis-acting SE-lncRNAs
To give clinical relevance to our data, these 138 SE-lncR-NAs were cross-referenced with The Atlas of Non-coding RNAs In Cancer (TANRIC), which compiles patient data from The Cancer Genome Atlas (TCGA), and data from Cancer Cell Line Encyclopedia (CCLE) [30]. From our list of 138 SE-lncRNAs that are associated with mRNAs within the MCF10A series, we identified 27 SE-lncRNAs that are annotated in patient samples within TANRIC (Fig. 2) (Additional file 6: Table S3). If a SE-lncRNA is cis-acting and impacting enhancer activity or associated mRNA expression, then a change in SE-lncRNA expression should be accompanied by a change in target mRNA expression. Therefore, we next identified mRNAs whose expression levels correlated in patient samples within all subtypes of breast cancer for each of these 27 SE-lncR-NAs (Pearson correlation coefficient ≥ 0.5 or ≤ − 0.5) (Additional file 6: Table S3). This analysis identified 11 SE-lncRNAs as potentially cis-acting (Fig. 3a). The remaining 16 SE-lncRNAs were further filtered out to focus on those whose neighboring mRNAs expression showed similar trends from normal to tumorous cells in patient samples and our progression series (Fig. 3b) (Additional file 6: Table S3). Lastly, we took the most differentiated SE-lncRNAs (Fig. 1b) within the progression series from our array (fold change cut-off: ± 10) and highlighted 4 that were annotated by TANRIC (Fig. 3c). One-way ANOVA with Tukey correction was performed to gauge statistical significance in expression changes in progression for all SE-lncRNAs presented in Fig. 3 and are reported in Additional file 7: Table S4. From our approach to identify potentially cis-acting SE-lncRNAs, we narrowed our list from 7753 to 31, with 27 promising targets as well as 4 SE-lncRNAs selected for follow up due to high differential expression within disease progression.

Characterization of potential cis-acting SE-lncRNAs
Our approach to identify cis-acting SE-lncRNAs in progression provided 27 potential targets, 11 of which are highly promising, as well as 4 SE-lncRNAs with the highest differential expression within disease progression. Cis-acting SE-lncRNAs are expected to be localized within the nucleus; thus, to focus and narrow our list of potentially cis-acting SE-lncRNAs that are active in progression of DCIS to IDC, we investigated their sub cellular location. Whole cell lysate, as well as separated cytoplasmic fraction and nuclear fraction were extracted from the 10A and CA1 cells. These cell lines were chosen to determine if localization changed during progression. For all samples, 87.5% of the sample was used for RNA extraction, while 12.5% was used for an immunoblot to ensure the fractionation was done correctly (Fig. 4a). The 7:1 ratio was used to maximize RNA content as lncRNAs are relatively lowly expressed. From the 31 SE-lncRNAs, 14 were primarily localized within the nucleus (Fig. 4b,  c). The remaining 17 were either localized within the cytoplasm or were expressed at low levels that localization could not be determined (Additional file 1: Figure  S1).
Next, we utilized 24 DCIS and 24 IDC Formalin-Fixed Paraffin-Embedded (FFPE) patient samples of varying subtypes to assess the expression levels of the 14 SE-lncRNAs. Of the 14 SE-lncRNAs we tested in patient samples, RP11-379F4.4 and RP11-465B22.8 stood out as their increase in expression from DCIS to IDC was statistically significant and the expression levels matched our MCF10A model (Fig. 4d, e). The remaining targets either did not show significant results (Additional file 2: Figure S2) or expression was not determined. Albeit RP11-465B22.8 was not identified with an associated mRNA, the fact that it was the most up-regulated in progression prompted us to scan 50 kilobases upstream and downstream of the gene for potential target genes it might regulate. We discovered that this SE-lncRNA neighbors the miR-200 family of genes of which miR-200b is well studied to have a role as a tumor-suppressor [31] and, in other cases, a tumor promoter [32]. Having narrowed our list to one potential cis-acting SE-lncRNA and one SE-lncRNA that was highest differentiated in our model, we performed knockdown of the two targets and observed the expression of their associated gene in DCIS and CA1 cells (Fig. 4f ). A two-fold increase was seen in the expression of RARRES1 48 h after knockdown of RP11-379F4.4 in DCIS and CA1 cells. Conversely, about a two-fold decrease in expression was observed for miR-200b 48 h post knockdown of RP11-465B22.8. These results illustrate that RP11-379F4.4 and RP11-465B22.8 are involved in regulating the expression of their neighboring gene. Importantly, this result shows that the lncRNA transcripts themselves are involved in regulating their neighboring gene and not the act of transcription of the SE-lncRNA itself. We have discovered several SE-lncRNAs that exhibit dynamic expression in  Top two are upregulated SE-lncRNAs, while bottom two are down regulated. One-way ANOVA with Tukey correction was carried out to evaluate statistical significance of gene expression between cell lines, n = 3, * = P < 0.05, error bars represent standard deviation. Full statistical analysis is presented in Additional file 6: Table S3 progression, but more importantly, our approach has identified 2 of the most promising, potentially cis-acting, target SE-lncRNAs (Fig. 4d, e) that, by regulating nearby gene expression, could be crucial in the progression of indolent DCIS to IDC.

Classification of super-enhancers in breast cancer progression
To complement our SE-lncRNA data, we identified super-enhancers in the MCF10A progression series. Enhancers are critical signaling elements regardless of their association with SE-lncRNAs, therefore we performed H3K27ac chromatin immunoprecipitation (ChIP) to identify global enhancer activity (Additional file 3: Figure S3). This analysis enabled us to identify superenhancers at each stage of progression as well as active elements that may be predictive of progression. H3K27ac has previously been demonstrated to identify 2-3 times more enhancer regions of interest than Med1 alone [33]. Super-enhancer regions were identified using the Rank Order Super-Enhancer (ROSE) algorithm [34] (Fig. 5a). Our analysis determined 403 super-enhancer regions in 10A, 627 in AT1, 1053 in DCIS, and 320 in CA1 cells (Additional file 8: Table S5). Interestingly, a stepwise increase in the number of super-enhancer regions was seen from MCF10As to DCIS cells, however in invasive CA1 cells the number of super-enhancers regions classified decreased. Although the number of classified superenhancers decreased in CA1 cells, the H3K signal is, on average, threefold more. In other words, even though the number of genomic regions that meet the threshold to be ranked as super-enhancers decrease from 10A to CA1 cells, a higher H3K signal is seen in CA1 cells indicating a higher expression of those genomic regions.
Despite being classified as a super-enhancer in one cell line, a region may not be classified as a super-enhancer in another because ROSE compares signal intensity within cell lines and not between cell lines to rank superenhancers. Consequently, we analyzed H3K27ac signal intensity of super-enhancer regions identified in each cell line in the MCF10A progression series and observed their signal in the corresponding cell lines (Fig. 5b). Many of the super-enhancers classified in MCF10As lose signal intensity in AT1, DCIS, and CA1 cells indicating a loss of genomic expression within these regions. Similar trends of loss and gain of signal of the super-enhancers classified in AT1 and DCIS cells are seen (Fig. 5b). In conjunction with the higher H3K27ac signal observed for all CA1 identified super-enhancers, most of these regions are upregulated for H3K27ac signal (Fig. 5b). Correspondingly, we scanned the ranked super-enhancer regions in the progression series for our 2 identified SE-lncRNAs (Fig. 4d, e), RP11-379F4.4 and RP11-465B22.8. H3K27ac signal intensity was observed to increase in a stepwise manner for both targets (Fig. 5c), demonstrating that these genomic regions increase in accessibility in progression and contribute to a higher expression of their neighboring genes. Furthermore, RP11-379F4.4 is ranked as a super-enhancer in DCIS (#8) as well as CA1 (#92) cells (Additional file 8: Table S5). Though the region containing RP11-465B22.8 was not classified as a super-enhancer in our progression, it still exhibits higher H3K27ac occupancy in DCIS and CA1 cells when compared to normal. Specifically, the region upstream of RP11-465B22.8 and the miR-200 family is higher in H3K27ac signal intensity. Super-enhancer regions recruit transcription machinery and tend to loop to promoter regions of genes they regulate [35]. Hence, that region (and its H3K27ac occupancy) is consistent with the profile of an active enhancer involved in regulation of the expression of miR-200 family of genes with the help of RP11-465B22.8. These changes in the enhancer activation of DNA within progression are crucial to helping understanding progression of normal tissue to cancerous.

Acquired/lost super-enhancers in breast cancer progression
We examined our super-enhancer list to unravel newly acquired super-enhancers at each stage in progression in addition to super-enhancers lost at each stage relative to MCF10A cells (Fig. 6). 383 super-enhancers were newly acquired at the AT1 stage, 684 were newly acquired in DCIS, while only 28 were newly acquired at the CA1 stage (Additional file 8: Table S5). Consistent with previously established trends, most of the newly acquired super-enhancers were classified in AT1 and DCIS cells. Gene ontology assessment on the closest genes to these regions reveal many acquired pathways including STAT signaling in AT1 and NF-kB signaling in DCIS, which are putative pathways known for promoting proliferation and tumorigenesis (Fig. 6). Conversely, 173, 120, and 259 super-enhancers were lost at AT1, DCIS, and CA1 stages, respectively, (Additional file 8: Table S5). Similarly, a gene ontology assessment on genes (50 kb up or downstream (100 kb total) as enhancers tend to regulate genes within approximately a 50 kb distance upstream and downstream of its locus [29]) near lost super-enhancers reveal protein folding and local estrogen production as major pathways lost in progression (Fig. 6).

Comparison of acquired/lost super-enhancers in the MCF10A model with super-enhancers classified in ER + and triple negative breast cancer patients
To couple our MCF10A series findings with patient data, we identified super-enhancers in 47 estrogen receptor positive (ER +) patient samples [36], 10 triple-negative breast cancer (TNBC) patient samples, and 11 triple-negative breast cancer cell lines (TNBCC) [37] (Additional file 9: Table S6). Comparison analysis of acquired superenhancer regions at each stage in progression was performed with super-enhancer regions in patient samples and TNBCC (Fig. 7a) (Additional file 9: Table S6). Superenhancers acquired/lost in progression that were most represented or absent in ER + and TNBC patients are listed with respect to their nearest gene (Fig. 7a). We highlight super-enhancer regions and their neighboring genes that were well represented in patient samples, TNBCC, and our progression series (Fig. 7b). Ephrin type-A receptor 2 (EphA2) region was classified as a super-enhancer at the DCIS stage in progression as well as being classified as a super-enhancer region in 34/47 ER + patients and 10/10 TNBC patients. Similarly, the region containing Cadherin 23 (CDH23) which was acquired at the CA1 stage in progression as a super-enhancer, was classified as a super-enhancer in 6/47 ER + patients and 10/10 TNBC patients. Interestingly, Transcription Factor AP-2 Alpha (TFAP2A) region was not ranked as a super-enhancer in any of the ER + patients while it was in all TNBC patients. Sumoylation of TFAP2A has been shown to block its ability to induce the expression of luminal genes and maintain a basal/triple-negative cancer subtype [38]. At the same time, genomic regions containing Glutaredoxin 2 (GLRX2), Laminin Subunit Alpha 2 (LAMA2), and Growth Arrest Specific 5 (GAS5), which are lost at the AT1 and DCIS stages, respectively, were not ranked as super-enhancers in any of the patients or TNBC cell lines. GAS5 is a well-studied down regulated lncRNA in breast cancer while GLRX2 is a protein that localizes to the mitochondria where it functions in mitochondrial redox homeostasis and is important for the protection against and recovery from oxidative stress [39]. Considering metabolic reprogramming is one of the pathways acquired (Fig. 6), epigenetic changes within the GLRX2 region could play a crucial role in progression. Super-enhancers are central to driving expression of genes controlling cell identity and stimulating oncogenic Here we have highlighted super-enhancers acquired/ lost in progression and cross-referencing these regions with super-enhancers in patient samples, we unravel epigenetic changes driving cell identity and progression. Lastly, this validates results from our model with patient samples and current literature demonstrating a robust discovery platform.

Discussion
Currently, functional determinants of DCIS progression to an invasive lesion are unknown [17]. This study profiles global SE-lncRNA expression in the MCF10A progression series giving insight into numerous SE-lncRNAs that are differentially expressed in progression. Furthermore, these SE-lncRNAs can play essential roles in transcriptional regulation through controlling SEs activity to regulate a broad range of physiological and pathological processes, especially tumorigenesis. Equally, SE-lncRNAs can regulate gene expression by affecting gene promoter activity. Although SE-lncRNAs significantly contribute to gene expression, the systematic identification of SE-lncR-NAs and their regulated genes still lacks comprehensive recognition [40] [41]. Hence, this study also elucidates the expression levels of mRNAs associated with those SE-lncRNAs. In addition, using stringent and comprehensive set of filters that combined Cancer Cell line data with patient data, we have highlighted 27 potentially cis-acting SE-lncRNAs and their target mRNAs coupled with 4 SE-lncRNAs that are highest differentiated in disease progression. From this list we have identified RP11-379F4.4 (AC080013.1) as a promising cis-acting SE-lncRNA to its target gene Retinoic acid receptor responder element 1 (RARRES1). Interestingly, RARRES1 functions as an invasion suppressor. This function of the gene was confirmed in metastatic prostate cancer (CaP) cell line (PC3M) by Oldridge et al. [42]. Likewise, RARRES1 is able to increase Sirtuin 1, while it decreases the mechanistic target of rapamycin (mTOR), two important regulators of energy homeostasis. RARRES1 is differentially expressed in metabolic diseases and is associated with biological hallmarks that require metabolic reprogramming. Metabolic reprogramming is now considered a hallmark of cancer etiology [43]. Although RARRES1 is among the most commonly methylated genes in multiple cancers, it is increased in basal-like hormone receptor negative breast cancer and in liver cirrhosis, a risk factor for hepatocellular cancer [44].
Similarly, we also have highlighted RP11-465B22.8 (AL390712.1) as the most differentiated SE-lncRNA in progression. Comparatively, miR-200b, neighboring RP11-465B22.8, is part of the well-known tumor suppressor miR200 family. The family of miR-200 includes five members: miR-200a, miR-200b, miR-200c, miR-429, and miR-141. miR-200b, which acts as an antioncogene, participates in the proliferation and metastasis inhibition of different kinds of cancers by downregulating target molecules. For instance, miR-200b inhibition promotes Rac1 activation and increases the metastatic potential of HBEC cells [45]. miR-200b can repress angiogenesis by targeting angiogenic factors and receptors [46]. It can inhibit the epithelial to mesenchymal transition (EMT) by inactivating transcription factors in breast cancer. miR-200b is associated with the estrogen receptor status of breast cancer cells [46,47]. Zheng et al. also highlight Fucosyltransferase IV (FUT4) could apply as a novel target for miR-200b that suppresses the proliferation and metastasis of breast cancer cells by reducing α1,3-fucosylation and LeY biosynthesis of glycoproteins [31]. Thus, SE-lncRNAs RP11-379F4.4 and RP11-465B22.8 and their respective potential targets are promising candidates for their cis-acting capabilities leading to progression. Further studies will have to be conducted to assess the mechanism of cis action for the SE-lncRNAs and how they promote DCIS lesions toward IDC.
In the past decade, increasing evidence has revealed that super-enhancers play a vital role in tumorigenesis and there is great interest in developing super-enhancer therapeutics, thus, this study also elucidates superenhancers that are acquired or lost in progression [6]. We profiled H3K27ac using ChIP in the MCF10A progression series. As enhancers and super-enhancers play an important role in driving cell identity, the alteration in intensity of the H3K27ac in progression paints a picture of epigenetic changes which could be leading to the hijacking of genes involved in various hallmarks of cancer.
Correspondingly, we uncover newly acquired superenhancers at each stage in progression in addition to super-enhancers lost at each stage relative to MCF10A cells. This analysis enabled us to highlight and distinguish loci that are activated/suppressed. Importantly, we identified 28 super-enhancers that are acquired from DCIS to CA1 transition and over 100 super-enhancers that are lost from DCIS to CA1. Furthermore, we classify genes neighboring these acquired/lost regions which identify pathways that contribute to progression. For example, STAT signaling is acquired in AT1 transition from normal cells, while NF-kB signaling is acquired in the transition to DCIS. Similarly, protein folding and local estrogen production pathways are lost overall in progression in addition to regions responsible for cell cycle regulation. These are canonical cancer pathways leading to proliferation and metastasis. Here we show how super-enhancers being activated/suppressed neighboring genes involved in these pathways play a pivotal role in their misregulation.
Comparison of acquired/lost super-enhancer regions with super-enhancer regions classified in 47 ER + patients, 10 TNBC patients, and 11 TNBC cell lines provides clinical relevance. This comprehensive analysis reveals epigenetic changes at the genome wide level in breast tumors. For example, Ephrin type-A receptor 2 (EphA2), is a receptor tyrosine kinase which binds ephrin-A family ligands residing on adjacent cells, leading to contact-dependent bidirectional signaling into neighboring cells. This gene has been known to regulate migration, integrin-mediated adhesion, proliferation, and differentiation of cells through DSG1/desmoglein-1 and inhibition of the ERK1/ERK2 signaling pathway [48]. EphA2 has been implicated in breast tumors and resistance of tumors to targeted therapies [48][49][50]. Targeting EphA2 has been shown to inhibit cell cycle progression and proliferation [48]. The genomic region containing EphA2 is an acquired super-enhancer at the DCIS stage in our progression model. Interestingly, it is also classified as a super-enhancer region in 34/47 ER + patient samples and 10/10 TNBC patients in our analysis. This explains the observed overexpression of this gene in breast tumors and can provide new targeting methods. Identically, the region containing GAS5 lncRNA is classified as a superenhancer on normal MCF10A cells, however, is lost at the AT1, DCIS, and CA1 stages. Neither is it classified as a super-enhancer in all 47 ER + and 9/10 TNBC patients implying a mechanism of the down-regulation seen of this lncRNA in breast tumors. GAS5 can bind to the DNA binding domain of the glucocorticoid receptor inactivating it and subsequently inhibiting the regulation of its target genes [51]. In addition, GAS5 can regulate the transcriptional activity of other receptors, such as androgen and progesterone and has been suggested as a potential tumor suppressor due to its pro-apoptotic function [52]. The epigenetic changes occurring within this region can explain the down-regulation of this lncRNA and reduce its effects in carrying out normal processes leading to tumorigenesis.
Lastly, we also analyzed the regions that contained our two identified targets, RP11-379F4.4 and RP11-465B22.8, to see if they were classified as super-enhancers in progression. RP11-379F4.4 was ranked as a super-enhancer in DCIS as well as CA1 cells and is seen to be acquired during progression. RP11-465B22.8 was not ranked as a super-enhancer by the ROSE algorithm in any of the cells, however, the signal intensity of H3K27ac saw a dramatic stepwise increase from normal to CA1 cells. This result verifies the enhanced transcription that is observed of these SE-lncRNAs and hints at a possible role that they may play to induce progression. We also examined if RP11-379F4.4, which was an acquired super-enhancer at the DCIS stage in progression, was classified as a superenhancer in any of the patient samples. Although RP11-379F4.4 was not classified as a super-enhancer in patient samples, its classification as an enhancer represents open DNA, and most likely, the cause of its higher expression seen in CA1 cells and patient tumors. Super-enhancers associate with key oncogenes in cancers and drive expression of genes that define cell identity. Additionally, cancer cells can acquire super-enhancers at oncogenes while losing super-enhancers at tumor suppressing genes. Understanding the alterations in the genomic landscape within breast tumors will uncover underlying biology that can be useful in diagnostic and targeted therapies.
The data presented here highlight several SE-lncRNAs that through their potential cis-acting abilities play an important role in progression of DCIS lesions into invasive IDC. Furthermore, we have identified two promising target SE-lncRNAs that may drive cancer progression through the regulation of their neighboring gene. Future studies will expand on understanding their potential cisacting functions. Equally, we reveal acquired/lost superenhancers in progression coupled with patient data that can help elucidate the epigenetic alterations promoting cancerous phenotypes. While there may not be one key protein that determines DCIS progression, understanding the networks of signaling pathways that change during progression unravel critical changes that push a DCIS lesion to be invasive.

Conclusion
Altogether, this comprehensive study of breast cancer cell lines coupled with patient samples provides a unique platform that identifies differentially expressed SE-lncR-NAs and acquired/lost super-enhancers in progression of breast cancer important for promoting DCIS lesions to IDC.