Skip to main content

Association of early menarche with breast tumor molecular features and recurrence

Abstract

Background

Early menarche is an established risk factor for breast cancer but its molecular contribution to tumor biology and prognosis remains unclear.

Methods

We profiled transcriptome-wide gene expression in breast tumors (N = 846) and tumor-adjacent normal tissues (N = 666) from women in the Nurses’ Health Studies (NHS) to investigate whether early menarche (age < 12) is associated with tumor molecular and prognostic features in women with breast cancer. Multivariable linear regression and pathway analyses using competitive gene set enrichment analysis were conducted in both tumor and adjacent-normal tissue and externally validated in TCGA (N = 116). Subgroup analyses stratified on ER-status based on the tumor were also performed. PAM50 signatures were used for tumor molecular subtyping and to generate proliferation and risk of recurrence scores. We created a gene expression score using LASSO regression to capture early menarche based on 28 genes from FDR-significant pathways in breast tumor tissue in NHS and tested its association with 10-year disease-free survival in both NHS (N = 836) and METABRIC (N = 952).

Results

Early menarche was significantly associated with 369 individual genes in adjacent-normal tissues implicated in extracellular matrix, cell adhesion, and invasion (FDR ≤ 0.1). Early menarche was associated with upregulation of cancer hallmark pathways (18 significant pathways in tumor, 23 in tumor-adjacent normal, FDR ≤ 0.1) related to proliferation (e.g. Myc, PI3K/AKT/mTOR, cell cycle), oxidative stress (e.g. oxidative phosphorylation, unfolded protein response), and inflammation (e.g. pro-inflammatory cytokines IFN\(\alpha\) and IFN\(\gamma\)). Replication in TCGA confirmed these trends. Early menarche was associated with significantly higher PAM50 proliferation scores (β = 0.082 [0.02–0.14]), odds of aggressive molecular tumor subtypes (basal-like, OR = 1.84 [1.18–2.85] and HER2-enriched, OR = 2.32 [1.46–3.69]), and PAM50 risk of recurrence score (β = 4.81 [1.71–7.92]). Our NHS-derived early menarche gene expression signature was significantly associated with worse 10-year disease-free survival in METABRIC (N = 952, HR = 1.58 [1.10–2.25]).

Conclusions

Early menarche is associated with more aggressive molecular tumor characteristics and its gene expression signature within tumors is associated with worse 10-year disease-free survival among women with breast cancer. As the age of onset of menarche continues to decline, understanding its relationship to breast tumor characteristics and prognosis may lead to novel secondary prevention strategies.

Introduction

Early menarche, the onset of the menstrual cycle at an early age (< 12 years old, the median age at menarche in the United States), is consistently associated with increased breast cancer risk [1, 2]. In large, pooled studies and meta-analyses, each year of younger onset of menarche was associated with a 5–9% increased risk of breast cancer [1,2,3]. The increase in breast cancer risk due to lengthening of reproductive cycling is thought to arise from higher levels and longer exposure time to estrogens [1, 2], which are known mitogens and drive a number of cancers [4]. However, no prior study to our knowledge has comprehensively investigated tumor molecular characteristics associated with early menarche. Further, the impact of early menarche on breast cancer prognosis also remains largely under-studied. Data from the National Health and Nutrition Examination Survey (NHANES) reveal that average ages of menarche declined by as much as 11 months between 1920 and 1980, with non-Hispanic Black women showing the largest decline in mean age of menarche (14 months) [5]. As the decreasing secular trend of age of onset of menarche continues [6], understanding the underlying mechanisms through which early menarche is associated with tumorigenesis could lead to novel prevention strategies. However, there is a general paucity of data linking early life exposures, such as age of menarche, with molecular tumor features that occur decades later in life. Here, we utilized the long-term prospective epidemiological data and enriched tumor molecular data in the Nurses’ Health Studies (NHS), as well as validation using The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) databases, to investigate how age at menarche may be associated with tumor molecular characteristics and prognosis in women with breast cancer.

Methods

Study population

We used data from two large-scale prospective cohorts of registered female nurses in the United States, the NHS and NHSII. NHS (established in 1976) recruited 121,701 women between ages 30 and 55 years and NHSII (initiated in 1989) enrolled 116,429 women between ages 25 and 42 years. In both cohorts, participants completed an initial study questionnaire and subsequent questionnaires biennially afterward; cumulative follow-up rates were greater than 90% [7]. As described previously [8], invasive breast cancer cases were identified initially by questionnaire (start of follow-up to 2012) or National Death Index search upon lack of participant response; breast cancer cases were confirmed by centralized medical record review using established protocols. No breast cancer cases included in our analysis had any prior personal history of cancer. Completion of the questionnaire was considered to imply informed consent upon study protocol approval by the Institutional Review Boards of the Brigham and Women's Hospital (Boston, MA) and Harvard T.H. Chan School of Public Health (Boston, MA) in 1976 (NHS) and 1989 (NHSII). NHSI/II were conducted in accordance with recognized ethical guidelines (Declaration of Helsinki).

Gene expression measurements

954 incident breast cancer cases within the study were eligible for transcriptomic analysis [10], which had participant characteristics similar to those without gene expression data. RNA was extracted from multiple cores of 1 or 1.5 mm procured from FFPE tumor (n = 1–3 cores) and matched normal adjacent tissue taken from > 1 cm from the tumor margin during surgery (n = 3–5 cores) and isolated using the Qiagen AllPrep RNA Isolation Kit. Transcriptome-wide gene expression was profiled using Affymetrix Glue Grant Human Transcriptome Array 3.0 (hGlue 3.0) and Human Transcriptome Array 2.0 (HTA 2.0) microarray chips. Normalization was performed using robust multiarray averages. Data was log-2-transformed and Affymetrix Power Tools probeset summarization-based metrics were used for quality control. After quality control and exclusion based on missing data, 846 tumor and 666 normal-adjacent tissues were used in this analysis. Gene expression data was deposited in Gene Expression Omnibus and is publicly available (GEO#; GPL22920, GSE93601, GSE115577). Of note, participant characteristics of breast cancer cases with and without gene expression were similar [11]. The most variable probe was selected to represent the gene when genes were mapped by multiple probes. ComBat, which is an empirical Bayes method for batch effects, was used to control for technical variabilities [12]. Genes with expression in the lowest quartile (< 25%) were excluded from the analyses.

Exposure and covariates

Age at menarche (age in years) and height (measured in feet and inches) were reported on the baseline study questionnaire. Weight at age 18 was reported during 1980 questionnaire cycle for NHSI and baseline study questionnaire for NHSII. Race was reported during 1992 questionnaire cycle for NHSI and baseline study questionnaire for NHSII. History of oral contraceptive use, menopausal status, parity, age at breast cancer diagnosis, calendar year of breast cancer diagnosis, weight and physical activity level at time of diagnosis were obtained via the biennial NHS and NHSII questionnaires. BMI was calculated by dividing the participant’s weight in kilograms by their height in meters squared (kg/m2). Tumor characteristics (stage and grade), treatment information (chemotherapy, radiotherapy, and endocrine therapy) were obtained from medical records or supplemental questionnaire. Estrogen receptor (ER) status was determined after central review of breast cancer tissue microarrays (TMAs), or pathology reports if missing. Based on previous literature, we defined early menarche as menstruation occurring before age 12, the median age at menarche in the U.S [9]. Therefore, age at menarche, our primary exposure, was dichotomized and modeled as a categorical variable of “early” (< 12 years old) vs. “later” (≥ 12 years old). Our analyses were restricted to complete cases that included the following covariates, selected a priori: age at breast cancer diagnosis (continuous), year of diagnosis (continuous), tumor stage (1–4), chemotherapy (yes/no/unknown), radiation (yes/no/unknown), endocrine therapy (yes/no/unknown), oral contraceptive use (current user/past user/never user/unknown), race (White/non-White), parity (continuous), BMI at age 18 (continuous), BMI change (BMI at diagnosis – BMI at age 18), and physical activity at time of diagnosis (continuous). Within our analytic cohort with gene expression measurements, 52 cases were excluded from analysis due to missing information on age at menarche; missing BMI and physical activity data were imputed using the median. Secondary analyses were also performed modeling age at menarche as continuous.

Breast cancer recurrence

Breast cancer recurrences were determined as previously described [13]. Briefly, supplemental questionnaires were sent to living cohort members with a previously confirmed diagnosis of breast cancer. Reports of new cancers of the liver, bone, brain, and lung—common sites of breast cancer metastasis—following their breast cancer diagnosis were considered breast cancer recurrences. Participants who died from breast cancer without previous report of recurrence were also presumed to have a breast cancer recurrence. The time scale of disease-free survival is thus defined as the time from initial diagnosis to either recurrence or end of follow-up, with participants who died of other causes censored at time of death. Deaths were most commonly reported by families, and deaths among nonrespondents were identified through the National Death Index, as is consistent in previous NHS analyses [14]. Once a death was reported, the specific cause was subsequently determined by medical record review or death certificate.

Statistical analysis

Age at menarche and gene expression

We evaluated the association between age at menarche with transcriptome-wide gene expression for each individual gene using covariate-adjusted linear regression (limma) [15]. Each regression model was adjusted for confounders determined a priori and surrogate variables generated from the transcriptome data (the leek method from Bioconductor sva package in R) [16]. Analyses were performed on tumor and tumor-adjacent tissues separately. Subgroup analyses stratified on ER-status based on the tumor. We used an FDR ≤ 0.1 to determine whether a gene meets transcriptome-wide significance [17]. Functional enrichment of biological pathways associated with age of menarche was performed using Correlation Adjusted Mean Rank (CAMERA), a competitive gene set test [18] using an intergene correlation of 0.01. The same set of covariates used in the single gene analysis are controlled for here. We used the 50 cancer “hallmark” gene sets from the Molecular Signature Database (MSigDB; http://www.broadinstitute.org/gsea/msigdb/) in our pathway analyses and an FDR ≤ 0.1 to determine statistical significance. For external validation, pathway analyses were replicated in a small subset of TCGA for which RNA-sequencing data and information on age at menarche was available (N = 116) [10, 19]. For this validation dataset, six TCGA sites were originally contacted and data from three (Roswell Park Cancer Institute, University of Pittsburgh, Mayo Clinic) sites that agreed to collect or provide breast cancer risk factor data on these cases were included in this study, as previously described [20]. Covariates from TCGA were selected a priori to match those used in the NHS analysis as closely as possible, though there are some differences: age at breast cancer diagnosis (continuous), year of diagnosis (continuous), tumor stage (1–4), race (White/non-White), parity (continuous), BMI at diagnosis (continuous), ER status (yes/no), and menopausal status (yes/no). Pathway analyses were again performed using CAMERA as described above and hits with FDR ≤ 0.2 were considered significant for validation.

Age at menarche and PAM50 scores

Proliferation score and risk of recurrence scores based on PAM50 assay were computed as described previously in NHS [21]. Briefly, proliferation score is computed by averaging gene expression of 11 genes (BIRC5, CCNB1, CDC20, NUF2, CEP55, NDC80, MKI67, PTTG1, RRM2, TYMS and UBE2C) [22]. Risk of recurrence (ROR) score combines gene expression of 50 gene in the PAM50 assay with tumor size and nodal status to compute an integer score proportional to risk of recurrence (0–100). PAM50 ROR score has been found to be highly predictive of risk of distant relapse [23]. Multiple linear regression was performed using these scores as continuous dependent variables and age at menarche as the main predictor. PAM50 is frequently used to classify breast tumors based on their gene expression into five molecular subtypes that differ both in biological characteristics and prognosis [24]. Multinomial regression was performed to test association of early menarche with PAM50-based intrinsic molecular subtypes (luminal B, normal, HER2, and basal) compared to the least aggressive luminal A subtype [24]. All other covariates previously mentioned were used for adjustment.

Early menarche-derived gene expression signature and breast cancer recurrence

To examine associations between an early menarche-associated gene expression signature and breast cancer recurrence, we included individual genes from FDR-significant pathways in the tumor (receptor-agnostic) showing nominal significance (p ≤ 0.05) to create a gene expression score, calculated as ∑(z-transformed genes from positively regulated pathways)—∑(z-transformed genes from negatively regulated pathways). LASSO regression was used in glmnet in R to select the most predictive genes while preventing overfitting through shrinkage of the regression coefficients [25].

Discovery cohort: Nurses’ Health Studies We used Cox proportional hazards regression to examine the association between our menarche-associated gene expression signature and breast cancer recurrence among stage 1–3 breast cancer cases. Scores were modeled as categorical, using quartiles of expression as cut points to make scores from 1 to 4, with 1 representing the lowest score (most dissimilar to early menarche) and 4 representing the highest score (most similar to early menarche). Hazard ratios and 95% confidence intervals were calculated. Recurrence-free survival was defined as the time between cancer diagnosis and either recurrence (cancer detected at common metastatic sites, such as bone, brain, lungs, and liver) or death from breast cancer without reported recurrence (12). We evaluated an interaction term between score and the log of recurrence time to test violation of the proportional hazards assumption with a likelihood ratio test. Proportional hazards were violated; we therefore applied a piecewise Cox model using the crossing of the Kaplan Meier curves as our cut point, which was 10 years.

Validation cohort: METABRIC To assess the generalizability of our menarche-associated gene expression signature, we leveraged an independent dataset, the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), for validation. Using our NHS-derived gene signature, we computed the early menarche-associated score in tumors from METABRIC using the available gene expression data. We then used a Cox regression model to assess the association of the score with breast cancer recurrence. Covariates included: age at diagnosis, ER status, batch, menopausal status, cancer stage, and treatment (chemotherapy, radiotherapy, and/or hormonal therapy). Analysis was restricted to complete cases to include a total of 952 breast cancer cases, stage 1–3 only. Score was again modeled as categorical, using quartiles of expression in METABRIC as cut points. We evaluated an interaction term between score and the log of recurrence time to test violation of the proportional hazards assumption; no violation was found.

Results

Participant characteristics

Of the 846 NHS women with tumor gene expression data, 206 had early menarche (< 12 years of age) (Table 1). Demographics were similar between women with early or later menarche, including age and year at breast cancer diagnosis and race, which is predominantly White (~ 95%) in the NHS cohort. Nearly 70% of women were postmenopausal at diagnosis in both the early and later menarche groups. Frequencies of breast cancer stage were similar between groups, consisting of mostly stage 1–3 tumors. Modest differences were observed in therapies used in disease management, with radiation therapy more commonly used among women with later age at menarche than early menarche (52.8% vs. 33.5%, respectively). Conversely, 51.0% of those with early menarche received chemotherapy compared to 43.3% of those with later menarche. Endocrine therapy and oral contraceptive use were similar between groups. Women in both groups had a median of two children and similar levels of physical activity.

Table 1 Participant characteristics of breast cancer cases in the NHS with tissue gene expression data (N = 846)

Gene expression and pathway analysis in women who experienced early menarche

A full study workflow is presented in Fig. 1. We did not observe a statistically significant association between early menarche and any individual gene in tumor tissues. However, upon stratification by ER status, we identified 28 genes associated with early menarche in ER-positive tumors (FDR ≤ 0.1) (Supplemental Fig. 1, Table S1). Among these were genes associated with Notch and TGFß signaling pathways (DMXL2 and LRRC32, respectively), DNA damage (HUS1), cell stress and metastatic cell survival (ERLEC1), extracellular matrix (COL16A1), and proliferation (PIK3C3). However, no single genes were significant in ER-negative tumors. In the matched tumor-adjacent normal tissue, 369 individual genes were significantly associated with early menarche (FDR ≤ 0.1) (Table S2). Included among these were genes positively associated with extracellular matrix, cell adhesion, and invasion (e.g., RHOA, RAB1A, CTNND1, ITFG1, CAV1, CST3), cell cycle/proliferation (e.g., CDC16, CDC42, GSK3B, MAPKA5), and other genes with known roles in tissue transformation. Further stratification by estrogen receptor status found early menarche was significantly associated with 9 genes in normal tissues adjacent to ER+ tumors and 0 in normal tissues adjacent to ER− tumors. When comparing the 28 significant genes in ER+ tumor tissues and the 9 significant genes in ER+ adjacent normal tissues, 2 genes overlapped (HUS1 and PRKAG3).

Fig. 1
figure 1

Study workflow of gene expression analyses

In multivariable-adjusted competitive gene set enrichment analysis, 18 cancer hallmark pathways were significantly associated (FDR ≤ 0.1) with early menarche in tumor tissues; 23 pathways were significantly associated with early menarche in tumor-adjacent normal tissues (Table 2). Fifteen enriched pathways overlapped between tumor and normal tissues. In both tissue types, early menarche was associated with upregulation of pathways associated with proliferation (e.g. Myc, PI3K/AKT/mTOR, cell cycle), oxidative stress (e.g. oxidative phosphorylation, unfolded protein response), and inflammation (e.g. pro-inflammatory cytokines IFN\(\alpha\) and IFN\(\gamma\)). Further, in both tissues, early menarche was associated with upregulation of adipogenesis. Normal tissues also showed enrichment of unique pathways like epithelial to mesenchymal transition (EMT) and angiogenesis, features indicative of a cancer-promoting microenvironment and tissue transformation, and downregulation of myogenesis.

Table 2 Pathway enrichment analysis of age at menarche in breast tumors and normal-adjacent tissues

Stratified by estrogen receptor status, 19 significant pathways were observed in ER-positive tumor tissue and 21 in matched ER-positive normal tissue, all of which mirrored the unstratified analysis with upregulation of pathways involved in proliferation, cell stress, and cancer cell metabolism (Table S3); similar findings were observed in ER-negative tissues, with 9 pathways in tumor and 22 pathways in tumor-adjacent normal tissues detected with an FDR ≤ 0.1 (Table S4). Analyses modeling age at menarche as continuous had very similar results (Table S5-S7), where we observed that increasing age at menarche was associated with downregulation of proliferation and cellular stress pathways.

We replicated the pathway analysis in TCGA (Table 3, Table S8). Although limited by sample size (N = 116), early menarche was associated with an upregulation of many of the same proliferation-related signaling pathways (MTORC1 signaling: FDR = 0.005; PI3K-AKT-MTOR signaling: FDR = 0.05), cellular stress pathways (Protein Secretion: FDR = 0.02), and inflammation pathways (Interferon Gamma Response: FDR = 0.11) that we observed in NHS. In addition, we observed the upregulation of several biologically related but unique pathways associated with early menarche, such as those involved in estrogen response (Estrogen Response Early: FDR = 0.05; Estrogen Response Late: FDR = 0.06) and others involved in inflammation and innate immune response (Allograft Rejection: FDR = 0.05; Complement: FDR = 0.06; Coagulation: FDR = 0.12).

Table 3 Pathway enrichment analysis of age at menarche in TCGA

In PAM50 analyses (Table 4) of molecular subtype, early menarche was associated with increased odds of HER2-enriched (OR = 2.32 [1.46–3.69]) and basal-like (OR = 1.84 [1.18–2.85]) breast cancer subtypes relative to luminal A. Early menarche was associated with a significant increase in PAM50 proliferation score (\(\beta\)=0.082 [0.02–0.14], p = 0.009) and a higher risk of breast cancer recurrence, with an estimated increase of 4.81 in the PAM50 ROR score (\(\beta\)=4.81 [1.71–7.92], p = 0.003).

Table 4 Early menarche is associated with more proliferative and aggressive tumors in breast cancer

Early menarche-derived gene expression signature and risk of breast cancer recurrence in NHS and METABRIC

We next created an early menarche signature based on 28 genes selected from LASSO regularization regression and examined its association with 10-year disease-free survival (Table 5). The majority of the 28 genes included within the signature were involved in three main biological processes: (1) cell stress response and metabolism (e.g. BAG1, HUS1, DRAM2); (2) cell proliferation, differentiation, and invasion (e.g. and (3) inflammation. In the NHS, individuals with higher early menarche-associated gene expression scores had an 18% increased risk of recurrence, though not statistically significant (N = 836, Nevents = 105, HR of highest vs. lowest score quartile = 1.18 [0.70–2.0], p-trend = 0.403, Fig. 2A). In METABRIC, after covariate adjustment, higher early menarche-associated gene expression score (based on the same 28 genes identified originally in NHS) was associated with a 58% higher risk of cancer recurrence (N = 952, Nevents = 310, HR for highest vs. lowest score quartile = 1.58 [1.10–2.25], p-trend = 0.02, Fig. 2B, Table S9).

Table 5 Early menarche-derived gene expression signature
Fig. 2
figure 2

Association between early menarche and 10-year disease-free survival. Covariate−adjusted marginal survivor curves computed from Cox proportional hazards models for 10−year disease−free survival and early menarche gene expression score modelled as quartiles of expression, with 1 representing the lowest level of expression (the most dissimilar to early menarche) and 4 representing the highest level of expression (most representative of early menarche signature), in NHS ( A ) and METABRIC ( B ) cohorts. Covariates in NHS included age at breast cancer diagnosis (continuous), year of diagnosis (continuous), estrogen receptor status (yes/no), tumor stage (1–3), tumor grade (1–4), chemotherapy (yes/no/unknown), radiation (yes/no/unknown), endocrine therapy (yes/no/unknown), oral contraceptive use (current user/past user/never user/unknown), race (White/non-White), parity (continuous), BMI at 18 years of age (continuous), weight change (BMI at diagnosis – BMI at 18 years of age), and physical activity at time of diagnosis (continuous). Covariates in METABRIC attempted to mirror NHS covariates as closely as possible, though not all were available; they included age at diagnosis, estrogen receptor status, batch, menopausal status, cancer stage, and treatment (chemotherapy, radiotherapy, and/or hormonal therapy)

Discussion

In this analysis of women with breast cancer in the NHS and NHSII, early menarche was strongly associated with cancer-promoting molecular changes in both tumor and normal-adjacent tissues, including, most notably, an enrichment of single genes and signaling pathways that drive cell proliferation and are commonly dysregulated in cancer. Similar gene expression differences were corroborated in a subset within TCGA with available data on age at menarche, with several additional pathways related to estrogen response also upregulated. Findings based on PAM50 metrics (molecular subtype, proliferation, risk of recurrence) were consistent with gene expression profiles that suggested association with more aggressive tumor disease. Our 28-gene early menarche-associated gene expression score was suggestively associated with worse survival in the NHS and significantly associated with worse survival in a larger external dataset, METABRIC.

While previous work in the field has focused on understanding the relationship between menarche and breast cancer risk, our gene expression analyses offer insight connecting a classical early life breast cancer risk factor and associations with molecular changes within tumors occurring decades later. We identified 369 individual genes significantly associated with early menarche in tumor-adjacent normal tissues, many of which were associated with cell transformation, invasion, and tumor-promoting changes to the tissue microenvironment. Further investigation into these genes and whether they may hold any biological insights in the tumorigenic process in women that underwent menarche at an early age are warranted. As these normal-adjacent tissues more closely recapitulate the normal breast, it should also be explored whether these genes and the signaling pathways in which they are involved may represent changes that occur early in life. After stratification by ER status, 28 genes in ER-positive tumor tissues were significantly associated with early menarche; these genes ranged in their biological function, with common threads of cell adhesion, DNA damage, and metastatic cell survival. Robust associations linked early menarche with a host of signaling pathways that included proliferation, oxidative stress, cancer cell metabolism, DNA damage repair, and inflammation in both tumor and matched normal tissues in both ER-positive and ER-negative tissues. Of note, the similar pathway enrichment we observed in both ER+ and ER- tissues is concordant with other studies that have shown that early menarche increases risk equally among hormone receptor positive and hormone receptor negative breast cancer subtypes [1, 2, 26,27,28]. As it becomes increasingly apparent that the length of reproductive exposure to estrogens alone may not be the only factor underlying the heightened risk that accompanies early menarche, as was previously believed, more mechanistic work is needed in this area.

In addition to proliferative, metabolic, and stress pathways, we observed associations between early menarche and pathways related to the tissue microenvironment, which included downregulation of myogenesis. Tumor-derived cytokines have been shown to impair myogenesis and alter the skeletal muscle immune microenvironment [29]; indeed, we observed a positive association with pro-inflammatory cytokines and a negative association with myogenesis with early menarche. Normal adjacent tissues also showed an upregulation of epithelial-mesenchymal transition and angiogenesis, both of which promote cancer spread.

Cancer-associated adipocytes are key players in breast cancer progression, undergoing metabolic reprogramming to support tumor cells through secretion of a variety of inflammatory and growth-promoting factors [30]. Early menarche was associated with a significant upregulation of adipogenesis. Women with more adiposity during childhood tend to have earlier menarche, even though early life body fatness has been associated with reduced breast cancer risk [31]. In the NHS, we recently showed that women with higher body fatness during childhood/adolescence was associated with the downregulation of pathways involved in cell stress, proliferation, and inflammation in breast tumors. 11 pathways identified for early life body fatness overlapped with our findings for early menarche, including 6 proliferation-related, 3 cell stress-related, and 2 microenvironment-related pathways. Interestingly, all show distinctly inverse directionality, which is consistent with the known complex relationships between early life body size, menarche, and breast cancer risk.

Though sample size was limited in TCGA, we observed trends and patterns in the tumor signaling pathway analyses similar to the NHS cohort. Four out of 10 of the significant pathways enriched in tumors from the TCGA directly overlapped with those identified within the NHS cohort, including upregulation of numerous cancer cell proliferation-, metabolism-, and inflammation-related pathways. In addition to those that directly replicated, tumors from women in the TCGA study population also exhibited a significant upregulation of distinct but related signaling pathways related to innate immune response, including complement, coagulation, and allograft rejection. Interestingly, early and late estrogen signaling was also upregulated in TCGA tumors from women with early menarche. As previously discussed, some have postulated that the increased breast cancer risk associated with early menarche may relate, in part, to higher levels and a longer exposure window to mitogenic estrogens. This hypothesis aligns with our data showing increased estrogen signaling in tumors was detected from women who underwent early menarche, even when adjusted for hormone receptor status. Some recent evidence also suggests that among women who experience menarche at an early age, those who have single nucleotide polymorphisms within genes involved in estrogen signaling are at higher risk of breast cancer than those who do not [32]. Overall, our findings within TCGA closely mirror our data within the NHS cohort and support the conclusion that early menarche is associated with more proliferative and pro-tumorigenic breast tumor characteristics. While our study focuses on early menarche, which represents exposure to estrogen early in life, other reproductive risk factors can also act to increase estrogen levels throughout the life course (e.g. parity, breast-feeding, exogenous hormone use). Therefore, further investigation to understand whether they may impact tumor biology in a similar or distinct manner would be of interest.

The increased proliferation and other tumor-promoting pathways discovered in our gene expression analyses led us to hypothesize that tumors from women with early menarche may possess a more aggressive tumor phenotype. Leveraging PAM50 subtypes, we found early menarche was associated with basal-like and HER2-enriched intrinsic molecular subtypes, which are considered more aggressive diseases with worsened prognosis [24], higher tumor proliferative index, and increased risk of recurrence score, which has been found to be highly predictive of risk of distant relapse, performing better than other methods of risk prediction [23], together bolstering our suspicions that these tumors may possess more aggressive molecular and pathological features. Concordantly, our NHS-derived gene expression signature that captures the molecular characteristics of tumors from women who experienced early menarche showed a significant, dose–response trend that was suggestively positively associated with cancer recurrence in NHS and significantly associated in METABRIC. This suggests our menarche-derived gene expression signature captured the molecular tumor features and associated prognostic impact in breast cancer patients who experienced early menarche.

In this study, we investigated the molecular features of breast tumors in relation to age at menarche, an established epidemiologic breast cancer risk factor. Early menarche was associated with more aggressive tumor molecular subtypes and characteristics, and our early menarche-derived gene expression signature was associated with a higher risk of breast cancer recurrence. Together, these results highlight how a common and increasingly prevalent early life exposure may influence the molecular and pathological phenotype of breast tumors later in life and how these changes relate to breast cancer prognosis. As the age of menarche onset continues to decline, better understanding of its influence on breast tumor biology and prognosis may lead to novel secondary prevention strategies in the future.

Availability of data and materials

Gene expression data analyzed in this study was deposited in Gene Expression Omnibus and is publicly available (GPL22920, GSE93601, GSE115577).

References

  1. Hamajima N, et al. Menarche, menopause, and breast cancer risk: Individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13:1141–51.

    Article  Google Scholar 

  2. Britt K. Menarche, menopause, and breast cancer risk. Lancet Oncol. 2012;13:1071–2.

    Article  PubMed  Google Scholar 

  3. Fuhrman BJ, et al. Association of the age at menarche with site-specific cancer risks in pooled data from nine cohorts. Cancer Res. 2021;81:2246–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Liang J, Shang Y. Estrogen and cancer. Annu Rev Physiol. 2013;75:225–40.

    Article  CAS  PubMed  Google Scholar 

  5. McDowell MA, Brody DJ, Hughes JP. Has age at menarche changed? Results from the National Health and Nutrition Examination Survey (NHANES) 1999–2004. J Adolesc Health. 2007;40:227–31.

    Article  PubMed  Google Scholar 

  6. Martinez GM. Trends and patterns in menarche in the United States: 1995 through 2013–2017. Natl Health Stat Report. 2020;2020:1–11.

    Google Scholar 

  7. Poole EM, et al. Body size in early life and adult levels of insulin-like growth factor 1 and insulin-like growth factor binding protein 3. Am J Epidemiol. 2011;174:642–51.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Peng C, et al. Prediagnostic 25-hydroxyvitamin D concentrations in relation to tumor molecular alterations and risk of breast cancer recurrence. Cancer Epidemiol Biomarkers Prev. 2020;29:1253–63.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Martinez GM. National Health Statistics Reports, Number 146, September 10, 2020. https://www.cdc.gov/nchs/products/index.htm (1995).

  10. Heng YJ, et al. Molecular mechanisms linking high body mass index to breast cancer etiology in post-menopausal breast tumor and tumor-adjacent tissues. Breast Cancer Res Treat. 2019;173:667–77.

    Article  CAS  PubMed  Google Scholar 

  11. Kensler KH, et al. PAM50 molecular intrinsic subtypes in the nurses’ health study cohorts. Cancer Epidemiol Biomarkers Prev. 2019;28:798–806.

    Article  PubMed  Google Scholar 

  12. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.

    Article  PubMed  Google Scholar 

  13. Kensler KH, et al. Androgen receptor expression and breast cancer survival: results from the nurses’ health studies. J Natl Cancer Inst. 2019;111:700–8.

    Article  PubMed  Google Scholar 

  14. Baer HJ, et al. Risk factors for mortality in the nurses’ health study: a competing risks analysis. Am J Epidemiol. 2011;173:319.

    Article  PubMed  Google Scholar 

  15. Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:3.

    Article  Google Scholar 

  16. Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3(9):e161. https://doi.org/10.1371/journal.pgen.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc Ser B (Methodol). 1995;57:289–300.

    Article  Google Scholar 

  18. Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 2012;40(17):e133. https://doi.org/10.1093/nar/gks461.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Heng YJ, et al. The association of modifiable breast cancer risk factors and somatic genomic alterations in breast tumors: the cancer genome atlas network. Cancer Epidemiol Biomark Prev. 2020;29:599–605.

    Article  Google Scholar 

  20. Wang J, et al. Alcohol consumption and breast tumor gene expression. Breast Cancer Res. 2017;19:1–15.

    Article  Google Scholar 

  21. Wallden B, et al. Development and verification of the PAM50-based Prosigna breast cancer gene signature assay. BMC Med Genom. 2015. https://doi.org/10.1186/s12920-015-0129-6.

    Article  Google Scholar 

  22. Nielsen TO, et al. A comparison of PAM50 intrinsic subtyping with immunohistochemistry and clinical prognostic factors in tamoxifen-treated estrogen receptor-positive breast cancer. Diagnosis. 2010. https://doi.org/10.1158/1078-0432.CCR-10-1282.

    Article  Google Scholar 

  23. Parker JS, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160–7.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Kensler KH, et al. PAM50 molecular intrinsic subtypes in the nurses’ health study cohorts. Cancer Epidemiol Biomark Prevent. 2019;28(4):798–806. https://doi.org/10.1158/1055-9965.EPI-18-0863.

    Article  Google Scholar 

  25. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1.

    Article  PubMed  PubMed Central  Google Scholar 

  26. Hamajima N, et al. Menarche, menopause, and breast cancer risk: Individual participant meta-analysis, including 118 964 women with breast cancer from 117 epidemiological studies. Lancet Oncol. 2012;13:1141–51.

    Article  Google Scholar 

  27. Jung AY, et al. Distinct reproductive risk profiles for intrinsic-like breast cancer subtypes: pooled analysis of population-based studies. J Natl Cancer Inst. 2022;114:1706–19.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Ritte R, et al. Height, age at menarche and risk of hormone receptor-positive and -negative breast cancer: a cohort study. Int J Cancer. 2013;132:2619–29.

    Article  CAS  PubMed  Google Scholar 

  29. Hogan KA, et al. Tumor-derived cytokines impair myogenesis and alter the skeletal muscle immune microenvironment. Cytokine. 2018;107:9–17. https://doi.org/10.1016/j.cyto.2017.11.006.

    Article  CAS  PubMed  Google Scholar 

  30. Wu Q, et al. Cancer-associated adipocytes: key players in breast cancer progression. J Hematol Oncol. 2019;12:1–15. https://doi.org/10.1186/s13045-019-0778-6.

    Article  Google Scholar 

  31. Baer HJ, et al. Body fatness during childhood and adolescence and incidence of breast cancer in premenopausal women: a prospective cohort study. Breast Cancer Res. 2005;7:1–12.

    Article  Google Scholar 

  32. Song SS, Kang S, Park S. Association of estrogen-related polygenetic risk scores with breast cancer and interactions with alcohol intake, early menarche, and nulligravida. Asian Pac J Cancer Prev. 2022;23:13–24.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank the participants and staff of the Nurses’ Health Study and the Nurses’ Health Study II for their valuable contributions as well as the following state cancer registries for their help: AL, AZ, AR, CG, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, WY. The authors assume full responsibility for analyses and interpretation of these data.

Funding

This work was supported by the National Institutes of Health/National Cancer Institute (UM1 CA186107, P01 CA87969, R01 CA49449, U01 CA176726, R01 CA67262, R01 CA50385, T32 CA009001, U19 CA148065, R01 CA166666, P30 CA016056, R35 CA253187, P50 CA116201) and National Institute on Aging (K01AG080030) and the Komen Foundation Grant (SAC110014). Dr. Ambrosone and Dr. Couch are recipients of funding from the Breast Cancer Research Foundation. Alexandra R. Harris is supported by the NCI Cancer Prevention Fellowship Program.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: ARH, CP, AHE. Resources: YJH, RMT, AHE. Data Curation: CP. Software: CP. Formal Analysis: ARH, CP. Supervision: CP, AHE. Funding Acquisition: AHE. Validation: TW. Investigation: ARH, CP. Visualization: ARH. Methodology: YJH, GMB, JW, CA, AB, FJC, FM, CGS, CMV, SHE, BR, RMT, CP, AHE. Project Administration: ARH. Writing-Original Draft: ARH. Writing-Review and Editing: All authors.

Corresponding author

Correspondence to Alexandra R. Harris.

Ethics declarations

Ethics approval and consent to participate

Completion of the NHS questionnaire was considered to imply informed consent upon study protocol approval by the Institutional Review Boards of the Brigham and Women's Hospital (Boston, MA) and Harvard T.H. Chan School of Public Health (Boston, MA) in 1976 (NHS) and 1989 (NHSII). NHSI/II were conducted in accordance with recognized ethical guidelines (Declaration of Helsinki).

Consent for publication

Not applicable.

Competing interests

Rulla M. Tamimi, Sc.D. is a deputy editor at Breast Cancer Research; RMT was uninvolved in the review of or any editorial decisions pertaining to this manuscript. All remaining authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Harris, A.R., Wang, T., Heng, Y.J. et al. Association of early menarche with breast tumor molecular features and recurrence. Breast Cancer Res 26, 102 (2024). https://doi.org/10.1186/s13058-024-01839-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13058-024-01839-0

Keywords