Epidemiology of breast cancer subtypes in two prospective cohort studies of breast cancer survivors

Introduction The aim of this study was to describe breast tumor subtypes by common breast cancer risk factors and to determine correlates of subtypes using baseline data from two pooled prospective breast cancer studies within a large health maintenance organization. Methods Tumor data on 2544 invasive breast cancer cases subtyped by estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 (Her2) status were obtained (1868 luminal A tumors, 294 luminal B tumors, 288 triple-negative tumors and 94 Her2-overexpressing tumors). Demographic, reproductive and lifestyle information was collected either in person or by mailed questionnaires. Case-only odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using logistic regression, adjusting for age at diagnosis, race/ethnicity, and study origin. Results Compared with luminal A cases, luminal B cases were more likely to be younger at diagnosis (P = 0.0001) and were less likely to consume alcohol (OR = 0.74, 95% CI = 0.56 to 0.98), use hormone replacement therapy (HRT) (OR = 0.66, 95% CI = 0.46 to 0.94), and oral contraceptives (OR = 0.73, 95% CI = 0.55 to 0.96). Compared with luminal A cases, triple-negative cases tended to be younger at diagnosis (P ≤ 0.0001) and African American (OR = 3.14, 95% CI = 2.12 to 4.16), were more likely to have not breastfed if they had parity greater than or equal to three (OR = 1.68, 95% CI = 1.00 to 2.81), and were more likely to be overweight (OR = 1.82, 95% CI = 1.03 to 3.24) or obese (OR = 1.97, 95% CI = 1.03 to 3.77) if premenopausal. Her2-overexpressing cases were more likely to be younger at diagnosis (P = 0.03) and Hispanic (OR = 2.19, 95% CI = 1.16 to 4.13) or Asian (OR = 2.02, 95% CI = 1.05 to 3.88), and less likely to use HRT (OR = 0.45, 95% CI = 0.26 to 0.79). Conclusions These observations suggest that investigators should consider tumor heterogeneity in associations with traditional breast cancer risk factors. Important modifiable lifestyle factors that may be related to the development of a specific tumor subtype, but not all subtypes, include obesity, breastfeeding, and alcohol consumption. Future work that will further categorize triple-negative cases into basal and non-basal tumors may help to elucidate these associations further.

Although many studies have examined associations between common breast cancer risk factors, race [30][31][32][33][34][35][36] and hormone receptor status [36][37][38][39][40][41], few studies have explored the relationship between common breast cancer risk factors and the molecular subtypes of breast cancer [22,27,42,43] [see Additional data file 1]. Therefore, we set out to describe breast tumor subtypes by race/ethnicity and common breast cancer risk factors and to determine correlates of breast cancer subtypes using baseline data from two large, prospective breast cancer survivorship studies of 2544 invasive breast cancer cases.

LACE Study
The Life After Cancer Epidemiology (LACE) Study consists of 2280 women diagnosed with invasive breast cancer between 1997 and 2000 and recruited primarily from the Kaiser Permanente Northern California (KPNC) Cancer Registry (82%) and the Utah Cancer Registry (12%). Further details on the LACE cohort have been previously reported [44]. Briefly, eligibility criteria included age between 18 and 70 years at enrollment; a diagnosis of early-stage primary breast cancer (stages I  1 cm, II, or IIIA); enrollment between 11 and 39 months postdiagnosis; having completed breast cancer treatment (except for adjuvant hormonal therapy); free of recurrence; and no history of other cancers in the five years prior to enrollment. Between January 2000 and April 2002, 2280 eligible women completed baseline questionnaires via mail. The mean time from diagnosis to enrollment was 22.8 months (range = 11.0 to 38.9 months). The study was approved by the institutional review boards (IRB) of KPNC and the University of Utah. The present analysis includes data from 1821 KPNC breast cancer patients from the LACE Study with complete breast cancer subtype information.

Pathways Study
The Pathways Study is a prospective cohort study actively recruiting women diagnosed with invasive breast cancer from the KPNC patient population since January 2006. Women are recruited as soon after diagnosis as possible (usually within two months), as described elsewhere [45]. Briefly, cases are rapidly ascertained on a daily basis by automatic scanning of electronic pathology reports with subsequent verification of cancer diagnosis and patient notification by a medical record analyst. Eligibility criteria include: current KPNC membership; at least 21 years of age at diagnosis; recent diagnosis of first primary invasive breast cancer (all stages); no prior history of any cancer; ability to speak English, Spanish, Cantonese, or Mandarin; and live within a 65-mile radius of a field interviewer. In addition, a passive consent is obtained from the patient's physician of record by an email notification stating our intention to contact the patient for study recruitment. Recruitment is ongoing, and as of 20 October, 2008, 2212 breast cancer patients have been enrolled via in-person interview. The mean time from diagnosis to enrollment is 1.9 months (range = 0 to 7.3 months). Written informed consent is obtained from all participants before they are enrolled in the study, typically at the time of the in-person baseline interview. The study was approved by the IRB of KPNC and all collaborating sites. In order to make these cases comparable with those from the LACE Study, the present analysis includes data from the first 723 women enrolled with a diagnosis of stages I  1 cm, II, or IIIA breast cancer and having complete breast cancer subtype data.

Reproductive and lifestyle factors
In the mailed baseline questionnaire of the LACE Study and during the in-person baseline interview of the Pathways Study, participants were asked detailed information on family history of cancer and reproductive history, including age at first fullterm pregnancy, number of biological children, breastfeeding, and menopausal status. Additional information was collected on smoking, alcohol use, hormone use (oral contraceptives (OC), hormone replacement therapy (HRT)), and demographics (age at breast cancer diagnosis, race/ethnicity, household income, education). Self-reported height and weight one year before diagnosis (LACE) and around diagnosis (Pathways) was obtained to calculate body mass index (BMI, kg/m 2 ). Any missing values were supplemented by concurrent information from KPNC electronic medical records.  [3,[47][48][49][50][51]. Beginning in July 1999, if the IHC staining for Her2 expression is equivocal (less than 30% strong staining, but more than 10% weak staining), then the specimen is sent for fluorescence in situ hybridization (FISH) at the KPNC regional cytogenetics laboratory. If the FISH score (Her2: 17 cen) is less than 2.0 [52], then the woman is classified as having Her2-negative tumor expression; if the FISH score is greater than 2.0, then the woman is classified as having Her2positive tumor expression. Results from FISH analyses are not reported to the KPNC Cancer Registry, and are obtained directly from the KPNC regional cytogenetics laboratory.

Outcome classification
Although the presence of basal markers can significantly improve the prognostic value of the triple-negative phenotype [13], for this analysis, we did not have IHC data for CK5/6 and EGFR expression. Thus, we were unable to further classify triple-negative cases into basal-like and non-basal-like breast tumors. Considering this limitation, the tumor subtype groups in this analysis consisted of: ER positive and/or PR positive, and Her2 negative (luminal A); ER positive and/or PR positive, and Her2 positive (luminal B); ER negative, PR negative, and Her2 negative (triple negative); ER negative, PR negative, and Her2 positive (Her2-overexpressing).

Statistical analysis
Comparisons of demographic, reproductive, and lifestyle characteristics by cohort study and race/ethnicity were conducted using Pearson chi-square tests. Using the combined sample size of 2544 breast cancer survivors, case-only odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using logistic regression. The luminal A group was selected as the referent because the majority of invasive breast cancer cases are of this subtype. All models were adjusted for age at diagnosis, race/ethnicity, and Pathways/LACE study origin except when these covariates were the predictors of interest. We also examined whether the associations between parity and tumor subtype varied by breastfeeding and whether BMI and tumor subtype varied by menopausal status by first generating strata-specific estimates and then including an interaction term in the model to test for statistical significance. CIs not overlapping with 1.00 or P < 0.05 were considered statistically significant.

Results
Demographic, reproductive, and lifestyle factors varied significantly by race/ethnicity in the combined studies (Table 1). Demographically, African Americans (mean age = 56.2 years) and Asians (mean age = 54.8 years) were more likely to be diagnosed at a younger age although whites were more likely to be diagnosed at an older age (mean age = 59.8 years). However, Asians (59.1%) were less likely to be post-menopausal than whites (75.6%), African Americans (71.4%), and other races/ethnicities (72.9%). A positive family history of breast cancer was more common among whites (22.6%) and other races/ethnicities (24.1%), than among the other groups.  The majority of the case-only ORs were in the same direction as observed in the combined analysis, except for the association between BMI and the triple-negative subtype. In the LACE Study, triple-negative cases were more likely to have higher Table 2 Distribution of breast cancer tumor subtypes by race/ethnicity in the combined LACE and Pathways Studies (n = 2544)     Table 4 Case-only odds ratios and 95% confidence intervals from logistic regression models a of associations between breast cancer tumor subtypes and demographic, reproductive, and lifestyle risk factors, combined LACE and Pathways Studies (n = 2544)

Discussion
In a pooled analysis of 2544 breast cancer cases using data from two prospective cohort studies housed within a large health maintenance organization, associations between breast cancer subtypes and various demographic, reproductive, and lifestyle factors were examined. In case-case analyses with the luminal A cases as the reference group, luminal B cases were more likely to be younger at diagnosis and were less likely to consume alcohol, use HRT, and OCs. Triple-negative cases tended to be younger at diagnosis and African American, and were more likely to be overweight and/or obese at diagnosis if premenopausal. Women with triple-negative tumors were also less likely to breastfeed for longer periods, and were more likely to not breastfeed if they had at least three children. Her2overexpressing cases were more likely to be younger at diagnosis and Hispanic or Asian, and less likely to use HRT. We

Case-only odds ratios and 95% confidence intervals from logistic regression models a of associations between breast cancer tumor subtypes and demographic, reproductive, and lifestyle risk factors, combined LACE and Pathways Studies (n = 2544)
also found that these cases were more likely to be women with at least three children and no history of breastfeeding. These case-case observations suggest that heterogeneity in associations with traditional breast cancer risk factors exists by tumor subtype.
Several studies have assessed risk factor profiles of tumor subtypes, including the Carolina Breast Cancer Study (CBCS; n = 1424 in situ and invasive cases) [22], the Polish Breast Cancer Study (PBCS; 804 invasive cases) [27], and a pooled study of two Washington State (WS) case-control studies (n = 1023 invasive cases) [42,43] [see Additional data file 1]. The CBCS and PBCS were able to classify their triplenegative cases into basal-like and unclassified using CK5/6 and EGFR IHC expression data while the WS study did not do so. The CBCS performed case-case and case-control analyses while the PBCS and WS study conducted case-control analyses only. Although we were unable to further classify triple-negative cases into basal-like and unclassified, similar to the results of the CBCS (case-case analysis) and PBCS (case-control analysis) for basal-like cases, our triple-negative cases were more likely to be younger at diagnosis and African American. We also observed that premenopausal triple-negative cases tended to have higher BMI, which was in agreement with the basal-like cases in the CBCS but not the PBCS, the latter of which found no association. Interestingly, the WS study (case-control analysis) reported a suggestive increased risk of triple-negative tumors with increasing BMI among women currently using hormone therapy [42], yet we did not see any such association in our study. The WS study (casecontrol analysis) also reported that breastfeeding for at least six months was related to a reduced risk of triple-negative tumors [43]. Similarly, both the CBCS (case-case analysis) and our study found suggestive associations of shorter duration of breastfeeding (less than four months) with being more likely to have a triple-negative tumor. Furthermore, both studies observed a strong positive association for triple-negative cases (basal-like cases for CBCS) among women who had higher parity and never breastfed; the CBCS reported a caseonly OR for parity of at least three children and no breastfeeding as 1.9 (95% CI = 1.1 to 3.4) compared with luminal A cases. The PBCS (case-control analysis) did not present data on the impact of breastfeeding on tumor subtypes.
As for luminal B and Her2-overexpressing cases, our results are in agreement with those of the CBCS that luminal B and Her2-overexpressing cases tended to be younger than luminal A cases. In contrast to the CBCS results, we observed that these cases were less likely to use HRT although luminal B cases were less likely to consume alcohol. No associations with these factors were observed in the PBCS, and the WS study did not examine luminal tumors separately by luminal A and luminal B subtype. We found that Her2-overexpressing cases were more likely to be Hispanic or Asian, but not African American, an observation which was not seen in any of the other studies. In fact, the CBCS, comprised of only whites and African Americans, reported that Her2-overexpressing cases were slightly more likely to be African American. Finally, we observed that Her2-overexpressing cases were more likely to be women who had at least three children and had not breastfed, an association not seen in the CBCS.
Although our results tend to be in agreement with those of other studies, limitations of our study should be discussed.
Only case-case comparisons were conducted, and it must be emphasized that the associations reported here are all in reference to risk of having a luminal A tumor subtype and should not be extended to risk of having invasive breast cancer. Casecase analyses among tumor subtypes are a useful exploratory tool to examine etiologic heterogeneity between the subtypes [53]. As previously mentioned, we have no data on CK5/6 and EGFR tumor markers to further classify triple-negative tumors into basal-like and unclassified. However, with additional funding, we plan to conduct these additional IHC assays in triple negative cases. Also, as there were a limited number of Her2overexpressing tumors (n = 94; 3.7%), results concerning this subtype should be interpreted with caution. Finally, although our large study population of 2544 women diagnosed with invasive breast cancer was more ethnically diverse (76.6% white, 6.1% African American, 7.8% Hispanic, 7.4% Asian, 2.1% other) than other studies that have examined breast cancer risk factors among tumor subtypes, unlike the CBCS, we were unable to further examine risk factors by white and African American race/ethnicity due to limited numbers. Our findings, especially those regarding Hispanic and Asian differences, should be replicated in other population-based studies.

Conclusions
In summary, using a case-case analysis to assess the associations between traditional breast cancer risk factors and breast cancer subtypes (luminal A, luminal B, triple negative, and Her2-overexpressing), we observed significant heterogeneity of associations by tumor subtype. These varying associations by subtype lend further support to the growing evidence base that breast cancer is a heterogeneous disease defined by ER, PR, and Her2 expression with distinct etiologic pathways and prognoses. Future research should focus on refinement of tumor subtypes into more homogenous subgroups in order to best elucidate how risk factors may vary by subtype. Important modifiable factors that may be related to the development of specific tumor subtypes include obesity and possibly breastfeeding (triple negative) and alcohol consumption (luminal B), yet no clear modifiable risk factor profile was apparent for Her2-overexpressing subtypes due to a limited sample size. Given this information, public health programs aimed towards achieving a healthy weight and promoting breastfeeding might reduce the number of poor prognostic triple negative tumors among all breast cancer cases, especially the high-risk African American group.