Subjects
Subjects were drawn from a population-based case-control study, which has been described in detail previously [18]. The parent study consisted of women aged 50 to 74 years, born in Sweden and resident there between 1 October 1993 and 31 March 1995. An attempt was made to contact all incident cases of invasive breast cancer in this population. Cases were identified through six Swedish regional cancer registries, and written consent to be approached with a mailed questionnaire was requested from the women through their physicians. The participation rate, amongst 3,979 eligible cases detected, was 84%. Non-participation was attributed to either refusal by the physician (4%) or the patient (12%). Controls were frequency-matched to the cases by age. Of 4,188 controls who were randomly selected from a continuously updated Swedish register, 3,454 (82%) gave consent to participate in the study. Exclusions were made for women who were pre-menopausal (198 cases, 152 controls), or with unknown menopausal status (217 cases, 100 controls), or with a previous diagnosis of cancer (other than non-melanoma skin cancer or cancer in situ of the cervix) (112 cases, 91 controls). The final study group consisted of 2,818 cases and 3,111 controls. The ethical review board at the Karolinska Institute and the six ethical review boards in other regions of Sweden approved the study.
For the validation analysis, subjects were drawn from the population-based case-control MARIE (Mamma Carcinoma Risk factor Investigation) study which was carried out from August 2002 to September 2005 in two study regions in Germany (the Hamburg and Rhein-Neckar-Karlsruhe regions). Details of the study design can be found in Flesch-Janys et al. [19]. Briefly, the MARIE study included 3,464 postmenopausal and histologically confirmed incident breast cancer cases aged 50 to 74 at diagnosis with primary invasive or in situ tumours (International Classification of Diseases (ICD) 10: C50 and D05) and 6,657 controls, frequency matched by year of birth and study region. Two controls per case were randomly selected from the lists of residents provided by the population registries. For the present analysis, in situ cases were excluded. The study was approved by the ethics committees of the University of Heidelberg and the University of Hamburg. All study participants gave written informed consent.
Data collection
Data were obtained by means of an extensive mailed questionnaire requesting detailed information on established and possible breast cancer risk factors, including reproductive and menstrual history, family history of breast cancer, hormone replacement therapy (HRT) and anthropometric measures, such as body mass index (BMI). Information on lifestyle such as smoking (> 1 year or > 100 cigarettes), alcohol intake (g/day) and physical activity (none, less than one hour per week, one to two hours per week or more than two hours per week) was also collected from the questionnaire. Highest education level attained was available as a categorical variable (elementary school, junior secondary school, high school or university). Data on the consumption of coffee one year prior to interview, specified in cups per week, where a cup was equivalent to 1.5 dl, were also collected. Age at menopause was defined as the age of the last menstrual period or age at bilateral oophorectomy, if one year or more prior to data collection. The women were considered pre-menopausal if menopause occurred less than one year before data collection. Women with hysterectomy, menses due to HRT or missing information were considered post-menopausal if they had reached the 90th percentile of the age of natural menopause (54 years in current smokers and 55 years in non-smokers, regardless of case/control status), or otherwise as unknown. Subjects classified as post-menopausal in this manner (280 cases and 303 controls) were assigned an age at menopause according to their case/control and current smoking status corresponding to the mean age at natural menopause in the respective groups.
Information regarding the retrieval of hormone receptor status from the medical records of all participants from surgical and oncological units throughout Sweden has been presented in detail elsewhere [20, 21]. Although ER and PR content of breast tumours were routinely measured in Sweden at the time of the study, this was often not performed on tumours ≤ 1 cm in size due to lack of tumour tissue. Quantitative receptor content was thus only available for 65.4% (1,835 women) of the tumours for both ER and PR.
For the validation study (MARIE), information on potential risk factors for breast cancer was obtained in face-to-face interviews using a standardized questionnaire. Nutritional data were collected using a food frequency questionnaire with 176 food items regarding dietary habits in the year prior to date of diagnosis for cases and date of food frequency questionnaire completion for controls. The consumption of caffeine-containing coffee was calculated in cups per day based on the information on both portion size (non-consumer, 0.5, 1, 2, 3 cups) and frequency (non-consumer, once per month or less, two to three times per month, once per week, two to three times per week, four to six times per week, once per day, twice per day, three to four times per day, five times per day or more). The analysis was limited to women who answered both questions on portion size and frequency of caffeine containing coffee consumption. The final study group comprised 5,395 controls and 2,651 cases. Information on tumour characteristics, such as ER and PR status, was obtained from medical records.
Statistical analysis
The variable for coffee consumption was categorized as follows: one cup or less per day; more than one to three cups/day; more than three to five cups/day; five or more cups/day. These categories were based on the distribution within the control group. Since very few women abstained from coffee, we combined abstainers and low consumers (one cup per day) into a single category. Women who consumed one cup or less of coffee per day served as the reference group for all regression analyses.
Unconditional logistic regression models, adjusting for the matching factor, age at enrolment in years (continuous), were applied to evaluate if established or possible breast cancer risk factors had (including coffee consumption) significantly different distributions/means (using the Wald test) between breast cancer cases and controls in this study.
The relationships between coffee consumption and other breast cancer risk factors were explored in the control population by treating coffee consumption as a covariate and using linear regression analysis for continuous risk factor variables (age at menarche (years), age at menopause (years), BMI (kg/m2) and alcohol consumption (g/day)), logistic regression analysis, for binary risk factor variables (HRT, family history of breast cancer and smoking) or proportional odds logistic regression, for categorical risk factor variables (parity/age at first birth (nulliparous; parous and age at first birth < 25 yr; parous and age at first birth ≥ 25 yr and < 30 yr; parous and age at first birth ≥ 30 yr), highest education level (elementary school, junior secondary school, high school and university), and recent physical activity (one year before enrolment; none, less than one hour per week, one to two hours per week, more than two hours per week)). The Wald test was used to determine the statistical significance of an overall linear trend for the association between coffee consumption, treated as a semi-continuous variable, and the breast cancer risk factor in the models fitted.
For models for breast cancer risk, covariates were considered to be potential confounders if they were found to be associated with both coffee consumption and breast cancer risk, and caused a shift of > 10% in estimates for any coffee category when added to the model. ORs and corresponding 95% CI were estimated for the multivariate logistic regression models fitted to examine breast cancer risk, overall, and stratified by ER and PR tumour subtypes. Three models were fitted for each outcome: adjusted for the matching factor (age at enrolment only), adjusted for age at enrolment, HRT, smoking and education, and adjusted for age at enrolment, HRT, smoking, education and daily alcohol consumption. The Wald test was used to determine the statistical significance of an overall linear trend for the association between coffee consumption, treated as a semi-continuous variable, and the breast cancer risk.
Since ER and PR status are strongly correlated (logistic regression P-value for association < 2.0 × 10-16), we assessed the extent to which coffee consumption drives each of the two tumour characteristics, by fitting multinomial regression models for five outcomes (controls, ER-negative and PR-negative, ER-negative and PR-positive, ER-positive and PR-negative, ER-positive and PR-positive). We compared a model without parameter restrictions to models with parameters restricted such that coffee consumption was only allowed to be associated with one tumour characteristic at a time. Likelihood ratio tests, with two degrees of freedom, were used to test the null hypothesis that associations between coffee consumption and PR status was due only to an association with ER or PR status.
Associations between coffee consumption and hormone receptor status were evaluated in a case-only analysis, by fitting binary logistic regression models (for ER and PR status), treating ER or PR status as dependent variables, with coffee consumption included as a covariate. ORs and corresponding 95% CI were estimated for each coffee consumption category. P-values representing heterogeneity were obtained by performing one degree of freedom trend tests, treating coffee consumption as a semi-continuous variable. As there exists prior evidence that certain tumour characteristics such as ER status are associated with age at diagnosis [22], and that coffee consumption is significantly associated with age at diagnosis [23], every model fitted in the case-only analysis was also adjusted for age at diagnosis in years (continuous).
The validation analysis based on the MARIE study population was performed using Proc LOGISTIC in SAS version 9.2 (SAS Institute, Cary, NC, USA). The variable on coffee consumption was categorized in the same way as in the Swedish study with women who consumed one cup or less of coffee per day as the reference group. Unconditional logistic regression models were used to estimate ORs and corresponding 95% confidence intervals. To test for trend, we treated the four categories of cups per day as a continuous scored variable in the model statement only.
All statistical computations for the Swedish study were performed using R version 2.8 [24]. All P-values presented are two-sided tests of statistical significance at the 5% level.