Association of germline genetic variants with breast cancer-specific survival in patient subgroups defined by clinic-pathological variables related to tumor biology and type of systemic treatment

Background Given the high heterogeneity among breast tumors, associations between common germline genetic variants and survival that may exist within specific subgroups could go undetected in an unstratified set of breast cancer patients. Methods We performed genome-wide association analyses within 15 subgroups of breast cancer patients based on prognostic factors, including hormone receptors, tumor grade, age, and type of systemic treatment. Analyses were based on 91,686 female patients of European ancestry from the Breast Cancer Association Consortium, including 7531 breast cancer-specific deaths over a median follow-up of 8.1 years. Cox regression was used to assess associations of common germline variants with 15-year and 5-year breast cancer-specific survival. We assessed the probability of these associations being true positives via the Bayesian false discovery probability (BFDP < 0.15). Results Evidence of associations with breast cancer-specific survival was observed in three patient subgroups, with variant rs5934618 in patients with grade 3 tumors (15-year-hazard ratio (HR) [95% confidence interval (CI)] 1.32 [1.20, 1.45], P = 1.4E−08, BFDP = 0.01, per G allele); variant rs4679741 in patients with ER-positive tumors treated with endocrine therapy (15-year-HR [95% CI] 1.18 [1.11, 1.26], P = 1.6E−07, BFDP = 0.09, per G allele); variants rs1106333 (15-year-HR [95% CI] 1.68 [1.39,2.03], P = 5.6E−08, BFDP = 0.12, per A allele) and rs78754389 (5-year-HR [95% CI] 1.79 [1.46,2.20], P = 1.7E−08, BFDP = 0.07, per A allele), in patients with ER-negative tumors treated with chemotherapy. Conclusions We found evidence of four loci associated with breast cancer-specific survival within three patient subgroups. There was limited evidence for the existence of associations in other patient subgroups. However, the power for many subgroups is limited due to the low number of events. Even so, our results suggest that the impact of common germline genetic variants on breast cancer-specific survival might be limited. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01450-7.


Introduction
Inherited common genetic variation is likely to influence survival in breast cancer patients [1]. Results from pre-clinical experiments have shown different metastatic behaviors in mice with different genetic backgrounds [2][3][4][5][6][7]. In addition, familial studies of breast cancer patients have shown that women with a first-degree relative with a poor prognosis breast cancer have a worse prognosis compared to women with a first-degree relative with a good prognosis cancer [8]. Moreover, genome-wide and candidate gene association studies have discovered common genetic variants associated with specific subtypes of breast cancer based on the expression of the estrogen receptor (ER) [9][10][11], progesterone receptor (PR), and the amplification of the human epidermal growth factor receptor 2 (HER2) [12,13], which are known breast cancer prognostic factors [14,15]. Finally, a number of studies have suggested that specific common germline genetic variants affect breast cancer prognosis both overall and within subgroups of patients [16][17][18][19][20][21][22][23][24].
Despite the supporting evidence, it remains challenging to identify common germline variants associated with breast cancer-specific survival. This may partially be explained by the good prognosis of breast cancer patients, which leads to underpowered analyses. Even large studies based on worldwide consortia cannot reach the number of breast cancer deaths necessary to detect small to moderate associations at a genome-wide significant level [19,20,25]. However, breast cancer is a heterogeneous disease, and it is possible that stronger associations between common germline variants and breast cancer-specific survival are present in certain patient subgroups, but cannot be detected in breast cancer overall. Previous studies provide modest evidence supporting this hypothesis [16,19,20].
The aim of our study was to evaluate the evidence for associations of inherited common genetic variants with breast cancer-specific survival within more homogeneous subgroups of breast cancer patients, defined by prognostic factors representative of tumor biology and/ or by the type of systemic treatment. To this end, we performed genome-wide association analyses within clinically relevant, defined subgroups of patients based on hormone receptors, tumor grade, age at diagnosis, and type of systemic treatment [26,27]. We also explored the subgroup-specific associations identified by previous studies [16,19,23,28,29], to confirm or refute those results.

Study sample
We selected female breast cancer patients of European ancestry from studies participating in the Breast Cancer Association Consortium (BCAC). We included patients with available information about vital status and number of years from diagnosis to last follow-up who were diagnosed with a primary invasive breast cancer of any stage and were at least 18 years old at diagnosis. The final study sample consisted of 91,686 breast cancer patients from 70 BCAC studies. A description of the included studies is given in Additional file 1: Supplementary Table  S1.
Information about histopathology, survival, and treatment was collected by individual studies and pooled and harmonized at the Netherlands Cancer Institute before incorporation into the BCAC database at the University of Cambridge (version 12, July 2019). All studies were approved by the relevant ethics committees and informed consent was obtained from all patients.

Patient subgroups
The subgroups of interest were defined based on age at diagnosis, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER2) status, tumor grade, and the use and type of systemic treatment, as available in the BCAC database. For age at diagnosis and tumor grade, we focused on subgroups characterized by worse prognosis. We thus defined 15 subgroups: (a) patients younger than age 40 years at diagnosis; (b) patients with grade 3 tumors; (c) patients with ER-positive (ER+) tumors, who received endocrine therapy (any kind); (d) patients diagnosed with ER-negative (ER−) tumors, who received chemotherapy (any kind); (e) patients with tumors that were hormone receptor (HR) positive (ER+ or PR+) and HER2-negative (HER2−); (f) patients with HR-positive (HR+), HER2− tumors, who received chemotherapy (any kind); (g) patients with HR+, HER2− tumors, who did not receive chemotherapy; (h) patients with HR+, HER2positive (HER2+) tumors; (i) patients with HR-negative (HR−), HER2+ tumors, (j) patients with HR−, HER2− tumors; (k) patients who received Tamoxifen; (l) patients who received an aromatase inhibitor; (m) patients who received a Cyclophosphamide Methotrexate Fluorouracil (CMF)-like chemotherapy regimen; (n) patients who received taxanes; (o) patients who received anthracyclines.
The rationale and references to the literature supporting the choice of each subgroup for inclusion in a genome-wide association study on survival are given in Additional file 1: Supplementary Table S2. We did not include the subgroup of HER2+ tumors treated with Trastuzumab because of a relatively small number of patients and low event rate, leading to analyses that are more underpowered than those presented.
Patients with metastatic breast tumors at diagnosis (1.1 % of all included patients) were excluded from the subgroup analyses whose definition was based on the use and type of systemic therapy as generally they are treated with palliative intent [15,30,31].
In addition to the subgroup analyses, we also performed a genome-wide analysis of 15-year breast cancerspecific survival in all breast cancer patients. We performed this analysis to evaluate whether associations between common germline variants and breast cancerspecific survival in subgroups could be detected in the full dataset of patients. Genome-wide analyses for survival (unstratified by subtype) were previously performed [20] based on 12 GWAS datasets, but these included fewer patients from iCOGS and OncoArray (n = 84,757), with shorter follow-up, than were available in the current dataset. We focused our analyses on the iCOGS and OncoArray datasets, because the remaining 10 GWAS datasets used in the previous study did not include information about tumor characteristics, beyond ER status, or treatment, which were crucial for the subgroup analyses.
Due to the presence of missing values in the variables used to define the subgroups, not all patients could be classified by each subgroup. The number of patients included in each subgroup, together with the number of breast cancer-specific deaths, patient/tumor characteristics, treatment, and follow-up information, are shown in Additional file 1: Supplementary Tables S3-S4, and Additional file 2: Supplementary Table S5.

Imputation of missing values in clinical and pathological variables
For secondary adjusted analyses, we imputed missing values in the clinical and pathological variables using the Multiple imputation by Chained Equations (MICE) R package (v. 3.2.0), as described in Additional file 2: Supplementary Methods. A list of imputed variables and  corresponding percentages of missing values and imputation methods is provided in Additional file 2: Supplementary Table S6.
Genotyping and imputation of genetic variants, ancestry analysis, and quality controls Methods related to genotyping and genotype imputation have been described previously [17,18,20]. In brief, patients were genotyped with two different arrays: iCOGS and OncoArray [17,32]. Only samples that were inferred to have European ancestry, based on genotype data, were included in the analyses. Non-genotyped variants were initially imputed based on the 1000 Genomes Project Phase 3 (October 2014) release as reference panel. More recently, non-genotyped single-nucleotide variants (SNVs) were re-imputed using a reference panel from the Haplotype Reference Consortium (HRC) [33] in order to improve imputation quality, especially for rarer variants. Analyses were performed on genotyped variants or imputed variants with a minor allele frequency (MAF) > 0.01. Imputed variants were included in the analyses if they had imputation r 2 > 0.7. Approximately 10 million variants were analyzed.

Statistical analyses
The outcome in the analyses was breast cancer-specific survival (time to death due to breast cancer). Hazard ratios (HR) and 95% confidence intervals (CI) were estimated using delayed entry Cox regression models, where the time at risk was considered as starting from the time of study entry if the study entry was after diagnosis (22.9% within 1 year after diagnosis, and 27.3% more than 1 year after diagnosis) and from diagnosis if the time of study entry was missing (24.5%), at diagnosis (16.9%) or before diagnosis (8.4%). The time-to-event was right censored at the time of last follow-up, or at 15 years after diagnosis, whichever came first. Patients who died of unknown cause or causes other than breast cancer were censored at the time of death if death occurred before 15 years from diagnosis or at 15 years otherwise.
With reference to the results of Early Breast Cancer Trialists' Collaborative Group (EBCTCG) [34], we additionally performed analyses within the subgroups whose definition was based on the use and type of systemic therapy, where we restricted the maximum follow-up time to 5 years after diagnosis (Additional file 2: Supplementary Table S5). The goal of those analyses was to investigate the potential shortterm effects of germline variants on patients who received specific types of systemic treatment, since the effect might not be constant over time and treatment plans tend to focus on the first 5 years after diagnosis [15,31].
Cox regression analysis was performed within each subgroup of interest, separately, and was stratified by country. All the analyses were performed separately by genotyping platform (iCOGS vs OncoArray), and the results were combined via a fixed-effects meta-analysis. The standard errors of the HR estimates were re-computed based on the likelihood ratio test statistic, as done previously [20] ( Figs. 1 and 2). For variants that satisfied the inclusion criteria (MAF>0.01 and r 2 > 0.7) on only one genotyping platform, we included the result for that specific platform (Tables 1 and 2). However, for variants with an association P < 5E−08, we also computed HR and 95% confidence interval in the other genotyping platform to verify that the direction of the association was the same (Additional file 2: Supplementary Table S7).
Inflation of the likelihood ratio test statistics was estimated, within each subgroup, by dividing the median of the observed test statistics values by the median of a χ 2 1 distribution (Additional file 2: Supplementary Figures S1 and S2). To assess the noteworthiness of the observed associations, we made use of the Bayesian false discovery probability (BFDP) measure [35]. To compute BFDPs, we set the prior probability of true association to 10 −4 [36,37], as done previously [20], and chose the prior distribution of the log hazard ratio of interest (effect size of a variant) to be a Normal distribution with mean 0 and standard error equal to 0.2 [36]. We describe associations with BFDP < 0.15 as "noteworthy" [20]. For each noteworthy result at a prior of 10 −4 , we also provided BFDPs under two, more restrictive, prior probabilities of true association (10 −5 and 10 −6 ; Additional file 3: Supplementary Tables S8-S9) [36,37]. In addition, we estimated the power to detect genetic variant associated with 15-year and 5-year breast cancer-specific survival by subgroup (Additional file 3: Supplementary Table S10 and S11) as described in Additional file 2: Supplementary Methods.
For each genome-wide significant (P < 5E−08) [38] and/or noteworthy (BFDP < 0.15) association observed in the primary unadjusted subgroup analyses, we performed secondary analyses adjusted for age at diagnosis, tumor characteristics, and type of systemic treatment not used in the definition of the specific subgroup in which the association was detected (Tables 1 and 2). Secondary adjusted analyses were performed to account for residual heterogeneity; we used imputed covariates in order to keep the same sample size.
For each genome-wide significant or noteworthy association in the primary unadjusted analyses, we looked at the functional annotation of the surrounding genomic area, using the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS) tool [39] (Additional file 2: Supplementary Figures S5 and S6). We also tested whether the expression of the nearest genes correlated with distant metastasis-free survival in breast cancer patients using KMplotter [40,41].
PancanQTL [42] was used to identify cis-expression quantitative trait locus (eQTLs), trans-eQTLs, and survival eQTLs in breast cancer to see whether the genome-wide significant or noteworthy genetic variants from the primary analyses could be linked with the expression levels of genes affecting survival. In addition, for all the genome-wide significant and/or noteworthy associations detected in the primary analyses, we . The x-axis shows the chromosome number, where chromosome 23 represents the X chromosome. The red line represents the genome-wide significant threshold 5E−08, the blue line corresponds to the threshold 1E−05. Breast cancer patients included in the analysis in panel: a younger than age 40 years at diagnosis; b diagnosed with a grade 3 tumor; c diagnosed with an ER+ tumor and treated with (any) endocrine therapy; d diagnosed with a ER− tumor and treated with (any) chemotherapy; e diagnosed with an ER+ or PR+, and HER2− tumor; f diagnosed with an ER+ or PR+, and HER2− tumor treated with (any) chemotherapy; g diagnosed with an ER+ or PR+, and HER2− tumor not treated with chemotherapy; h diagnosed with an ER+ or PR+, and HER2+ tumor; i diagnosed with an ER− and PR− and HER2+ tumor; j diagnosed with an ER− and PR− and HER2− tumor; k treated with tamoxifen; l treated with aromatase inhibitor; m treated with CMF-like chemotherapy; n treated with taxanes; o treated with anthracyclines; p all searched the GWAS catalog [43] to see whether there was already evidence of those being associated with breast cancer or other traits.

Results
After a median follow-up of 8.1 years, there were a total of 7531 breast cancer deaths among 91686 breast cancer patients (Additional file 1: Supplementary Tables S3-S4 and Additional file 2: Supplementary Table S5).
In the 15-year breast cancer-specific survival analyses, power for detecting genome-wide significant associations was < 0.45 for effect sizes (HRs) < 1.20 in all subgroups investigated and for all minor allele frequencies. The power was highest in the subgroups of grade 3 tumors, ER+ tumors treated with endocrine . The x-axis shows the chromosome number, where chromosome 23 represents the X chromosome. The red line represents the genome-wide significant threshold 5E−08, the blue line corresponds to the threshold 1E−05. Breast cancer patients included in the analysis in panel: a diagnosed with a ER+ tumor and treated with (any) endocrine therapy; b diagnosed with a ER− tumor and treated with (any) chemotherapy; c diagnosed with a ER+ or PR+, and HER2− tumor treated with (any) chemotherapy; d diagnosed with a ER+ or PR+, and HER2− tumor not treated with chemotherapy; e treated with tamoxifen; f treated with aromatase inhibitor; g treated with CMF-like chemotherapy; h treated with taxanes; i treated with anthracyclines Abbreviations: Chr chromosome, ER estrogen receptor, PR progesterone receptor, HER2 human epidermal growth factor receptor 2, ET Endocrine therapy, CT chemotherapy, AAF alternative allele frequency, HR hazard ratio, CI confidence interval, BFDP Bayesian False Discovery Probability Note: BFDP is computed assuming the prior probability of true association equal to 10 -4 for all variants, which implies a number of expected true associations in the order of 10 2 . Results with BFDP<0.15 in the adjusted analyses are bolded. a Reference/Alternative alleles, b Analyses only include OncoArray data since the variants had imputation r 2 <0.7 on iCOGS. More detailed analyses are reported in Table 2 and Supplementary Table 7 Table S10). In the 5-year breast cancer-specific survival analyses, power was highest in the subgroups of patients with an ER− tumor who received chemotherapy, patients who received tamoxifen, and patients who received anthracyclines (Additional file 3: Supplementary Table S11). Genome-wide significant and/or noteworthy associations with 15-year or 5-year breast cancer-specific survival were observed in the unadjusted analyses based on all patients (Additional file 2: Supplementary Table S7) and analyses of eight out of the 15 subgroups investigated (Tables 1 and 2; Figs. 1 and 2). The genomic inflation factor of the unadjusted genome-wide analyses varied from 0.981 to 1.028 (Additional file 2: Supplementary Figures S1-S4); it is therefore unlikely that the association results were affected by cryptic population substructure.
Two genome-wide significant associations were observed in the unstratified analysis based on patients genotyped using OncoArray (Additional file 2: Supplementary Table S7), namely variants rs57714252 (P = 4.7E−08) and rs4129285 (P = 4.9E−08), both situated in an intergenic region of chromosome 4 (Additional file 2: Supplementary Figure S5). These results were only based on the OncoArray data, since on iCOGS the variants did not satisfy the inclusion criteria for genotypes (iCOGS imputation r 2 = 0.62). The corresponding estimates in the iCOGS data were in the opposite direction compared to the OncoArray estimates, and the results from the meta-analysis were not genome-wide significant and showed a large BFDP (Additional file 2: Supplementary Table S7).
Genome-wide significant associations were observed in the analysis restricted to patients diagnosed with a grade 3 tumor, with five correlated variants (Tables 1  and 2 Table 2). The meta-analysis result remained substantially unchanged after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy (HR [95% CI] 1.31 [1.18, 1.44], P = 7.9E−08, BFDP = 0.05; Table 2). The variant was not associated with the outcome in lower-grade tumors or in all patients combined (heterogeneity by grade P = 1.5E−03; Additional file 2: Supplementary Figure S7). All the five variants overlap chromatin features H3K4me3 and H3K27ac (associated with active transcription start sites) in multiple mammary cell types from normal breast tissue (Additional file 2: Supplementary Figure S8) Figure S9). However, there was no evidence of association with TBL1X expression in normal breast tissue with any of the variants identified in our genomewide analyses (Additional file 2: Supplementary Figure  S10).
In the same subgroup of grade 3 tumors, we observed a noteworthy, non-genome-wide significant association with variant rs66871326, located on chromosome 2 in an intron of C2orf80. For variant rs66871326, the alternative A allele was associated with decreased risk of breast cancer death (HR [95% CI] 0.85 [0.80,0.90], P = 2.1E−07, BFDP = 0.11). The corresponding BFDP increased to 0.49 after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy (Table 1).
We identified six variants on chromosome 15 with genome-wide significant associations within the subgroup of patients diagnosed with an ER+ or PR+, HER2 − tumor (Table 1). We identified two independent variants, namely rs8030394 and rs112813972, both situated in an intronic region of THSD4 (Additional file 2: Supplementary Figure S5). For the most significant variant, rs8030394, the T allele was associated with increased risk of breast cancer death (HR [95% CI]: 2.47 [1.81, 3.37], P = 1.1E−08, BFDP = 0.42). For the second variant, rs112813972, the C allele was associated with decreased risk of death (HR [95% CI] 0.40 [0.28,0.55], P = 4.0E−08, BFDP = 0.70). These associations were not genome-wide significant after adjusting for age at diagnosis, additional tumor characteristics, and treatment with (neo)adjuvant chemotherapy, and the corresponding BFDPs increased to 0.72 and 0.90, respectively (Table 1).
We observed genome-wide significant associations from the 15-year breast cancer-specific analyses in the subgroups of patients with an ER+ or PR+ and HER2− tumor who did and did not receive chemotherapy. Three correlated variants on chromosome 2 were identified within the subgroup of patients who received chemotherapy. The most significant variant, rs62192052, is located in an intronic region of DNER (Additional file 2: Supplementary Figure S5) and was associated with decreased risk of death (HR [95% CI] 0.15 [0.08, 0.28], P = 2.6E−09, per T allele; Table 1). Although the result remained genome-wide significant after adjusting for age at diagnosis and additional tumor characteristics, the BFDP from both the unadjusted and adjusted analysis was ≥ 0.99 (Table 1; Supplementary Table S8), indicating that this association is almost certainly a false positive. Variant rs56248395, located on chromosome 11 in an intron of NAV2 (Additional file 2: Supplementary Figure  S5), was associated with breast cancer death in the subgroup of patients with an ER+ or PR+ and HER2− tumor who did not receive chemotherapy (HR [95% CI] 2.33 [1.72,3.15], P = 4.8E−08, per T allele; Table 1). This result had BFDP ≥ 0.59 and was only based on the OncoArray data, since on iCOGS the variants did not satisfy the inclusion criteria for genotypes (iCOGS imputation r 2 = 0.66; Additional file 2: Supplementary  Figure S11). The last association of interest was observed in the subgroup of patients who received anthracyclines with intergenic variant rs34072391 on chromosome 7, but was not noteworthy after adjustment for additional prognostic factors (Table 1).
In the 5-year survival analyses focused on the treatment subgroups, we observed two genome-wide significant associations within the subgroup of patients diagnosed with an ER− tumor who received chemotherapy. The most significant variant was rs78754389, located on chromosome 4 in an intronic region of gene ARAP2 (Additional file 2: Supplementary Figure S5 Figure S12). The second genome-wide significant variant was rs117685664, located on chromosome 8 (HR [95% CI] 0.26 [0.16,0.42], P = 4.6E−08, per T allele). This association was not genome-wide significant after accounting for the age at diagnosis and additional tumor characteristics (Table 1), and the corresponding BFDPs from unadjusted and adjusted analysis were 1.00 and 0.99, respectively, indicating a false positive finding.
We also observed one additional noteworthy but not genome-wide significant association in the 5-year breast cancer-specific survival analyses in the subgroup of patients who received Tamoxifen  We did not identify any genome-wide significant or noteworthy association in any of the remaining seven subgroups investigated (Figs. 1 and 2). In addition, none of the above reported associations have a BFDP < 0.15 when considering 10 −6 as prior probability of true association (Additional file 3: Supplementary Tables S8). Moreover, none of the subgroup-specific genome-wide significant associations detected by previous studies using a smaller version (both in terms of number of cases and of length of follow-up) of the iCOGS and/or OncoArray BCAC datasets were replicated at P < 0.001 (Additional file 3: Supplementary Table S12).

Discussion
We investigated the association of over 10 million common germline genetic variants with breast cancerspecific survival within 15 patient subgroups based on prognostic factors representative of tumor biology or related to the type of systemic treatment. Our hypothesis was that focusing on more homogeneous subgroups of breast cancer patients might reveal otherwise undetected associations. Besides type of systemic treatment, the definition of the subgroups was based on current clinically used biological characteristics for tumor subtyping and treatment decisions: patient's age, tumor histological grade, ER, PR, and HER2 status ( [31]; Supplementary table S2). We did not have gene expression or copy number aberration data available to classify tumors on the basis of specific biological processes [44,45]; however, relevant survival differences have been reported among the four subtypes based on ER, PR, and HER2 status [27,46,47].
A concern about performing GWAS on several subgroups of patients is the increased proportion of false discoveries, also known as type I errors. For this reason and to overcome additional limitations of the association p values [35][36][37], we made use of the BFDP approach and only considered as robust candidates those associations with BFDP < 0.15 at a prior probability of 10 −4 .
We found evidence of four loci potentially associated with breast cancer survival: one in the subgroup of patients diagnosed with a grade 3 tumor, one in the subgroup of patients with an ER+ tumor and treated with endocrine therapy, and two in the subgroup of patients with an ER− tumor and treated with chemotherapy.
The most significant variant identified in the subgroup of grade 3 tumors, rs5934618, is situated in intron 1 of TBL1X, a gene which encodes the Transducin (beta)-like 1X-linked protein. Both TBL1X and the closely related gene TBLR1 have been implicated in the activation of the Wnt/beta-catenin signaling pathway, which has been reported to be overactivated in the progression and proliferation of several tumors, including breast tumors, where it has been linked with reduced overall survival [48][49][50]. Rs5934618 and the other four correlated variants identified in our genome-wide analysis overlap with chromatin features H3K4me3 and H3K27ac in normal breast; these histone marks are generally characteristics of gene promoters and/or enhancers and might indicate that one or more of these variants act through modulating expression of TBL1X. There was no direct evidence that any of these variants are expression singlenucleotide polymorphisms (eSNPs) for TBL1X, but this might reflect the tissues examined or that the variants only regulate the gene in a specific context.
The remaining three variants potentially associated with breast cancer survival were as follows: rs4679741, identified in the subgroup of patients diagnosed with an ER+ tumor and treated with endocrine therapy; rs1106333 and rs78754389, identified in the subgroup of patients diagnosed with an ER− tumor and treated with chemotherapy. For variant rs4679741, it is unclear which the potential target genes might be, while there was no evidence linking the other two variants to the expression of the closest genes in breast cancer nor evidence of association between those genes and survival within the specific subgroups of breast cancer patients. Nevertheless, there are several mechanisms through which the four identified variants could affect survival. For example, they could act through regulation of an unannotated long noncoding RNA [51,52] or microRNA [53,54]. Further functional studies including epigenetic mechanisms are needed in order to gain more insights about the detected associations and to ascertain the potential underlying biological mechanisms.
We did not find strong evidence of germline variants associated with breast cancer-specific survival in any of the other subgroups of patients investigated. In addition, we did not replicate any of the subgroup-specific associations identified by previous studies. One of these associations, with variant rs4458204, was previously detected in the subgroup of patients with an ER− tumor who received chemotherapy [16]. The estimated HR (95% CI) was 1.81 (1.49-2.19) with association P = 1.9E−09. In our analysis of the same subgroup, we obtained a much lower HR estimate and the association was no longer statistically significant (HR (95% CI) 1.14 (0.99, 1.32), P = 6.0E−02), suggesting that the previous result was a false positive. Even though there is some overlap in terms of patients between the previous study and our current study, the latter is based on a substantially larger number of breast cancer patients and it includes more complete follow-up data.
A major strength of our study is the sample size, which was the largest to date and provided reasonable power to detect associations with breast cancer-specific survival within specific subgroups of patients. On the other hand, our study is subject to several limitations that are intrinsic to large consortium studies: these include variation in study design, time periods of diagnosis, and duration of follow-up, all of which can contribute to within subgroup heterogeneity. Some broad treatment-related subgroups, namely ER+ treated with any endocrine therapy and ER− treated with any chemotherapy, may include different treatments due to the wide period of diagnosis included in our study. On the other hand, the majority of patients were diagnosed between 2000 and 2009 (69.9% and 64.5% for ER+ treated with any endocrine therapy and ER− treated with any chemotherapy, respectively). If any impact on the results, the variation in treatment over time might have hampered the detection of associations between variants and survival in these subgroups. Several studies did not report the cause of death for all patients. Out of 14,606 deaths observed within the first 15 years after diagnosis, 7531 (51.6%) were due to breast cancer. Of the remaining 7075 deaths, 4905 (33.6%) were due to causes other than breast cancer, and for 2170 deaths (14.8%) it was unknown whether they were due to breast cancer or to other causes. This will have led to a loss of power, given that most of the deaths of unknown cause are likely to have been due to breast cancer. A related weakness of the study is its dependence on accuracy of cause of death certification and on coding practices of underlying cause of death in different countries. However, despite potential inaccuracies in cause of death, we considered it more valid to focus on deaths reported as due to breast cancer than by considering all deaths together, which would include those due to other causes. An additional limitation of the study is that in most subgroups we had very limited power to detect highly significant associations, particularly for small to moderate effect sizes (HRs 1.05-1.30), even for variants of relatively high minor allele frequency (MAF = 0.20). Therefore, we may have missed variants with low to moderate associations with survival.

Conclusions
In conclusion, we found evidence of four loci associated with breast cancer-specific survival within specific patient subgroups. The variants identified appear to be independent of known additional prognostic factors, as shown in the results of the adjusted analyses based on imputed clinic-pathological variables, and could, after proper validation, improve prognostic estimates and potentially help in better stratifying patients in treatment subgroups. However, the power for many subgroups is limited due to the low number of events. Even so, given the lack of evidence of strong associations in many of the patient subgroups investigated, and the fact that previously reported variants were not confirmed, our results suggest that the impact of common germline genetic variant on breast cancer-specific survival might be limited.   Table S7. Shows an overview of the GWAS significant associations (P < 5 E-08) and noteworthy (BFDP< 0.15) associations from the unadjusted 15-year and 5-year breast cancer-specific survival analyses by subgroup. Supplementary Figure S1. Shows the Q-Q plots of the meta-analysis results of all variants for 15-year breast cancerspecific survival. Supplementary Figure S2. Shows the Q-Q plots of the meta-analysis results of all variants for 5-year breast cancer-specific survival. Supplementary Figure S3. Shows the Q-Q plots of the metaanalysis results for 15-year breast cancer-specific survival and the corresponding genome inflation factors by minor allele frequency. Supplementary Figure S4. Shows the Q-Q plots of the meta-analysis results for 5-year breast cancer-specific survival and the corresponding genome inflation factors by minor allele frequency. Supplementary Figure S5. Shows the regional plots of genome-wide significant (P < 5E-08) independent associated variants from the 15-year and 5-year genome-wide breast cancer-specific survival analyses. Supplementary Figure S6. Shows the regional plots of noteworthy (BFDP< 0.15), non-genome-wide significant (P > 5E-08) variants from the 15-year and 5-year genome-wide breast cancer-specific survival analyses. Supplementary Figure S7 shows the unadjusted association of variant rs5934618 with 15-year breast cancer-specific survival by tumor grade and in all breast cancer patients. Supplementary Figure S8. Shows the functional annotation and position of variants rs5934618, rs4830644, rs3810742, rs4830642, and rs72611496 relative to TBL1X. Supplementary Figure S9. Shows Kaplan-Meier distant metastasis-free survival plots (based on KMPlotter data) for high versus low expression level of gene TBL1X by tumor grade. Supplementary Figure S10. Shows the association of genetic variants with TBL1X expression, based on GTEx v8 data on samples of normal breast tissue from 396 individuals (male and female). Supplementary Figure  S11. Shows a Kaplan-Meier distant metastasis-free survival plot for high versus low expression level of gene GRIP2, restricted to patients with an ER-tumor who received chemotherapy (based on KMPlotter data). Supplementary Figure S12. Shows a Kaplan-Meier distant metastasis-free survival plot for high versus low expression level of gene ARAP2, restricted to patients with an ER-tumor who received chemotherapy (based on KMPlotter data). Table S8, Supplementary Table  S9, Supplementary Table S10, Supplementary Table S11, and  Supplementary Table S12. Supplementary Table S8. Shows the BFDPs under two more restrictive prior probabilities of true association (10 -5 and 10 -6 ) for the results presented in Table 1. Supplementary  Table S9. Shows the BFDPs under two more restrictive prior probabilities of true association (10 -5 and 10 -6 ) for the results presented in Table 2. Supplementary Table S10. Shows power calculation by subgroup, at the two-sided 5E-08 level for varying genotype hazard ratio (GHR) and minor allele frequency (MAF), based on number of cases and event rate from the 15-year breast cancer-specific analyses. Supplementary Table S11. Shows power calculation by subgroup, at the two-sided 5E-08 level for varying genotype hazard ratio (GHR) and minor allele frequency (MAF), based on number of cases and event rate from the 5-year breast cancer-specific analyses. Supplementary Table S12 Cancer Biobank is acknowledged for providing infrastructure for the collection of blood samples for the cases. Investigators from the CPS-II cohort thank the participants and Study Management Group for their invaluable contributions to this research. They also acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, as well as cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. The authors would like to thank the California Teachers Study Steering Committee that is responsible for the formation and maintenance of the Study within which this research was conducted. A full list of California Teachers Study team members is available at https://www.calteachersstudy.org/team. DIETCOMPLYF thanks the patients, nurses and clinical staff involved in the study. The DietCompLyf study was funded by the charity Against Breast Cancer (Registered Charity Number 1121258) and the NCRN. We thank the participants and the investigators of EPIC (European Prospective Investigation into Cancer and Nutrition).      Availability of data and materials All estimates from the genome-wide survival analyses are available through the BCAC website: http://bcac.ccge.medschl.cam.ac.uk. The datasets analyzed during the current study are not publicly available due to protection of participant privacy and confidentiality, and ownership of the contributing institutions, but may be made available in an anonymized form via the corresponding author on reasonable request and after approval of the involved institutions. To receive access to the data, a concept form must be submitted, which will then be reviewed by the BCAC Data Access Coordination Committee (DACC); see http://bcac.ccge.medschl.cam.ac.uk/bcacdata/.