Common germline polymorphisms associated with breast cancer-specific survival

Introduction Previous studies have identified common germline variants nominally associated with breast cancer survival. These associations have not been widely replicated in further studies. The purpose of this study was to evaluate the association of previously reported SNPs with breast cancer-specific survival using data from a pooled analysis of eight breast cancer survival genome-wide association studies (GWAS) from the Breast Cancer Association Consortium. Methods A literature review was conducted of all previously published associations between common germline variants and three survival outcomes: breast cancer-specific survival, overall survival and disease-free survival. All associations that reached the nominal significance level of P value <0.05 were included. Single nucleotide polymorphisms that had been previously reported as nominally associated with at least one survival outcome were evaluated in the pooled analysis of over 37,000 breast cancer cases for association with breast cancer-specific survival. Previous associations were evaluated using a one-sided test based on the reported direction of effect. Results Fifty-six variants from 45 previous publications were evaluated in the meta-analysis. Fifty-four of these were evaluated in the full set of 37,954 breast cancer cases with 2,900 events and the two additional variants were evaluated in a reduced sample size of 30,000 samples in order to ensure independence from the previously published studies. Five variants reached nominal significance (P <0.05) in the pooled GWAS data compared to 2.8 expected under the null hypothesis. Seven additional variants were associated (P <0.05) with ER-positive disease. Conclusions Although no variants reached genome-wide significance (P <5 x 10−8), these results suggest that there is some evidence of association between candidate common germline variants and breast cancer prognosis. Larger studies from multinational collaborations are necessary to increase the power to detect associations, between common variants and prognosis, at more stringent significance levels. Electronic supplementary material The online version of this article (doi:10.1186/s13058-015-0570-7) contains supplementary material, which is available to authorized users.


Introduction
Breast cancer is the most commonly diagnosed cancer in women, in the world, with an estimated 1.67 million new cancer cases diagnosed in 2012. Breast cancer mortality is the second most common cancer-related death in women in the more developed regions of the world and accounts for 15.4% of cancer-related deaths in women [1]. Breast cancer outcome is affected by several factors including: age, tumour size, tumour grade, extent of local and distal spread at diagnosis, oestrogen receptor (ER) status, human epidermal growth factor receptor 2 (HER2) status and treatment received. It is also likely that inherited host characteristics, such as genetic variants, are important [2].
The association between common germline genetic variation and breast cancer survival has been examined in many candidate gene studies investigating genes in pathways known to be involved in breast cancer [3]. These studies have identified numerous single nucleotide polymorphisms (SNPs) associated with outcome at nominal significance levels, but none have been widely replicated in further studies. The exceptions to this are three genome-wide association studies (GWAS) [4][5][6] and a study from the Breast Cancer Association Consortium, which had substantial power to detect associated variants with large effect sizes (hazard ratio (HR) >2) [7]. Two of those GWAS have reported significant associations for three polymorphisms (rs9934948, rs3784099, rs4778137) [4,6]. The aim of this study was to evaluate the association of previously reported SNPs with prognosis using data from a hypothesis-generating pooled analysis of eight breast cancer survival GWAS from ten studies including 37,954 breast cancer cases [8].

Literature review
Studies reporting common polymorphisms associated with breast cancer prognosis were identified by searching both Google Scholar and Pubmed. We searched Google Scholar using the search terms: 'breast cancer' , 'survival' , 'prognosis' , 'polymorphisms' and 'SNPs'. The search terms for Pubmed were 'breast cancer' AND ('survival' OR 'prognosis') AND ('polymorphism' OR 'SNP'). The references of all identified studies were then individually interrogated for any additional studies. The search was last updated on 6 June 2014. We considered studies to be eligible for inclusion if they reported an association between a germline genetic variant and at least one of the following end points: overall survival, disease-free survival and breast cancer-specific survival (BCSS). Studies evaluating the prognostic importance of rare high-penetrance variants with minor allele frequency <2% in BRCA1, BRCA2 and CHEK2 were omitted from the review. Only one study conducted ER subtype-specific analyses.
For the purposes of comparison, all studies that used genetic models that grouped together two genotypes into a single category were defined as using 'dominance models'. This category includes both dominant and recessive models as each study's definition of a dominant or recessive model is dependent on which allele is the major or minor allele, whether they consider the effect allele to be bi-directional, or whether they focus on only the risk allele.
Genotype and sample quality control were carried out separately for each study. In short, SNPs were excluded based on: low call rate, minor allele frequency <1% and significant deviation of genotype frequencies from the Hardy-Weinberg equilibrium. Samples were excluded for: low call rate, ambiguous gender, relatedness and extreme heterozygosity. We also excluded subjects of less than 90% European ancestry. Sample ancestry was determined separately for each GWAS included in the metaanalysis using either principal component analysis, multidimensional scaling or LAMP based on ethnicities from HapMap samples. Samples with less than 90% European ancestry were excluded. As different genotyping arrays had been used for the different studies, imputation had been performed using a reference panel from the 1000 Genomes Project [8,20]. We utilised the imputed data for the SNPs of interest in this study. Details of the pooled studies are shown in Additional files 1 and 2.
Cox proportional hazards models were fitted to assess the association of genotype with breast cancer-specific mortality under a co-dominant (log-additive) genetic model using the likelihood ratio test. The models were adjusted for principal components in order to minimise the effect of population substructure, and the Collaborative Oncological Gene-environment Study (COGS) [16] dataset was stratified by study. Each survival GWAS was analysed separately and the results were harmonised and combined using a standard inverse-variance weighted fixed-effects meta-analysis. In order to compare the results with the published associations we used a one-sided test based on the reported direction of effect. In the initial analysis all 56 SNPs' models were unadjusted for prognostic factors. However, we conducted multivariable analysis of the previously reported SNPs that were significantly associated with survival adjusting for age, stage and grade using 29,360 samples from the COGS study.

Literature review
We identified 46 publications reporting nominally significant associations between 62 germline variants and survival after a breast cancer diagnosis. Details of each variant and the reported association with breast cancer prognosis are shown in Additional file 3. The median sample size was 890 cases; the smallest study had 85 cases and the largest 25,853. Fifty-nine variants were from 44 candidate gene studies and three variants were identified through GWAS. The candidate genes were involved in the following pathways: DNA repair, cell cycle control, matrix metalloproteinases, immune response, drug response, tumour progression, vitamin D receptors and miscellaneous other pathways (Table 1). Findings from the identified publications were infrequently replicated; only six variants out of the 62 were reported in at least one subsequent publication.

Meta-analysis findings
Results from the GWAS meta-analysis included 58 of the 62 previously identified variants discussed above. The SNP (rs2886162) was replaced by a perfectly correlated tagSNP (rs2364725, r 2 = 1). Associations for four of the variants identified: rs4778137 in OCA2, rs3803662 in TOX3, rs1042522 in TP53 and rs2479717 in CCND1 were discovered in studies carried out by the Breast Cancer Association Consortium using sets of samples included in our GWAS meta-analysis. Therefore, we are unable to replicate these associations independently in the full dataset. The substantial sample overlap between the studies that identified associations with rs4778137 and rs3803662 means that there is little to be gained by attempting to replicate their associations in the additional samples included in the meta-analysis. However, the sample sizes in the studies identifying rs1042522 and rs2479717 were relatively small, so we evaluated their association with BCSS in the GWAS meta-analysis omitting the samples from studies used in the original publications. The two SNPs were evaluated in 29,224 and 31,434 samples respectively. The results for the 56 SNPs evaluated in the metaanalysis are presented in Additional file 4. In the analysis of all cases, five SNPs (rs2981582, rs1800566, rs9934948, rs1800470 and rs3775775) were significant with one-sided P value <0.05, 51 SNPs were not significant at this nominal P value. The most significant association was for rs2981582 in FGFR2 (per G allele HR 1.09, 90% confidence interval (CI) 1.04 to 1.14, one-sided P value = 0.00085). All significantly associated SNPs had good imputation quality (r 2 = 0.9 to 1). The imputation r 2 for all 56 SNPs can be found in Additional file 4. No single SNP reached the stringent level of significance generally regarded as genome-wide significant (P value <5x10 −8 ) but the number of moderately significant associations (5) was somewhat greater than that expected by chance (2.8). This is illustrated by the quantile-quantile plot shown in Figure 1. Seven SNPs not significantly associated with prognosis in all patients were significant in ER-positive disease. We found evidence of ER-positive specific associations with prognosis for seven out of the twelve SNPs nominally associated (P <0.05) with survival. These SNPs were not previously identified in patients with specifically ER-positive disease; however, our observations may agree with the previously reported results as most breast cancers are ER positive. We measured the level of heterogeneity between the studies included in the pooled analysis for the 12 SNPs associated with survival. There was moderate evidence of heterogeneity for the SNP rs2981582 (I 2 = 41.1%, P value = 0.084). For all other SNPs there was low heterogeneity (I 2 < 25%, P value >0.2). Details of the SNPs nominally associated with BCSS are shown in Table 2. The results for the nominally associated SNPs adjusted for age, stage and grade are shown in Additional file 5. The HRs for some of the SNPs were attenuated after adjustment. Also, the associations with BCSS of SNPs rs3775775 and rs2333227 were stronger in the multivariable analysis.

Discussion
There have been few studies focused on the replication of sub-genome-wide significant associations identified previously. Previous replication studies have focused on reporting the SNPs with the strongest evidence of association. We have found some evidence to support previously reported associations between common germline genetic variants and breast cancer prognosis. However, the moderate evidence for some variants provides a rationale for continued research efforts to identify such variants. Significant variants were for  the most part candidates in cancer-related genes as is shown in Table 1. Despite the larger sample size and therefore increased power to detect true associations with prognosis in comparison to previous studies, a possible reason for associations failing to reach genome-wide significance may still be limited power. Figure 2a illustrates that for our analysis with 2,900 survival events from 37,954 cases, there is limited power to detect associations at stringent significance levels for modest effect sizes based on a variant with a 0.3 minor allele frequency. Figure 2b shows that almost five times as many events would be needed to detect with 80 per cent power at P value <10 −8 an allele with a minor allele frequency of 0.3 that confers a HR of 1.1. In a two-sided test, five of the previously reported associations with prognosis were significantly associated with BCSS in the GWAS meta-analysis but had discordant directions of effect to the original results. These discrepancies may be caused by differing ethnicity between the sample populations [21] as the meta-analysis is specific to patients with European ancestry whereas the five original studies consider non-European populations [6,[22][23][24]. On the other hand, they may also represent false positive associations in both discovery and replication data.
Many previously published studies used a dominance model to evaluate associations. We only used a codominant model to detect association in the GWAS. This is justified because thousands of common variants [25] associated with a range of diseases have been identified using a co-dominant model with little or no evidence for dominance. It seems unlikely that breast cancer survival would differ substantially from other phenotypes in any true, underlying genetic model. Where the true underlying model is co-dominant this approach will maximise statistical power. While it is possible that some variants may be truly associated under a dominance model, for example through loss of heterozygosity of the specific germline variant in the tumour, we would still have reasonable power to detect such an association with the large sample size of the GWAS under a co-dominant model.
A further way to increase power to detect robust associations with prognosis is to reduce the level of heterogeneity in the phenotype. Studies focusing on identifying subtype-specific associations will have increased power to detect variants associated with a particular subtype than an analysis on all patients will have. In particular, studies considering disease subtypes, for example ER-negative disease, may provide valuable information into the reasons for known prognostic differences between subtypes. We identified seven SNPs associated with ER-positive disease. These SNPs were not previously identified in specifically ER-positive disease, however, our observations may agree with the previously reported results as most breast cancers are ER positive. In addition, studies looking at interactions with specific treatments, most notably adjuvant chemotherapy, hormonal therapy and adjuvant radiotherapy, may further inform targeted treatment of subgroups of patients according to their inherited genetic information. Some of the previously reported associations with prognosis were found in specific subgroups of patients; however, as yet the sizes of these studies are limited. Large subtype-specific studies are needed in order to investigate interactions with particular subgroups effectively. The generation of sufficiently large studies to deliver strongly significant results, as well as having good outcome and treatment data to enable powerful subtype-specific analyses, will only be possible by combining data resources through large-scale global collaborations. Case-control studies including approximately 100,000 cases are now being conducted to identify common variants associated with risk. It seems a realistic goal to carry out case-cohort studies of a similar size. Reliable identification of SNPs associated with breast cancer prognosis may help to understand the molecular mechanisms of tumour progression and metastasis. Ultimately, this may lead to the development of new therapeutic targets. Polygenic risk scores based on multiple risk alleles have been shown to have potentially useful discrimination [26]. Similar polygenic prognostic scores may improve discrimination of prognostic and treatment benefit tools such as PREDICT [27].

Conclusions
We have found limited evidence to support the assertion that germline genetic variation influences outcome after a diagnosis of breast cancer. Large studies with detailed clinical and follow-up information are needed in order to achieve sufficient statistical power to detect associations at stringent significance thresholds. In addition, power can also be increased by reducing the level of phenotype heterogeneity, which will also provide valuable insights into prognostic differences between subgroups.

Additional files
Additional file 1: Study information for GWAS included in meta-analysis [8].
Additional file 2 Samples included in meta-analysis by study [8].
Additional file 3: Previously reported associations with breast cancer survival.
Additional file 4: Look-up of previously reported associations in meta-analysis.
Additional file 5: Multivariable analysis results adjusting for age, stage and grade in samples from the COGS dataset.

Competing interests
The authors declare they have no competing interests.