of breast cancer: results from the National Cancer Institute Breast and Prostate Cancer Cohort Consortium (BPC3)

Introduction Androgens have been hypothesised to influence risk of breast cancer through several possible mechanisms, including their conversion to estradiol or their binding to the oestrogen receptor and/or androgen receptor (AR) in the breast. Here, we report on the results of a large and comprehensive study of the association between genetic variation in the AR gene and risk of breast cancer in the National Cancer Institute Breast and Prostate Cancer Cohort Consortium (BPC3). Methods The underlying genetic variation was determined by first sequencing the coding regions of the AR gene in a panel of 95 advanced breast cancer cases. Second, a dense set of markers from the public database was genotyped in a panel of 349 healthy women. The linkage disequilibrium relationships (blocks) across the gene were then identified, and haplotype-tagging single nucleotide polymorphisms (htSNPs) were selected to capture the common genetic variation across the locus. The htSNPs were then genotyped in the nested breast cancer cases and controls from the Cancer Prevention Study II, European Prospective Investigation into Cancer and Nutrition, Multiethnic Cohort, Nurses' Health Study, and Women's Health Study cohorts (5,603 breast cancer cases and 7,480 controls). Results We found no association between any genetic variation (SNP, haplotype, or the exon 1 CAG repeat) in the AR gene and risk of breast cancer, nor were any statistical interactions with known breast cancer risk factors observed. Conclusion Among postmenopausal Caucasian women, common variants of the AR gene are not associated with risk of breast cancer.

The androgen receptor (AR) protein exists as two isoforms, both arising from the same DNA sequence on the X chromosome (Xq11-q12). The shorter form of the AR protein lacks the N-terminal region, which is coded by exon 1. Within exon 1 is a tri-nucleotide CAG repeat. Although this polymorphism is associated with AR transactivation activity [18][19][20][21] and prostate cancer risk in some studies [22][23][24][25][26][27][28], no clear association has been shown with breast cancer risk [29][30][31][32][33][34]. The 3' UTR (untranslated region) of the AR contains sequence elements that bind to proteins involved in regulation of mRNA stability. This and other sequence-specific characteristics of AR mRNA, including putative function of the repeats in exon 1, have recently been reviewed [35]. The AR is expressed in the normal breast, as well as in primary and metastatic breast cancer tumours, and both the expression and protein levels are correlated with tumour invasiveness [36].
We hypothesised that inherited polymorphisms in genes related to sex steroid hormone synthesis, metabolism, and cell signaling could alter the function of these genes and the proteins they encode, therefore altering breast cancer risk; in this report, we present results for the AR. We used a haplotypetagging approach, which aims to capture common variants in the AR gene. Here, we present these haplotypes and describe their association with breast cancer risk in a pooled analysis of nested case control studies from a large collaborative study, the Breast and Prostate Cancer Cohort Consortium (BPC3) [37], which includes 5,603 cases of breast cancer and 7,480 controls.

Study population
The BPC3 has been described in detail elsewhere [37]. Briefly, the consortium includes five large well-established cohorts assembled in the U.S. and Europe which have both DNA samples and extensive questionnaire information (the American Cancer Society Cancer Prevention Study II [38] [42]). Most women in these cohorts, with the exception of the MEC, were Caucasians of U.S. and European descent. Breast cancer cases were identified in each cohort by self-report with subsequent confirmation of the diagnosis from medical records or tumour registries and/or from linkage with population-based tumour registries (method of confirmation varied by cohort). Controls were matched to cases by ethnicity and age and, in some cohorts, additional criteria (such as country of residence in EPIC).

Genotyping
Coding regions of AR were sequenced in a panel of 95 advanced breast cancer cases from the MEC (19 of each ethnic group: African-American, Latino, Japanese, Native Hawaiian, and white). Thirty-two single nucleotide polymorphisms (SNPs) with minor allele frequency greater than 5% in any of the five ethnic groups or greater than 1% overall were selected from this resequencing as well as any SNP available in dbSNP to be used to select haplotype-tagging SNPs (htSNPs). These SNPs were genotyped in a reference panel of 349 healthy women from the MEC populations (including 70 whites) at the Broad Institute (Cambridge, MA, USA) using the Sequenom, Inc. (San Diego, CA, USA) and Illumina, Inc. (San Diego, CA, USA) platforms, and six htSNPs were selected to maximise R 2 H (a measure of correlation between SNPs genotyped and the haplotypes they describe) among Caucasians, using the method of Stram et al. [43]. Genotyping of the six htSNPs in the breast cancer cases and controls was performed in three laboratories (University of Southern California, Los Angeles, CA, USA; Harvard School of Public Health, Boston, MA, USA; and International Agency for Research on Cancer, Lyon, France) using a fluorescent 5' endonuclease assay and the ABI-PRISM 7900 for sequence detection (Taqman) (Applied Biosystems, Foster City, CA). Initial quality control checks of the SNP assays were performed at the manufacturer (Applied Biosystems); an additional 500 test reactions were run by the BPC3. Assay characteristics for the six htSNPs for AR are available on a public website [44]. Sequence validation for each SNP assay was performed and 100% concordance was observed [45]. To assess inter-laboratory variation, each genotyping centre ran assays on a designated set of 94 samples from the Coriell Biorepository (Camden, NJ, USA), showing completion and concordance rates of greater than 99% [45]. The internal quality of genotype data at each genotyping centre was assessed by typing 5%-10% blinded samples in duplicate or triplicate (depending on the study); the resulting concordance was greater than 99%. The genotyping success rate was 94% or greater for each of the six SNPs at each genotyping centre. No deviation from Hardy-Weinberg equilibrium was observed among the controls in each cohort (at the p < 0.01 level) for any given assay. An association among the exon 1 CAG repeat in AR, family history of breast cancer, and breast cancer risk was previously reported (1990-96 followup in the NHS, 617 cases and 960 controls [46]). The exon 1 CAG repeat was genotyped in an additional 376 cases and 540 controls from the NHS as well as 669 cases and 674 controls from the WHS, as previously described [46]. Given that there is no association between the CAG repeat and breast cancer risk, and the interaction between this polymorphism and family history was not observed in this larger combined sample set (1,662 cases and 2,174 controls), we decided not to expend the resources necessary to genotype the repeat in the remaining data sets.

Statistical analysis
We used conditional multivariate logistic regression to estimate odds ratios (ORs) for disease in subjects with a linear (additive) scoring for zero, one, or two copies of the minor allele of each SNP. We also used conditional logistic regression with additive scoring and the most common haplotype as the referent to estimate haplotype-specific ORs, using an expectation-substitution approach to assign haplotypes based on the unphased genotype data and to account for uncertainty in assignment [47,48]. Haplotype frequencies and expected subject-specific haplotype indicators were calculated separately for each cohort (as well as by country within EPIC and race in the MEC). To test the global null hypothesis of no asso-ciation between variation in AR haplotypes and htSNPs and risk of breast cancer (or subtypes defined by receptor status), we used a likelihood ratio test comparing a model with additive effects for each common haplotype (treating the most common haplotype as the referent) to the intercept-only model. We combined rare haplotypes (those with estimated individual frequencies less than 5% in all cohorts) into a single category that comprised less than 1.5% of the controls.
We considered both unadjusted conditional models and conditional models adjusting for known breast cancer risk factors. The covariates included to account for breast cancer risk factors were age at menarche (≤12 years, 13-14 years, 15+ years), menopausal status (pre-, peri-, and postmenopausal), parity (ever/never full-term pregnancy), body mass index (BMI) (in kg/m 2 as a continuous variable), and use of postmenopausal hormones (ever/never). Other common risk factors, including family history of breast cancer, personal history of benign breast disease, and age at menopause, were unavailable for large numbers of women and therefore were not included in the models. Because the results remained essentially unchanged regardless of the model used, we present results using the unadjusted conditional model. We also evaluated these covariates, restricting analyses of interaction to only those subjects with information available for variables such as family history, with categorical variables divided into quintiles. Interaction effects were evaluated using likelihood ratio testing, comparing models with the main effects of the genetic and risk variable to the model with these main effects and a multiplicative interaction term. Lastly, we tested whether the association between AR and breast cancer differed by menopausal status at diagnosis and tumour receptor (oestrogen receptor [ER] and progesterone receptor [PR]) status.
The exon 1 CAG repeat was analysed as previously reported [44]. Interaction p values between number of repeats and family history were calculated using likelihood ratio tests comparing the model with main effects for carrying at least one long repeat (cutoffs of ≥22, 23, 25, and 27 repeats) and family history with the model containing these main effects, and an additional interaction term, with homozyotes of the ≥22 allele with no family history as the reference. Figure 1 shows the genomic structure of the region around AR. One very common (approximately 70%) haplotype exists, with six lower-prevalence haplotypes being defined each by the htSNPs. The minimum R 2 H for these six SNPs was 0.77 in the Japanese, white, and Latina samples from the SNP selection panel. However, these SNPs do not describe haplotype diversity among African-Americans (minimum R 2 H = 0.03).

Results
A total of 5,603 cases and 7,480 controls were available for genotyping. Table 1 shows some of the baseline characteristics of these cases and controls. Genotyping success for each (page number not for citation purposes) polymorphism was greater than 94%, and samples not yielding a genotype for a given SNP were removed from analyses for that SNP. Samples not yielding at least one genotype were removed from haplotype analyses, for a total of 5,584 cases and 7,459 controls. No associations with breast cancer or heterogeneity of risk estimates across the participating cohorts were observed for any individual SNP (Table 2) or haplotype tagged by these SNPs (Table 3). No differences in haplotype distribution were observed between ER + (n = 2,543) and ER -(n = 590) cases (global p value = 0.61), PR + (n = 2,158) and PR -(n = 860) cases (global p value = 0.51), or localised (n = 2,964) and metastatic (n = 1,646) cases (global p value = 0.43). No statistically significant interactions were observed between haplotypes and common breast cancer risk factors such as family history (yes/no), BMI (≤25, >25), age at first fullterm pregnancy (nulliparous, ≤24, >24), or alcohol consumption (non-drinkers, ≤5 g/day, >5 g/day) (p interaction = 0.13, 0.16, 0.14, and 0.28, respectively). These results were not materially different after excluding African-American women from the MEC (344 cases and 426 controls).
Data from further follow-up of the NHS and the WHS did not support the previous findings of interaction between the AR CAG repeat and family history on breast cancer risk in the NHS with follow-up to 1996 [46]. No statistically significant interactions between longer AR CAG repeat length and positive family history were observed in either the further NHS follow-up or the WHS. The decrease in risk associated with shorter repeats among family history positive cases as previously reported [46] was not observed (Table 4).

Discussion
One of the main aims of the BPC3 was to overcome limitations of prior studies by increasing sample size and, therefore, power of the study. By choosing genes involved in the synthesis, metabolism, and signaling of sex hormones, we aimed to maximise the possibility of finding alleles that predispose to breast cancer. Although the AR gene is a likely candidate gene, no association between polymorphisms in the AR gene and breast cancer risk was observed, despite the large sample size (5,603 cases and 7,480 controls) and systematic approach of this study. Linkage disequilibrium (LD) at androgen receptor (AR) Linkage disequilibrium (LD) at androgen receptor (AR). Linkage disequilibrium plot generated by Haploview [50] showing correlation between all pairs of single nucleotide polymorphisms (SNPs) genotyped to select haplotype-tagging SNPs (ht-SNPs). The 32 SNPs genotyped among the 349 Multiethnic Cohort samples are shown in their physical position along the X chromosome (solid black horizontal line) and relative to the AR gene (depicted above the X chromosome plot). htSNPs selected are shown by arrows. Although two LD blocks are shown (black outline in LD plot), high correlation between the blocks allowed for the analysis of the htSNPs as one block. In a previous study [46], shorter alleles of the CAG repeat polymorphism in exon 1 of the AR gene were associated with decreased risk of disease in women with a family history of breast cancer in the NHS. Adding samples from further followup cycles of the NHS, as well as samples from the WHS, we were unable to confirm this initial finding.
Mutations in genes such as BRCA1 (breast cancer 1, early onset) and BRCA2, although highly penetrant, are of low prevalence in the general population. Very few common polymorphisms have been shown to be associated with breast cancer risk. Using a candidate gene approach to select genes of possible interest in breast cancer etiology has also yielded very few breast cancer-susceptibility loci. One possible explanation for the lack of consistent association between common polymorphisms and breast cancer risk in individual studies is that the change in risk associated with common variants is too low to detect in individual studies and results that are reported may reflect publication bias.
The 95% confidence intervals in our study were narrow and exclude a substantial association between common variants in the AR gene with breast cancer risk. A concern that is more specific to the AR gene is that, due to the gene's location on the X chromosome, X chromosome inactivation could bias risk estimates associated with a causal allele toward the null. Such bias would be especially likely if the same X chromosome (either maternal or paternal) were inactivated in all breast tissue within each woman. However, X chromosome inactivation occurs very early in embryonic development and differs between lobes within the same breast [49]. Assuming a lowpenetrance allele (as hypothesised here), women who are heterozygous for a putative risk allele on the X chromosome are still at approximately half the risk of developing breast cancer as women who are homozygous for the same allele, as approx-  imately half (from random inactivation) of the breast cells would not express or be exposed to the risk allele, compared with all breast cells expressing the risk allele among homozygotes. This somewhat limits the possibility that X chromosome inactivation patterns could bias risk estimates toward the null. Optimally, tumour tissue from heterozygous women would be analysed to determine which allele is inactivated; however, this is not possible in the present study, because tumour specimens are not available.
Due to the low numbers of premenopausal women in our study, we cannot exclude the AR gene as a susceptibility locus for breast cancer occurring before menopause. Additionally, although the MEC does provide information from non-Caucasian individuals, there are not a sufficient number of samples, and htSNPs selected to describe genetic variation in Caucasians is not sufficient among African-Americans to definitively exclude polymorphisms in the AR gene as breast cancer-susceptibility alleles except in Caucasians. Among the Table 3 Association between AR haplotypes and breast cancer risk in the NCI BPC3    latter, neither common variants nor the CAG repeat in exon 1 of the AR gene is associated with risk of postmenopausal breast cancer.

Conclusion
Common polymorphisms in the AR gene are not associated with breast cancer risk among postmenopausal Caucasian women.