Breast cancer risk assessment with five independent genetic variants and two risk factors in Chinese women

Introduction Recently, several genome-wide association studies (GWAS) have identified novel single nucleotide polymorphisms (SNPs) associated with breast cancer risk. However, most of the studies were conducted among Caucasians and only one from Chinese. Methods In the current study, we first tested whether 15 SNPs identified by previous GWAS were also breast cancer marker SNPs in this Chinese population. Then, we grouped the marker SNPs, and modeled them with clinical risk factors, to see the usage of these factors in breast cancer risk assessment. Two methods (risk factors counting and odds ratio (OR) weighted risk scoring) were used to evaluate the cumulative effects of the five significant SNPs and two clinical risk factors (age at menarche and age at first live birth). Results Five SNPs located at 2q35, 3p24, 6q22, 6q25 and 10q26 were consistently associated with breast cancer risk in both testing set (878 cases and 900 controls) and validation set (914 cases and 967 controls) samples. Overall, all of the five SNPs contributed to breast cancer susceptibility in a dominant genetic model (2q35, rs13387042: adjusted OR = 1.26, P = 0.006; 3q24.1, rs2307032: adjusted OR = 1.24, P = 0.005; 6q22.33, rs2180341: adjusted OR = 1.22, P = 0.006; 6q25.1, rs2046210: adjusted OR = 1.51, P = 2.40 × 10-8; 10q26.13, rs2981582: adjusted OR = 1.31, P = 1.96 × 10-4). Risk score analyses (area under the curve (AUC): 0.649, 95% confidence interval (CI): 0.631 to 0.667; sensitivity = 62.60%, specificity = 57.05%) presented better discrimination than that by risk factors counting (AUC: 0.637, 95% CI: 0.619 to 0.655; sensitivity = 62.16%, specificity = 60.03%) (P < 0.0001). Absolute risk was then calculated by the modified Gail model and an AUC of 0.658 (95% CI = 0.640 to 0.676) (sensitivity = 61.98%, specificity = 60.26%) was obtained for the combination of five marker SNPs, age at menarche and age at first live birth. Conclusions This study shows that five GWAS identified variants were also consistently validated in this Chinese population and combining these genetic variants with other risk factors can improve the risk predictive ability of breast cancer. However, more breast cancer associated risk variants should be incorporated to optimize the risk assessment.


Introduction
Breast cancer is one of the most common cancers among women worldwide [1]. Although life/environment related factors are implicated in breast carcinogenesis, it is a complex polygenic disorder in which genetic makeup also plays an important role [2,3]. In the past decades, high-penetrance genes (for example, BRCA1, BRCA2, PTEN and TP53) have been identified to be associated with familiar breast cancer [4]. However, these genes account for less than 5% of overall breast cancer patients and most of the risk is likely to be attributable to more low-penetrance genetic variants [5][6][7].
Recently, several genome-wide association studies (GWAS) reported many novel breast cancer predisposing single nucleotide polymorphisms (SNPs) [8][9][10][11][12][13][14]. However, most of the studies were conducted among Caucasians [8][9][10][11][12][13] and only one among Chinese [14], and whether these genetic variants are applicable marker SNPs in Asian women is unclear. Furthermore, evaluation of a risk-predicting model is an important topic in genetic studies of human diseases, including breast cancer. An effective risk-predicting model can assist physicians in disease prevention, diagnosis, prognosis and treatment [15]. For the harvest of GWAS on breast cancer, many studies combined the genetic markers and other traditional risk factors together to evaluate the risk-predicting model of breast cancer [16][17][18][19][20][21][22]. However, most of the breast cancer risk model effects are unsatisfied and only one related study was available in Chinese women [17].
In the current study, a two-stage case-control study of 1,792 breast cancer cases and 1,867 cancer-free controls was conducted among Chinese women to replicate 15 selected SNPs identified from previous GWAS. Then, risk models were constructed and absolute risk was calculated to evaluate the combined effects of the significant SNPs and clinical risk factors.

Study subjects
This study was approved by the institutional review board of Nanjing Medical University. The hospital-based case-control study included 1,792 breast cancer cases and 1,867 cancer-free controls, and the detail process of subjects recruitment was described previously [23][24][25]. In brief, incident breast cancer patients were consecutively recruited from the First Affiliated Hospital of Nanjing Medical University, the Cancer Hospital of Jiangsu Province and the Gulou Hospital, Nanjing, China, between January 2004 and April 2010. Exclusion criteria included reported previous cancer history, metastasized cancer from other organs, and previous radiotherapy or chemotherapy. All breast cancer cases were newly-diagnosed and histopathologically confirmed, without restrictions of age or histological types. Cancer-free control women, frequency-matched to the cases on age (± 5 years) and residential area (urban or rural), were randomly selected from a cohort of more than 30,000 participants in a community-based screening program for non-infectious diseases conducted in the same region. All participants were ethnic Han Chinese women. Of the eligible participants, 878 cases and 900 controls were randomly assigned to form the testing set, and the remaining 914 cases and 967 controls formed the validation set.
After providing informed consent, each woman was personally interviewed face-to-face by trained interviewers using a pre-tested questionnaire to obtain information on demographic data, menstrual and reproductive history, and environmental exposure history. After the interview, each subject provided 5 ml of venous blood. The estrogen receptor (ER) and progesterone receptor (PR) status of breast cancer was determined by immunohistochemistry examinations which were obtained from the medical records of the hospitals.

SNP selection and genotyping
The SNP selection procedure followed three criteria: (a) reported marker SNP in previous GWAS (last search in November 2009); (b) minor allele frequency (MAF) ≥ 0.05 in Chinese Han Beijing (CHB) based on the HapMap database (phase II, released 24 in November 2008); (c) only SNPs with low linkage disequilibrium (LD) were included (r 2 < 0.8) if multiple SNPs can be found at the same region. Overall, 15 SNPs (11 regions of 2q35, 3p24, 5p11, 5p12, 6q22, 6q25, 8q24, 10q26, 11p15, 16q12 and 17q23; Table 1) were selected and genotyped by using the middle-throughput TaqMan OpenArray Genotyping Platform (Applied Biosystems Inc., Carlsbad, CA, USA) for testing set samples (878 cases and 900 controls) and by TaqMan Assays on ABI PRISM 7900 HT Platform (Applied Biosystems Inc.) for validation set samples (914 cases and 967 controls). For OpenArray Assays, normalized human DNA samples were loaded and amplified on customized arrays following the manufacturer's instructions. Each 48-sample array chip contained two NTCs (no template controls). For TaqMan Assays, approximately equal numbers of case and control samples were assayed in each 384-well plate. Two blank controls in each plate were used for quality control and 96 duplicates were randomly selected to repeat for the two platforms, and the results were more than 97% concordant.

Statistical analyses
Differences between breast cancer cases and controls in demographic characteristics, risk factors and frequencies of SNPs were evaluated by Fisher's exact tests (for categorical variables) or Student t-test or t'-test (equal variances not assumed) (for continuous variables). Hardy-Weinberg equilibrium was evaluated by exact test among the controls [26].
As shown in Additional file 1, three steps were performed to assess the breast cancer risk model. (1) SNPs screening. Following a two-stage strategy, associations between SNPs and risk of breast cancer were estimated by computing odds ratios (ORs) and their 95% confidence intervals (CIs). (2) Risk model construction. For the model parsimony, only genetic or clinical risk factors that were independently associated with breast cancer were included. Both OR (odds ratio) and AR (absolute risk) were taken as indicators to evaluate the risk model. For the OR-based risk model, two different methods were used. One method treated each risk allele/factor equally and combined them based on the counts of risk alleles/factors. Another method assessed the effects of the SNPs and risk factors using a risk score analysis with a linear combination of the SNP genotypes or risk factors weighted by their individual OR (The log odds at each SNP locus was additive in the number of minor alleles, and the log odds for the entire model was additive across SNPs and other risk factors). Then the risk score was classified into four groups by its quartiles in controls. AR is the risk of developing a disease over a time-period. In our paper, the AR for each woman was estimated by a modified Gail model [16,27]. This method is described as a multiplicative model used to derive genotype relative risk from the allelic OR. The allelic OR for each SNP was obtained assuming an additive genetic model by logistic regression analysis. For each of the three genotypes at each SNP, the genotype relative risk was converted to the risk relative to the population. The overall risk relative to the population was derived by combining the risks relative to the population of all SNPs as well as the two clinical risk factors (age at menarche and age at first live birth) of the individual by multiplication. Finally, the AR for each woman was obtained based on the overall risk relative to the population, calibrated by the incidence rate of breast cancer for women (aged 20 to 85 years), and the mortality rate for all causes except breast cancer from the Shanghai registration system, China [28].
(3) Risk model discrimination. The model performance was evaluated by receiver-operator characteristic (ROC) curves and the area under the curve (AUC) to classify the breast cancer cases and controls. The difference of AUCs was tested by a non-parametric approach developed by DeLong ER et al. [29]. Furthermore, for the absolute risk-based risk models, we used the 10-fold cross-validation method to check the reliability of the models. All of the statistical analyses were two-sided and performed with Statistical Analysis System software (9.1.3; SAS Institute, Cary, NC, USA) and Stata (9.2; StataCorp LP, Lakeway Drive College Station, TX, USA), unless indicated otherwise.

Results
A total of 1,792 breast cancer cases and 1,867 cancerfree controls were included in the final analysis, and the characteristics of these subjects were summarized in Table 2. Age at menarche (P < 0.001) and age at first The results of the selected 15 SNPs and the breast cancer risk in testing set samples were presented in Table 1. The call rates of the 15 SNPs were all above 95% and the MAF in the controls were all above 0.05. Five SNPs at 2q35, 3p24, 6q22, 6q25 and 10q26 were significantly associated with breast cancer risk (2q35: rs13387042, P = 0.039; 3p21.4: rs2307032, P = 0.017; 6q22.33: rs2180341, P = 0.040; 6q25.1: rs2046210, P = 1.26 × 10 -5 ; 10q26.13: rs2981582, P = 0.037). Therefore, these five SNPs were included in the further validation analyses.
The call rates of the five SNPs in the validation stage were all above 95% ( The cumulative effects of the five SNPs and the two risk factors (age at menarche and age at first live birth) on breast cancer risk were examined by two methods (Table 4). One method was based on the counting of risk alleles/factors. Women carrying six or more risk alleles of the five SNPs (5.75% of case patients and 3.23% of control subjects) had a nearly three-fold increased risk for developing breast cancer compared with those carrying less than one of the risk alleles (11.08% of case subjects and 16.70% of control subjects). When taking age at menarche and age at first live birth into consideration, the top group (having more than seven risk alleles/factors) had a 5.61-fold increased risk compared to the reference group (adjusted OR = 5.61, 95% CI = 4.16 to 7.56). Another method was based on the risk score calculated with a linear combination of the SNP alleles or risk factors weighted by the individual odds ratio and then classified into four groups by the quartiles. Subjects with the upper quartile risk score were associated with a 91% increased breast cancer risk compared to those having the low quartile score (adjusted OR = 1.91, 95% CI = 1.56 to 2.35, P for trend: 5.60 × 10 -10 ). Similarly, a 4.73-fold increased risk was illustrated when taking age at menarche and age at first live birth into consideration (adjusted OR = 4.73, 95% CI = 3.80 to 5.88, P for trend: 2.27 × 10 -47 ). We then assessed the performance of the two risk prediction methods in discriminating cases and controls by ROC curves analyses. The AUC for the risk score analysis (0.649, 95% CI: 0.631 to 0.667; sensitivity = 62.60%, specificity = 57.05%, Figure 1) was significantly higher than that by the risk factors counting method (AUC: 0.637, 95% CI: 0.619 to 0.655; sensitivity = 62.16%, specificity = 60.03%, Figure 2) (P < 0.0001).
Absolute risk was also calculated to evaluate the combined effects of the five SNPs and the two risk factors by a modified Gail model and a 65-year absolute risk for breast cancer among women aged 20 to 85 years was estimated for each subject. From Table 5, a clear trend was observed that more subjects were grouped as high risk along with the increased numbers of risk alleles/factors. However, the variation of absolute risk distribution increased with increasing numbers of factors used in the risk-predicting model. Compared to a uniform 65-year cumulative risk 0.07 as carrying four risk factors (chosen by the largest proportion in controls: 22.01%, Table 5) for breast cancer in the population, a wide spectrum of absolute risk estimates was found using these five markers and the two clinical risk factors (Figure 3). At a cutoff of 0.14 (two-fold of the population median risk) or 0.21 (three-fold of the population median risk), 26.57% or 10.43% of women were grouped as high risk, respectively. We also used the ROC curve analysis to evaluate the performance of absolute risk to classify the cases and controls. As shown in Figure 4, we obtained an AUC of 0.658 (95% CI: 0.640 to 0.676) (sensitivity = 61.98%, specificity = 60.26%) for five SNPs plus two risk factors. Based on the cross-validation, similar results for AUCs were obtained (0.572 (five SNPs only), 0.644 (two risk factors only) and 0.660 (five SNPs plus two risk factors)), which suggests a relative reliability of the models.
The stratified analyses by ER or PR status of the five SNPs were summarized in Additional file 2. However, no significant heterogeneity was observed for the effect of each SNP by different ER or PR subgroups. Further stratified analysis was conducted on the cumulative effects of the five SNPs (coded 0 to 2 risk alleles as 0 and more than 3 risk alleles as 1) and found no heterogeneity between subgroups (Additional file 3).

Discussion
In our study involving 1,792 breast cancer cases and 1,867 cancer-free controls, 5 of the 15 variants, identified in previous GWAS studies [8][9][10][11][12][13][14], were consistently associated with breast cancer risk in this Chinese population. Risk assessment models and absolute risk calculations combining the five SNPs and two clinical risk factors indicated the small effects of these markers in discriminating cases and controls. Overall, the results provide further evidence and utility for GWAS identified SNPs in relation to breast cancer risk assessment in Chinese women. We summarized associations of the 15 SNPs of breast cancer identified by previous GWAS studies and following replication studies (Additional file 4). SNP rs13387042 at 2q35 was identified as a breast cancer susceptibility SNP in two GWAS conducted among Europeans [12,13]. Significant associations were also observed in most of the later studies on Europeans Figure 1 The area under curves (AUCs) for breast cancer riskpredicting models calculated by risk score method.
Traditional approaches to assessing patients' disease risk are primarily achieved through non-genetic risk factors with apparent limitations, and it is expected that a better prediction can be reached if we can incorporate genetic determinants. Recently, several studies on these efforts were published [16][17][18][19][20][21][22]. Zheng et al. conducted a validation study with 3,039 breast cancer cases and 3,082 controls for 12 GWAS identified SNPs (nine regions) in Asian women [17], and built a risk assessment model with eight SNPs and five clinical risk factors. However, only five of the eight SNPs were significantly associated with breast cancer susceptibility in the study. In our current study, two more regions were incorporated (3q24.1, 17q23.2) and we found five susceptibility SNPs with a two-stage validations, although the performance of the risk assessment model was still limited.
Overall, risk model prediction is not a diagnostic tool but provides an estimate of likelihood of developing disease in the future. A well-evaluated risk model, taking genetic and clinical risk factors together, can be used as a screening tool for high risk individuals among the general population. Women at high risk for breast cancer can be focused on by choosing an optimal cutoff (for example, two-fold of the population median risk), and these women should perform regular breast cancer screening [48,49]. Results from this study suggest that GWAS identified SNPs can be used to improve the prediction model. However, there are a number of limitations for the current study. First, several newly reported breast cancer risk-associated SNPs were not included in the current analysis [50]. Second, more breast cancer associated risk factors should be evaluated, such as the body mass index (BMI) and family history of breast cancer [14]. However, the effects on breast cancer risk by BMI could not be well-evaluated in our study with a retrospective study design. Our moderate study sample size limited our power to evaluate the parameters of breast cancer family history (only 101 cases (7.39%) and 3 controls (0.29%) with a positive breast cancer family history). Third, the two-stage study design, although helping to avoid false positive findings, may cause the omission of low but true associations, because our overall study sample size is moderate.

Conclusions
Overall, five GWAS identified variants were also consistently validated in this Chinese population. Risk assessment models that incorporate both a genetic risk score based on these SNPs and the established risk factors for breast cancer may be useful for identifying high-risk women for targeted cancer prevention. More genetic risk variants and other risk factors should be well evaluated and incorporated into the risk-predicting models to improve the ability of personalized risk assessment.