Skip to main content


Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

A genome-wide association study identifies a breast cancer risk variant in ERBB4 at 2q34: results from the Seoul Breast Cancer Study



Although approximately 25 common genetic susceptibility loci have been identified to be independently associated with breast cancer risk through genome-wide association studies (GWAS), the genetic risk variants reported to date only explain a small fraction of the heritability of breast cancer. Furthermore, GWAS-identified loci were primarily identified in women of European descent.


To evaluate previously identified loci in Korean women and to identify additional novel breast cancer susceptibility variants, we conducted a three-stage GWAS that included 6,322 cases and 5,897 controls.


In the validation study using Stage I of the 2,273 cases and 2,052 controls, seven GWAS-identified loci [5q11.2/MAP3K1 (rs889312 and rs16886165), 5p15.2/ROPN1L (rs1092913), 5q12/MRPS30 (rs7716600), 6q25.1/ESR1 (rs2046210 and rs3734802), 8q24.21 (rs1562430), 10q26.13/FGFR2 (rs10736303), and 16q12.1/TOX3 (rs4784227 and rs3803662)] were significantly associated with breast cancer risk in Korean women (Ptrend < 0.05). To identify additional genetic risk variants, we selected the most promising 17 SNPs in Stage I and replicated these SNPs in 2,052 cases and 2,169 controls (Stage II). Four SNPs were further evaluated in 1,997 cases and 1,676 controls (Stage III). SNP rs13393577 at chromosome 2q34, located in the Epidermal Growth Factor Receptor 4 (ERBB4) gene, showed a consistent association with breast cancer risk with combined odds ratios (95% CI) of 1.53 (1.37-1.70) (combined P for trend = 8.8 × 10-14).


This study shows that seven breast cancer susceptibility loci, which were previously identified in European and/or Chinese populations, could be directly replicated in Korean women. Furthermore, this study provides strong evidence implicating rs13393577 at 2q34 as a new risk variant for breast cancer.


Breast cancer, one of the most common malignancies among women worldwide, is a complex polygenic disease in which genetic factors play a significant role in the disease etiology [1, 2]. So far, genome-wide association studies (GWASs) have reported over 40 common low-penetrance variants in 25 loci that are associated with the breast cancer risk reported in the National Human Genome Research Institute catalog [3]. The most strongly and consistently associated single-nucleotide polymorphisms (SNPs) reside in intron 2 of the receptor tyrosine kinase FGFR2 (rs2981582) at 10q26.13 and near the 5' end of the TOX3 gene at 16q12.1 (rs3803662) [49]. With the exception of three studies conducted among Asian women [1012], all other previously published GWASs have been conducted primarily in women of European descent. Several studies, including our study, have investigated previously identified loci in European populations in other ethnic groups and validated the initial findings [13, 14]. However, newly discovered loci initially identified in women of European descent tend to be weakly associated with breast cancer in women of Asian descent [10, 11] or could not be confirmed in Asians because of the difference in linkage disequilibrium (LD) patterns between ethnic populations, suggesting that additional genetic variants for Asian women remain to be discovered.

In this study, we conducted a three-stage GWAS to identify common breast cancer susceptibility loci and to validate the previously reported loci by using Affymetrix Genome-Wide Human SNP Array 6.0 (Affymetrix, Inc., Santa Clara, CA, USA) with 2,273 patients with breast cancer from the Seoul Breast Cancer Study (SeBCS) and 2,052 healthy controls from a large urban cohort, the Korea Genome Epidemiology Study (KoGES), as stage I. By analyzing data from two replication stages that consisted of 4,049 cases and 3,845 controls, we found strong evidence for a new genetic variant that may be associated with breast cancer risk among Asian women.

Materials and methods

Study population

A genome-wide association scan (stage I) was conducted with 2,385 patients with breast cancer from the SeBCS and 2,392 healthy controls from a large urban cohort that is participating in the KoGES (Supplementary methods in Additional file 1). For stage II, 2,052 cases were selected among the patients who were the participants in SeBCS but not included in stage I and 2,169 controls from another cohort recruited from two small cities with both urban and rural areas as part of KoGES. For stage III, 1,997 cases were selected from two independent breast cancer studies - the Korean Hereditary Breast Cancer (KOHBRA) study (n = 1,289) and the Yonsei Breast Cancer Study (n = 708) - and 1,676 controls were selected from health examinees from rural populations to study the risk factors for chronic diseases. Detailed descriptions of these participants are provided in Supplementary methods in Additional file 1, and descriptive statistics of the study subjects are shown in Table S2 of Additional file 2. The study protocols were approved by the institutional review boards of Seoul National University Hospital (institutional review board # H-0503-144-004) and each collaborating institute (Description of study participants in Additional file 1). Informed consent was obtained from all participants.


A GWAS (stage I) was performed by a single platform by using the Affymetrix Genome-Wide Human SNP Array 6.0 chip (Affymetrix, Inc.). In total, 4,777 samples were genotyped by using 500 ng of genomic DNA from peripheral blood. The Birdseed V2 algorithm was used to call the genotypes [15].

In total, 30 quality control (QC) samples were genotyped by using Affymetrix SNP Array 6.0. The average concordance rate between the QC samples was 99.8%. For internal validation of the Affymetrix SNP Array 6.0 platform, 12 SNPs were genotyped for all subjects by SNPstream UHT (12-plex, SNP-IT assay, Orchid Biosciences, Princeton, NJ). Samples of subjects that had a genotype call rate of below 95%, a high heterozygosity rate, or an incorrectly imputed gender were excluded. Calculated genome-wide average identity by state (IBS) between each pair of individuals was used to identify individuals who appeared to be in relationships with first-degree relatives or in relationships with more distant relatives whose clusters were tightly linked to the first-degree relatives. Pairwise IBS between individuals was calculated by using a subset of pruned markers (74,965 SNPs) that are in approximate linkage equilibrium. IBS analysis was performed by using the PLINK software package. Multidimensional scaling analyses based on pairwise IBS showed that, apart from some outliers, all subjects clustered closely with HapMap Asians (Figure S2 of Additional file 3). Subjects with a cancer history and patients with diagnosed benign breast cancer were subsequently excluded. Finally, 4,325 individuals remained in the association analyses (Table S1 of Additional file 2).

To ensure quality data for SNPs, SNPs were excluded if they met any of the following QC criteria: (a) deviation from the Hardy-Weinberg equilibrium P value of less than 10-6, (b) a genotype call rate of less than 95%, (c) a minor allele frequency (MAF) of less than 1%, (d) a poor cluster plot, (e) filtering out differential missingness between cases and controls (P < 10-4), and (f) multiple positioning or mitochondrial SNPs or both. In total, 555,525 Affymetrix SNP Array 6.0 SNPs were used for the final association analyses (Table S1 of Additional file 2).

Genotyping for stages II and III was performed by using the 5' exonucleaseassay (TaqMan) employing the ABI Prism 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA) in accordance with the instructions of the manufacturer. Primers and probes were supplied directly by Applied Biosystems (Foster City, CA, USA) as Assays-By-Design. For QC, about 2.2% of the samples were genotyped repeatedly. Only those SNPs that satisfied a concordance rate of greater than 99% in duplicates and a genotype success rate of greater than 99% were included in the subsequent replication phases.

Single-nucleotide polymorphism selection for validation study using stage I

We selected the SNPs if they were implicated in previous GWASs and reported from the National Human Genome Research Institute catalog [16]. We included SNPs only if they had an assigned reference allele, a defined MAF, and an estimated odds ratio (OR) or a beta-coefficient and a 95% confidence interval (CI). For the SNPs that were not genotyped by using Affymetrix SNP Array 6.0 or successfully imputed (imputation QC r2 was less than 0.3), we selected the best tagging SNPs on the basis of the LD metrics (r2 and D'). Thus, we selected rs10736303 for the SNPs at 10q21.13 (rs1219648, rs2981572, rs2981585, and rs2981579) and rs10483813 for 14q24.1/RAD51L1 rs999737. We did not include the SNP rs614367 at 11q13.3/MYEOV, CCND1, ORAOV1, FGF19, FGF4, FGF3 since the MAF of rs614367 is very low in this study (MAF = 0.002) [9].

Single-nucleotide polymorphism selection for replication in stage II

After stage I, SNPs for replication were selected on the basis of the following criteria among the 555,525 SNPs that were directly genotyped and that had passed the QC procedure: SNPs (a) with an MAF of more than 5% for either cases or controls, (b) with very clear genotyping clusters, (c) with a Ptrend of per-allele OR of not more than 5 × 10-4, and (d) not in strong LD (r2 < 0.5) with any of the GWAS-identified risk variants. Additionally, to select the SNPs resided in SNP cluster we selected the loci within which at least two SNPs had a Ptrend of per-allele OR of not more than 5 × 10-3. When multiple SNPs showed LD within 100 kb (r2 > 0.2), the SNP with the lowest Ptrend was selected for replication.

Single-nucleotide polymorphism imputation

To infer the genotype of SNPs that were not observed in the Affymetrix Genome-Wide Human SNP Array 6.0 used in the present study, SNP imputation was carried out by using the hidden Markov model as implemented in MACH 1.0 [17, 18]. Imputation was based on 555,525 autosomal SNPs that were genotyped in stage I and that had passed the QC procedure, and the phased CHB + JPT data from HapMap Phase II (release 22) were used as the reference panel, which consisted of over 2.4 million SNPs. In total, 2,210,823 SNPs showed an imputation quality score (r2) of at least 0.30. The average r2 of SNPs not found on the array but included in the validation study (15 SNPs in Table 1) is 0.97.

Table 1 Association of previously identified loci with breast cancer risk in 2,257 cases and 2,052 controls in the Seoul Breast Cancer Study


The LD metrics (r2 and D') between SNPs were calculated by using Haploview version 4.2 software (Whitehead Institute, Cambridge, MA, USA) based on release 27, NCBI Build 36. Regional plots were drawn by using LocusZoom standalone version 1.1 [19] based on HapMap Phase II JPT+CHB for all SNPs in Figure S3 of Additional file 3 except for rs10736303, which was based on 1000 Genome August 2009 JTP+CHB.

Statistical analyses

Association on breast cancer risk was estimated by ORs and 95% CIs while assuming an additive model by logistic regression analysis adjusted for age. GWAS stage (stage 1) analyses were conducted primarily by using PLINK program version 1.06 [20] for directly genotyped SNPs, and the MACH2dat program [17, 18] was used for imputed SNPs. For the comparison of the observed and expected distribution of test statistics, quantile-quantile analysis of 2-degrees-of-freedom logistic regression statistics was applied. The genomic inflation factor lamda (λ) was calculated as 1.043, suggesting that the population substructure, if any, should not have a substantial effect on result (Figure S1 of Additional file 3).

To control the risk of false discoveries, the multiple-comparison-adjusted P values for stage II, stage III, and combined analysis were calculated by the Benjamini-Hochberg false discovery rate method [21]. Cochran's Q statistic to test for heterogeneity and the I2 statistic to quantify the proportion of the total variation due to heterogeneity across all stages were calculated. Combined statistics were estimated from meta-analysis while assuming a fixed-effects model since there was no evidence of heterogeneity. To assess the relationship according to estrogen receptor (ER) or progesterone receptor (PR) status or both, ORs and 95% CIs were estimated after stratification by hormone receptor status. Case-only P value was used to test for heterogeneity and was estimated by using a polytomous logistic regression model with receptor status as the outcome variables.

The statistical power of detecting the ORs reported in previous GWASs was calculated by using Quanto version 1.2.4. All other analyses other than GWAS stage were done by using SAS version 9.2 (SAS Institute Inc., Cary, NC, USA) and STATA version 11.2 (StataCorp LP, College Station, TX, USA).


Validation of previously identified breast cancer susceptibility loci in Korean women

We found that multiple genomic locations were potentially associated with the risk of breast cancer (Figure 1). Table 1 presents the test of associations between the previously reported loci and breast cancer risk in 2,257 cases and 2,052 controls. Among the 27 SNPs reported in published GWASs, 10 SNPs located in seven loci showed significant associations with breast cancer risk (Ptrend < 0.05). Plots of regional LD and strength of association by chromosomal position for these seven loci of interest are shown in Figure S3 of Additional file 3.

Figure 1

Manhattan plot for 555,525 genotyped single-nucleotide polymorphisms in 2,273 cases and 2,052 controls.

The strongest and most significant association was observed in 6q25.1/ESR1 rs2046210 (OR = 1.29; 95% CI = 1.18 to 1.41; Ptrend = 5.84 × 10-8) followed by 16q12.1/TOX3 rs4784227 (OR = 1.27; 95% CI = 1.15 to 1.40; Ptrend = 1.51 × 10-6), and these values are slightly similar to the magnitude and direction of previous reports. The SNPs 6q25.1/ESR1 rs3734802 and 16q12.1/TOX3 rs3803662, which had a moderate LD with rs2046210 (r2 = 0.317 in CHB+JPT) and rs4784227 (r2 = 0.139 in CHB+JPT), respectively, were also significantly associated with breast cancer risk (OR = 1.20; 95% CI = 1.09 to 1.33; Ptrend = 1.8 × 10-4 and OR = 1.24; 95% CI = 1.14 to 1.36; Ptrend = 2.41 × 10-6, respectively). Although allele frequencies of 5q11.2/MAP3K1 rs889312 (like 16q12.1/TOX3 rs3803662) were substantially different between Europeans and Koreans, the rs889312 C allele was also significantly associated with increased risk of breast cancer (OR = 1.16; 95% CI = 1.06 to 1.27; Ptrend = 8.49 × 10-4). The 10q26.13/FGFR2 rs10736303 (proxy of rs2981579), 5p15.2/ROPN1L rs1092913, 5q12/MRPS30 rs7716600, and 8q24.21 rs1562430 were also confirmed to be associated with breast cancer risk, although the magnitude of the last three SNPs was smaller than that of previous reports. However, 14q24.1/RAD51L1 rs10483813 (proxy of rs999737) showed no significant association with an OR of 1.21. Additionally, none of the remaining 17 SNPs was replicated in our study (P > 0.05), and the effect sizes were estimated to be less than 1.10, which is lower than that of the estimated ones in women of European descent.

We further assessed these associations according to ER and PR status (Table S3 of Additional file 2). The 5q12/MRPS30 rs7716600 and rs4415084 exhibited a stronger association with ER+ than with ER- tumors (Pheterogeneity = 0.02 and Pheterogeneity = 0.05, respectively). Although no overall associations were found for 2q35 rs13387042 and 10p15.1/ANKRD16, FBXO18 rs2380205, stratified analysis revealed a statistically significant association with ER+ (Ptrend = 0.03 and Ptrend = 0.04) or PR+ (Ptrend = 0.05 and Ptrend = 0.05) tumors but not with ER- or PR- tumors. The test for heterogeneity was significant only for 10p15.1/ANKRD16, FBXO18 rs2380205 (Pheterogeneity ER+ vs. ER- = 0.030 and Pheterogeneity PR+ vs. PR- = 0.040). Additionally, two SNPs at 6q25.1/ESR1 (rs2046210 and rs3784805) exhibited a stronger association with ER- than with ER+ tumors, although the differences were not statistically significant. There were no differences in associations by ER or PR status for the remaining SNPs.

Analysis of 2q34 rs13393577 and breast cancer risk

To search for additional independent genetic risk variants in Korean women, we selected the most suggestive 17 SNPs from stage I and genotyped in an independent set of 2,052 cases and 2,169 controls (stage II). Among the 17 SNPs evaluated in stage II, only one SNP, rs13393577 at 2q34, was significantly associated with breast cancer risk. The estimated ORs for rs13393577 were as follows: ORheterozygote = 1.39 (95% CI = 1.16 to 1.67; P = 2.05 × 10-5), ORhomozygote = 5.57 (95% CI = 2.26 to 13.7; P = 4.10 × 10-3), and ORper-allele = 1.51 (95% CI = 1.29 to 1.78; Ptrend = 1.1 × 10-5). SNP rs13393577 and three more SNPs (3q26.32 rs3806685, 6q25.1 rs9498283, and 17q24.3 rs11077488) showing marginally significant associations in stage II (Ptrend < 0.10) were further evaluated in stage III, which included 1,997 cases and 1,676 controls. Again, rs13393577 showed a significant association with breast cancer risk (Table 2). The Ptrend reached 8.8 × 10-14 in the combined analysis (Figure 2). The SNP rs3806685 at 3q26.32 was also associated with breast cancer risk with an ORper-allele of 1.18 (95% CI = 1.04 to 1.34; Ptrend = 1.8 × 10-2) in stage III; however, the combined OR was not significant (Ptrend = 5.9 × 10-1). The other two SNPs showed no significant association in stage III or in the combined analysis (P > 0.05). The test for heterogeneity suggested no difference in the genetic effects across ER or PR status for rs13393577 (data not shown). To capture additional signals for rs13393577, we investigated the SNPs nearby rs13393577 (Figure 3). Among the directly genotyped SNPs, two SNPs are in strong LD (r2 > 0.8) with rs13393577, and the smallest P values were 1.2 × 10-3 for rs6756468 (r2 = 1.00) and 2.6 × 10-2 for rs16848753 (r2 = 0.81). SNP rs6756468 is in tight LD with rs13393577 located in the LD block containing the mir-548f-2 gene. Several imputed SNPs in high LD with rs13393577 were nominally associated with breast cancer risk (P < 0.05) (Table S5 of Additional file 2).

Table 2 Results of four single-nucleotide polymorphisms and breast cancer risk identified in genome-wide association studies (stage I) and replication stages (stages II and III)
Figure 2

Forest plot. Result of pooled analysis of rs13393577 on the basis of estimated per-allele odds ratio from each stage. CI, confidence interval; ES, effect size.

Figure 3

Regional association plot of the 2q34 (rs13393577) locus. The results of association signals (-log P) are shown for directly genotyped (diamonds) and imputed (triangle) single-nucleotide polymorphisms (SNPs) distributed in a genomic region 500 kb to either side of rs1339577. Red reflects the linkage disequilibrium (r2) with the top SNP, and increasing red hue is associated with increasing r2. The blue bars show the recombination rate based on HapMap phase II release 22 JPT and CHB populations. The bottom panels illustrate the locations of known genes. The genomic position is based on the UCSC (University of California at Santa Cruz) Genome Browser assembly, March 2010. CHB, Han Chinese from Beijing; JPT, Japanese from Tokyo.


In the present study, we conducted a three-stage GWAS in Korean women (6,322 cases and 5,897 controls). We not only confirmed previously identified loci in Europeans or Chinese populations or both but also found rs13393577 at 2q34/ERBB4 as a new breast cancer susceptibility variant in Korean women.

In the validation study, we evaluated whether 27 SNPs in the 20 GWAS-identified loci were also relevant in our population using stage I and identified that 10 SNPs at seven loci were significantly associated with breast cancer risk. As anticipated, the strongest and the most significant results were observed in rs2046210 at 6q25.1/ESR1 and rs4784227 at 16q12.1/TOX3, and these results are slightly similar to those of the magnitude and direction of previous reports conducted in a Chinese population [10, 11]. For the SNPs rs2048671 at 7q32.3/NR and rs10822013 at 10q21.2/ZNF365, which were also identified in Asians, we recently reported significant associations with breast cancer risk through multi-stage GWAS with a cumulative sample size up to over 34,000 East Asian subjects (OR per-allele = 1.10; 95% CI = 1.07 to 1.14; Ptrend = 5.87 × 10-9 and OR = 1.08; 95% CI = 1.04 to 1.11; Ptrend = 6.21 × 10-6) [12]. However, the associations of these SNPs were not significant in this study, possibly because of its limited power.

Among the remaining 23 SNPs that were initially identified in Europeans, eight SNPs - rs10736303 (10q26.13/FGFR2, proxy of rs2981579), rs3803662 (16q12.1/TOX3), rs7716600 (5p12/MRPS30), rs16886165 and rs889312 (5q11.2/MAP3K1), rs3734805 (6q25.1/ESR1), and rs1562430 (8q24.21) - showed significant associations in the same direction except for rs1092913 (5p15.2/ROPN1L) with the G allele as the risk allele. The effect sizes of the confirmed variants were similar to or smaller than those of the initially identified ones. This phenomenon has been frequently observed in validation studies using ethnic populations different from the population used for the initial findings [22, 23].

In addition, we could not evaluate the SNPs (rs1219648, rs2981572, rs2981585, and rs2981579) previously identified within intron 2 of FGFR2 at 10q21.13, because they were not genotyped or successfully imputed (imputation QC r2 < 0.3). Thus, we selected rs10736303 as the best tagging SNP capturing 10q26.13/FGFR2 since it is in high LD with the reported SNPs with pairwise r2 values of 0.67 for rs2981579 (r2 = 0.48 in CHB+JTP; r2 = 0.74 in CEU), 0.57 for rs2981575 (r2 = 0.36 in CHB+JPT; r2 = 0.72 in CEU), and 0.53 for rs1219648 (r2 = 0.29 in CHB+JPT; r2 = 0.72 in CEU) on the basis of our data. Furthermore, the rs10736393 is located at intron 2 of FGFR2 within the sequences conserved across all placental mammals and suggested to be a functional variant to regulate FGFR expression by generating a putative ER-binding site [5]. In the present study, the rs10736393 G allele was significantly associated with increased breast cancer risk with an effect size of rs2981579 that was the same as in a previous report [8]. However, the recently added SNP, rs10510102, located in the 300-kb telomeric region of intron 2 of FGFR2 but not with a genome-wide significance level (P = 1.6 × 10-6), was not replicated in the present study [24].

Subgroup analysis revealed that some of the validated associations differed by ER or PR status. Recent studies showed stronger associations with ER+ than with ER- tumors for several loci - rs13387042(2q35), rs4973768 (3p24), rs889312 (5q11.2/MAP3K1), rs7716600 (5q12/MRP30), rs13281615 (8q24), rs1219648 and rs2981582 (10q26.13/FGFR2), and rs3803662 (16q12) - and with PR+ than with PR- tumors for rs2981582 (10q26.13/FGFR2) [6, 25, 26]. Among these loci, rs13387042 (2q35), rs7716600, and rs4415084 (5q12/MRP30) showed significantly different associations by ER status, and rs4973768 (3p24) also showed a stronger association with ER+ tumors, although the test for heterogeneity was not significant. The association of rs2380205 (10p15.1/ANKRD16, FBXO18) with breast cancer risk was also stronger for ER+ or PR+ tumors than with negative tumors, and this heterogeneity in association remains to be evaluated in other populations.

The stronger association of rs2046210 at 6q25.1/ESR1 with ER- than with ER+ tumors has been well documented [10]. In the present study, we could also observe this heterogeneity for rs2046210 and its nearby SNP rs3784805, although the differences were not statistically significant. Direct replication in some of the loci showing significant differences in associations according to ER or PR status provides further support for the hypothesis that intrinsic subtypes of breast cancer should have different etiologic pathways; thus, the polygenic component of these subtypes of breast cancer should be different [27].

There are several potential reasons for the failure of validation for previously identified loci in women of European descent. First, several risk variants could escape detection because of the limited statistical power caused by either low allele frequency or a very small effect size of the initial findings. There are several SNPs of which the allele frequencies in Koreans are substantially lower than in Europeans: SNP rs11249433 at 1p11.2/NOTCH2, FCGR1B (4% versus 39%), rs1011970 at 9p21.3/CDKN2A, CDKN2B (7% versus 17%), rs865686 at 9q31.2/KLF4, RAD23B, ACTL7A (7% versus 24%), rs10995190 at 10q21.2/ZNF365 (2% versus 15%), and rs10483813, proxy of rs999737, at 14q24.1/RAD51L (3% versus 24%). Thus, we have only 8% to 30% of the statistical power to detect the reported effect sizes of 1.06 to 1.16 for these SNPs with the current sample size. We could not exclude the possibility that the effect size of the original reports could be represented as exaggerated ORs caused by 'winner's curse'. Second, a difference in underlying genomic structure between ethnicities could produce the bias to cover SNPs tagging the causal variants, although the reported SNPs could work effectively in women of European descent. Another possibility is that some of the variants evaluated may not be strongly associated with breast cancer risk in Asian women such as shown in the null association of rs2180341 (6q22.33). For rs3180341, we had a statistical power of 80% to detect an OR as small as 1.15; furthermore, the lack of an association has been shown in a study conducted in a Chinese population [13]. Moreover, the risk profiles of genetic variants could be manifested differently in different ethnic populations, assuming that the relative contribution of the risk variants to carcinogenic pathways of breast cancer varies between different populations. Finally, the interactions of environmental exposures, lifestyle, or other effect modifiers and even the difference in breast cancer prevalence could have an effect on the penetrance of these alleles.

The ERBB4, harboring rs13393577 in the first intron at 2q34, is a member of the epidermal growth factor (EGF/ERBB) family of receptor tyrosine kinases, which are key activators of signaling pathways involved in cell division, migration, adhesion, differentiation, and apoptosis [28]. It is reported that ERBB4 is frequently overexpressed in breast cancer, and the expression of transcripts encoding the cleavable ERBB4 isoforms was associated with ER expression and a high histological grade of differentiation [29]. Rokavec and colleagues [30] identified the presence of five germ-line variants in the ERBB4 5'-untranslated region and reported that one of these variants (ERBB4 -782T > G) was associated with breast cancer risk from the different promoter activity according to the different allele. However, rs13393577 is not in LD with ERBB4 -782T > G; thus, the potential influence of rs13393577 is unlikely to be mediated through this previously reported variant.

We conducted an in silico functional analysis to assess the potential biological function of rs13393577. The rs13393577 C allele had no predicted binding site, whereas several transcription factors were predicted to bind the rs13393577 T allele implementing six high-scoring binding sites (maximum score = 92.7 points; minimum score = 85.9 points) [31]. In agreement with this, FASTSNP scored rs13393577 as 1-2 (intronic enhancer) [32]. Additionally, Murabito and colleagues [33] have shown that three SNPs in ERBB4 (rs905883, rs7564590, and rs7558615) were associated with breast cancer risk in a family-based GWAS that included 58 breast cancer cases, although no association was attained with genome-wide significance level. Among these variants, rs7564590 is in moderate LD with rs13393577 (r2 = 0.44 in CHB+JPT and r2 = 0.25 in CEU) whereas the other two SNPs (rs905883 and rs755861515) are in very weak LD with rs13393577 (all r2 < 0.02 in CHB+JPT and CEU). Thus, if both rs13393577 and rs7564590 are not themselves functional, they might be in high LD with the true causal variants. Additionally, we could not exclude the possibility that the strong association shown in rs13393577 is related to the function of the mir-548f-2 gene harboring SNP rs6956468, which is in tight LD with rs13393577.


In summary, we have confirmed 10 SNPs in seven loci of breast cancer risk which were initially identified in European or Chinese populations or both and provided additional evidence confirming the heterogeneity in the risk of different tumor subtypes for common breast cancer susceptibility variants. Moreover, we identified rs13393577 in ERBB4 located at 2q34 as a new breast cancer susceptibility variant. Future studies, including fine mapping, functional assay, and a replication study with large sample sizes from diverse ethnic populations, are needed to validate our results.



CEPH Utah residents with ancestry from Northern and Western Europe


Han Chinese from Beijing + Japanese from Tokyo


confidence interval


estrogen receptor


epidermal growth factor receptor 4


genome-wide association study


identity by state


Korea Genome Epidemiology Study


linkage disequilibrium


minor allele frequency


odds ratio


progesterone receptor


quality control


Seoul Breast Cancer Study


single-nucleotide polymorphism.


  1. 1.

    Balmain A, Gray J, Ponder B: The genetics and genomics of cancer. Nat Genet. 2003, 33: 238-244. 10.1038/ng1107.

  2. 2.

    Nathanson KL, Wooster R, Weber BL: Breast cancer genetics: what we know and what we need. Nat Med. 2001, 7: 552-556. 10.1038/87876.

  3. 3.

    Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. 2009, 106: 9362-9367. 10.1073/pnas.0903103106.

  4. 4.

    Ahmed S, Thomas G, Ghoussaini M, Healey CS, Humphreys MK, Platte R, Morrison J, Maranian M, Pooley KA, Luben R, Eccles D, Evans DG, Fletcher O, Johnson N, dos Santos Silva I, Peto J, Stratton MR, Rahman N, Jacobs K, Prentice R, Anderson GL, Rajkovic A, Curb JD, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, et al: Newly discovered breast cancer susceptibility loci on 3p24 and 17q23.2. Nat Genet. 2009, 41: 585-590. 10.1038/ng.354.

  5. 5.

    Easton DF, Pooley KA, Dunning AM, Pharoah PD, Thompson D, Ballinger DG, Struewing JP, Morrison J, Field H, Luben R, Wareham N, Ahmed S, Healey CS, Bowman R, Meyer KB, Haiman CA, Kolonel LK, Henderson BE, Le Marchand L, Brennan P, Sangrajrang S, Gaborieau V, Odefrey F, Shen CY, Wu PE, Wang HC, Eccles D, Evans DG, Peto J, Fletcher O, et al: Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007, 447: 1087-1093. 10.1038/nature05887.

  6. 6.

    Stacey SN, Manolescu A, Sulem P, Rafnar T, Gudmundsson J, Gudjonsson SA, Masson G, Jakobsdottir M, Thorlacius S, Helgason A, Aben KK, Strobbe LJ, Albers-Akkers MT, Swinkels DW, Henderson BE, Kolonel LN, Le Marchand L, Millastre E, Andres R, Godino J, Garcia-Prats MD, Polo E, Tres A, Mouy M, Saemundsdottir J, Backman VM, Gudmundsson L, Kristjansson K, Bergthorsson JT, Kostic J, et al: Common variants on chromosomes 2q35 and 16q12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet. 2007, 39: 865-869. 10.1038/ng2064.

  7. 7.

    Stacey SN, Manolescu A, Sulem P, Thorlacius S, Gudjonsson SA, Jonsson GF, Jakobsdottir M, Bergthorsson JT, Gudmundsson J, Aben KK, Strobbe LJ, Swinkels DW, van Engelenburg KC, Henderson BE, Kolonel LN, Le Marchand L, Millastre E, Andres R, Saez B, Lambea J, Godino J, Polo E, Tres A, Picelli S, Rantala J, Margolin S, Jonsson T, Sigurdsson H, Jonsdottir T, Hrafnkelsson J, et al: Common variants on chromosome 5p12 confer susceptibility to estrogen receptor-positive breast cancer. Nat Genet. 2008, 40: 703-706. 10.1038/ng.131.

  8. 8.

    Thomas G, Jacobs KB, Kraft P, Yeager M, Wacholder S, Cox DG, Hankinson SE, Hutchinson A, Wang Z, Yu K, Chatterjee N, Garcia-Closas M, Gonzalez-Bosquet J, Prokunina-Olsson L, Orr N, Willett WC, Colditz GA, Ziegler RG, Berg CD, Buys SS, McCarty CA, Feigelson HS, Calle EE, Thun MJ, Diver R, Prentice R, Jackson R, Kooperberg C, Chlebowski R, Lissowska J, et al: A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1). Nat Genet. 2009, 41: 579-584. 10.1038/ng.353.

  9. 9.

    Turnbull C, Ahmed S, Morrison J, Pernet D, Renwick A, Maranian M, Seal S, Ghoussaini M, Hines S, Healey CS, Hughes D, Warren-Perry M, Tapper W, Eccles D, Evans DG, Hooning M, Schutte M, van den Ouweland A, Houlston R, Ross G, Langford C, Pharoah PD, Stratton MR, Dunning AM, Rahman N, Easton DF: Genome-wide association study identifies five new breast cancer susceptibility loci. Nat Genet. 2010, 42: 504-507. 10.1038/ng.586.

  10. 10.

    Zheng W, Long J, Gao YT, Li C, Zheng Y, Xiang YB, Wen W, Levy S, Deming SL, Haines JL, Gu K, Fair AM, Cai Q, Lu W, Shu XO: Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat Genet. 2009, 41: 324-328. 10.1038/ng.318.

  11. 11.

    Long JR, Cai QY, Shu XO, Qu SM, Li C, Zheng Y, Gu K, Wang WJ, Xiang YB, Cheng JR, Chen KX, Zhang LN, Zheng H, Shen CY, Huang CS, Hou MF, Shen HB, Hu ZB, Wang FR, Deming SL, Kelley MC, Shrubsole MJ, Khoo US, Chan KYK, Chan SY, Haiman CA, Henderson BE, Le Marchand L, Iwasaki M, Kasuga Y, et al: Identification of a functional genetic variant at 16q12.1 for breast cancer risk: results from the Asia Breast Cancer Consortium. PLoS Genet. 2010, 6: e1001002-10.1371/journal.pgen.1001002.

  12. 12.

    Cai Q, Long J, Lu W, Qu S, Wen W, Kang D, Lee JY, Chen K, Shen H, Shen CY, Sung H, Matsuo K, Haiman CA, Khoo US, Ren Z, Iwasaki M, Gu K, Xiang YB, Choi JY, Park SK, Zhang L, Hu Z, Wu PE, Noh DY, Tajima K, Henderson BE, Chan KY, Su F, Kasuga Y, Wang W, et al: Genome-wide association study identifies breast cancer risk variant at 10q21.2: results from the Asia Breast Cancer Consortium. Hum Mol Genet. 2011, 20: 4991-4999. 10.1093/hmg/ddr405.

  13. 13.

    Long J, Shu XO, Cai Q, Gao YT, Zheng Y, Li G, Li C, Gu K, Wen W, Xiang YB, Lu W, Zheng W: Evaluation of breast cancer susceptibility loci in Chinese women. Cancer Epidemiol Biomarkers Prev. 2010, 19: 2357-2365. 10.1158/1055-9965.EPI-10-0054.

  14. 14.

    Hutter CM, Young AM, Ochs-Balcom HM, Carty CL, Wang T, Chen CTL, Rohan TE, Kooperberg C, Peters U: Replication of breast cancer GWAS susceptibility loci in the Women's Health Initiative African American SHARe Study. Cancer Epidemiol Biomarkers Prev. 2011, 20: 1950-1959. 10.1158/1055-9965.EPI-11-0524.

  15. 15.

    Birdseed V2 algorithm. []

  16. 16.

    The National Human Genome Research Institute catalog. []

  17. 17.

    Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010, 34: 816-834. 10.1002/gepi.20533.

  18. 18.

    Li Y, Willer C, Sanna S, Abecasis G: Genotype imputation. Annu Rev Genomics Hum Genet. 2009, 10: 387-406. 10.1146/annurev.genom.9.081307.164242.

  19. 19.

    Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ: LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010, 26: 2336-2337. 10.1093/bioinformatics/btq419.

  20. 20.

    PLINK. []

  21. 21.

    Shaffer JP: Controlling the false discovery rate with constraints: the Newman-Keuls test revisited. Biom J. 2007, 49: 136-143. 10.1002/bimj.200610297.

  22. 22.

    Zhang B, Beeghly-Fadiel A, Long JR, Zheng W: Genetic variants associated with breast-cancer risk: comprehensive research synopsis, meta-analysis, and epidemiological evidence. Lancet Oncol. 2011, 12: 477-488. 10.1016/S1470-2045(11)70076-6.

  23. 23.

    Chang BL, Spangler E, Gallagher S, Haiman CA, Henderson B, Isaacs W, Benford ML, Kidd LR, Cooney K, Strom S, Ingles SA, Stern MC, Corral R, Joshi AD, Xu JF, Giri VN, Rybicki B, Neslund-Dudas C, Kibel AS, Thompson IM, Leach RJ, Ostrander EA, Stanford JL, Witte J, Casey G, Eeles R, Hsing AW, Chanock S, Hu JJ, John EM, et al: Validation of genome-wide prostate cancer associations in men of African descent. Cancer Epidem Biomar. 2011, 20: 23-32. 10.1158/1055-9965.EPI-10-0698.

  24. 24.

    Fletcher O, Johnson N, Orr N, Hosking FJ, Gibson LJ, Walker K, Zelenika D, Gut I, Heath S, Palles C, Coupland B, Broderick P, Schoemaker M, Jones M, Williamson J, Chilcott-Burns S, Tomczyk K, Simpson G, Jacobs KB, Chanock SJ, Hunter DJ, Tomlinson IP, Swerdlow A, Ashworth A, Ross G, Silva ID, Lathrop M, Houlston RS, Peto J: Novel breast cancer susceptibility locus at 9q31.2: results of a genome-wide association study. J Natl Cancer Inst. 2011, 103: 425-435. 10.1093/jnci/djq563.

  25. 25.

    Garcia-Closas M, Hall P, Nevanlinna H, Pooley K, Morrison J, Richesson DA, Bojesen SE, Nordestgaard BG, Axelsson CK, Arias JI, Milne RL, Ribas G, Gonzalez-Neira A, Benitez J, Zamora P, Brauch H, Justenhoven C, Hamann U, Ko YD, Bruening T, Haas S, Dork T, Schurmann P, Hillemanns P, Bogdanova N, Bremer M, Karstens JH, Fagerholm R, Aaltonen K, Aittomaki K, et al: Heterogeneity of breast cancer associations with five susceptibility loci by clinical and pathological characteristics. Plos Genet. 2008, 4: e1000054-10.1371/journal.pgen.1000054.

  26. 26.

    Broeks A, Schmidt MK, Sherman ME, Couch FJ, Hopper JL, Dite GS, Apicella C, Smith LD, Hammet F, Southey MC, Van 't Veer LJ, de Groot R, Smit VT, Fasching PA, Beckmann MW, Jud S, Ekici AB, Hartmann A, Hein A, Schulz-Wendtland R, Burwinkel B, Marme F, Schneeweiss A, Sinn HP, Sohn C, Tchatchou S, Bojesen SE, Nordestgaard BG, Flyger H, Orsted DD, et al: Low penetrance breast cancer susceptibility loci are associated with specific breast tumor subtypes: findings from the Breast Cancer Association Consortium. Hum Mol Genet. 2011, 20: 3289-3303. 10.1093/hmg/ddr228.

  27. 27.

    Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lonning PE, Borresen-Dale AL: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98: 10869-10874. 10.1073/pnas.191367098.

  28. 28.

    Yarden Y, Sliwkowski MX: Untangling the ErbB signalling network. Nat Rev Mol Cell Bio. 2001, 2: 127-137. 10.1038/35052073.

  29. 29.

    Junttila TT, Sundvall M, Lundin M, Lundin J, Tanner M, Harkonen P, Joensuu H, Isola J, Elenius K: Cleavable ErbB4 isoform in estrogen receptor-regulated growth of breast cancer cells. Cancer Res. 2005, 65: 1384-1393. 10.1158/0008-5472.CAN-04-3150.

  30. 30.

    Rokavec M, Justenhoven C, Schroth W, Istrate MA, Haas S, Fischer HP, Vollmert C, Illig T, Hamann U, Ko YD, Glavac D, Brauch H: A novel polymorphism in the promoter region of ERBB4 is associated with breast and colorectal cancer risk. Clin Cancer Res. 2007, 13: 7506-7514. 10.1158/1078-0432.CCR-07-0457.

  31. 31.

    Heinemeyer T, Wingender E, Reuter I, Hermjakob H, Kel AE, Kel OV, Ignatieva EV, Ananko EA, Podkolodnaya OA, Kolpakov FA, Podkolodny NL, Kolchanov NA: Databases on transcriptional regulation: TRANSFAC, TRRD and COMPEL. Nucleic Acids Res. 1998, 26: 362-367. 10.1093/nar/26.1.362.

  32. 32.

    Yuan HY, Chiou JJ, Tseng WH, Liu CH, Liu CK, Lin YJ, Wang HH, Yao A, Chen YT, Hsu CN: FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization. Nucleic Acids Res. 2006, 34: W635-W641. 10.1093/nar/gkl236.

  33. 33.

    Murabito JM, Rosenberg CL, Finger D, Kreger BE, Levy D, Splansky GL, Antman K, Hwang SJ: A genome-wide association study of breast and prostate cancer in the NHLBI's Framingham Heart Study. BMC Med Genet. 2007, 8 (Suppl 1): S6-10.1186/1471-2350-8-S1-S6.

  34. 34.

    Li J, Humphreys K, Heikkinen T, Aittomaki K, Blomqvist C, Pharoah PD, Dunning AM, Ahmed S, Hooning MJ, Martens JW, van den Ouweland AM, Alfredsson L, Palotie A, Peltonen-Palotie L, Irwanto A, Low HQ, Teoh GH, Thalamuthu A, Easton DF, Nevanlinna H, Liu J, Czene K, Hall P: A combined analysis of genome-wide association studies in breast cancer. Breast Cancer Res Treat. 2011, 126: 717-727. 10.1007/s10549-010-1172-9.

  35. 35.

    Sehrawat B, Sridharan M, Ghosh S, Robson P, Cass CE, Mackey JR, Greiner R, Damaraju S: Potential novel candidate polymorphisms identified in genome-wide association study for breast cancer susceptibility. Hum Genet. 2011, 130: 529-537. 10.1007/s00439-011-0973-1.

  36. 36.

    Gold B, Kirchhoff T, Stefanov S, Lautenberger J, Viale A, Garber J, Friedman E, Narod S, Olshen AB, Gregersen P, Kosarin K, Olsh A, Bergeron J, Ellis NA, Klein RJ, Clark AG, Norton L, Dean M, Boyd J, Offit K: Genome-wide association study provides evidence for a breast cancer risk locus at 6q22.33. Proc Natl Acad Sci USA. 2008, 105: 4340-4345. 10.1073/pnas.0800441105.

Download references


We thank all participants and investigators of the SeBCS, the Korean Hereditary Breast Cancer (KOHBRA) study, and the KoGES. This work was supported by a grant from the Ministry for Health, Welfare and Family Affairs, Republic of Korea (#2011-N73007-00) and by a grant from the BRL (Basic Research Laboratory) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (#2012-0000347). The KOHBRA study was supported by a grant from the National R&D Program for Cancer Control, Ministry for Health, Welfare and Family Affairs, Republic of Korea (#1020350).

Author information

Correspondence to Bok-Ghee Han or Daehee Kang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

DK helped to conceive and design the experiments and to write the paper. B-GH helped to conceive and design the experiments. JHO, D-JK, MP, E-hK, and W-YP helped to perform the experiments. H-cK and J-YC helped to write the paper. Ji-YL and HS helped to write the paper, to manage the genotyping data, and to perform statistical analyses. Jong-YL coordinated the genetic study. YJK helped to manage the genotyping data and to perform statistical analyses. MJG helped to manage the genotyping data. J-YC and SKP helped to perform statistical analyses and to direct the studies that contributed data or biological collection of original studies. K-ML, YSC, HM, HMK, JP, D-YN, S-HA, K-YY, LL, MHL, S-WK, JWL, B-WP, WH, MKK, S-AL, KM, C-YS, P-EW, C-NH, J-WK, J-PL, S-YJ, and H-LK helped to direct the studies that contributed data or biological collection of original studies. All authors read and approved the final manuscript.

Hyung-cheol Kim, Ji-Young Lee contributed equally to this work.

Electronic supplementary material

Additional file 1: Supplementary Methods. Description of study participants. Genotyping and quality control procedures. (DOC 77 KB)

Additional file 2: Supplementary Tables. Supplementary Table 1. The results and procedure of quality control (QC) in subjects and SNPs on the GWA scan. Supplementary Table 2. Summarized characteristics of study participants and number of SNPs analyzed in each stage. Supplementary Table 3. Per-allele OR and 95% CI for the association of SNPs previously identified and breast cancer risk by ER and PR status in SeBCS. Supplementary Table 4. Summarized results of 17 SNPs included in the Stage II. Supplementary Table 5. The association between the SNPs in flanking region of rs13393577 and breast cancer risk. (DOC 166 KB)

Additional file 3: Supplementary Figures. Supplementary Figure 1. Quantile-quantile (QQ) plot of p-values for trend tests of 555,525 SNPs in 2,273 cases and 2,052 controls. Genomic control inflation factor (λ) = 1.043. Supplementary Figure 2. Plot of the first two dimensions from a multidimensional scaling (MDS) analysis based on pairwise identity-by-state (IBS). Gray: Case population in this study, Black: Control population in this study, Blue: HapMap Chinese, Red: HapMap Japanese, Green: HapMap CEU, Purple: HapMap YRI base on the HapMap phase 3. Supplementary Figure 3. Regional plots of the -log P-values for 7 SNPs at replicated loci. Results (-log P) are shown for the association of directly genotyped and imputed SNPs for a 1 Mb region centered on SNP reported in previous GWAS (diamond). Additional nearby SNP is represented as square. (DOC 1004 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Kim, H., Lee, J., Sung, H. et al. A genome-wide association study identifies a breast cancer risk variant in ERBB4 at 2q34: results from the Seoul Breast Cancer Study. Breast Cancer Res 14, R56 (2012).

Download citation


  • Breast Cancer
  • Estrogen Receptor
  • Breast Cancer Risk
  • Progesterone Receptor Status
  • Korean Woman