Significant overlap between human genome-wide association-study nominated breast cancer risk alleles and rat mammary cancer susceptibility loci

Introduction Human population-based genome-wide association (GWA) studies identify low penetrance breast cancer risk alleles; however, GWA studies alone do not definitively determine causative genes or mechanisms. Stringent genome- wide statistical significance level requirements, set to avoid false-positive associations, yield many false-negative associations. Laboratory rats (Rattus norvegicus) are useful to study many aspects of breast cancer, including genetic susceptibility. Several rat mammary cancer associated loci have been identified using genetic linkage and congenic strain based-approaches. Here, we sought to determine the amount of overlap between GWA study nominated human breast and rat mammary cancer susceptibility loci. Methods We queried published GWA studies to identify two groups of SNPs, one that reached genome-wide significance and one comprised of SNPs failing a validation step and not reaching genome- wide significance. Human genome locations of these SNPs were compared to known rat mammary carcinoma susceptibility loci to determine if risk alleles existed in both species. Rat genome regions not known to associate with mammary cancer risk were randomly selected as control regions. Results Significantly more human breast cancer risk GWA study nominated SNPs mapped at orthologs of rat mammary cancer loci than to regions not known to contain rat mammary cancer loci. The rat genome was useful to predict associations that had met human genome-wide significance criteria and weaker associations that had not. Conclusions Integration of human and rat comparative genomics may be useful to parse out false-negative associations in GWA studies of breast cancer risk.


Introduction
Breast cancer is a complex disease characterized by environmental, genetic, and epigenetic factors. Due to the complexity of developing this disease a woman's individual risk may vary greatly from population risk estimates. The familial relative risk of developing breast cancer increases with the number of affected relatives, suggesting that there is a strong genetic component associated with this disease [1,2]. High-penetrance breast cancer risk mutations such as those of BRCA1 and BRCA2 have been identified [3,4]. Population frequencies of mutations with high-penetrance toward risk are rare due to their severe effects on individuals and, thus, these mutations account for only a small percentage of population risk. Risk alleles with moderate penetrance and minor allele population frequencies of 0.005 to 0.01 (for example, PALB2) are estimated to account for approximately 3% of risk. Therefore, a majority of population-based breast cancer risk is likely explained by low penetrance alleles with rare to common population frequencies [5].
Genome-wide association (GWA) studies have been used to identify several low penetrance breast cancer risk alleles [6]. Due to a need to control for numerous multiple comparisons made in GWA studies, a Bonferroni correction based P-value cut-off of ≤1 × 10 -7 is typically required for an association to be considered genomewide significant. It has been suggested that this approach is too stringent as it may result in many false negative associations [7]. Furthermore, while GWA studies are unbiased approaches to identify genomic regions associated with breast cancer risk, these epidemiology-based approaches cannot easily determine risk genes or genetically determined mechanisms of susceptibility. Currently, only a small percentage of breast cancer heritability is explained by published studies suggesting that considerable genetic variation associated with breast cancer risk remains to be identified [5,8].
Comparative genetics between rats and humans has also been used to identify breast cancer risk alleles [9]. In general, the laboratory rat is a good experimental organism to model breast cancer. Compared to induced mammary tumors in mice, rats develop mammary carcinomas of ductal origin, which is similar to a majority of human breast cancers. Also, rat mammary tumors are responsive to estrogen, just as are a majority of human breast tumors [10,11]. Most importantly, the laboratory rat is a versatile organism to study breast cancer susceptibility, as experiments can be controlled at genetic and environmental levels. Inbred rat strains exhibit differential susceptibility to chemically induced carcinogenesis using 7,12-dimethylbenz[a]anthracene (DMBA) [10,[12][13][14]. Copenhagen (COP) and Wistar-Kyoto (WKY) rat strains are resistant to DMBA, N-Nitroso-N-methylurea (NMU), and oncogene induced mammary carcinomas, while the Wistar-Furth (WF) rat strain is susceptible.
Previous genetic studies using rats have identified eight Mammary carcinoma susceptibility (Mcs) loci, named . A (WFxCOP) F1 x WF backcross design was used to identify . Copenhagen alleles at Mcs 1-3 are associated with decreased mammary tumor multiplicity, while the Mcs 4 COP allele is associated with increased tumor development [15]. Further analysis of the Mcs1 locus using WF.COP congenic lines, spanning different regions of the quantitative trait locus (QTL), identified three independent loci associated with mammary carcinoma susceptibility, named Mcs 1a-c [17]. Another linkage analysis study using WF and WKY rat strains revealed four additional QTLs associated with mammary carcinoma susceptibility, named Mcs 5-8. Additionally, a modifier of Mcs8, Mcsm1, partially counteracts the resistance phenotype conferred by Mcs8 [16,18]. Further analysis of the Mcs5 locus using WF.WKY congenic rat lines resulted in the identification of four subloci named Mcs5a1,Mcs5a2,Mcs5b and Mcs5c [9,19]. Additional linkage analysis using the SPRD-Cu3 rat strain (DMBA-induced mammary carcinogenesis susceptible) and the resistant WKY rat strain resulted in the identification of three more rat QTLs associated with mammary cancer named Mcstm1, Mcstm2/Mcsta2 and Mcsta1 [20,21]. Several rat genomic regions that associate with mammary cancer susceptibility were identified using beta-estradiol instead of DMBA to induce carcinogenesis. These QTLs were identified using the August Copenhagen Irish (ACI) rat strain, which is susceptible to beta-estradiol carcinogenesis and the COP and Brown Norway (BN) rat strains, which are resistant. These loci are named Estrogen-induced mammary cancer loci or Emca 1-2 and Emca 4-8 [22,23].
Comparative genomics between human breast and rat mammary cancer risk alleles will continue to be warranted, especially if appreciable overlap in genetic susceptibility exists between these species. In this study, genomic locations of human breast cancer risk GWA study-identified polymorphisms were compared to the rat genome to determine if positive associations were more often located at orthologs to rat mammary cancer risk loci than at randomly selected regions not known to be associated with rat mammary cancer susceptibility.

Methods
Converting rat mammary cancer associated loci to human orthologous regions No research animals were used in this work. Previously published information on rat mammary cancer associated loci was used. Human orthologous regions of rat regions that associate with mammary cancer susceptibility listed in Table 1 were determined using the 'In other genomes (convert)' function available at the University of California Santa Cruz (UCSC) genome browser [24]. Rat Nov. 2004 (Baylor 3.4/rn4) and human Feb. 2009 (GRCh37/hg19) genome assemblies were used. If a rat mammary cancer locus split into multiple human orthologous regions, we noted all orthologous regions until they reached less than 1% of the bases and spanned less than 1% of the original rat mammary cancer locus using the UCSC genome browser.

Random rat regions
To determine if human GWA study-identified polymorphisms map to rat mammary cancer loci more frequently than to random regions of the rat genome, we selected rat genome segments that have not shown an association with mammary cancer risk. These rat genomic regions were named 'random rat regions' and are listed in Table 2. We initially focused on fourteen Mcs/ Mcsm regions with an average size of 22,322,710 bps as these are generally smaller in size than other rat mammary cancer associated loci identified. Fourteen random rat genome regions, each 22,322,710 bps in size were used for comparison. Random rat regions were selected by picking a chromosome using a random number generator function of Microsoft Excel. The range of chromosomes entered into the random number generator function was 1 through 21 (rats have 21 chromosomes, including a sex chromosome). The start position for each random rat region was determined using a random    [28] or 130,952,381 bp per chromosome if divided equally across chromosomes. Therefore, values for the rat genome start-position were chosen from 1 to 130,952,381 using a random number generator. Following, 22,322,709 bps were added to each random start position to obtain the desired full length. The 14 random rat genome regions were then entered into the UCSC genome browser, and the human orthologous regions were determined using the 'in other genomes (convert)' function, as described above [24]. Randomly-generated rat genome segments were used as controls if the human orthologous segment did not contain a sequence that was also orthologous to a known rat mammary cancer associated locus. We also verified, using the UCSC genome browser, that human orthologous regions to random rat regions were not located at human centromeric regions, as genetic variation in these chromosomal regions is underrepresented in GWA studies [29,30]. Total sizes and percentages of rat genome covered by rat mammary cancer loci and random genome regions used are in Table 3.

Determining human GWA study nominated polymorphisms
Human breast cancer risk GWA studies considered were published through March 2013. In the first round of analysis we picked GWA studies with a clearly defined study population of subjects of European descent. In the second round of analysis, the defined population was broader and included studies that tested populations of non-European descent. Studies that included non-European descent populations were subdivided into respective populations used. The GWA studies evaluated are listed in Tables 4 and 5. Results from GWA studies used consisted of multiple stages (two to four stages) to evaluate breast cancer risk association. In our analysis, all SNPs that entered the final stage of their respective study were compared in the rat genome. A tested SNP was called either 'associated' if it reached genome wide significance in its respective study or 'potentially associated' if it failed to meet the respective study statistical criterion following the final stage of analysis. Conventionally, a P-value level for an association to be considered statistically significant in a GWA study is 1 × 10 -7 . This stringency is to protect from false-positives due to multiple comparisons on a genome-wide scale. It has been argued that this low P-value requirement results in numerous false negative associations [7]. Therefore, we queried supplemental material of each published GWA study considered and picked polymorphisms that failed the validation stage in the respective study. We also included polymorphisms that did reach genome-wide significance. We considered 533 SNPs from studies that included populations of European descent, and 285 SNPs from studies of non-European descent populations. All SNPs used in this analysis are listed in Additional file 1. Human genomic locations of polymorphisms were found using dbSNP (GRCh37 assembly) [31]. These were compared to locations of the human orthologous regions of rat mammary cancer loci and random rat regions. If a polymorphism mapped to a region of interest, the name, location, odds ratio, 95% confidence interval, and P-value were noted.

Statistics
The number of human polymorphisms that mapped to orthologous regions containing rat mammary cancer loci (observed) was compared to the number of human polymorphisms that mapped to random rat regions (expected) using a chi-square analysis with one degree of freedom. Several rat mammary cancer loci overlap extensively and subsequently several human polymorphisms mapped to multiple rat loci. Currently, it is not known if these overlapping rat mammary cancer loci would fine-map to the same locus or independent loci. For this study, human polymorphisms mapping to overlapping rat mammary cancer susceptibility associated sequences were counted only once. For analysis of associated (passed genome-wide significance level) versus potentially associated (did not pass genome-wide significance level) associations, a logistic regression was performed using SYSTAT 13 statistical software. A threshold of associated or potentially associated was used as the independent variable and outcome was either the SNP mapped to a rat mammary cancer locus or it mapped to a random rat region.

Results
Significantly more breast cancer risk GWA study nominated SNPs are located at orthologs of rat Mcs/Mcsm loci compared to random rat genomic regions We picked 28 GWA studies of breast cancer risk in which well-defined populations were analyzed (Table 4). Physical locations of polymorphisms that failed the final validation step and polymorphisms that reached genome-wide significance were determined using dbSNP [31]. We included SNPs that failed the final validation step of the respective study, because it has been suggested that many true associations are ruled out in a GWA study due to stringent statistical analysis methods [7]. We determined if sequences containing these polymorphisms were located at either a human genome region orthologous to a known rat mammary cancer locus or to a randomly selected region of the rat genome. Our goal was to determine if GWA study-nominated potentially-associated (did not pass final validation) and associated (genome-wide significant) risk polymorphisms, together map more often to human orthologous regions of rat mammary cancer susceptibility loci than to randomly selected rat genome segments of similar size. If yes, it would suggest that human GWA information combined with rat genetic susceptibility information is broadly useful to determine true genetic associations. Overall, rat Mcs/ Mcsm loci are mapped to shorter genomic segments than other rat mammary cancer risk loci; therefore, we first compared overlap between human GWA study nominated breast cancer risk SNPs and rat Mcs/Mcsm loci to overlap of human associated SNPs with randomly selected rat genomic regions not known to contain mammary cancer susceptibility loci ( Figure 1). Human GWA studies were grouped by population of descent for comparison. There was a significant difference between the number of GWA study nominated SNPs mapping to rat Mcs/Mcsm loci compared to random rat regions in studies analyzing populations of European descent (66 SNPs to 51 SNPs respectively, P-value <0.05). Human GWA study identified polymorphisms located at orthologs of rat loci are listed in Additional file 2. This result supports previous studies indicating rat genetic susceptibility is useful to predict and study human breast cancer risk loci. There was no difference in Asian or African-American descent populations. This is likely due to a limited number of published population-based breast cancer risk genetic-association studies using these populations.
Breast cancer risk GWA study nominated polymorphisms map more often to orthologs of all known rat mammary cancer loci than to randomly selected regions Next, we included additional rat mammary cancer susceptibility loci that have been identified, but span large genomic segments. Loci added were Mcstm1, Mcstm2, Mcsta1, Emca1-2 and Emca4-8 [20][21][22][23]. The same random rat genomic regions used previously were used in this analysis to be consistent. Respectively, 179 and 51 GWA study nominated polymorphisms were located in human orthologous regions to rat mammary cancer loci and randomly selected rat regions ( Figure 2A) when studies using populations of European descent were considered. This difference was statistically significant (P <0.01). Note, some rat mammary cancer loci identified in independent studies have long regions of overlap. Consequently, several human GWA study identified polymorphisms mapped to human sequence orthologous to overlapping rat susceptibility loci. As it is not known if these rat loci contain unique sub-loci, human risk associated polymorphisms mapping to overlapping rat regions were counted only once. The size of the rat genome covered by all known rat mammary cancer susceptibility loci compared to control loci was disproportionate (Table 3). However, the ratio of breast cancer risk associated human SNPs at orthologs to rat mammary cancer susceptibility loci to SNPs at random segments was higher than the ratio of susceptibility loci bases to random bases ( Not surprisingly, only 179 of 533 or 33.6% of the total human GWA study identified SNPs using populations of European descent were located at orthologs to rat mammary cancer associated loci. It is notable that 57 of the 533 total SNPs evaluated were reported in more than one GWA study; a majority of these were potential associations that failed the final validation step of the respective study. These results further suggest that there are several breast cancer risk associated SNPs not reaching genome-wide statistical significance in human populationbased genetic studies. Since more breast cancer risk polymorphisms nominated from GWA studies of populations of European descent mapped to orthologs of rat mammary cancer loci than to randomly selected regions of the rat genome, we determined if this was the case for association studies using non-European descent populations. We queried the nine GWA studies of populations of non-European ancestry that are listed in Table 5. These were GWA studies using populations of African, African-American, Ashkenazi Jewish, and Asian descent; however, only polymorphisms from studies using African-American, Ashkenazi Jewish and Asian descent populations mapped to any of the human orthologous segments to rat genomic regions picked for this study. First, results from all studies of non-European descent populations were combined ( Figure 2A). Eighty-nine risk associated SNPs mapped to orthologs of rat mammary cancer loci and 26 SNPs were located at randomly selected rat regions. Next, studies using populations of Asian, Ashkenazi Jewish and African-American descent were considered separately. This resulted in 64 Asian descent population SNPs mapping to orthologs of rat mammary cancer loci and 18 SNPs to random rat regions. Twenty-four SNPs identified in studies of African-American descent populations were located at orthologs to rat mammary cancer loci and eight SNPs in random rat regions. The difference between rat mammary cancer loci and random regions was statistically significant (P <0.01) for both populations (Figure 2A). Interestingly, one SNP from a study of an Ashkenazi Jewish population mapped to the human orthologous region of rat Mcsta1, but no GWA study nominated SNP from that study mapped to a rat random region [53]. The lack of human SNPs mapping to orthologs of rat mammary cancer loci from populations of African and Ashkenazi Jewish decent may be due to a limited number of studies conducted on these populations. On the other hand, it may indicate that susceptibility alleles different from those currently identified in laboratory rats are segregating in these populations. Out of 285 SNPs considered from studies using populations of non-European descent, 89 SNPs or 31% mapped to orthologs of rat mammary cancer loci (Additional file 2). Fifteen risk-associated SNPs were represented in more than one human GWA study.
Next, GWA-study nominated variants from populations of European descent were separated by associated (reached genome-wide significance) and potentially associated (did not reach genome-wide significance after the final stage) variants ( Figure 2B). Nineteen associated SNPs were located at rat mammary cancer loci compared to seven SNPs that mapped to random rat regions. Comparatively, 160 potentially associated SNPs mapped A B Figure 2 Number of breast cancer risk GWA study nominated SNPs mapping to orthologs of rat mammary cancer loci or randomly selected rat genomic segments. Dark grey columns indicate GWA study nominated SNPs that map to human orthologous regions of rat mammary cancer loci. Light grey columns indicate GWA study nominated SNPs that mapped to human orthologous regions of randomly selected rat genomic regions. A) Studies by population descent. Asterisks indicate statistical significance (P <0.01). The difference between risk associated SNPs mapping to rat mammary cancer loci and random rat regions in studies of European, Asian and African-American descent populations was significant (P-values <0.01 using chi-square analysis with the number of SNPs mapping to rat mammary cancer loci set as the observed value and the number of SNPs mapping to random rat regions as the expected value). B) Associated and potentially associated SNPs identified in populations of European descent that mapped to rat regions of interest were compared using logistic regression. Threshold of association was not a significant predictor of whether a SNP mapped to an ortholog of a rat mammary cancer locus or a random rat region. 'ns' indicates a comparison was not statistically significant. GWA, genome-wide association.
to rat mammary cancer susceptibility loci compared to 44 SNPs that mapped to random rat regions. A logistic regression was performed using threshold of association (associated or potentially associated) as the independent variable and rat genome location (ortholog of a rat mammary cancer risk locus or a randomly selected locus) as the dependent variable. Threshold of association was not a significant effect (P-value = 0.54). This result, that both associated and potentially associated breast cancer risk variants map more often to orthologs of rat mammary cancer risk loci than rat regions not associated with susceptibility, strongly supports that comparative genomics between humans and rats may be an effective integrative approach to determine which potential associations nominated by human association studies are true positives.
Human populations have been studied more extensively for breast cancer genetic risk than have rat populations; therefore, it is not surprising that human studies have yielded a considerable number of genome-wide significantly associated SNPs in alleles where it is not known if the rat genome contains a concordant allele. Interestingly, seven strongly associated human SNPs were in sequences orthologous to the randomly selected rat genome regions that are not known to associate with rat mammary cancer based on studies evaluating specific rat strains; thus, it is possible that a portion of the rat genome used in this study as rat random-genome control regions may actually associate with unidentified rat mammary cancer susceptibility loci. Thus, more rat genomic regions associated with mammary cancer risk may be identified with additional rat genetic studies. To date, only six inbred rat strains have been used to identify rat genomic regions associated with mammary cancer risk [15,16,[20][21][22][23]. Therefore, it is highly likely that more mammary cancer susceptibility loci may be identified by incorporating additional rat strains. It is also possible that more extensive analysis of previously studied rat strains may yield additional susceptibility loci by using a higher density of genetic markers for example.
Twenty-one of the 24 known rat mammary cancer associated loci are orthologous to human loci containing SNPs that are either associated or potentially associated with breast cancer risk. Fourteen of the known rat mammary cancer associated loci are orthologous to human risk alleles marked by GWA study nominated SNPs reaching genome-wide significance. Human GWA study designs do not definitively determine causative genes or mechanisms. The laboratory rat is a versatile experimental organism to complement human studies of breast cancer. For example, inbred rat strains provide a model with reduced genetic variation that can be genetically manipulated and environmentally controlled. The overlap between human breast and rat mammary cancer susceptibility associated loci suggests rats can be used extensively to study genetically determined mechanisms and environment interactions that will translate directly to human breast cancer risk and prevention.
Human GWA study nominated breast cancer risk SNPs map similarly to rat mammary cancer associated loci identified using 7,12-dimethylbenz[a]anthracene or beta-estradiol Several rat mammary cancer loci used in this study were identified using DMBA to induce mammary tumors. These are Mcs5a1,Mcs5a2,Mcsm1,. The remaining rat mammary cancer loci considered were identified using beta-estradiol to induce mammary carcinogenesis. Estradiolassociated susceptibility loci are Emca1-2 and Emca4-8. While DMBA is representative of environmental polycyclic aromatic hydrocarbons, this synthesized mammary carcinogen is not found in nature. Conversely, estradiol is an endogenous environmental exposure associated with breast cancer risk. Human GWA study nominated SNPs mapping to orthologs of rat mammary cancer loci identified using DMBA were compared to those identified using betaestradiol. We considered SNPs from all GWA studies, irrespective of the population used. We noted that many DMBA and beta-estradiol identified rat mammary cancer loci overlap. In fact, seven of the fourteen DMBA associated rat mammary cancer loci overlap at least one beta-estradiol associated rat mammary cancer risk locus, and five of the seven beta-estradiol loci overlap rat mammary cancer loci identified using DMBA. To account for this overlap, human SNPs mapping to overlapping rat mammary cancer loci, one identified using DMBA and the other using betaestradiol, were included once in the 'DMBA' group and once in the 'beta-estradiol' group. These results are shown in Figure 3. A relatively similar number of GWA study nominated SNPs mapped to orthologs of rat mammary cancer loci that were identified using DMBA (181 SNPs) and beta estradiol (146 SNPs). This suggests that different mammary carcinoma induction methods can effectively identify rat susceptibility loci relevant to human disease risk, and it also suggests that a plethora of carcinogenesis mechanisms may be genetically determined.

Discussion
It has been suggested that the use of Bonferroni-based correction procedures to protect against multiple comparisons in GWA studies is too stringent and results in an abundance of false negative associations with little recourse to sort these from true-negative associations. Therefore, we considered associated and potentially associated human SNPs from breast cancer risk GWA studies to determine if SNPs that failed validation and SNPs that reached genome-wide significance map to respective regions of the rat genome known to associate with rat mammary cancer risk more often than to regions of the rat genome that are not known to associate with susceptibility. Results presented here indicate that the rat genome is useful to prioritize and rank human alleles potentially associated with risk. The rat genome is useful regardless of the human population studied. Significantly more SNPs from GWA studies of populations of European, Asian, and African-American descent map to human orthologous regions of rat mammary cancer loci than to human orthologous regions of randomly selected rat genomic regions not known to associate with mammary cancer susceptibility. This supports the general idea that there are SNPs associated with breast cancer risk that are missed due to conservative statistical methods used in GWA studies, and that the rat is useful to parse out important genetic variation in susceptibility to mammary carcinogenesis.
Interestingly, we were unable to map GWA study nominated SNPs to 3 of the 24 known rat mammary cancer loci. These were Mcs1a, Mcs5a1 and Mcs5c. However, using a genome-targeted population-based genetic association study, a human SNP associated with breast cancer risk has been identified at human MCS5A1 [9]. The riskassociated SNPs at MCS5A1 are adjacent to a breast cancer risk-associated SNP at MCS5A2, which was identified in two independent human population based studies [9,43]. Taken together, there is a high correlation between genetics of breast cancer susceptibility in humans and mammary cancer susceptibility in rats. Interestingly, there are several human genomic regions that are human GWA study nominated hotspots (for example, 19q13, FGFR2) that are not known to have concordant rat orthologs. An explanation is that human breast and rat mammary cancer susceptibility are controlled by overlapping and nonoverlapping genetic mechanisms. Another explanation is that there are rat genomic regions associated with mammary cancer risk yet to be discovered by using additional inbred strains, more extensive analysis of strains previously studied, and different methods of carcinogenesis induction.

Conclusions
There is extensive genomic overlap between human breast and rat mammary cancer susceptibility. The rat genome may provide utility to identify true-positive associations regardless of the human population used for a GWA study. The laboratory rat will continue to be an important model organism for researching genetically determined mechanisms of mammary cancer susceptibility that may translate directly to human susceptibility. An appreciable number of GWA study nominated SNPs not meeting genome-wide significance levels have genomic overlap with rat mammary cancer susceptibility loci. This supports the general idea that Bonferroni-based multiple-comparison correction procedures are too stringent and complementary approaches that integrate rat genomics would be highly efficacious to prioritize breast cancer risk associated alleles.

Additional files
Additional file 1: Table S1. List of GWAS nominated SNPs used in Analysis.
Additional file 2: Table S2. Breast cancer risk associated polymorphisms from studies of European descent populations that map to rat mammary cancer loci and random rat regions. Table S3. Breast cancer risk associated polymorphisms from studies of non-European descent populations that map to rat mammary cancer loci and random rat regions.

Competing interests
The authors declare that they have no competing interests. Figure 3 Number of breast cancer risk GWA study nominated SNPs mapping to regions identified using DMBA or beta-estradiol. Number of GWA study nominated SNPs mapping to rat mammary cancer loci separated by method of mammary carcinogenesis induction. Slightly more SNPs mapped to orthologs of rat loci that were identified using DMBA than beta-estradiol. DMBA,7,anthracene; GWA, genome-wide association.