Skip to main content

Genome-wide and transcriptome-wide association studies of mammographic density phenotypes reveal novel loci



Mammographic density (MD) phenotypes, including percent density (PMD), area of dense tissue (DA), and area of non-dense tissue (NDA), are associated with breast cancer risk. Twin studies suggest that MD phenotypes are highly heritable. However, only a small proportion of their variance is explained by identified genetic variants.


We conducted a genome-wide association study, as well as a transcriptome-wide association study (TWAS), of age- and BMI-adjusted DA, NDA, and PMD in up to 27,900 European-ancestry women from the MODE/BCAC consortia.


We identified 28 genome-wide significant loci for MD phenotypes, including nine novel signals (5q11.2, 5q14.1, 5q31.1, 5q33.3, 5q35.1, 7p11.2, 8q24.13, 12p11.2, 16q12.2). Further, 45% of all known breast cancer SNPs were associated with at least one MD phenotype at p < 0.05. TWAS further identified two novel genes (SHOX2 and CRISPLD2) whose genetically predicted expression was significantly associated with MD phenotypes.


Our findings provided novel insight into the genetic background of MD phenotypes, and further demonstrated their shared genetic basis with breast cancer.


Heterogeneity of breast tissue composition can be observed through radiographic imaging using mammography. Epithelial and connective tissues are radiologically dense with white appearance, while adipose tissue is radiologically lucent with dark appearance on a mammogram [1]. Mammographic density (MD) has been widely established as one of the strongest risk factors for breast cancer [2,3,4], the most common cancer type among women in the USA [5]. Specifically, quantitative MD measures, including mammographic dense area (DA), non-dense area (NDA), and the percentage of dense area in the whole breast (PMD), have all been independently associated with breast cancer [3, 6,7,8]. In analyses adjusting for body mass index (BMI) and age, women with higher DA and PMD have an elevated risk of breast cancer, while NDA is associated with a decreased breast cancer risk.

Twin studies indicate that genetic factors explain a large fraction of the variation in MD phenotypes, with heritability estimates for DA, NDA, and PMD, after adjusting for age and individual-specific shared environmental factors, exceeding 60% [9,10,11]. Previous genome-wide association studies (GWAS) have identified 46 genetic variants that are significantly (p < 5 × 10−8) associated with at least one MD phenotype, including 27 associated with DA, 17 associated with NDA, and 20 associated with PMD [12,13,14,15,16,17]. Importantly, many of these variants have also been discovered as the susceptible loci of breast cancer, suggesting the critical role played by MD as an intermediate phenotype for the disease. However, only a small fraction of the variance of MD phenotypes can be explained by these significant variants [14, 17]. To enhance our understanding of the genetic basis of MD, additional GWAS with larger sample sizes is needed.

In the present study, we conducted GWAS and a transcriptome-wide association study (TWAS) in up to 27,900 European ancestry women with the goal of identifying novel loci associated with MD phenotypes.


Study population and data collection

We conducted a GWAS for three MD phenotypes (DA, NDA, and PMD) using data from 21 studies which provided individual-level genotype and phenotype data (Additional file 2: Table S1) as well as nine additional studies which provided GWAS summary statistics (Additional file 2: Table S2), under the Breast Cancer Association Consortium (BCAC) and the Markers of Density Consortium (MODE). The overall study sample comprised of 6666 breast cancer cases and 21,234 controls. All the individuals had PMD data collected, while the DA and NDA measures were only available in a proportion of the study population. The final sample sizes used in the meta-analyses were 24,579 (DA), 24,689 (NDA), and 27,900 (PMD). For breast cancer cases, mammograms collected prior to the cancer diagnosis were used for density assessment. Study-specific approaches to obtain quantitative measures of MD phenotypes are summarized in Additional file 2: Table S1, and study-specific protocols for MD measurement are given in the Additional file 1. Most of studies included in our analysis used CUMULUS, a computer-assisted semi-automated thresholding software [18]. Age and BMI at time of mammogram collection were included as covariates in the GWAS. For participants with missing BMI at mammogram (N = 1767), self-reported BMI within five years of mammogram collection was used as an approximation.

Individual-level genotype data were generated with either the iCOGs [19] or OncoArray [20] arrays. We applied standard quality control filters as described elsewhere [19]. Genotype data were imputed to 1000 Genomes phase 3 version 5 using IMPUTE2 [21]. Genotype dosage, ranging between 0 and 2, was generated for imputed variants. Single nucleotide polymorphisms (SNPs) with low imputation quality (INFO < 0.3) or with a minor allele frequency (MAF) < 1% were excluded. Approximately 9.8 million variants were included in the association analysis. Genomic positions of the variants were based on Genome Reference Consortium GRCh37 (hg19).

Genome-wide association study (GWAS)

We conducted study-specific multivariable adjusted linear regression analysis for each MD phenotype. All MD phenotypes were square-root transformed before analysis as this resulted in distributions that were close to normal. Age and 1/BMI at mammogram, and the first ten ancestry informative principal components, as previously described [14], were included as covariates in each regression model. Analyses were performed using R 3.6.1 (R Foundation). We then combined study-specific GWAS results with previously derived GWAS summary statistics using a sample-size weighted meta-analysis (the ‘SAMPLESIZE’ scheme as implemented in METAL [22]). To be included in the meta-analysis, a variant needed to have a valid Z-statistic from at least three individual studies and a minimum sample size of 3000. Regional association plot for each genome-wide significant locus in the meta-analysis was generated using the LocusZoom software [23].

Sensitivity and conditional analysis

The majority of studies included in our analysis were population-based or breast cancer nested case–control studies. To assess if any identified SNP-MD associations was an artifact resulting from oversampling of breast cancer cases in our population, we replicated the association analysis for all genome-wide significant SNPs in controls only (N = 21,234), as a sensitivity analysis. As mammographic NDA is strongly associated with BMI, we also assessed the association between significant NDA loci and BMI among 13,915 individuals with NDA, BMI, and genotype data available.

To quantify the number of independent signals in each significant GWAS locus, we performed a conditional analysis using the COndition and JOint analysis tool implemented in the Genome-wide Complex Trait Analysis software (COJO-GCTA) [24]. Since COJO-GCTA requires beta and standard error estimates, which were not available in our sample-size weighted meta-analysis data, we performed a standard error weighted meta-analysis with the normalized square-rooted MD phenotypes (per study, [sqrt-MD − mean(sqrt-MD)]/stderr(sqrt-MD)) as the outcomes. For each locus, we defined the lead SNP as the first independent signal, and performed the conditional analysis for SNPs located within + / − 500 kb. The top-ranked SNP with conditional p value < 10−5 was added to the independent signal list, and the conditional analysis was run again for rest of the SNPs. The conditional analysis was halted when no variant reached the threshold of conditional p value < 10−5. For the loci with multiple independent signals identified in the conditional analysis, all signals are annotated on the regional association plot.

Breast cancer association analysis

We examined the association between MD phenotype-associated SNPs and breast cancer risk, overall and by estrogen receptor (ER) status, using publicly available breast cancer GWAS summary statistics [19], based on 122,977 cases (including 69,501 ER-positive and 21,468 ER-negative cases) and 105,974 controls of European ancestry from the BCAC. We also assessed if known breast cancer SNPs [25] were associated with MD phenotypes.

Exploratory bioinformatics analysis

We used linkage disequilibrium (LD) score regression to estimate the SNP heritability (h2SNP) of MD phenotypes [26]. We partitioned the h2SNP by 74 functional genomic categories [27], and estimated the heritability enrichment for each category. We quantified the genome-wide genetic correlation between each MD phenotype and breast cancer [19, 28]. We also estimated the local genetic correlation between MD phenotypes and overall breast cancer using ρHESS [29, 30], which estimates the local shared heritability between two traits across 1703 independent genomic blocks, based on LD in European ancestry populations [31]. We defined statistically significant local genetic correlations as p < 0.05/1703 = 2.94 × 10−5.

Transcriptome-wide association study (TWAS)

To estimate the association between imputed gene expression and MD phenotypes, we conducted a transcriptome-wide association analysis (TWAS). We used genotype and gene expression data in mammary tissue from 396 individuals collected by the GTEx consortium (Release V8) to build gene-specific SNP prediction models of gene expression [32]. Predictive models were built based on variants located + / − 500 kb of each gene, using three different approaches (Top 1, Elastic Net [33], and LASSO [34]). Gene-specific expression levels were then imputed with the model showing the highest predictive R-square based on cross-validation. A total of 7284 genes with nominally significant (p < 0.01) heritability were included in the association analysis for each MD phenotype. Construction of predictive models and association analysis using GWAS summary statistics were performed using the R-based pipeline FUSION [34]. A significance threshold of p < 0.05/(7284*3) = 2.29 × 10−6 was utilized to identify statistically significant associations between imputed gene expression levels and MD phenotypes.

Replication of novel GWAS and TWAS findings

Replication analyses of the novel GWAS loci were performed using data from a previous GWAS meta-analysis of mammographic density phenotypes in an independent population of 24,192 European ancestry women participating in the Kaiser Permanente Northern California (KPNC) Research Program on Genes, Environment and Health (RPGEH) [17]. Briefly, MD phenotypes were measured using Cumulus6 on a single craniocaudal view from 20,311 Hologic and 3881 GE full-field digital mammography (FFDM) exams. MD phenotypes were transformed separately within each cohort to attain standard normal distributions and to facilitate meta-analysis and interpretation of effect sizes in SD units. Genotypes were assayed using the Affymetrix Axiom array with > 650,000 variants, and imputed using the 1000 Genomes Project Phase III reference panel. Allele dosage effects were estimated using linear regression models adjusted for age at mammography, ln(BMI), the first ten principal components of European ancestry, genotyping reagent kit, and image batch separately in the Hologic and GE cohorts, and the estimates were combined using inverse-variance weighted meta-analysis.

Replication analyses of the novel TWAS loci were performed in 24,158 women from the Kaiser RPGEH mammographic density GWAS [17] with genotypes imputed using the Haplotype Reference Consortium reference panel for single-nucleotide variants, and 1000 Genomes Project Phase III reference panel for indels [35]. Expression levels of 7 genes (MTMR11, SHOX2, CRISPLD2, SMIM25, TMEM184B, EP300, and DESI1) were estimated using the PredictDB GTEx v8 Elastic Net models for mammary tissue [33], which did not include MRPL23-AS1. Associations of the predicted gene expression levels with the standardized MD phenotypes were estimated using linear regression models adjusted for age at mammography, ln(BMI), the first ten principal components of European ancestry, genotyping reagent kit, and image batch separately in the Hologic (n = 20,282) and GE (n = 3876) cohorts, and the estimates were combined using inverse-variance weighted meta-analysis.


Our study population was on average 56.6 years old and had an average BMI of 26.5 kg/m2 at the time of mammogram. The mean DA, NDA, and PMD were 28.5 cm2, 120.4 cm2, and 23.4%, respectively. Age and BMI at mammogram, as well as the square-root transformed MD measures, all approximately followed a normal distribution (Additional file 2: Figure S1). Genomic inflation factors (λGC) were between 1.11 and 1.13 (Additional file 2: Figure S2), with LD-score regression intercepts between 1.05 and 1.06, suggesting that the observed genomic inflation is partly driven by the polygenic effects of many variants [26].

GWAS of MD phenotypes

We identified 28 distinct loci associated with at least one MD phenotype at p < 5 × 10−8 (Table 1, Additional file 1: Figures S3–S8). Of these, 18 were associated with DA, six with NDA, and 15 with PMD (Fig. 1). SNPs in seven regions (1q21.2, 5q23.2, 5p35.1, 6q25.1, 11p15.5, 12q23.2, 19q13.33) were associated with both DA and PMD. SNPs at 8q11.23 were associated with both NDA and PMD; SNPs at 22q13.2 were associated with both DA and NDA; and SNPs at 10q21.2 were associated with all three MD phenotypes. The phenotypic variance explained by the lead SNPs of the genome-wide significant loci was 2.6% for DA, 0.8% for NDA, and 1.6% for PMD. Nine of the significant loci (5q11.2, 5q14.1, 5q31.1, 5q33.3, 5q35.1, 7p11.2, 8q24.13, 12p11.2, 16q12.2) had not previously been associated with MD phenotypes. Conditional analysis showed evidence that four DA-associated loci (5q35.1, 10q21.2, 20q13.13, 22q13.2) and one NDA-associated locus (8p11.23) had two independent signals at conditional p value < 10−5 (Additional file 2: Table S3).

Table 1 Lead SNPs of the genome-wide significant loci identified in the GWAS meta-analysis of mammographic dense area (DA), non-dense area (NDA) and percent mammographic density (PMD)
Fig. 1
figure 1figure 1figure 1

Manhattan plots of the GWAS meta-analysis results of mammographic a dense area (DA, N = 24,579), b non-dense area (NDA, N = 24,689), and c percent mammographic density (PMD, N = 27,900). p value thresholds for genome-wide significance (p = 5 × 10−8, red dash line) and suggestive significance (p = 10−5, blue dash line) are shown as horizontal lines. The gene closest to each lead variant is annotated. Novel loci are marked red. a Manhattan plot of the GWAS meta-analysis results of DA. b Manhattan plot of the GWAS meta-analysis results of NDA. c Manhattan plot of the GWAS meta-analysis results of PMD

For the 10 novel SNPs identified (corresponding to nine loci as the 5q35.1 region was associated with both DA and PMD), we performed look-ups using 24,192 women of European ancestry [17]. Of the ten SNPs, seven replicated at p < 0.05 with the same direction of effect (three with DA (5q35.1, 8q24.13, 12p11.2) and four with PMD (5q31.1, 5q33.3, 5q35.1, 16q12.2), Additional file 2: Table S4). One DA SNP (5q11.2) had a concordant direction of association, whereas one DA SNP (7p11.2) and the NDA SNP (5q14.1) had discordant directions of association in the look-up with p > 0.05.

Sensitivity analysis based on 21,234 controls showed consistent direction of effect and comparable effect size to the main analysis (Additional file 2: Table S5). None of the NDA-associated SNPs were associated with BMI at p < 0.05 (Additional file 2: Table S6), suggesting that observed SNP-NDA associations were not due to residual confounding with BMI.

We captured the lead SNPs from 46 distinct loci that have previously been reported to associate with at least one MD phenotype. We investigated their associations (N = 63) with the corresponding MD phenotype based on our study (Additional file 2: Table S7). We were able to replicate 26 out of 28 DA SNPs, 13 out of 16 NDA SNPs, and 15 out of 19 PMD SNPs at p < 0.05. Among these, 10 DA SNPs, 2 NDA SNPs, and 5 PMD SNPs were found with genome-wide significance at p < 5 × 10−8.

Association of MD significant loci with breast cancer

We assessed whether the identified MD phenotype-associated SNPs were also associated with breast cancer risk. Of the 28 lead SNPs, 13 were associated with overall breast cancer risk at genome-wide significance (Table 2). In addition, one SNP (22q13.1) was significantly associated with ER-positive breast cancer (p = 5.6 × 10−8 for overall breast cancer). For nine of these 14 SNPs, the direction of association was consistent with that expected (i.e., the same direction for DA, PMD and breast cancer risk, and opposite direction for NDA and breast cancer risk), while for five SNPs the direction was the opposite; some of these conflicting results have been observed previously [14]. An additional seven lead SNPs were associated with breast cancer risk at p < 0.05. We also assessed the associations between 205 independent genome-wide significant variants for breast cancer [19] and MD phenotypes (Fig. 2, Additional file 2: Table S8) at p < 0.05, 63 (31%, 48 with consistent direction as expected) breast cancer variants were associated with DA, 36 (18%, 20 with opposite direction as expected) with NDA, and 62 (30%, 49 with consistent direction as expected) with PMD, respectively. In total, 92 (45%, 67 with expected direction) breast cancer variants were associated with at least one MD phenotype.

Table 2 Association between MD significant loci and breast cancer risk (overall, ER-positive, and ER-negative)
Fig. 2
figure 2figure 2figure 2

Manhattan-like plots showing the association between genome-wide significant breast cancer SNPs and the three mammographic density phenotypes (DA, NDA, PMD). p value thresholds for genome-wide significance (p = 5 × 10−8, red line), suggestive significance (p = 10−5, blue line) and nominal significance (p = 0.05, green line) are shown as horizontal dash lines. For signals with genome-wide significance for both MD phenotype and breast cancer, the nearest gene is annotated. a GWAS results of DA for significant SNPs of breast cancer. b GWAS results of NDA for significant SNPs of breast cancer. c GWAS results of PMD for significant SNPs of breast cancer

Exploratory bioinformatic analysis

We estimated the phenotypic variance attributable to common variants, as previously described [26]. SNP heritability (h2SNP) was estimated as 0.32 (se = 0.04) for DA, 0.24 (se = 0.03) for NDA, and 0.27 (se = 0.03) for PMD. By partitioning the h2SNP by 74 functional annotations [27], we observed that active enhancers marked by histone modification H3K27ac were enriched for all MD phenotypes (2.25-fold for DA, p = 7.10 × 10−11; 2.11-fold for NDA, p = 9.71 × 10−7 and 2.22-fold for PMD p = 7.71 × 10−10, Additional file 2: Table S9).

We further quantified the genetic correlation between MD phenotypes and breast cancer risk (Fig. 3). DA and PMD showed positive genetic correlations with overall breast cancer (DA: rg = 0.24, p = 1.11 × 10−4; PMD: rg = 0.29, p = 1.90 × 10−9), ER-positive (DA: rg = 0.21, p = 2.59 × 10−4; PMD: rg = 0.26, p = 4.71 × 10−8), and ER-negative breast cancer (DA: rg = 0.26, p = 1.04 × 10−3; PMD: rg = 0.27, p = 5.95 × 10−5). In contrast, NDA showed a negative genetic correlation with breast cancer (overall: rg =  − 0.17, p = 9.50 × 10−4; ER-positive: rg =  − 0.12, p = 0.021; ER-negative: rg = -0.17, p = 0.018).

Fig. 3
figure 3

Genetic correlations between three MD phenotypes (DA, NDA, PMD) and breast cancer (overall, ER-positive, and ER-negative), estimated by LD score regression

We estimated the local genetic correlation between MD phenotypes and overall breast cancer by partitioning the genome into 1,703 independent blocks. In total, we identified nine significant pairwise local genetic correlations between MD phenotypes and overall breast cancer (DA: 6q25.1, 10q21.2, 11p15.5, 12p11.2, 22q13.2; NDA: 8p11.23; PMD: 5q33.3, 6q25.1, 10q21.2) (Additional file 1: Figure S9). All nine regions harbored at least one genome-wide significant locus for a MD phenotype and were directionally consistent with the breast cancer association.

TWAS of MD phenotypes

Finally, we performed a TWAS investigating associations between the imputed expression of 7284 genes and MD phenotypes (Additional file 1: Figure S10, Additional file 2: Tables S10–S12), and identified significant associations with eight genes (Table 3). Six genes were either located in (MTMR11, SMIM25, and TMEM184B) or within the 1 Mb of the GWAS loci (EP300, DES11, and MRPL23-AS1). The imputed expression of two additional genes was associated with MD phenotypes, including SHOX2 (positively associated with NDA) and CRISPLD2 (negatively associated with PMD). We replicated our TWAS findings using individual-level data for 24,158 women from an independent GWAS to impute expression for seven of the identified genes with available models in PredictDB [17, 33]. Five of the seven genes, except EP300 and DESI1, were replicated at p < 0.05 with a consistent direction of effect (Additional file 2: Table S13).

Table 3 Genes with significant association between genetically predicted gene expression and MD phenotypes (DA, NDA, PMD), based on transcriptome-wide association study (TWAS)1


We conducted a GWAS for three MD phenotypes in 27,900 European ancestry women. We identified 28 distinct loci that were associated with at least one MD phenotype at genome-wide significance. Nine of these have not previously been reported to be associated with mammographic density. In addition, 14 of the 28 loci were also associated with breast cancer risk at genome-wide significance. We quantified the genetic correlation between MD phenotypes and breast cancer, further establishing the shared genetic basis between MD phenotypes and breast cancer risk. Finally, we conducted a TWAS and identified two additional novel associations between imputed expression level and MD phenotypes.

Previous GWAS based on data from MODE/BCAC identified 12 MD loci [12,13,14,15,16] and a recent GWAS of MD based on 24,192 women further discovered 31 novel loci [17]. In addition, GWAS investigating volumetric MD revealed one novel locus for percent dense volume (HABP2 at 10q25.3) and two loci for absolute dense volume (INHBB at 2q14.2, LINC01483 at 17q24.3) [36]. Previous studies support the association for DA lead SNP rs150249911, which is an intronic variant of the ITGA1 gene at 5q11.2. ITGA1-coded integrin α1 protein upregulated following the expression of estrogen receptor β, a marker of breast cancer [37]. SNP rs413472 (5q14.1) is located in the SSBP2 gene which has previously been implicated in breast cancer (p = 4.00 × 10−5) in Indonesian women [38]. DA lead SNP rs10155920 is in a long non-coding RNA located downstream of the EGFR gene at chromosome 7p11.2. EGFR is one of the most well-studied signaling pathways that contributes to the invasion, dissemination, and metastasis of breast tumors [39]. Breast cancer fine-mapping analysis has identified DA lead SNP rs7297051 as one of the four independent association signals of breast cancer at chromosome 12p11 [40], which is approximately 50 kb upstream of the PTHLH gene. PTHLH encodes parathyroid hormone-related protein (PTHrP), which aids in normal mammary gland development [41]. PTHrP has also been related to the prognosis of breast cancer, as its occasional secretion by tumor cells may promote osteoclastic activity and contribute to osteolytic bone metastases [42]. PMD lead SNP rs11646715 is in the FTO gene at chromosome 16q12.2 which encodes an mRNA demethylase and is well-known for its association with fat mass and obesity. Research has indicated that the FTO gene may play a role in cellular sensing of macronutrients and may be involved in the regulation of cell growth, which can at least partly explain its relationship with both obesity and breast cancer [43]. As we adjusted for BMI in our GWAS model, and further, rs11646715 was not associated with BMI in our data, the mechanisms underlying the associations between genetic variation in this region and MD likely differ from its effect on adiposity. Future studies are essential to elucidate potential biological mechanisms that link these genes to MD and ultimately breast cancer susceptibility. The novel DA locus at 8q24 has previously been associated with multiple types of cancer, including breast cancer [44]. To rule out that our finding was an artifact due to oversampling of breast cancer cases, we assessed the association with DA using controls only and observed a significant association (p = 5.64 × 10−4). Interestingly, rs58847541 is associated with breast cancer but in the opposite direction to the effect on DA.

Thirteen of the 28 GWAS loci were also associated with overall breast cancer risk with genome-wide significance, and had little difference in effect by cancer subtype. Among these, we observed multiple unexpected inconsistency in the direction of associations between MD-associated loci and breast cancer risk, including DA and PMD loci 1q21.2 and 2q14.2, DA loci 8q24.13 and 22q13.2, and NDA and PMD locus 8q11.23. The underlying biological mechanisms driving these discrepancies are unclear, but one potential explanation is that these loci may be involved in multiple pathways across life stages, which differentially affect breast development and the risk of breast cancer. Furthermore, the MD phenotypes we studied were radiologic reflection of the underlying breast tissue composition, which made it difficult to distinguish the epithelium from stroma tissue of the breast. We also investigated the association between 205 independent breast cancer SNPs and MD phenotypes and found that 45% of the variants were associated with at least one MD phenotype at p < 0.05. The local genetic correlation analysis in our study highlighted specific loci at which MD phenotypes and breast cancer showed evidence of shared heritability (DA: ESR1, ZNF365, LSP1, and MKL1; NDA: 8p11.23; PMD: ESR1 and ZNF365). These observations reinforce the strong shared genetic basis between mammographic density and breast cancer.

The SNP heritability (h2SNP), which can be interpreted as the proportion of phenotypic variance explained by the additive effects of all genotyped variants, was estimated to be 0.32 for DA, 0.24 for NDA, and 0.27 for PMD. Our estimates were slightly lower than those previously reported [17, 36], perhaps due to differences in the study populations or methodology [45]. Twin studies have estimated the heritability of the three MD phenotypes to all exceed 60% [9, 10]. The difference between the SNP-heritability estimates in our analysis and the estimates from twin studies may reflect the effect of rare variants not being genotyped and not in LD with any genotyped variants, or may be due to non-additive genetic effects, interactions between genetic variants and environmental factors, or uncontrolled shared environmental factors in the twin studies.

Our TWAS identified eight genes for which imputed expression levels were significantly associated with MD phenotypes. Six of these were either located in or close to the identified GWAS loci, suggesting the observed genotype–phenotype association may be mediated through gene expression. Two additional genes SHOX2 and CRISPLD2 were associated with NDA and PMD, respectively, and replicated in an independent study. Future studies are thus needed to elucidate the biological bases of these findings.

Our study has several strengths. It is the largest GWAS of mammographic density to date, enabling us to discover nine novel MD loci. We performed sensitivity analyses using controls, which reaffirmed that all significant associations were not spurious artifacts due to oversampling of cases. However, a few studies included in our study used the thresholding approach other than CUMULUS, which may cause inconsistency in the measurement of MD phenotypes and thus lead to biased results. Also, although previous studies have demonstrated that the MD measurement collected by CUMULUS was highly reproducible [8, 46], it is important to acknowledge that it was a reader-dependent approach and thus might inevitably be subjective to measurement error. Another weakness with our study is the lack of diversity, as our study sample only included women of European ancestry. Considering that the risk of breast cancer attributable to mammographic density may differ among racial/ethnic groups [47], future efforts should be made to collect mammogram and genotype data from racially diverse populations.


In this study, we conducted a GWAS and TWAS of MD phenotypes using 27,900 women of European ancestry. Our study improved our understanding about the genetic background of MD phenotypes, and reinforced the evidence of their shared genetic basis with breast cancer risk.

Availability of data and materials

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.



Breast Cancer Association Consortium


Body mass index


Mammographic dense area


Estrogen receptor


Genome-wide association study


Linkage disequilibrium


Minor allele frequency


Mammographic density


Mammographic non-dense area


Percent mammographic density


Single nucleotide polymorphism


Transcriptome-wide association study


  1. Boyd NF, Martin LJ, Yaffe MJ, Minkin S. Mammographic density and breast cancer risk: current understanding and future prospects. Breast Cancer Res. 2011;13(6):223.

    Article  PubMed  PubMed Central  Google Scholar 

  2. McCormack VA, dos Santos SI. Breast density and parenchymal patterns as markers of breast cancer risk: a meta-analysis. Cancer Epidemiol Biomark Prev. 2006;15(6):1159–69.

    Article  Google Scholar 

  3. Pettersson A, Graff RE, Ursin G, Santos Silva ID, McCormack V, Baglietto L, Vachon C, Bakker MF, Giles GG, Chia KS, et al. Mammographic density phenotypes and risk of breast cancer: a meta-analysis. J Natl Cancer Inst. 2014;106(5):dju078.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Bond-Smith D, Stone J. Methodological challenges and updated findings from a meta-analysis of the association between mammographic density and breast cancer. Cancer Epidemiol Biomark Prev. 2019;28(1):22–31.

    Article  CAS  Google Scholar 

  5. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2018. CA Cancer J Clin. 2018;68(1):7–30.

    Article  PubMed  Google Scholar 

  6. Ursin G, Ma H, Wu AH, Bernstein L, Salane M, Parisky YR, Astrahan M, Siozon CC, Pike MC. Mammographic density and breast cancer in three ethnic groups. Cancer Epidemiol Biomark Prev. 2003;12(4):332–8.

    Google Scholar 

  7. Maskarinec G, Pagano I, Lurie G, Wilkens LR, Kolonel LN. Mammographic density and breast cancer risk: the multiethnic cohort study. Am J Epidemiol. 2005;162(8):743–52.

    Article  PubMed  Google Scholar 

  8. Pettersson A, Hankinson SE, Willett WC, Lagiou P, Trichopoulos D, Tamimi RM. Nondense mammographic area and risk of breast cancer. Breast Cancer Res. 2011;13(5):R100.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Boyd NF, Dite GS, Stone J, Gunasekara A, English DR, McCredie MR, Giles GG, Tritchler D, Chiarelli A, Yaffe MJ, et al. Heritability of mammographic density, a risk factor for breast cancer. N Engl J Med. 2002;347(12):886–94.

    Article  PubMed  Google Scholar 

  10. Stone J, Dite GS, Gunasekara A, English DR, McCredie MR, Giles GG, Cawson JN, Hegele RA, Chiarelli AM, Yaffe MJ, et al. The heritability of mammographically dense and nondense breast tissue. Cancer Epidemiol Biomark Prev. 2006;15(4):612–7.

    Article  Google Scholar 

  11. Holowko N, Eriksson M, Kuja-Halkola R, Azam S, He W, Hall P, Czene K. Heritability of mammographic breast density, density change, microcalcifications, and masses. Cancer Res. 2020;80(7):1590–600.

    Article  CAS  PubMed  Google Scholar 

  12. Lindstrom S, Vachon CM, Li J, Varghese J, Thompson D, Warren R, Brown J, Leyland J, Audley T, Wareham NJ, et al. Common variants in ZNF365 are associated with both mammographic density and breast cancer risk. Nat Genet. 2011;43(3):185–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Stevens KN, Lindstrom S, Scott CG, Thompson D, Sellers TA, Wang X, Wang A, Atkinson E, Rider DN, Eckel-Passow JE, et al. Identification of a novel percent mammographic density locus at 12q24. Hum Mol Genet. 2012;21(14):3299–305.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lindstrom S, Thompson DJ, Paterson AD, Li J, Gierach GL, Scott C, Stone J, Douglas JA, dos-Santos-Silva I, Fernandez-Navarro P, et al. Genome-wide association study identifies multiple loci associated with both mammographic density and breast cancer risk. Nat Commun. 2014;5:5303.

    Article  CAS  PubMed  Google Scholar 

  15. Brand JS, Li J, Humphreys K, Karlsson R, Eriksson M, Ivansson E, Hall P, Czene K. Identification of two novel mammographic density loci at 6Q25.1. Breast Cancer Res. 2015;17:756.

    Article  CAS  Google Scholar 

  16. Fernandez-Navarro P, Gonzalez-Neira A, Pita G, Diaz-Uriarte R, Tais Moreno L, Ederra M, Pedraz-Pingarron C, Sanchez-Contador C, Vazquez-Carrete JA, Moreo P, et al. Genome wide association study identifies a novel putative mammographic density locus at 1q12-q21. Int J Cancer. 2015;136(10):2427–36.

    Article  CAS  PubMed  Google Scholar 

  17. Sieh W, Rothstein JH, Klein RJ, Alexeeff SE, Sakoda LC, Jorgenson E, McBride RB, Graff RE, McGuire V, Achacoso N, et al. Identification of 31 loci for mammographic density phenotypes and their associations with breast cancer risk. Nat Commun. 2020;11(1):5116.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Heine JJ, Scott CG, Sellers TA, Brandt KR, Serie DJ, Wu FF, Morton MJ, Schueler BA, Couch FJ, Olson JE, et al. A novel automated mammographic density measure and breast cancer risk. J Natl Cancer Inst. 2012;104(13):1028–37.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Michailidou K, Lindstrom S, Dennis J, Beesley J, Hui S, Kar S, Lemacon A, Soucy P, Glubb D, Rostamianfar A, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 2017;551(7678):92–4.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Amos CI, Dennis J, Wang Z, Byun J, Schumacher FR, Gayther SA, Casey G, Hunter DJ, Sellers TA, Gruber SB, et al. The OncoArray consortium: a network for understanding the genetic architecture of common cancers. Cancer Epidemiol Biomark Prev. 2017;26(1):126–35.

    Article  Google Scholar 

  21. Howie BN, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009;5(6):e1000529.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics. 2010;26(18):2336–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ATC, Replication DIG, Meta-analysis C, Madden PA, Heath AC, Martin NG, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet. 2012;44(4):369–75, S361–63.

  25. Kapoor PM, Lindstrom S, Behrens S, Wang X, Michailidou K, Bolla MK, Wang Q, Dennis J, Dunning AM, Pharoah PDP, et al. Assessment of interactions between 205 breast cancer susceptibility loci and 13 established risk factors in relation to breast cancer risk, in the Breast Cancer Association Consortium. Int J Epidemiol. 2020;49(1):216–32.

    Article  PubMed  Google Scholar 

  26. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics C, Patterson N, Daly MJ, Price AL, Neale BM. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47(3):291–5.

  27. Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh PR, Anttila V, Xu H, Zang C, Farh K, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet. 2015;47(11):1228–35.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR, Loh PR, ReproGen C, Psychiatric Genomics C, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control C, Duncan L, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015;47(11):1236–41.

  29. Shi H, Kichaev G, Pasaniuc B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet. 2016;99(1):139–53.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Shi H, Mancuso N, Spendlove S, Pasaniuc B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am J Hum Genet. 2017;101(5):737–51.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics. 2016;32(2):283–5.

    CAS  PubMed  Google Scholar 

  32. Consortium G. The genotype-tissue expression (GTEx) project. Nat Genet. 2013;45(6):580–5.

    Article  CAS  Google Scholar 

  33. Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K, Carroll RJ, Eyler AE, Denny JC, Consortium GT, Nicolae DL, et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 2015;47(9):1091–8.

  34. Gusev A, Ko A, Shi H, Bhatia G, Chung W, Penninx BW, Jansen R, de Geus EJ, Boomsma DI, Wright FA, et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat Genet. 2016;48(3):245–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Emami NC, Cavazos TB, Rashkin SR, Cario CL, Graff RE, Tai CG, Mefford JA, Kachuri L, Wan E, Wong S, et al. A large-scale association study detects novel rare variants, risk genes, functional elements, and polygenic architecture of prostate cancer susceptibility. Cancer Res. 2021;81(7):1695–703.

    Article  CAS  PubMed  Google Scholar 

  36. Brand JS, Humphreys K, Li J, Karlsson R, Hall P, Czene K. Common genetic variation and novel loci associated with volumetric mammographic density. Breast Cancer Res. 2018;20(1):30.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Lindberg K, Strom A, Lock JG, Gustafsson JA, Haldosen LA, Helguero LA. Expression of estrogen receptor beta increases integrin alpha1 and integrin beta1 levels and enhances adhesion of breast cancer cells. J Cell Physiol. 2010;222(1):156–67.

    Article  CAS  PubMed  Google Scholar 

  38. Haryono SJ, Datasena IG, Santosa WB, Mulyarahardja R, Sari K. A pilot genome-wide association study of breast cancer susceptibility loci in Indonesia. Asian Pac J Cancer Prev. 2015;16(6):2231–5.

    Article  PubMed  Google Scholar 

  39. Ali R, Wendt MK. The paradoxical functions of EGFR during breast cancer progression. Signal Transduct Target Ther. 2017, 2.

  40. Zeng C, Guo X, Long J, Kuchenbaecker KB, Droit A, Michailidou K, Ghoussaini M, Kar S, Freeman A, Hopper JL, et al. Identification of independent association signals and putative functional variants for breast cancer risk through fine-scale mapping of the 12p11 locus. Breast Cancer Res. 2016;18(1):64.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Hens JR, Dann P, Zhang JP, Harris S, Robinson GW, Wysolmerski J. BMP4 and PTHrP interact to stimulate ductal outgrowth during embryonic mammary development and to inhibit hair follicle induction. Development. 2007;134(6):1221–30.

    Article  CAS  PubMed  Google Scholar 

  42. Akhtari M, Mansuri J, Newman KA, Guise TM, Seth P. Biology of breast cancer bone metastasis. Cancer Biol Ther. 2008;7(1):3–9.

    Article  CAS  PubMed  Google Scholar 

  43. Akbari ME, Gholamalizadeh M, Doaei S, Mirsafa F. FTO gene affects obesity and breast cancer through similar mechanisms: a new insight into the molecular therapeutic targets. Nutr Cancer. 2018;70(1):30–6.

    Article  CAS  PubMed  Google Scholar 

  44. Turnbull C, Ahmed S, Morrison J, Pernet D, Renwick A, Maranian M, Seal S, Ghoussaini M, Hines S, Healey CS, et al. Genome-wide association study identifies five new breast cancer susceptibility loci. Nat Genet. 2010;42(6):504–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Rice MS, Rosner BA, Tamimi RM. Percent mammographic density prediction: development of a model in the nurses’ health studies. Cancer Causes Control. 2017;28(7):677–84.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Bissell MCS, Kerlikowske K, Sprague BL, Tice JA, Gard CC, Tossas KY, Rauscher GH, Trentham-Dietz A, Henderson LM, Onega T, et al. Breast cancer population attributable risk proportions associated with body mass index and breast density by race/ethnicity and menopausal status. Cancer Epidemiol Biomarkers Prev. 2020.

Download references


See Supplementary Information.


See Supplementary Information.

Author information

Authors and Affiliations




HC, JS, GLG, RMT, CMV, and SL were involved in conceptualization and study design; SF, DJT, QW, JD, KM, JLH, MCS, TN, TLN, PAF, AB, GC, RAM, KA, AH, SA, FC, JO, RLM, GGG, CAH, GM, SW, EMJ, AK, HE, IA, DGE, WGN, PH, KC, AS, MJ, MP, PF, DSM, VNK, JHR, AMD, PDPP, DFE, GLG, RMT, and CMV helped in data curation; HC, SF, CS, and MKB contributed to data management and harmonization; HC, SF, JD, SL, and CS helped in statistical analysis; JHR, PW, LAH, and WS were involved in replication and validation; HC contributed to drafting of the manuscript; all authors helped in reviewing and editing the manuscript; SL was involved in funding acquisition; SL contributed to project supervision. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Sara Lindström.

Ethics declarations

Ethics approval and consent to participate

The use of data has been approved by each participating studies. See Supplementary Information.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Supplementary Information, Funding, Acknowledgement, and Supplementary Figures.

Additional file 2.

Supplementary Tables.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, H., Fan, S., Stone, J. et al. Genome-wide and transcriptome-wide association studies of mammographic density phenotypes reveal novel loci. Breast Cancer Res 24, 27 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: