Impact of breast density on diagnostic accuracy in digital breast tomosynthesis versus digital mammography: results from a European screening trial

Background The diagnostic accuracy of digital breast tomosynthesis (DBT) and digital mammography (DM) in breast cancer screening may vary per breast density subgroup. The purpose of this study was to evaluate which women, based on automatically assessed breast density subgroups, have the greatest benefit of DBT compared with DM in the prospective Malmö Breast Tomosynthesis Screening Trial. Materials and methods The prospective European, Malmö Breast Tomosynthesis Screening Trial (n = 14,848, Jan. 27, 2010–Feb. 13, 2015) compared one-view DBT and two-view DM, with consensus meeting before recall. Breast density was assessed in this secondary analysis with the automatic software Laboratory for Individualized Breast Radiodensity Assessment. DBT and DM’s diagnostic accuracies were compared by breast density quintiles of breast percent density (PD) and absolute dense area (DA) with confidence intervals (CI) and McNemar’s test. The association between breast density and cancer detection was analyzed with logistic regression, adjusted for ages < 55 and ≥ 55 years and previous screening participation. Results In total, 14,730 women (median age: 58 years; inter-quartile range = 16) were included in the analysis. Sensitivity was higher and specificity lower for DBT compared with DM for all density subgroups. The highest breast PD quintile showed the largest difference in sensitivity and specificity at 81.1% (95% CI 65.8–90.5) versus 43.2% (95% CI 28.7–59.1), p < .001 and 95.5% (95% CI 94.7–96.2) versus 97.2% (95% CI 96.6–97.8), p < 0.001, respectively. Breast PD quintile was also positively associated with cancer detected via DBT at odds ratio 1.24 (95% CI 1.09–1.42, p = 0.001). Conclusion Women with the highest breast density had the greatest benefit from digital breast tomosynthesis compared with digital mammography with increased sensitivity at the cost of slightly lower specificity. These results may influence digital breast tomosynthesis’s use in an individualized screening program stratified by, for instance, breast density. Trial registration. Trial registration at https://www.ClinicalTrials.gov: NCT01091545, registered March 24, 2010. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-023-01712-6.


Introduction
Digital breast tomosynthesis (DBT) has been investigated as an alternative to digital mammography (DM) in breast cancer screening, with proven increased cancer detection rates (CDR) in several prospective trials [1].However, the impact of DBT on recall rates has varied across different studies and screening settings [1].Women with high breast density seem to particularly benefit from DBT due to its reduction of the overlapping tissue effect, as compared with DM [2].Women with high breast density also have a higher risk of breast cancer, missed cancers, and false positive (FP) findings with DM compared with women with low breast density [3][4][5].
Radiologists commonly classify breast density into four categories according to the Breast Imaging Reporting and Data System (BI-RADS) [6].Yet, this categorization is associated with both intra-and interobserver variations [7,8].In previous studies investigating DBT, breast density was often dichotomized into dense and non-dense [9].A more detailed assessment of breast density might better capture the risk of developing breast cancer and address reduced sensitivity of cancer detection from the overlapping tissue effect [10].Several automated quantitative breast density assessment software algorithms have been developed with the aim to primarily reduce observer variability [11].One such software is the Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) [12,13].
Screening with DBT improves CDR compared with DM in women with dense breasts [9].However, results from more detailed density sub-analyses in prospective trials with either BI-RADS density classification or automated software breast density assessment have shown inconsistent CDR results and recall rates in different density subgroups and most data are from American rather than European material [9,[14][15][16].Accordingly, more information, especially from European data is needed.Younger women also generally have higher breast density, and DM's sensitivity for breast cancer detection in this population is lower compared with older women [4,17].Further, density sub-analyses from previous prospective DBT screening trials have not included women 40-49 years old [14][15][16].
The prospective Malmö Breast Tomosynthesis Screening Trial compared one-view wide angle DBT alone to two-view DM and included women 40-74 years old [18].The purpose of this current study is to evaluate which breast density subgroups, as assessed by automatic software, that have the greatest benefit from digital breast tomosynthesis compared with digital mammography in the Malmö Breast Tomosynthesis Screening Trial, with a separate evaluation for women aged 40-49 years.

Study participants
The prospective Malmö Breast Tomosynthesis Screening Trial was conducted between January 27, 2010 and February 13, 2015 at Skåne University Hospital in Malmö, Sweden.This secondary analysis was pre-specified and received ethical approval from the local ethics committee at Lund University (Dnr 2009/770; trial protocol at https:// www.Clini calTr ials.gov: NCT01091545).A random sample of 21,691 women aged 40-74 years old were selected from the Malmö screening registry, asked to participate in the trial, and enrolled after providing their written informed consent (Fig. 1).Exclusion criteria were pregnancy and non-Swedish or non-English speakers.One-view (mediolateral oblique) wide angle DBT and two-view (mediolateral oblique and craniocaudal) DM images were acquired at one screening occasion with Mammomat Inspiration (Siemens Healthineers, Erlangen, Germany).The authors had full control of the data and all information submitted for publication, and none were employed by Siemens Healthineers.Seven radiologists (among them SZ) with breast imaging experience ranging from 2 to 40 years participated in the screen reading.Five of the readers had a screen reading volume of over 5000 screen examinations per year.All images were read in two separate reading arms, the DM reading arm and the DBT reading arm, with double reading in each arm and consensus meetings taking place before recall.The participants could be recalled from one or both reading arms (Fig. 1) [18,19].Breast density categorization within the trial was performed according to BI-RADS breast density 4th Ed categories [6] for all participating women by the first reader as part of the DM reading arm.The study sample was investigated in several previous publications (Additional file 1), though screening performance had not been investigated by automatically assessing breast density.Breast density was retrospectively assessed with the automated software LIBRA for this study (Fig. 2).Breast area and absolute dense area (DA) were analyzed for each processed DM view, resulting in four analyzed images per woman (two in women with one breast) that were combined for a mean value.The mean value of breast percent density (PD) was calculated by dividing DA by breast area.Final exclusion criteria were inability of LIBRA to perform an analysis and the presence of breast implants.

Definitions
Previous screening was defined as a woman who had participated in the regional screening program in Skåne, Sweden in 2005 or later.Menopausal status was defined by age at DBT screening as premenopausal (< 55 years) or postmenopausal (≥ 55 years) [20].

Study outcomes
Outcomes, calculated per woman, were sensitivity, specificity, and CDR for breast cancer per 1,000 women screened, as well as FP rate, recall rate, biopsy rate, positive predictive value for recall, and positive predictive value for biopsy.A subgroup analysis was conducted for women aged 40-49 years.

Statistics
The study participants were divided into quintiles of PD and DA per increasing density.The outcomes of the DBT reading arm were compared with those of the DM reading arm for each breast density quintile.The density subgroups were not pre-specified in the study protocol.The sensitivity and specificity of DBT and DM were

Participant characteristics
This study included 14,730 women after exclusions (95 due to breast implants and 23 due to missing LIBRA values) (Fig. 1) at a median age at inclusion of 58 years (inter-quartile range = 16).Further descriptive data are presented in Table 1.One woman, later presenting with interval cancer, was recalled from the screening examination but without any cancer found at follow-up.This woman is included both as an FP and as a participant with interval cancer.

Breast percent density and absolute dense area
The median PD and DA were 21.6% and 33.2 cm 2 , respectively.Each quintile contained 2945-2947 women.Two women at the cut-off value between quintiles 3 and 4 had an equal DA.Descriptive data for all quintiles are presented in Table 2.

Logistic regression
In the logistic regression models, after adjustment for menopausal status and previous screening, higher PD  Cancer detection rate and false positives CDR was higher with DBT compared with DM in all five quintiles, both for PD and DA.However, the CI for difference included zero for all quintiles except the highest PD quintile (Fig. 4 and Additional file 2: Table S2).The largest difference between DBT and DM was found in the highest PD and DA quintiles, with 4.8 (95% CI 0.3-9.3)and 4.4 (95% CI − 0.1-9.0)additional cancer detections per 1,000 women screened, respectively.FP rates were also higher for DBT compared with DM for all PD and DA quintiles, although with CI for difference overlapping zero for PD quintiles 1 and 4 and DA quintiles 1 and 2.

Recall, biopsy rate, positive predictive value for recall, and positive predictive value for biopsy
Recall rates were highest for both DBT and DM in the highest PD and DA quintiles (Fig. 5).Biopsy rates were higher for DBT compared with DM for all PD and DA quintiles, albeit with CI for difference overlapping zero for PD quintiles 1-4 and DA quintiles 1-3 (Additional file 2: Table S3).The positive predictive values for recall and biopsy were similar between DBT and DM across all PD and DA quintiles.

Exploratory analysis
An exploratory test analyzed which quintiles had the largest difference in CDR when using DBT.For PD, the largest gain was in quintile 5 alone, so no further testing was done.For DA, the largest gain in CDR occurred in quintiles 4 and 5.When these quintiles were analyzed together, the incremental CDR was 4.1 (95% CI 0.7-7.4)additional cancer detections per 1,000 women screened for DBT compared with DM.The corresponding incremental FP rate for the DA quintiles with DBT compared with DM was 1.4 percentage points (95% CI 0.7-2.0).

Women 40-49 years old
For women aged 40-49 years, the median PD and DA were 35.8% and 43.9 cm 2 , respectively.Additional file 2:

Outcome by BI-RADS density category
For completeness and reference, data outcomes by BI-RADS density category are presented in Additional file 2: Tables S6-S9.These data, featuring FP, CDR, and BI-RADS density distribution results, were published in part in previous studies [18,[21][22][23].

Discussion
The 3) additional women with breast cancer identified per 1000 screened.The largest difference in specificity between DM and DBT, with lower results for the latter, was also seen in women in the highest PD quintile; however, specificity was still high (95.5%)for DBT.Among women aged 40-49, the sensitivity of DBT was higher compared with DM in most density categories for both PD and absolute dense area (DA).
In the USA, DBT is widely implemented in screening since several years, especially among women with dense breast.However, in 2021, the European Commission Initiative on Breast Cancer published a conditional recommendation for DBT in screening women with dense breasts, albeit with "very low certainty of the evidence" < 0.001  [24].Both European [24] and American recommendations [25] dichotomized breast density categories.Two studies with more detailed density sub-analyses with automatic breast density assessment that analyzed data from prospective trials, the Oslo Tomosynthesis Screening Trial [14] and Tomosynthesis trial in Bergen [15], did not find a significantly higher CDR for DBT compared with DM for women with the densest breasts.However, in the Oslo Tomosynthesis Screening Trial, the higher CDR for the densest group with DBT compared to DM was of similar magnitude (21.7% (95% CI 3.0-41.9),p = 0.06) as the incremental rate for the subgroup with the second highest breast density (22.6% (95% CI 12.9-32.9),p < 0.001) [14].In the Tomosynthesis trial in Bergen, no difference in CDR between any density subgroups in DBT and DM was found [15].These differences in findings in comparison with this study could have derived from the smaller sample sizes of the densest subgroups in both the Oslo Tomosynthesis Screening Trial and Tomosynthesis trial in Bergen.As well as that the Tomosynthesis trial in Bergen did not find any difference in CDR overall [26], in contrast to several other European trials [1].A detailed density sub-analysis of the prospective Tomosynthesis plus Synthesized Mammography trial, which used the BI-RADS density categorization, found a significantly higher CDR with DBT compared with DM for women with the highest breast density (OR 3.8 (95% CI 1.5-11.1)),which is in agreement with the present study's findings [16].Neither the Tomosynthesis trial in Bergen nor the Oslo Tomosynthesis Screening Trial found any significant difference in FP between DBT and DM among women with the highest breast density [14,15].These different results compared with this study could again be due to the smaller sample size among the densest subgroups and the Oslo Tomosynthesis Screening Trial's FP rate being derived before the consensus meeting.Automated breast density assessment enables reproducibility.LIBRA can assess breast density in both raw and processed images [12], which is beneficial since in clinical settings, it is common that only processed images are stored [27].Whether PD or DA should be used for breast density assessment is still debatable [28], although it has been suggested that PD has a higher correlation with breast cancer risk [29].The current study's results showed similarities between PD and DA, but in exploratory analyses, a larger group that benefits more from DBT in terms of increased CDR could be identified with DA.Still, this study was not designed to compare the two different breast density assessment methods.
The current study does have limitations.The subgroup division and post hoc analysis were not powered in the original trial, though significant differences were still found in the higher breast density subgroups.DM's FP rate in this trial could also be underestimated due to the DBT images available at the consensus meeting, which caused DM to be favored.The LIBRA assessments were not manually reviewed, though LIBRA has previously been validated for Siemens images, with a strong association with radiologists' density assessments (r = 0.89) [20].Images with failed LIBRA readings, due to bad positioning of the breast, were excluded in the study.However, the number of failed readings were low (n = 23).Further, the density measurement with LIBRA was assessed area-based from DM-images.A stronger association with breast cancer, has however, previously been shown for volumetric measurements from DBT [30].Finally, the subgroup of women aged 40-49 was small, so these results should be interpreted with caution.
The findings in this study add important knowledge to the scarce evidence regarding DBT screening in women with the densest breasts, showing greatest impact for

Conclusion
In conclusion, women with high mammographic density, as assessed with automatic density software, had the greatest benefit from digital breast tomosynthesis screening compared with digital mammography, as it improved cancer detection for 20-40% of the screening population at the cost of a small decrease in specificity.These results may influence digital breast tomosynthesis's use in a future individualized screening program stratified by, for instance, breast density.

Fig. 2
Fig. 2 Participant images with density assessment.Images from the Laboratory for Individualized Breast Radiodensity Assessment (LIBRA) of a woman without cancer, 47 years old, who participated in the Malmö Breast Tomosynthesis Screening Trial.The woman was not recalled from screening.Breast density assessment with the LIBRA showed breast density corresponding to the fourth quintiles of both breast percent density and absolute dense area.Left images show the craniocaudal (upper) and mediolateral oblique (lower) view from digital mammography without density assessment.Right images show the same projections with density assessment.The total breast areas are marked in red and the dense areas in green

Fig. 3 a
Fig.3 a-dGraphs of sensitivity and specificity.Graph of (a and b) sensitivity (sens) and (c and d) specificity (spec) of breast percent density (PD) and absolute dense area (DA) in all quintiles for digital breast tomosynthesis (DBT) and digital mammography (DM), with 95% confidence intervals as vertical lines.Dotted lines mark overall sensitivity and specificity for DBT and DM

Fig. 4 aFig. 5
Fig. 4 a and b Graphs of differences in cancer detection and false positives.Graph of differences in cancer detection rate (CDR) per 1000 women screened and false positives (FP) in percentage points between digital breast tomosynthesis and digital mammography for all (a) breast percent density (PD) and (b) absolute dense area (DA) quintiles.Dotted lines mark overall difference in CDR and FP

Table 1
Descriptive data of the study population LIBRA Laboratory for Individualized Breast Radiodensity Assessment; PPV-1 positive predictive value of recall; PPV-3 positive predictive value of biopsy; SD standard deviation; IQR interquartile range a In total, 929 women did not have a recorded Breast Imaging Reporting and Data System (BI-RADS) 4 th ed.Breast density measurement

Table 2 a
Descriptive statistics of breast percent density quintiles b Descriptive statistics of absolute dense area quintiles PD breast percent density; BI-RADS Breast Imaging Reporting and Data System 4 th ed; DA absolute dense area

Table 3
Sensitivity of digital breast tomosynthesis and digital mammography in all quintilesDBT digital breast tomosynthesis; CI confidence interval; DM digital mammography; PD breast percent density; DA absolute dense area

Table 4
Multivariable logistic regression for detected breast cancers and false positive recall DBT digital breast tomosynthesis; OR odds ratio; CI confidence interval; DM digital mammography; FP false positive; PD breast percent density; DA absolute dense area

Table 5
Sensitivity, specificity, and cancer detection rate among women 40-49 years old DBT digital breast tomosynthesis; CI confidence interval; DM digital mammography; CDR cancer detection rate per 1000 women screened; PD breast percent density; DA absolute dense area