 Research article
 Open Access
 Published:
Overdiagnosis in the populationbased organized breast cancer screening program estimated by a nonhomogeneous multistate model: a cohort study using individual data with longterm followup
Breast Cancer Research volume 20, Article number: 153 (2018)
Abstract
Background
Overdiagnosis, defined as the detection of a cancer that would not become clinically apparent in a woman’s lifetime without screening, has become a growing concern. Similar underlying risk of breast cancer in the screened and control groups is a prerequisite for unbiased estimates of overdiagnosis, but a contemporary control group is usually not available in organized screening programs.
Methods
We estimated the frequency of overdiagnosis of breast cancer due to screening in women 50–69 years old by using individual screening data from the populationbased organized screening program in Stockholm County 1989–2014. A hidden Markov model with four latent states and three observed states was constructed to estimate the natural progression of breast cancer and the test sensitivity. Piecewise transition rates were used to consider the timevarying transition rates. The expected number of detected nonprogressive breast cancer cases was calculated.
Results
During the study period, 2,333,153 invitations were sent out; on average, the participation rate in the screening program was 72.7% and the average recall rate was 2.48%. In total, 14,648 invasive breast cancer cases were diagnosed; among the 8305 screendetected cases, the expected number of nonprogressive breast cancer cases was 35.9, which is equivalent to 0.43% (95% confidence interval (CI) 0.10%–2.2%) overdiagnosis. The corresponding estimates for the prevalent and subsequent rounds were 15.6 (0.87%, 95% CI 0.20%–4.3%) and 20.3 (0.31%, 95% CI 0.07%–1.6%), respectively. The likelihood ratio test showed that the nonhomogeneous model fitted the data better than an agehomogeneous model (P <0.001).
Conclusions
Our findings suggest that overdiagnosis in the organized biennial mammographic screening for women 50–69 in Stockholm County is a minor phenomenon. The frequency of overdiagnosis in the prevalent screening round was higher than that in subsequent rounds. The nonhomogeneous model performed better than the simpler, traditional homogeneous model.
Background
The populationbased organized screening program with mammography in Stockholm County started in 1989. During the first two years (1989–1990), 80,000 women per year were invited; during the latest five years (2010–2014), the number of women invited per year has increased to 100,000. In a metaanalysis in which 13 areas within nine counties in Sweden were combined, a 43% mortality reduction was found for women actually screened in the screening epoch compared with the prescreening epoch after adjustment for selfselection bias. In the four areas in Stockholm County, the mortality reductions were 36%–54% and 18%–41% in women screened and invited, respectively [1].
Harms of screening, especially overdiagnosis, have become a growing concern. Overdiagnosis is defined as the detection of a cancer that would not become clinically apparent in a woman’s lifetime without screening. It can result from either detecting a nonprogressive cancer or detecting a progressive cancer and the patient dies before the cancer becomes clinically detectable. However, on the individual level, it is currently impossible to determine whether a screendetected cancer has been overdiagnosed or not. The frequency of overdiagnosis can be estimated at only a group level, which complicates estimation.
Ideally, the frequency of overdiagnosis can be estimated from comparing the excess cumulative breast cancer (BC) incidence between screened and unscreened women. Similar underlying risk of BC in the two groups is a prerequisite for unbiased estimates which might be the case in randomized controlled trials (RCTs), but a contemporary control group is usually not available in the organized screening since the entire population is invited [2]. A possibility is to estimate the incidence rate in the prescreening period and then extrapolate to obtain an expected incidence in the absence of screening. However, the incidence trend in the screening period in the absence of screening is unknown and assumptions made in extrapolating the incidence rate—such as type of regression model, duration of prescreening period, and screened age range—will have an impact on the estimates [3]. An alternative is to estimate the frequency of overdiagnosis using multistate models through estimating the natural history of BC during the screening epoch only.
Multistate models have been widely used to characterize the natural course of diseases. The simplest model describing the progression of BC includes three states: free of BC, preclinical screendetectable phase (PCDP) (asymptomatic but detectable by screening), and clinical phase (CP) (disease with clinical symptoms). The rate of moving to another state is called the transition rate, and the duration of staying in the PCDP is called the sojourn time. Subjects with long sojourn times can be thought of as slowgrowing or nonprogressive cancers that may be overdiagnosed cases if detected by screening. The threestate model can be extended to a fourstate model by dividing the preclinical phase into progressive and nonprogressive PCDPs to represent the true early detected cases and overdiagnosed cases if they were detected by screening. Several multistate models using constant rate have been developed to estimate the frequency of overdiagnosis in BC screening [4,5,6,7]. In this study, we developed a nonhomogeneous model to cope with agespecific transition rates. In a recent study, the estimates from this model were validated and found to be comparable to the results from the cumulative incidence approach in a randomized trial setting [8]. We applied this method to estimate the frequency of overdiagnosis in the organized screening program in Stockholm County, Sweden, using individual data on screening history and mode of detection.
Methods
Study population
Populationbased, organized screening with mammography started in Stockholm in 1989. Women 50–69 years old were invited to screening every 24 months. Between 2005 and 2009, the screening program was gradually extended to women 40–49 years old; from 2012, women 70–74 years old were also invited. To estimate the frequency of overdiagnosis for women 50–69 years old, women born in 1920–59 and invited to screening in 1989–2014 (N = 417,710) were considered; that is, women 40–49 years old at invitation were not included in the study. Only 9.65% of women born in 1938–44 were invited to screening after age 69 (N = 40,308).
From the start of screening, individual screening information on invitation, participation, recall for further assessment, and screening results as well as findings from the diagnostic procedures following a positive screening result were regularly recorded in the regional screening register [9]. The unique identification number for each woman was used to link the screening data to the StockholmGotland Cancer Register (which is a regional part of the national cancer register founded in 1958) to identify BC cases. International Classification of Disease (ICD) site code 170 or C50 and histopathological code C − 24; 096,146,196, 896, and 996 were used to define invasive BC cases. Women with BC diagnosed before their first invitation date were excluded.
Statistical methods
The yearly number of women invited, screened, and recalled for further assessment as well as the participation and recall rates are presented in Table 1. Mode of detection based on the EU guidelines was determined for BC cases by using the individual screening histories and outcome of screening and categorized into screendetected case at the prevalent and the subsequent screening rounds, interval cancer (IC), and nonparticipant (NP) [10]. Individual personyears were calculated from the date of first invitation to the date of BC diagnosis, two years after the last invitation or December 31, 2014, whichever came first. The IC ratio was calculated as the number of ICs divided by the number of prevalent screendetected cases (PSDs), subsequent screendetected case (SSDs), and ICs. The agespecific BC incidence rate is reported in four age groups: 50–54, 55–59, 60–64, and 65–69 (Table 2). For women who were invited to screening after age 69, the personyears and number of breast cases were included in the 65–69 age group.
The fourstate Markov model
To model the data collected from the screening program, a hidden Markov model with four latent states and three observed states was used (Fig. 1, redrawn and modified from the original in [8]). Let X(t) denote the underlying disease process which is unobserved or hidden and is assumed to follow the Markov property, which means that the future status depends only on the current status and is independent of all states before. The four latent states are (1) free of BC, (2) progressive PCDP, (3) CP, and (4) nonprogressive PCDP. If the progression of BC is an irreversible procedure, the transition rate (Λ(t)) at time t is as follows.
The transition rate from state i to state j at age t is defined by
The transition probability from state i to state j is defined by P_{ij}(s, t) = Pr(X(t) = j X(s) = i) for 0 ≤ s ≤ t.
We assumed that initiation time (t_{0}) of the disease process is age 40. It implied that women who participated in the screening program were assumed to be free of BC before age 40. The transition rates were assumed to be constant within intervals, and the transition rate matrix could be expressed by Λ(t) = Λ^{l} with entries \( {\lambda}_{ij}(t)={\lambda}_{ij}^{(l)} \). In our model, three age intervals, [40, 50),[50, 60), and [60, ∞), which were denoted by l = 1, 2, and 3, were defined. In addition, we reparametrized the transition rate from state 1 to state 4 by assuming that the transition rate from state 1 to state 4 was proportional to the rate to state 2 over time. The ratio of two transition rates was represented by \( r=\frac{\lambda_{14}(t)}{\lambda_{12}(t)} \) .
The observed states and measurement error
The individual screening history, including invitation and participation, combined with outcome of screening indicated the subject’s observed disease states in the screening period. The information was described by three observed states denoted by Y(t), including (1) negative finding, (2) screendetected case, and (3) clinical case. The observed states depend on not only the true disease states but also on the accuracy of the screening program. A preclinical BC case might be misclassified as a negative finding (falsenegative case) of mammography. The probability of detecting a progressive or a nonprogressive PCDP case was defined as test sensitivity (S). In contrast, a subject free of BC may be misclassified as an abnormal result of mammography (falsepositive case). However, as further diagnostic examinations were performed to confirm the diagnosis, the probability of being misclassified as a cancer case,Pr(Y(t) = 2 X(t) = 1), was set to zero. We assumed that the cases in the CP will seek medical care and thus can be identified in the cancer register. The misclassification matrix (E) was defined as follows:
We assumed the misclassification matrix to be constant over age and the test sensitivity of progressive and nonprogressive PCDP cases to be the same due to nondifferential pathological findings (that is, Pr(Y(t) = 2 X(t) = 2) = Pr(Y(t) = 2 X(t) = 4) = S ). In addition, a falsenegative case was assumed to be detected in the next screening round (if not detected clinically before that) to simplify the likelihood function [8].
Maximum likelihood estimation
The individual observed states identified from the screening histories combined with the outcomes of screening were used to construct the likelihood function through the transition probabilities. The transition probabilities can be derived from the transition rates by solving the forward Kolmogorov equation [11]. The likelihood function was similar to that used by Wu et al. [8]. In brief, the likelihood contribution of a sequence of observations on an individual subject can be represented by transition probabilities and misclassification probabilities according to the observed states [12]. Since individual screening histories were collected from age 50, the transition rates before age 50 were intractable. The transition rate from state 1 to state 2 for time interval [40, 50) was obtained from the average agespecific incidence in 1989–2004 in Stockholm as reported in the cancer register [13]. Because the closedform solutions of parameters did not exist, a numerical procedure was required to estimate the parameters and maximize the likelihood. The quasiNewton (Broyden–Fletcher–Goldfarb–Shanno) and Nelder–Mead methods were used to find the maximum likelihood estimates (MLEs) using the package optimx in R software [14]. We applied the Karush–Kuhn–Tucker conditions to determine whether the −2*log likelihood indeed converged [15]. The standard errors of the estimates of parameters were obtained from the inverse of the Hessian matrix of the maximized loglikelihood function. A homogeneous Markov model based on constant transition rates was also estimated, and likelihood ratio test was used to compare the homogeneous and nonhomogeneous models. All calculations were performed using the R statistical software.
To check whether the model fitted the data, the observed and expected cumulative incidence curves in the everattenders were plotted. The expected number of BC cases was calculated on the basis of the individual screening histories and MLEs of parameters. The annual observed incidence rate was first calculated and the expected number of BC cases in each year was calculated by summing the probability that an individual was in PCDP or CP given her previous states over all atrisk subjects. Followup of clinically detected cases was continued until the next supposed examination time (two years after latest scheduled time) [16].
Estimation of frequency of overdiagnosis
There are several different definitions of overdiagnosis in the literature because of the choice of denominator [17]. In the present study, overdiagnosis due to screen detection of nonprogressive cancer was estimated. We used the number of screendetected cases as the denominator and the expected number of detected nonprogressive BCs as the numerator. Here, the expected number was calculated as the number of screendetected cases at each screening round multiplied by the estimated probability that a detected BC would be nonprogressive. The estimated probability was calculated as follows:
Here, k denotes the screening round and the corresponding age is t_{k} \( {\widehat{P}}_{ij}\left({t}_{k1},{t}_k\right) \) represents the estimated transition probability from state i at time t_{k − 1} to state j at time t_{k}, and\( \widehat{S} \)denotes the estimated test sensitivity. The first part of the denominator represents the probability of a subject who stays in state 1 before t_{k − 1}, transits to either state 4 (A_{1}) or state 2 (B_{1}) between t_{k − 1} and t_{k}, and then is detected by screening. The second part of the denominator represents the probability of a subject who transits to state 4 (A_{2}) or 2 (B_{2}) between t_{k − 2}and t_{k − 1} but is not detected at time t_{k − 1} and stays in same state at time t_{k}. Because the probability for staying in state 4 is one, it is omitted from the equation. The numerator represents the probability of a subject who transits from state 1 to state 4 between either time t_{k − 1} and t_{k} (A_{1}) or t_{k − 2} and t_{k − 1}(A_{2}).
The 95% confidence interval (CI) of frequency of overdiagnosis was estimated by simulating the variation in the estimation of overdiagnosis. We drew the values from the multivariate normal distribution (with mean vectors equal to the MLEs of parameters and the covariance matrix equal to the estimated covariance matrix) 1000 times and computed the expected number of nonprogressive BCs for each drawn value to reflect the sample variation of overdiagnosis [18].
Results
During the study period 1989–2014, 2,333,153 invitations were sent and 1,695,872 women participated in the screening program (Table 1). The average participation rate was 72.7%. The yearly recall rate varied between 2.02% and 3.30% (median 2.48%).
The incidence rates of invasive BC in women 50–54, 55–59, 60–64, and 65–69 years old were 260, 279, 350, and 367 per 100,000 women, respectively (Table 2). Almost half (48.6%) of the PSDs were found in women 50–54. The IC ratio was higher in women 50–59 (33.3%) than in women 60 or older (27.3%).
The transition rates from free of BC to progressive PCDP were 276 and 381 per 100,000 womenyears for women 50–59 and 60–69, respectively (Table 3). The mean sojourn times (MSTs) in age 40–49, 50–59, and 60–69 were 2.60 (95% CI 2.31–2.89), 2.16 (95% CI 2.03–2.29), and 3.52 (95% CI 3.31–3.73) years, respectively. The ratio of λ_{14}(t) toλ_{12}(t) was 0.00182 (95% CI 0–0.00523), and the test sensitivity was 88% (95% CI 85.2%–90.9%). The likelihood ratio test showed that the nonhomogeneous model fits the data better than the homogeneous model (chisquared = 854 with 3 degrees of freedom, P <0.001), which is visually confirmed by the cumulative observed and expected incidence curves (Fig. 2). The results showed that the homogeneous model overestimated the risk of BC in the ages of 50–59 and underestimated the risk in the ages above 60. The expected cumulative incidence curve in the nonhomogeneous model was close to the observed incidence curve, indicating that the model fit was adequate. Table 4 shows the estimation of overdiagnosis from nonprogressive detected BC cases. Among the 8305 screendetected invasive cases, the expected number of nonprogressive BC cases detected by screening was 35.9, which corresponds to 0.43% (95% CI 0.10%–2.18%) overdiagnosis. There were 15.6 (0.87%, 95% CI 0.20%–4.31%) and 20.3 (0.31%, 95% CI 0.07%–1.59%) estimated nonprogressive BC cases in the prevalent and subsequent rounds of screening, respectively.
Discussion
We used a nonhomogeneous multistate model to estimate the frequency of overdiagnosis from the expected number of nonprogressive BCs among the screendetected cancer cases in the populationbased organized screening program with mammography in women 50–69 in Stockholm County, Sweden. We found that only 0.43% of the screendetected invasive cases were overdiagnosed. The frequency of overdiagnosis in the prevalent round was three times higher than that in the subsequent rounds. We showed that the nonhomogeneous model fitted the data better than the homogeneous model.
Published estimates of the frequency of overdiagnosis have varied because of type of screening program, study design, choice of control group, estimation method, and adjustment for lead time (the time by which screening advances the diagnosis compared with absence of screening) [2, 19,20,21,22]. Overdiagnosis based on Swedish data has been estimated in four RCTs and two observational studies.
Estimation of overdiagnosis based on the Swedish RCTs
In the Stockholm trial, two screening rounds were performed for 40,318 women 40 to 64 years old, and 20,000 controls were also invited to one screening round at the end of the trial. Two estimates of overdiagnosis were made on the basis of the Stockholm trial. Gotzsche found 49% overdiagnosis by comparing the relative risk of all BCs in the screening period [23]. However, failing to separate the increasing incidence due to earlier detection of cancer (the socalled leadtime problem) results in overestimation of overdiagnosis. Moss found that the invasive (0.81 versus 0.85 per 1000 personyears in the screened and in the control groups) and all BC cumulative incidence rates (0.88 versus 0.91 per 1000 personyears in the screened and the control groups) were similar between two arms over 15 years followup [24]. It should be noted that because the control group was also invited to a single screen, which might lead to some overdiagnosis in the control group, overdiagnosis probably was underestimated in this approach. Our findings of the frequency of overdiagnosis in the subsequent screening rounds of the organized screening program were consistent with Moss’s finding in the Stockholm trial that showed no evidence of overdiagnosis as a result of incident screens [25].
Similar results were found in the TwoCounty trial and the Gothenburg trial; −0.02 and −0.03 per 1000 absolute excess cumulative incidences of all BCs were found in the screened group in the two trials, respectively [24]. Duffy et al. applied a homogeneous fourstate model to quantify the overdiagnosis [4]. The frequency of overdiagnosis in the prevalent and the two subsequent rounds were 3.1%/4.2% and 0.3%/0.3%, respectively, in the TwoCounty/Gothenburg trials. These estimates are in line with our estimates and confirm a low level of overdiagnosis. Using 29year followup data from one county of the TwoCounty trial, Yen et al. further confirmed that screening did not lead to excess incidence of BC in the screened group (risk ratio 1.00) where 100,000 additional screens were performed compared with the control group. No evidence of overdiagnosis for invasive or in situ BC was found [26].
The estimate for overdiagnosis based on the Malmö I trial has been considered reliable because of the stopscreen design (women in the control group were never invited to screening) and an adequate followup time [27, 28]. Zackrisson et al. estimated the frequency of overdiagnosis at 10% for all BCs and 7% for invasive BCs in women 55 to 69 at random assignment by comparing the incidence rate between the invited and control groups at 15 years of followup, after the end of the trial [28]. The United Kingdom BC panel recalculated the estimated overdiagnosis by comparing the excess numbers with different denominators, such as the number of cancers diagnosed over the whole followup period in the control group/invited group or the number of cancers/screendetected cancers diagnosed in the screening period in the invited group [27]. The estimates varied from 11% to 29%. Although RCTs may provide a good opportunity to quantify overdiagnosis, the generalizability for the current organized screening program remains dubious.
Estimation of overdiagnosis based on the organized screening programs
Zahl et al. used the agespecific incidence of invasive BC during 1971–2000 to quantify the increasing incidence after introduction of mammographic screening in Sweden [29]. They estimated that the frequency of overdiagnosis in women 50–69 years old was 45%; however, lead time was not properly adjusted for and the increase in the incidence over time was not considered. Jonsson et al. also applied the incidence rate approach to quantify overdiagnosis in 11 out of 20 countries after implementing organized screening [30]. The prescreening incidence (15 years before the start of screening) was used to calculate the expected incidence in the absence of screening during the screening period until year 2000. In the stable phase, overdiagnosis rates were estimated at 54% and 21% for the 50–59 and 60–69 age groups, respectively, after leadtime adjustment. It should be noted that the increased incidence might result from a prevalent screening effect among newcomers, potential changes in risk factors leading to changing trends, and so on; therefore, it should not be attributed entirely to overdiagnosis [30]. However, the data from the organized screening program in Stockholm County were not included in this study and the choice of prescreening period might have influenced on the estimation of overdiagnosis [3]. Therefore, it was difficult to compare with our findings.
Estimation of mean sojourn time
Our estimates of the MSTs for women ages 40–49, 50–59, and 60–69 (2.60, 2.16, and 3.52 years) were lower than previously reported MSTs in the TwoCounty trial (2.44, 3.70, and 4.17 years) [31]. There are several reasons for the shorter MSTs in our study. First, the sojourn time in our model represents the sojourn time in the progressive BCs. The nonprogressive BCs having infinite sojourn time were separated. Second, it has been shown that there is an association between hormone replacement therapy (HRT), BC risk, and sojourn time [32, 33]. In Sweden, the use of HRT increased starting in 1990 and decreased after 2002 and the majority of HRT use was in the 50–59 age group [34]. HRT use increases the risk of invasive lobular carcinoma, which has a shorter sojourn time than ductal carcinoma [35]. This might explain why we got lower estimates of MST in women 50–59 years old. Another explanation can be that Duffy et al. used age at random assignment to classify the population into age groups regardless of how old they were at the end of the study and the MST was estimated separately for these groups. For example, women 50–59 at random assignment were 57–66 years old at the end of the trial and thus were an average of 3–4 years older during the study period. The estimate of longer MST in 50–59 age group found by Duffy et al. may be partially attributable to the longer MST observed at ages 60–69. In contrast, in our model, a woman might contribute to the likelihood for estimation of MST in different age intervals as they move through the age groups.
Concerns about the in situ cancers
The incidence of ductal carcinoma in situ cancer has increased significantly since the introduction of the organized screening program [36]. This increase has been considered to be a marker of overdiagnosis [37]. A sensitivity analysis was performed by combining the in situ and invasive BCs in the same state to estimate overdiagnosis (Additional files 1 and 2). There were 51.6 nonprogressive detected BC cases found among 9631 in situ and invasive BC cases, corresponding to 0.54% overdiagnosis (Additional file 3). Although slightly higher overdiagnosis was found, the frequency of overdiagnosis was still low. Similar findings were demonstrated by the sixstate model in women 40–49, suggesting that the majority of screendetected in situ cancers would have presented clinically in the absence of screening [7, 38].
Strengths and limitations
Our estimate of overdiagnosis based on a nonhomogeneous model and largescale screening data has several strengths. First, to the best of our knowledge, this is the first study using individual screening histories to quantify the frequency of overdiagnosis in the Stockholm organized screening program. Aggregated data used in most studies cannot reflect the actual exposure of screening. In the Stockholm screening program, individual screening histories were collected from the start, qualitychecked, and regularly stored in the register. Date of and status of participation, mammographic results, followup assessments, and cancer outcomes were prospectively recorded [9, 39]. Second, the unscreened population (expected number of BCs in the absence of screening) was obtained from natural history modeling that provides the same characteristics (risks) between the unscreened population and screened population. Bias, which results from the choice of control groups, could thus be prevented. Third, agespecific incidence and sojourn time were taken into account in our model. Piecewise constant transition rates were used in the nonhomogeneous model and fitted the data better than the traditional homogeneous model with constant rates.
There are some limitations which may have influenced our estimates of overdiagnosis. First, the detection mode of BC cases might be misclassified. For example, women born in 1920–1941 might have been invited to at least one screening round in the Stockholm trial. The SSDs might be misclassified as PSDs. Lidbrink et al. found that, in the organized screening in Stockholm, the tumor size in the screening units performing prevalent screening was similar to that in the unit where the trial was conducted [39]. Mammography performed outside of the organized screening program might also bias our results. The participation rate in the more densely populated counties in Sweden, including Stockholm County, has been lower than in other counties in Sweden, which might be due to higher access to private mammography, in particular during the first years of the program [1, 40]. The cancer cases diagnosed in the private sector might be misclassified as NPs in the organized screening program. The risk of progressing to the CP might be overestimated, leading to an underestimation of overdiagnosis.
Second, overdiagnosis resulting from the detection of progressive cancers in women who died before the cancer became symptomatic was not counted in our study. A possible extension of the model is to consider death as a separate state [7, 41]. Besides, in Stockholm, the allcause mortality rates in women 50–59 and 60–69 from year 1989 to 2014 were 3.2 and 7.98 per 1000 womenyears [42]. Thus, during a 2.16year sojourn time, 6.91 deaths per thousand women might be expected in those progressive screendetected breast cases at ages 50–59. In other words, a reasonable estimate of overdiagnosis due to death would be approximately 0.69% for women 50–59. The corresponding estimate for the women 60–69 would be 2.8% given a 3.52year sojourn time. The true value will be lower after considering the difference between lead time and sojourn time.
Third, the assumptions made in modeling of the natural history should be considered. We assumed the test sensitivity to be constant over time, age, and type of BC. Owing to lack of data, the effect of improvement of screening tools, like digital mammography, was not possible to take into account. In addition, falsenegative cases were assumed to be detected in the next screening round for simplification of the likelihood function. It might slightly overestimate the test sensitivity. Furthermore, our nonhomogeneous model requires a certain initiation time where the true state either is known or can be modeled [12]. We restricted the model so that the risk of BC was zero before the age of 40 years. A sensitivity analysis was performed to check this model assumption. The results were similar when the initiation time was assumed to be 35 or 45 years (not shown). The average incidence rate from the Stockholm cancer register for the years 1989–2004 was used to approximate the transition rate from free of BC to progressive PCDP in the 40–49 age group. The underlying preclinical incidence rate might be higher than the clinical incidence rate since the incidence increases with age. This might affect other estimates of parameters, especially the MST in women 40–49, if the rate does not represent the background incidence.
Fourth, a more robust CI of the estimate of overdiagnosis could be calculated through the bootstrapping method. However, since the individual screening history for over 400,000 women was used for estimation, the estimation procedure was very timeconsuming.
Another important issue is that the likelihood function might be flat and lead to an identifiability problem. Even if our model fitted the data well, we cannot exclude a misspecification of the model. The genuine progression of BC could have been oversimplified in our fourstate model. Owing to insufficient or incomplete (censored) data, it might be difficult to get the correct estimates. To successfully estimate other parameters and to further quantify the overdiagnosis, our model relies on the information from clinically detected cancers, including ICs and NPs (who were the most informative cases in the dataset because the exact transition time to CP is known). Earlier, we showed that a certain proportion (5–10%) of an unscreened group, like neverattenders, can stabilize the model [8]. Therefore, the inspection of the likelihood function seems to be necessary. We have checked the Karush–Kuhn–Tucker conditions to make sure the optimization algorithm indeed converged.
Our findings provide the following insights for future research. First, it would be valuable to assess the period effects on the BC risk, sojourn time, and sensitivity to investigate how exposure to HRT and digital mammography have influenced the frequency of overdiagnosis. Second, further comparison of the overdiagnosis estimates based on other evaluation methods, like the cumulative incidence method, in the same dataset needs to be carried out to provide more solid evidence for policy makers to further confirm the findings [21].
The frequency of overdiagnosis in the organized screening program depends on the latent proportion of nonprogressive BC cases but also on the screening program, where a higher participation rate or improved sensitivity due to better screening instruments will lead to higher frequency of overdiagnosis. The balance between benefit and harm of screening should be considered and thus regularly monitoring the mortality reduction and overdiagnosis will be necessary [43].
Conclusions
Our findings suggest that the overdiagnosis in the populationbased organized biennial mammographic screening for women 50–69 in Stockholm County, Sweden, is a minor phenomenon. The proportion of frequency of overdiagnosis in the prevalent screening round was higher than that in subsequent rounds but still low. The nonhomogeneous model fitted the data better than the simpler, traditional homogeneous model.
Abbreviations
 BC:

Breast cancer
 CI:

Confidence interval
 CP:

Clinical phase
 HRT:

Hormone replacement therapy
 IC:

Interval cancer
 MLE:

Maximum likelihood estimate
 MST:

Mean sojourn time
 NP:

Nonparticipant
 PCDP:

Preclinical screendetectable phase
 PSD:

Prevalent screendetected case
 RCT:

Randomized controlled trial
 SSD:

Subsequent screendetected case
References
 1.
Swedish Organised Service Screening Evaluation Group. Reduction in breast cancer mortality from organized service screening with mammography: 1. Further confirmation with extended data. Cancer Epidemiol Biomark Prev. 2006;15:45–51.
 2.
Etzioni R, Gulati R, Mallinger L, Mandelblatt J. Influence of study features and methods on overdiagnosis estimates in breast and prostate cancer screening. Ann Intern Med. 2013;158:831–8.
 3.
Ripping TM, Verbeek AL, Ten Haaf K, van Ravesteyn NT, Broeders MJ. Extrapolation of prescreening trends: Impact of assumptions on overdiagnosis estimates by mammographic screening. Cancer Epidemiol. 2016;42:147–53.
 4.
Duffy SW, Agbaje O, Tabar L, Vitak B, Bjurstam N, Bjorneld L, et al. Overdiagnosis and overtreatment of breast cancer: estimates of overdiagnosis from two trials of mammographic screening for breast cancer. Breast Cancer Res. 2005;7:258–65.
 5.
Olsen AH, Agbaje OF, Myles JP, Lynge E, Duffy SW. Overdiagnosis, sojourn time, and sensitivity in the Copenhagen mammography screening program. Breast J. 2006;12:338–42.
 6.
Yen MF, Tabar L, Vitak B, Smith RA, Chen HH, Duffy SW. Quantifying the potential problem of overdiagnosis of ductal carcinoma in situ in breast cancer screening. Eur J Cancer. 2003;39:1746–54.
 7.
Gunsoy NB, GarciaClosas M, Moss SM. Modelling the overdiagnosis of breast cancer due to mammography screening in women aged 40 to 49 in the United Kingdom. Breast Cancer Res. 2012;14:R152.
 8.
WYY W, Nyström L, Jonsson H. Estimation of overdiagnosis in breast cancer screening using a nonhomogeneous multistate model: A simulation study. J Med Screen. 2018;25:183–90.
 9.
Lind H, Svane G, Kemetli L, Tornberg S. Breast Cancer Screening Program in Stockholm County, Sweden  Aspects of Organization and Quality Assurance. Breast Care (Basel). 2010;5:353–7.
 10.
Perry N, Broeders M, de Wolf C, Tornberg S, Holland R, von Karsa L. European guidelines for quality assurance in breast cancer screening and diagnosis. Fourth editionsummary document. Ann Oncol. 2008;19:614–22.
 11.
Cox DR, Miller HD. The Theory of Stochastic Processes. London: Chapman & Hall; 1965. p. 146–202.
 12.
Jackson CH, Sharples LD, Thompson SG, Duffy SW, Couto E. Multistate Markov models for disease progression with classification error. J Roy Stat Soc DSta. 2003;52:193–209.
 13.
The National Board of Health and Welfare in Sweden [Available from: http://www.socialstyrelsen.se/statistik/statistikdatabas/cancer]. Accessed 19 Jan 2015.
 14.
Nash JC, Varadhan R. Unifying optimization algorithms to aid software system users: optimx for R. J Stat Softw. 2011;43:114. https://www.jstatsoft.org/article/view/v043i09.
 15.
Nash JC. Nonlinear Parameter Optimization Using R Tools. 1st ed. Somerset: Wiley; 2002.
 16.
Titman AC, Sharples LD. Model diagnostics for multistate models. Stat Methods Med Res. 2010;19:621–51.
 17.
de Gelder R, Heijnsdijk EA, van Ravesteyn NT, Fracheboud J, Draisma G, de Koning HJ. Interpreting overdiagnosis estimates in populationbased mammography screening. Epidemiol Rev. 2011;33:111–21.
 18.
Aalen OO, Farewell VT, De Angelis D, Day NE, Gill ON. A Markov model for HIV disease progression including the effect of HIV diagnosis and treatment: application to AIDS prediction in England and Wales. Stat Med. 1997;16:2191–210.
 19.
Biesheuvel C, Barratt A, Howard K, Houssami N, Irwig L. Effects of study methods and biases on estimates of invasive breast cancer overdetection with mammography screening: a systematic review. Lancet Oncol. 2007;8:1129–38.
 20.
Puliti D, Duffy SW, Miccinesi G, de Koning H, Lynge E, Zappa M, et al. Overdiagnosis in mammographic screening for breast cancer in Europe: a literature review. J Med Screen. 2012;19(Suppl 1):42–56.
 21.
Ripping TM, ten Haaf K, Verbeek AL, van Ravesteyn NT, Broeders MJ. Quantifying overdiagnosis in cancer screening: A systematic review to evaluate the methodology. J Natl Cancer Inst. 2017;109:djx060. https://academic.oup.com/jnci/article/109/10/djx060/3845953.
 22.
Carter JL, Coletti RJ, Harris RP. Quantifying and monitoring overdiagnosis in cancer screening: a systematic review of methods. BMJ. 2015;350:g7773.
 23.
Gotzsche PC. On the benefits and harms of screening for breast cancer. Int J Epidemiol. 2004;33:56–64 discussion 69–73.
 24.
Moss S. Overdiagnosis and overtreatment of breast cancer: overdiagnosis in randomised controlled trials of breast cancer screening. Breast Cancer Res. 2005;7:230–4.
 25.
Moss S, Waller M, Anderson TJ, Cuckle H. Trial Management Group. Randomised controlled trial of mammographic screening in women from age 40: predicted mortality based on surrogate outcome measures. Br J Cancer. 2005;92:955–60.
 26.
Yen AM, Duffy SW, Chen TH, Chen LS, Chiu SY, Fann JC, et al. Longterm incidence of breast cancer by trial arm in one county of the Swedish TwoCounty Trial of mammographic screening. Cancer. 2012;118:5728–32.
 27.
Independent UK Panel on Breast Cancer Screening. The benefits and harms of breast cancer screening: an independent review. Lancet. 2012;380:1778–86.
 28.
Zackrisson S, Andersson I, Janzon L, Manjer J, Garne JP. Rate of overdiagnosis of breast cancer 15 years after end of Malmo mammographic screening trial: followup study. BMJ. 2006;332:689–92.
 29.
Zahl PH, Strand BH, Maehlen J. Incidence of breast cancer in Norway and Sweden during introduction of nationwide screening: prospective cohort study. BMJ. 2004;328:921–4.
 30.
Duffy SW, Lynge E, Jonsson H, Ayyaz S, Olsen AH. Complexities in the estimation of overdiagnosis in breast cancer screening. Br J Cancer. 2008;99:1176–8.
 31.
Duffy SW, Day NE, Tabar L, Chen HH, Smith TC. Markov models of breast tumor progression: some agespecific results. J Natl Cancer Inst Monogr. 1997;1997:93–7.
 32.
Beral V. Million Women Study Collaborators. Breast cancer and hormonereplacement therapy in the Million Women Study. Lancet. 2003;362:419–27.
 33.
WeedonFekjaer H, Vatten LJ, Aalen OO, Lindqvist B, Tretli S. Estimating mean sojourn time and screening test sensitivity in breast cancer mammography screening: new results. J Med Screen. 2005;12:172–8.
 34.
Suhrke P, Maehlen J, Zahl PH. Hormone therapy use and breast cancer incidence by histological subtypes in Sweden and Norway. Breast J. 2012;18:549–56.
 35.
Tabar L, Duffy SW, Vitak B, Chen HH, Prevost TC. The natural history of breast carcinoma: what have we learned from screening? Cancer. 1999;86:449–62.
 36.
Virnig BA, Tuttle TM, Shamliyan T, Kane RL. Ductal carcinoma in situ of the breast: a systematic review of incidence, treatment, and outcomes. J Natl Cancer Inst. 2010;102:170–8.
 37.
Jorgensen KJ, Gotzsche PC, Kalager M, Zahl PH. Breast Cancer Screening in Denmark: A Cohort Study of Tumor Size and Overdiagnosis. Ann Intern Med. 2017;166:313–23.
 38.
Evans AJ, Pinder SE, Ellis IO, Wilson AR. Screen detected ductal carcinoma in situ (DCIS): overdiagnosis or an obligate precursor of invasive disease? J Med Screen. 2001;8:149–51.
 39.
Lidbrink EK, Tornberg SA, Azavedo EM, Frisell JO, Hjalmar ML, Leifland KS, et al. The general mammography screening program in Stockholm. Organisation and firstround results. Acta Oncol. 1994;33:353–8.
 40.
Olsson S, Andersson I, Karlberg I, Bjurstam N, Frodis E, Hakansson S. Implementation of service screening with mammography in Sweden: from pilot study to nationwide programme. J Med Screen. 2000;7:14–8.
 41.
Taghipour S, Caudrelier LN, Miller AB, Harvey B. Using Simulation to Model and Validate Invasive Breast Cancer Progression in Women in the Study and Control Groups of the Canadian National Breast Screening Studies I and II. Med Decis Mak. 2017;37:212–23.
 42.
Statistics Sweden [Available from: http://www.statistikdatabasen.scb.se]. Accessed 24 Oct 2018.
 43.
Paci E, EUROSCREEN Working Group. Summary of the evidence of breast cancer service screening outcomes in Europe and first estimate of the benefit and harm balance sheet. J Med Screen. 2012;19(Suppl 1):5–13.
Acknowledgments
The simulations were performed using resources provided by the Swedish National Infrastructure for Computing (SNIC) at the High Performance Computing Center North (HPC2N). The authors would like to thank Anna Stoltenberg for support on data collection.
Funding
This work was supported by the Swedish Research Council, the Nordic Cancer Union, Lion’s Cancer Research Foundation in Umeå University, the Cancer Research Foundation in Northern Sweden, and ALF grants in Västerbotten.
Availability of data and materials
Not applicable.
Author information
Affiliations
Contributions
WYW, LN, and HJ substantially contributed to the study concept. WYW developed the model, performed the analysis, and wrote the first draft. ST and KME contributed to the data collection. XL developed the algorithm to accelerate the estimation. HJ supervised and directed the project. All authors contributed to data interpretation and revised the manuscript critically and have read and approved the final version for submission.
Corresponding author
Correspondence to Wendy YiYing Wu.
Ethics declarations
Ethics approval and consent to participate
The present study was approved by the ethics committee in Umeå, Sweden (Dnr 2015/57–31 and Dnr 2017–16632 M).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Table S1. Number of personyears, in situ and invasive breast cancer cases by detection mode, interval cancer ratio, and breast cancer incidence. (DOCX 20 kb)
Additional file 2:
Table S2. Maximumlikelihood estimates (MLEs) and 95% confidence intervals (CIs) based on the nonhomogeneous and the homogeneous multistate model of the in situ and invasive breast cancer. (DOCX 29 kb)
Additional file 3:
Table S3. Number of screendetected cases, expected number of detected nonprogressive in situ combined with invasive breast cancers (nonprogressive breast cancer, or NPBC), and the frequency of overdiagnosis (percentage) by round of screening. (DOCX 13 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Wu, W.Y., Törnberg, S., Elfström, K.M. et al. Overdiagnosis in the populationbased organized breast cancer screening program estimated by a nonhomogeneous multistate model: a cohort study using individual data with longterm followup. Breast Cancer Res 20, 153 (2018). https://doi.org/10.1186/s130580181082z
Received:
Accepted:
Published:
Keywords
 Overdiagnosis
 Breast cancer
 Organized screening program
 Mammography
 Multistate model