Overdiagnosis in breast cancer screening: the importance of length of observation period and lead time

Background Overdiagnosis in breast cancer screening is a controversial topic. One difficulty in estimation of overdiagnosis is the separation of overdiagnosis from lead time that is the advance in the time of diagnosis of cancers, which confers an artificial increase in incidence when a screening programme is introduced. Methods We postulated a female population aged 50-79 with a similar age structure and age-specific breast cancer incidence as in England and Wales before the screening programme. We then imposed a two-yearly screening programme; screening women aged 50-69, to run for twenty years, with exponentially distributed lead time with an average of 40 months in screen-detected cancers. We imposed no effect of the screening on incidence other than lead time. Results Comparison of age- and time-specific incidence between the screened and unscreened populations showed a major effect of lead time, which could only be adjusted for by follow-up for more than two decades and including ten years after the last screen. From lead time alone, twenty-year observation at ages 50-69 would confer an observed excess incidence of 37%. The excess would only fall below 10% with 25 years or more follow-up. For the excess to be nullified, we would require 30 year follow-up including observation up to 10 years above the upper age limit for screening. Conclusion Studies using shorter observation periods will overestimate overdiagnosis by inclusion of cancers diagnosed early due to lead time among the nominally overdiagnosed tumours.


Introduction
The issue of overdiagnosis in breast cancer screening is a topic of much interest and controversy [1][2][3]. Overdiagnosis is usually defined as the diagnosis as a result of screening of cancer that would never have been diagnosed in the woman's life in the absence of screening [1]. In theory, overdiagnosis can be estimated by comparison of incidence in a randomized trial of screening, but this would require that the control group was never screened and that both intervention and control groups were followed up to expiry or for more than at least two decades. In practice, none of the trials satisfy both of these criteria [4]. Consequently, overdiagnosis rates are usually estimated on the basis of changes in breast cancer incidence following the introduction of screening services in a population setting [1][2][3]5,6]. There is no uniform method of estimation, and, consequently, estimates vary considerably, from less than 5% to around 50% [1].
One of the first observable effects of breast screening is the capability to diagnose cancer before it would have occurred symptomatically, since screening works by diagnosing breast cancer at an earlier and more treatable stage [7]. The temporal advance in the time of diagnosis is known as the lead time. A major complication of estimating overdiagnosis is taking account of lead time to disentangle overdiagnosed from early-diagnosed cases [1]. Overdiagnosis can be thought of as cases whose lead time exceeds their remaining years of life. Cases whose lead time does not exceed their future lifetime will cause an increase in incidence to be observed but should not be included in the estimation of overdiagnosis. There is disagreement about the importance of lead time [8,9], but there is no doubt that the phenomenon exists, since incidence of breast cancer after a screen is considerably lower than in the absence of screening, indicating that many tumors have their period of diagnosis advanced [10].
In this article, we generate a set of breast cancer incidence figures, by individual year of age and individual calendar year, in the absence of screening. These are based approximately on the population of age 50 to 79 and the age-specific incidence of breast cancer in the late 1980s in the UK, just before the screening program started. We then generate the effect on occurrence of breast cancer of a 2-yearly screening program from age 50 to 69, assuming only lead time and no overdiagnosis. We use the difference between the two to evaluate the reliability of different strategies for adjusting for lead time in estimation of overdiagnosis.

Materials and methods
We postulated a population of age 50 to 79 and of size and structure similar to those prevailing in the female population in England and Wales in the late 1980s [11,12]. We further postulated that invasive breast cancer incidence rates in the absence of screening would increase from 158.4 per 100,000 person-years at age 50 to 272.7 per 100,000 at age 79, again corresponding roughly to those prevailing in England and Wales in the years immediately prior to the screening program [11,12]. The population, incidence rates, and numbers of cases are shown by individual year of age in Table 1. For simplicity, we assumed that in the absence of screening, the population age structure and age-specific breast cancer incidence rates remain stable for the following 30 years. Thus, the same number of cases in each individual year of age occurred for the following 30 years.
We then assumed the introduction of a screening program with 2-yearly screening applied to all women currently 50 to 69 years old, at years 1, 3, 5, and so on up to year 19 (that is, 10 rounds of screening in all). We assumed that the only effect of screening on incidence of breast cancer was one of lead time. We postulated that in screening a population age x in year y we would find 86% of the cancers which would have occurred in age x+1, year y+1, 64% of tumors which would have occurred in age x+2, year y+2, and so on up to 7% of the tumors from age x+10, y+11, as shown in Table 2. These figures correspond to an average lead time of around 40 months, which is consistent with the estimates for this age group [13][14][15][16], most of which also found screening sensitivities of close to 100%. For mathematical details, see the Appendix.  We then compared the numbers of cancers between the screening and no-screening scenarios, for different age groups and periods of observation, to estimate the excess incidence that would be observed as a result of lead time alone and to evaluate the likely efficacy of various methods of taking account of lead time. We also calculated the resulting estimates for different strategies used in the past to take lead time into account. These include considering incidence up to 5 and 10 years after the upper age limit for screening and comparing breast cancer incidence in the screened population with the corresponding incidence of subjects in the unscreened population who are 2 to 5 years older [5,6]. This was solely a modeling study that involved no experimental work on human or animal subjects. No ethical approval or consent was required. Table 3 shows the numbers of cancers diagnosed over 30 years, by calendar year and year of age, under the scenario of 2-yearly screening of women 50 to 69 years old for 10 rounds (the first 20 years) as described above. The corresponding numbers of cancers in the absence of screening would simply be the column of cancers in Table 1, 401, 410, and so on, repeated 30 times. The lead-time effect can be seen for age 50, year 1, for example, as 410 + 0.86 × 410 + 0.64 × 420 + · · · + 0.07 × 561 = 1, 812.

Results
Thus, the effect of lead time is to shift cancers along the diagonal of the table, upward and leftward. This reflects the fact that earlier detection brings forward the diagnosis of the tumor both in calendar period and age. For example, without screening in year 1, there would be 4,540 cancers diagnosed at ages 50 to 59; with screening, there would be 20,756. Conversely, in year 10 at ages 70 to 79, there would be 5,452 cancers diagnosed without screening but only 3,268 with screening. This shift in age and time of diagnosis also means that if incidence is increasing with age, with time, or with both, then the observed excess due to lead time will be larger. Figure 1 shows the cumulative numbers of breast cancer cases observed in the cohort of age 50 at year 1, over 30 years, with and without screening. The incidences separate dramatically, in the first 10 years, and the disparity remains roughly constant over the following 10 years; then the two cumulative graphs come together at 30 years. This demonstrates that the only phenomenon generated by the screening in this scenario is lead time. The screening diagnoses many cancers earlier than they would have been diagnosed in the absence of screening, but after the screening stops, there is a deficit of cases in the screened group and the unscreened group 'catches up'. Some researchers attempt to control for lead time by comparing incidence up to 10 years above the upper age limit for screening [5]. The rationale for this approach is that an excess of cases observed at screening ages because of lead time will be balanced by a corresponding deficit in cases above the screening age range. Table 4 shows the cumulative numbers of cancers with and without screening, by years of follow-up and upper age limit. Clearly, when the age range for screening alone is observed, lead time confers a substantial increase in observed cases, which remains substantial 10 years after the program has stopped. From lead time alone, 20-year observation at ages 50 to 69 would confer an observed excess incidence of 37% in the screened population. The excess would only fall below 10% with a follow-up of 25 years or more. For the excess to be nullified, or almost nullified, we would require a 30-year follow-up, including observation up to 10 years above the upper age limit for screening. This is because lead time shifts the diagnosis not only in terms of age of the subject but also in terms of the calendar year of diagnosis.
Another strategy sometimes employed to account for lead time is to compare the incidence in the screened population with the corresponding incidence of subjects in the unscreened population who are 2 to 5 years older [5,6]. Table 5 shows the incidence per 100,000 personyears in 5-year age groups and time periods in the screened and unscreened populations. Note that because alternate 5-year periods contain two screens and three screens, the overall incidence in the screened population fluctuates from period to period. Note also that within 5-year periods there is no monotonic increase in incidence with age, because some of the older age groups have cases shifted to younger groups because of lead time. If we compare the cumulative incidence over 10 years between the age group of 50 to 69 in the screened population and the age group of 55 to 74 in the unscreened population, using Table 5 data, we would observe 3,000 cases per 100,000 women in the screened population and 2,190 in the latter, an excess of 37%. At 20 years, we would observe 5,415 cases per 100,000 women in the screened population and 4,380 in the unscreened population, an excess of 24%. At 30 years, the cumulative incidence figures would be 6,865 and 6,570, an excess of 4%.
A more exact 'aging' adjustment for lead time might be to compare the cumulative incidence of a screened cohort with that of an unscreened cohort that is 5 years older. Figure 2 shows the cumulative incidence per 100,000 women up to year 25 (not 30, since we do not have data for ages 80 and above) for ages 50 to 54 at the start of the follow-up period in the screened population and 55 to 59 in the unscreened population. The incidence curves come together only at 25 years. There is  Figure 1 Expected cumulative incidence over ages 50 to 79 in a cohort of women of age 50 at the start, with and without 2-yearly screening from age 50 to 69.

Discussion
We posited a population of age 50 to 79 with age distribution and breast cancer incidence similar to those that prevailed in the UK before the NHS Breast Screening Programme was initiated. We assumed that age-specific incidence in an unscreened population would remain unchanged over 30 years. We then imposed a screening program with 2-yearly screening for 20 years from age 50 to 69 (as is common in Europe). We calculated the incidence in the screened population, assuming that the only influence of screening on incidence was lead time, using lead-time parameters similar to those observed in a large breast screening trial [13,14]. Thus, we calculated the expected effects on incidence from lead time alone, with no influence of overdiagnosis or changes in underlying risk.
The most important finding was that lead time can be expected to add substantially to observed incidence in a screened population. Although the increase is greatest in the early years of the program, there is still a substantial excess at long-term follow-up. From Table 4, in the screened age group, 50 to 69, a 37% excess cumulative incidence was apparent at 20 years, and a 15% excess at 30 years, 10 years after the program stopped. Thus, the first conclusions from this work must be that lead time contributes a considerable observed increase in incidence in screening age groups and that for an estimate of overdiagnosis to be reliable, it must correct for lead time.
Secondly, the results give us some qualifications to the use of common methods of correction for lead time and suggest that correction for lead time in our estimates and those of others may be inadequate [2,3,5,6]. The method of considering incidence up to an age in excess of the upper age limit for screening can be reliable but  only if more than two decades of follow-up are available. From Table 4, with incidence to age 74, the excess from lead time alone at 20 years was 18%. With incidence to age 79, the excess at 20 years was 14%. It is only at 25 to 30 years of follow-up that the excess incidence decreased to a figure close to zero. It should be noted that our estimates pertain to a population that is 100% screened. In the screening trials, results are usually given for the group randomly assigned to the offer of screening, which will contain some non-attenders. Both lead-time effects and overdiagnosis will be smaller in an invited group, and the attenuation depends on the proportion of non-attenders.
At the end of 10 years, from lead time alone there was an excess cumulative incidence at ages 50 to 79 of 25% in the screened population. Kalager and colleagues [5], who used this method of controlling for lead time, observed an 18% excess incidence in the Norwegian breast screening program at approximately 10 years. With 77% participation in the Norwegian program [5], one would expect 19% (0.77 × 25%) at 10 years. Thus, all of the excess incidence in that study might be explained by lead time. The excess incidence in the screened ages at a follow-up of only 10 years amounts to 52,020 cases, whereas the deficit above the screened ages is only 12,962 cases (Table 4). In the age group of 70 to 79 in the first 10 years, 70% of the cancers diagnosed are in women who have never been screened (Table 3). Our study [3] had a 15-year follow-up, but even at 15 years, our results here indicate that there would still be residual lead-time effects when this type of correction is used.
We assumed that the incidence rates would have remained stable over time in the absence of screening, for simplicity of calculation. If we had assumed that underlying incidence rates were increasing, as indeed they were almost everywhere in the world in the late 20th century, the effect of lead time and the potential to overestimate overdiagnosis would have been even greater.
Our lead-time model is based on the exponential sojourn time model of Walter and Day [15], whose article gives a full discussion of the mathematical details and modeling assumptions involved in estimating cancer screening parameters. Walter and Day gave a range of estimates of the mean preclinical screen detectable period from 1.7 to 3.1 years; however, more recent estimates have been close to our estimate of 40 months [13][14][15][16].
The recent UK review of breast cancer screening estimated overdiagnosis of 10% to 20% depending on the denominator used, from the difference between study and control group incidence in three of the randomized screening trials [16]. For two of the trials, the incidence extended only 6 years beyond the end of screening in the study group, so the overdiagnosis estimates in the review are also likely to be overestimates.
Similarly, the attempt by Kalager and colleagues [5] to control for lead time by comparing the screened population with subjects in the unscreened population who are 2 to 5 years older is inadequate for adjusting for lead time. Our results show that at 10 years, even with the 5year adjustment, lead time would confer a 37% increase in incidence. Correcting for the participation rate of 77%, we would expect a 28% increase in incidence, and the increase observed by Kalager and colleagues [5] was actually smaller than this (18%).
Morrell and colleagues [6] applied a similar 5-year aging correction for lead time in the New South Wales breast screening program and observed a 30% excess incidence after adjustment, and at about 10 years after the program started, with 60% participation. From lead time alone, we would expect a 22% increase (0.6 × 37%), suggesting that overdiagnosis accounts for an 8% excess and that the remainder of the 30% is a residual leadtime effect. It is also worth noting that the estimate by Morell and colleagues [6] is not based on cumulative incidence but only on the three years from 1999 to 2001 when the program was mature. More importantly, it is worth noting that the estimate of more than 30% overdiagnosis in the UK program by Jørgensen and Gøtzsche [2] has no adjustment for lead time at all and therefore cannot be regarded as a reliable estimate of overdiagnosis.
Further empirical evidence that overdiagnosis is a smaller problem than generally thought comes from the Swedish Two-County Trial of breast cancer screening. In one county, the cumulative incidence over 29 years was identical between study and control groups [17]. Since the control group was subject to screening 7 years later than the study group, this does not rule out overdiagnosis, but the fact that a population experiencing an additional 100,000 screening episodes had no increase whatsoever in incidence suggests that overdiagnosis is a minor phenomenon.
It should be noted that our estimates are based on incidence of invasive cancer only. We have blocked the time into discrete years and posited simultaneous screening of all subjects rather than a screening round taking several months to cover the population. Also, it could be argued that the average sojourn time (the duration of the preclinical screen detectable period), and therefore lead time, is smaller than used in our estimates. We did, however, use an estimate of just over 3 years, which is commonly observed for ages of 50 or more [13,14,18,19]. Indeed, some colleagues have estimated considerably greater sojourn times, albeit with lesser sensitivity [20]. Also, the general point will still hold that lead time confers an excess incidence even with the common corrections used up to 20 years after the program starts. Improvements to screening technology may change the sojourn time and sensitivity parameters. However, they are likely to increase rather than reduce lead time, so the qualitative implications of our results will remain relevant. It should also be noted that although we posited an average lead time of 40 months, the range of individual values is wide. For example, 35% of cancers that would have occurred 3 to 4 years after a screen have their diagnosis advanced to the screening year.
We estimated effects on incidence of lead time alone. Of course, there will be other influences, including changes in underlying incidence because of trends in risk factor exposure and because of overdiagnosis. However, our results show that estimation of either of the latter will be contaminated by lead time unless there is very long (greater than two decades) observation. The major implication of this is that it is easy to overestimate overdiagnosis, particularly in uncontrolled observation of incidence before and after screening. To avoid such overestimation, it is necessary to estimate lead time explicitly or to have a very long period of observation. To conclude that large numbers of breast cancers would never progress in the absence of treatment is a major biological leap of faith, and every effort should be made to rule out alternative explanations of increases in breast cancer incidence.

Conclusions
The results here indicate that to estimate overdiagnosis using either the compensatory drop above the age range for screening or an age inflation, very-long-term followup is required to avoid bias from residual lead-time effects. Previous estimates of overdiagnosis are likely to be overestimates.