The design features of the two trials have been described in detail elsewhere [1, 8]. Briefly, in the Swedish Two-county Trial, 77,080 women aged 40 to 74 years were randomised to regular invitation to screening, and 55,985 to no invitation. Screening was by single-view mammography, with an interscreening interval of 2 years in women aged 40 to 49 years and 33 months in women aged 50 to 74 years at randomisation. The trial began in late 1977. Around 7 years later, after approximately 3 rounds of screening in the older group and 4 rounds of screening in the younger, a mortality reduction of 30% was observed and published [9], the control group invited to screening and the screening phase of the trial closed. Follow-up was continued for mortality from the tumours diagnosed during the screening phase [1].

In the Gothenburg Trial, 21,650 women aged 39 to 59 years were randomised to invitation to screening and 29,961 to no invitation [8]. The screening was by two-view mammography at first screen, with number of views thereafter dependent on breast density. Screening took place at 18 month intervals. The trial began in 1982. After five rounds of screening in the 1933 to 1944 birth cohorts (approximately the 39 to 49 year age group at randomisation), the corresponding control group members were offered screening and the screening phase of the trial closed. In the 1923 to 1932 birth cohorts (the 50 to 59 year age group), the control group was invited to screening after four rounds. As in the Swedish Two-county Trial, follow-up has continued for mortality from the tumours diagnosed during the screening phase of the trial.

In both trials, the control group was offered screening at the close of the screening phase, so we cannot estimate overdiagnosis by a simple comparison of long term incidence rates in the study and control groups. We can, however, study the size and timing of excess incidence during the screening phase to obtain clues to when overdiagnosis may occur. Accordingly, our first analysis was to estimate cumulative incidence rates of invasive, *in situ* and total cancers in the study and control groups of each trial. It has already been noted that in both trials incidence equalised between study and control groups with the first screen of the control group, suggesting that if there is overdiagnosis, it occurs mainly at the first screen [2, 8].

In the Gothenburg Trial, each individual year of birth cohort (from 1923 to 1944) was randomised in succession, with a study to control ratio chosen on the basis of the capacity of the mammography facilities to screen the study group [8]. The variation of the randomisation ratio by year of birth induced an age imbalance (albeit a very small imbalance) between study and control groups. To take account of this, the Gothenburg study group incidence is compared not with the raw control group incidence but with the standardised incidence that would have been observed in the control group if it had had exactly the same year of birth distribution as the study group [8].

Our second analysis involved explicit estimation of the incidence of 'real' and 'overdiagnosed' cases from the numbers of cases detected at screening and between screens in the two trials. We assumed a uniform annual incidence *I* of preclinical but screen detectable, truly progressive cancers, an exponential distribution of time from inception of these to clinical symptoms with rate *λ*, and a screening test sensitivity *S*. In addition, we assume exponential incidence of overdiagnosed (non-progressive) preclinical screen-detectable cancers, with rate *μ*. Because a tumour is only overdiagnosed if it is actually detected at screening, we define the screening test sensitivity to be 100% for overdiagnosed cancers. In this model, there are four states: no detectable disease, non-progressive (overdiagnosed) preclinical disease, progressive preclinical disease, and clinical symptomatic disease. The expected rates of cancers diagnosed at first, second and third screens, and in the intervals following those screens with an average interval time of *t* are as follows.

First screen:

where *a* is average age (50 years in the Gothenburg Trial and 58 years in the Swedish Two-county Trial). The second component in the expected rate represents the overdiagnosed cancers.

This allows a constant incidence rate of non-progressive disease from birth to age at first screen. This is arbitrary, biologically unverifiable and it may be wrong. However, the expected rates predicted for any multiplier of *μ* from 15 or 20 years upwards are very similar, and it seemed to us less arbitrary to allow the subjects' age to dictate our time limit than to choose one ourselves, given the current low level of knowledge of non-progressive disease.

Between first and second screen:

As these are symptomatic tumours there is no term for overdiagnosis.

Second screen:

The second component in the expected rate represents the overdiagnosed cancers.

Between second and third screen:

As these are symptomatic tumours there is no term for overdiagnosis.

Third screen:

The second component in the expected rate represents the overdiagnosed cancers.

Interval after third screen:

Since these are symptomatic tumours there is no term for overdiagnosis.

From the data on screen-detected and interval cancers, we estimated *I*, *λ*, *S* and *μ* by fitting Poisson distributions to the numbers of cases at the three screens and in the three intervals with expectations as above. For the Swedish Two-county Trial, *t* = 2.56 years (the average interval for the 19,844 women younger than 50 years and the 57,236 women aged 50 to 74 years). For the Gothenburg Trial, *t* = 1.5 years. The estimation algorithm used was Markov Chain Monte Carlo (MCMC), implemented in the computer programme WinBUGS [10]. The diagnostic criteria of Geweke, Raftry and Lewis, and Heldelberger and Elch in Convergence Diagnostics and Output Analysis Software (CODA) were used to assess convergence of the MCMC parameters [11]. The results for the chain provided no evidence against convergence for all the parameters. We intentionally chose uninformative prior distributions to approximate a maximum likelihood solution. Results are presented as mean posterior distribution values and 95% credible intervals. The WinBUGS program updated a single chain with 15,000 samples (with thinning of 1), from which the first 5,000 samples were discarded (burn-ins) and the remaining 10,000 samples were used in estimation. Prior distributions used for the parameters *I*, *λ*, *S* and *μ* were as follows: *I*, lognormal(0.0, 0.0001); *λ*, gamma(0.01, 0.01); S, logit(S) = *α*, *α* ~ normal(0.0, 0.0001); *μ*, lognormal(0.0, 0.01). Note that the second parameter in the normal and lognormal distributions is the precision, and not the variance or the standard deviation [10].