### Setting: data

In 1995 the Norwegian Government initiated an organized population-based service screening program [10], in which mammography results and interval cancer cases are carefully registered by the Cancer Registry of Norway. The Norwegian Breast Cancer Screening Program (NBCSP) originally included four counties. Other counties were subsequently included, and by 2004 the screening program achieved nationwide coverage. All women between 50 and 69 years of age receive a written invitation biannually, and two-view mammograms from participating women are independently evaluated by two readers.

A high-quality population-based Cancer Registry and a unique personal identity number for each inhabitant in the country enables close follow-up over time [11], and the possibility to link data from several sources (Figure 1). Reporting cancer cases to the Cancer Registry is mandatory, and information is obtained separately from clinicians, pathologists, and death certificates.

The present study includes screening data from 1995 to 2002. A total of 78% of the invited women attended the screening program during this period, resulting in 364,731 screened women 50 to 69 years of age. Among these women, 336,533 answered a question regarding former screening experience – and 113,238 reported no previous (private or public) mammography experience before entering NBCSP. While interval data in this study include the two subsequent years following the first NBCSP attendance of all participating woman, we have chosen to only include screening data from the first NBCSP attendance of women having reported no previous mammography. Eligible women receive a new invitation to mammography screening 16 to 24 months after their previous screening (with most women receiving their invitation 22 to 23 months after the previous screening). All observations are censored 2 days after the new invitation was mailed (or on death, emigration, or after 2 years of observation for women passing the NBCSP upper age limit of 69 years of age). An overview of the data used in the estimation is shown in Figure 2.

To make the results comparable with estimates provided in previous studies [5, 12–15], all cases of ductal carcinoma in situ (DCIS) – a noninvasive form of breast tumor – were included. In addition, estimates were also deduced excluding DCIS cases to check the potential effect of DCIS cases. Several tumors detected at the same time in one woman were counted as one case, with size measurements given for the largest tumor. Only new primary breast cancers were included in this study.

In the NBCSP, tumor measurements are performed on pathological sections after surgery, and tumors are measured diagonally between the outer edges. All measurements were performed in a standardized manner according to specifications given in a national quality assurance manual. Tumor size measurements were available for 92% of the cancers detected at screening. There were several reasons for missing tumor measurements: some tumors were torn up at the surgical operation before tumor measurements were taken, others were difficult to measure on pathological cross-sections, and some tumors had grown into the outer skin. In addition, a substantial part had received tumor-reducing treatment before the pathological material was removed. Tumors of unknown size are therefore probably somewhat different from tumors with an observed tumor size. Patients who received tumor-reducing treatment will typically have larger tumors, which in practice could have biased our estimates – leading to higher growth rates. Sensitivity analyses related to possible bias in tumor sizes were therefore performed.

Tumor size measurements of clinical breast cancers that emerge without screening are needed for the tumor growth model suggested in the present article. Since women who do not attend screening represent a selected group, possibly with different alertness to early symptoms, tumor size measurements made before the start of the official screening program were used. The Cancer Registry of Norway did not receive reliable information on tumor size prior to the official screening program. At Haukeland University Hospital (covering Bergen, Norway's second largest city), however, a good database for tumor measurements of clinical invasive breast cancers exists [16]. We were able to use these data, where 503 women aged 50 to 69 years were diagnosed with breast cancer between 1985 and 1994. Among these cases, 433 women (86%) had registered tumor measurements in millimeters. A comparison of tumor measurements found at screening and in the Haukeland University Hospital database of clinically detected cases is shown in Figure 2.

### Growth model specification

Although the growth rates vary throughout the lifespan of each tumor, a smoothly increasing function is likely to serve as a good model for growth rates at the population level, as departures from one individual to the next probably are smoothed out at the population level. For small tumors, growth is mostly governed by the cell reproduction rate of the given tumor cells. This constantly higher growth rate leads to an exponential growth curve with constant doubling times. When tumors grow larger, growth velocity is likely to decrease with the increasing burden on the host, as the tumor receives more limited nutrition. One family of curves starting with near-exponential growth, before gradually leveling off below a given maximum level, is the general logistic function (see examples in Figure 3).

Several studies have examined growth curves, both in general and for human breast cancer tumors in particular. The conclusion has often been that the growth curves can be described by either a logistic function [17] or a Gompertz function [9, 18]. For the range of tumor sizes that are relevant for screening, there are only minor differences between logistic and Gompertz growth given probable parameters. Spratt and colleagues used a variant of the general logistic growth curve with a maximum tumor volume of 40 cell doublings, equaling a ball of 128 mm in diameter, after testing several models on a clinical dataset that mostly consisted of overlooked tumors [9, 19]. To make the comparison with Spratt and colleagues' observations [9, 19], we used the same variant of the log-normal logistic growth model in the present study. This implies an almost exponentional growth for the smallest tumors, with decelerating growth as the tumors approaches the maximum of 128 mm in diameter (see examples in Figure 3). In addition to the chosen model, model fits for several alternative choices of maximum tumor volume were evaluated, with moderate effects on the estimated values.

Growth rates vary between individual tumors, and both a study of overlooked cancers [9] and a thymidine-labeling study of tumors observed in a laboratory [20] found that variations in net productive growth rates (cell production minus cell death) can be described by a log-normal distribution. We therefore modeled the individual growth rates, *κ*_{
i
}, by a log-normal distribution with two variables; the mean *α*_{1}, and the variance *α*_{2}. Mathematically, this gives the following specification of tumor volume, **V**_{
i
}**(** *t* **)**, as a function of time, *t*, for a given woman (*i*):

{\text{V}}_{\text{i}}(\text{t})=\frac{{\text{V}}_{\mathrm{max}\phantom{\rule{0.25em}{0ex}}}}{{\left[1+\left({\left(\frac{{\text{V}}_{\mathrm{max}\phantom{\rule{0.25em}{0ex}}}}{{\text{V}}_{\text{cell}}}\right)}^{0.25}-1\right)\xb7{e}^{-0.25\xb7{\kappa}_{\text{i}}\xb7\text{t}}\right]}^{4}}

(1)

where *κ*_{
i
} is a log normally distributed growth rate with mean *α*_{1} and variance *α*_{2}, V_{max} is the maximum tumor volume (set to a tumor of 128 mm in diameter), and V_{cell} is the volume of one cell. (As all calculations in the present paper use a relative cancer time, the choice of V_{cell} does not affect the given estimates.)

Overall, this can be seen as a mixed effects model with individual logistic growth curves and a log-normally distributed random effect.

Assuming tumors have a ball shape, tumor volumes can be calculated from the tumor diameter, *X*_{
i
} **(** *t* **)**, by:

{\text{V}}_{\text{i}}(\text{t})=\frac{4}{3}\pi {\left[\frac{{\text{X}}_{\text{i}}(\text{t})}{2}\right]}^{3}

(2)

As tumor measurements in the NBCSP are the maximum diameter, the real tumor volume will in practice be smaller. The most important part of the model, however, is the modeled growth curve, and sensitivity analyses show little effects of a general reduction in modeled tumor volume as a function of tumor diameter.

### Screening test sensitivity model specification

Since larger tumors are easier to detect on mammograms than smaller tumors, the STS was modeled as an increasing function of the tumor size, *X*, in millimeters. As used for the tumor growth curve, a variant of the logistic function was used for the STS. Mathematically, the modeled STS, *S*(*X*), can be written as:

S(X)=\frac{\mathrm{exp}\phantom{\rule{0.25em}{0ex}}\left(\frac{X-{\beta}_{2}}{{\beta}_{1}}\right)}{1+\mathrm{exp}\phantom{\rule{0.25em}{0ex}}\left(\frac{X-{\beta}_{2}}{{\beta}_{1}}\right)}

(3)

where *β*_{1} defines how fast tumor sensitivity increases by tumor size and *β*_{2} relates STS to tumor size, with *β*_{2} = 0 equaling *S*(0) = 0.5 (places the sensitivity curve in relation to tumor size).

### Parameter estimation

Since mammography screening detects a higher proportion of the larger prevalent tumors compared with the smaller prevalent tumors, the pool of undiagnosed tumors is expected to have a clear overrepresentation of small tumors shortly after screening. One would suspect this could lead to relatively small tumors detected shortly after screening, followed by gradually increasing tumor sizes with the time since last screening. This trend is severely damped, however, as each tumor before detection must reach a certain individual size to produce sufficient symptoms to alarm the woman. In practice, the relationship between tumor size and clinical detection results in only a vague trend in interval cancer tumor sizes by time since screening (correlation = 0.01 in the NBCSP), whereas the number of interval cancers increases sharply. We have therefore chosen to disregard the size distribution of interval cancers, and build our estimation procedure on the observed frequency of interval cancers by the time since screening, the number of cases found at screening, the tumor size distribution of screening cancers, the assumed background incidence, and the size distribution of clinical tumors without screening (based on historical data).

Combining these data with our model, the model parameters can be estimated by maximum likelihood calculation. As the full likelihood includes several integrals, the actual maximum likelihood calculations are performed discretely, grouping the data into sufficiently small time and tumor size intervals.

To ease the calculations, the likelihood contributions from the screening and interval data have been taken as independent. This is possible since the number of cases is small relative to the total population of screened women, and since there probably are considerable variations in tumor growth with several screening detected cancers arising after the observed interval. To test the assumption in a relevant setting, we performed a simulation of the suggested growth model, without the independence assumption, using the estimated parameter values and a 100% overlap in screening and interval populations. This revealed only a weak correlation of 0.019 between the total number of screening and interval cancers (based on 10,000 simulations), giving no indication of problems with the assumed independence. Conditional on the assumed background incidence without screening and the clinical distribution of tumor sizes, the likelihood of a given dataset can be written as:

\begin{array}{c}L(\text{data}|{\alpha}_{1},{\alpha}_{2},{\beta}_{1},{\beta}_{2})\\ \approx P(\text{observedno .ofcasesatscreeningindifferentsizeintervals}|{\alpha}_{1},{\alpha}_{2},{\beta}_{1},{\beta}_{2})\\ \cdot {\displaystyle \prod _{\begin{array}{c}\text{allobserved}\\ \text{intervals}j\end{array}}P(\text{observedno .ofintervalcancers}j\phantom{\rule{0.25em}{0ex}}\text{monthsafterscreening}|{\alpha}_{1},{\alpha}_{2}},{\beta}_{1},{\beta}_{2})\end{array}

(4)

where the first part is calculated by a multinomial distribution:

\begin{array}{l}P(\text{observedno .ofcasesatscreeningindifferentsizegroups}|{\alpha}_{1},{\alpha}_{2},{\beta}_{1},{\beta}_{2})\hfill \\ =\frac{sn!}{{\displaystyle \prod _{i=1}^{sn}s{c}_{i}!}}\cdot {\displaystyle \prod _{i=1}^{sn}s{p}_{i}^{s{c}_{i}}}\hfill \end{array}

(5)

where *i* is an indicator for the size group, *sn* is the number of screened women, *sc*_{
i
} is the number of screening cases in size group *i*, and *sp*_{
i
} is the probability of a woman having a tumor in size group *i* at screening, given the parameter set {*α*_{1}, *α*_{2}, *β*_{1}, *β*_{2}}.

Similarly, the second part of the likelihood, concerning the rate of interval cancers, follows a Poisson distribution:

\begin{array}{l}P(\text{observedno .ofintervalcancers}j\phantom{\rule{0.25em}{0ex}}\text{monthsafterscreening}|{\alpha}_{1},{\alpha}_{2},{\beta}_{1},{\beta}_{2})\hfill \\ \approx \frac{{e}^{-i{e}_{j}}\cdot i{e}_{j}^{i{c}_{j}}}{i{c}_{j}!}\hfill \end{array}

(6)

where *ic*_{
j
} is the observed number of cancers *j* months after screening and *ie*_{
j
} is the expected number of cancers *j* months after screening, given the parameter set {*α*_{1}, *α*_{2}, *β*_{1}, *β*_{2}}.

The probability of finding a cancer in a given size group at screening (*sp*_{
i
} ) and the expected number of interval cases (*ie*_{
j
} ), given a set of known parameters, (*α*_{1}, *α*_{2}, *β*_{1}, *β*_{2}), are therefore needed for the estimation of model parameters. There is no available knowledge regarding the number of tumors initiated at different ages that have the potential of becoming screening or clinically detected cancers later on. The expected number of cases given a known tumor growth rate cannot therefore be deduced directly. It is possible, however, to calculate the expected number of cases at screening using back calculations from the expected number of clinical cancers seen without screening. This idea is not unlike the theory behind Markov models of cancer screening [12], utilizing known quantities regarding the expected number of future cancers to calculate the expected number of cases at an earlier stage.

Given a set of tumor growth parameters, we can calculate the probability that a tumor arising clinically at a given age without screening would have been in a given tumor size group some months earlier. Combining this with given STS parameters, we can calculate the probability that a tumor arising clinically at a given time without screening is found (earlier) at a given screening examination. Applying this on the expected number of future clinical cancers for all size groups separately, we can calculate the expected number of cancers that would be found at screening and, consequently, the reduction in cancers seen after screening. The probability of finding a given number of cancers in different size groups at screening, and a given number of interval cases each month after screening, can therefore be calculated for a given set of model parameters:

s{p}_{i}=S(\text{Cancerofsize}i|{\beta}_{1},{\beta}_{2}){\displaystyle \sum _{\begin{array}{c}\text{alltime}\\ \text{intervalsf}\end{array}}r\cdot g{s}_{f,\text{i}}}

(7)

where *S*(...) is the STS defined in equation (3), and *r* is the expected breast cancer rate per time unit (month) without screening – to simplify calculations, the rate is assumed constant over time as in the earlier used Markov model [5, 12], probably giving a good approximation in the limited time span used in the estimation – and *gs*_{f,i}, is the probability that a clinical cancer is in size group *i f* months before clinical detection. Using our assumed tumor growth function, *gs*_{f,i}can be calculated using back calculation of tumor sizes:

g{s}_{f,\text{i}}={\displaystyle \sum _{\begin{array}{c}\text{all}\\ \text{sizegroups}g\end{array}}{p}_{g}\cdot P(\text{Tumorofsize}g\phantom{\rule{0.25em}{0ex}}\text{wasofsize}i\phantom{\rule{0.25em}{0ex}}f\phantom{\rule{0.25em}{0ex}}\text{monthsearlier}|{\alpha}_{1},{\alpha}_{2})}

(8)

where *p*_{
g
} is the relative proportion of breast cancers of size *g* without screening.

Similarly, *ie*_{
j
} can be found by:

i{e}_{j}=PY{R}_{j}\cdot r\cdot {\displaystyle \sum _{\begin{array}{c}\text{all}\\ \text{sizegroups}\phantom{\rule{0.25em}{0ex}}g\end{array}}{p}_{g}\cdot f{s}_{j,g}}

(9)

where *PYR*_{
j
} is the number of person years in interval *j* and *fs*_{j,g}is the probability that a clinical cancer in size group *g* would have been found if screened *j* months earlier.

Using back calculation of tumor sizes, *fs*_{j,g}can be expressed as:

f{s}_{j,g}={\displaystyle \sum _{\begin{array}{c}\text{all}\\ \text{sizegroups}\phantom{\rule{0.25em}{0ex}}gs\end{array}}\begin{array}{c}P(\text{Tumorofsize}gs\phantom{\rule{0.25em}{0ex}}\text{wasofsizegjmonthsearlier}|{\alpha}_{1},{\alpha}_{2})\\ \cdot S(gs|{\beta}_{1},\beta )\end{array}}

(10)

In practice, both *P*(tumor of size *g* was of size *i f* months earlier |*α*1, *α*_{2}) in equation (8) and *P*(tumor of size *gs* was of size *g j* months earlier |*α*1, *α*_{2}) in equation (10) can be calculated in the following three stages. First, by rearranging the growth formula equation (1), expressing earlier tumor size as a function of present tumor size and tumor growth rate (*κ*_{
i
} ). Then calculating upper and lower limits for tumor growth (*κ*_{
i
} ), constituting the requested probability. Finally, calculating the probability for tumor growth within the given limits using the log-normal distribution and assumed growth parameters {*α*_{1}, *α*_{2}}.

Combining these formulas, maximum likelihood estimates of the observed dataset can be deduced by numerically maximizing the log-likelihood.

### Modeling choices: specifications

While the number of cancers in the interval between screenings can be observed directly, the expected number without screening has to be estimated. As the NBCSP offers screening to all women in a defined population, no parallel control group is available to carry out this estimation. In addition, commitment to screening can, and probably does, vary with individual risk factors, so those who do not attend are not a suitable control group either.

The background incidence was therefore calculated from historical data combined with an estimated time trend. In practice, data from 1990 to 1994 were used with time trend estimates from an age-period cohort model with additional screening parameters [21]. Incidence rates vary among age groups and counties, and the estimate was therefore weighted by the number of person-years in each combination of age group and county. Further, it may be a problem that the sharply increased use of hormone replacement therapy (HRT) in the 1990s [22] has influenced the historical time trends in breast cancer incidence. HRT is known to increase breast cancer risk [23], and Bakken and colleagues [22] found a relative risk of breast cancer of 2.1 for current versus never users in Norway. Combining sales figures with risk estimates, Bakken and colleagues estimated the proportion of breast cancer cases that could be attributed to HRT use as 27% among Norwegian women 45 to 64 years of age. HRT use increased sharply from the period that was used to calculate the expected incidence without screening (1990 to 1994) to our estimating period (1996 to 2002). Therefore 21% was added to the estimated background incidence (when otherwise not noted), on the basis of information regarding increased breast cancer risk and HRT sale figures found in Bakken and colleagues [22]. With this correction, the expected incidence without screening was estimated as 190 cases/100,000 person-years for women 50 to 59 years of age, and as 219 cases/100,000 person-years for women 60 to 69 years of age.

When calculating the expected number of cases at screening, we cannot include an infinite number of future time intervals. We therefore limited the growth rates to realistic levels given the women's current age, and reweighted the distribution. Experiences with different limits show that the choice of growth limit had little effect on the estimated values.

### Statistical calculations

All calculations, simulations, and plots were performed using the R statistical package [24]. Data were transformed from the Norwegian breast cancer screening database and were summarized using a combination of SQL commands and the statistical package S-PLUS (Insightful, Seattle, USA). To double check the implemented R functions, new datasets were simulated and the results compared with the expected number of cases.

Maximum likelihood estimates were found by optimization over all four parameters simultaneously, using the optimize function found in the R package [24]. For these calculations, time intervals of 1 month were used. Tumor sizes were categorized to 1 mm, 2 mm, 5 mm, 10 mm, 15 mm, ..., 100+ mm, as the background data revealed that many pathologists approximated tumor sizes to the nearest 5 mm, 10 mm, 15 mm, ..., 100 mm (data not shown). To look at possible age differences, estimates were calculated separately for women aged 50 to 59 years and women aged 60 to 69 years, in addition to all age groups combined. Calculations were very computer intensive, with a huge number of probability calculations needed to calculate the expected number of cases for a given parameter set.

The main estimates are presented with (pointwise) confidence intervals showing their (random) uncertainty. Robust 95% confidence intervals were calculated by 1,000 smoothed bias-corrected parametric bootstrap replications [25], resampling all of the observed data except the assumed breast cancer incidence without screening. Simulations were used to deduce the overall STS and the mean sojourn time. As a validation of the model fit, observed values versus expected values were plotted. In addition, the traditional Markov model of breast cancer screening [5, 12, 26] was compared with the new method using one-fifth holdout cross-validation, measuring the weighted mean square differences. For evaluation of cross-validation results, *P* values calculated from 50 parametric bootstrap replications were used.