Skip to main content

Table 2 Pseudocode of nested cross-validation for model selection and model assessment

From: Benefits of biomarker selection and clinico-pathological covariate inclusion in breast cancer prognostic models

Repeat 100 times:

 

Divide the data into 10 outer folds

 

Repeat 10 times:

  

Keep 1 outer fold for testing

  

Select the remaining 9 outer folds for training

  

Divide the 9 outer training folds into 10 inner folds

  

Repeat 10 times:

   

Keep 1 inner fold for testing

   

Select the remaining 9 inner folds for training

   

Move all variables into the list of available variables

   

Create an empty list of nested model variables

   

Iterate this backward selection procedure until only 1 variable is left in the list of available variables:

    

Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time

    

Select the variable that contributes the least to the model likelihood

    

Move the selected variable from the list of available variables to the top of the list of nested model variables

   

Move the last available variable to the top of the list of nested model variables

   

Iterate over the list of nested variables:

    

Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set.

    

Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) h of the present Cox model using the 1 inner testing fold.

    

Record the variable usage U in the present Cox model and the size n of the model. UX(vm) = 1 if vm is in model X, 0 otherwise.

  

Estimate:

  

- the expected model size <n> = ΣX(hX nX)/ΣX(hX)

  

- the (inner) variable stability score for each variable vm: <vm> = ΣX(hx UX(vm))/ΣX(hx)

 

Train the Cox model containing the most stable <n> variables using the outer training set.

 

Evaluate the ATD-AUCROC k of the present Cox model using the 1 outer testing fold.

 

Record the variable usage T in the present Cox model and the size s of the model.

 

TX(vm) = 1 if vm is in model X, 0 otherwise.