Skip to main content

Table 2 Pseudocode of nested cross-validation for model selection and model assessment

From: Benefits of biomarker selection and clinico-pathological covariate inclusion in breast cancer prognostic models

Repeat 100 times:
  Divide the data into 10 outer folds
  Repeat 10 times:
   Keep 1 outer fold for testing
   Select the remaining 9 outer folds for training
   Divide the 9 outer training folds into 10 inner folds
   Repeat 10 times:
    Keep 1 inner fold for testing
    Select the remaining 9 inner folds for training
    Move all variables into the list of available variables
    Create an empty list of nested model variables
    Iterate this backward selection procedure until only 1 variable is left in the list of available variables:
     Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time
     Select the variable that contributes the least to the model likelihood
     Move the selected variable from the list of available variables to the top of the list of nested model variables
    Move the last available variable to the top of the list of nested model variables
    Iterate over the list of nested variables:
     Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set.
     Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) h of the present Cox model using the 1 inner testing fold.
     Record the variable usage U in the present Cox model and the size n of the model. UX(vm) = 1 if vm is in model X, 0 otherwise.
   Estimate:
   - the expected model size <n> = ΣX(hX nX)/ΣX(hX)
   - the (inner) variable stability score for each variable vm: <vm> = ΣX(hx UX(vm))/ΣX(hx)
  Train the Cox model containing the most stable <n> variables using the outer training set.
  Evaluate the ATD-AUCROC k of the present Cox model using the 1 outer testing fold.
  Record the variable usage T in the present Cox model and the size s of the model.
  TX(vm) = 1 if vm is in model X, 0 otherwise.