Repeat 100 times: | ||||
Divide the data into 10 outer folds | ||||
Repeat 10 times: | ||||
Keep 1 outer fold for testing | ||||
Select the remaining 9 outer folds for training | ||||
Divide the 9 outer training folds into 10 inner folds | ||||
Repeat 10 times: | ||||
Keep 1 inner fold for testing | ||||
Select the remaining 9 inner folds for training | ||||
Move all variables into the list of available variables | ||||
Create an empty list of nested model variables | ||||
Iterate this backward selection procedure until only 1 variable is left in the list of available variables: | ||||
Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time | ||||
Select the variable that contributes the least to the model likelihood | ||||
Move the selected variable from the list of available variables to the top of the list of nested model variables | ||||
Move the last available variable to the top of the list of nested model variables | ||||
Iterate over the list of nested variables: | ||||
Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set. | ||||
Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) h of the present Cox model using the 1 inner testing fold. | ||||
Record the variable usage U in the present Cox model and the size n of the model. UX(vm) = 1 if vm is in model X, 0 otherwise. | ||||
Estimate: | ||||
- the expected model size <n> = ΣX(hX nX)/ΣX(hX) | ||||
- the (inner) variable stability score for each variable vm: <vm> = ΣX(hx UX(vm))/ΣX(hx) | ||||
Train the Cox model containing the most stable <n> variables using the outer training set. | ||||
Evaluate the ATD-AUCROC k of the present Cox model using the 1 outer testing fold. | ||||
Record the variable usage T in the present Cox model and the size s of the model. | ||||
TX(vm) = 1 if vm is in model X, 0 otherwise. |