Benefits of biomarker selection and clinico-pathological covariate inclusion in breast cancer prognostic models

Parisi, Fabio; González, Ana M; Nadler, Yasmine; Camp, Robert L; Rimm, David L; Kluger, Harriet M; Kluger, Yuval

doi:10.1186/bcr2633

Breast Cancer Research

Table 2 Pseudocode of nested cross-validation for model selection and model assessment

From: Benefits of biomarker selection and clinico-pathological covariate inclusion in breast cancer prognostic models

Repeat 100 times:
	Divide the data into 10 outer folds
	Repeat 10 times:
		Keep 1 outer fold for testing
		Select the remaining 9 outer folds for training
		Divide the 9 outer training folds into 10 inner folds
		Repeat 10 times:
			Keep 1 inner fold for testing
			Select the remaining 9 inner folds for training
			Move all variables into the list of available variables
			Create an empty list of nested model variables
			Iterate this backward selection procedure until only 1 variable is left in the list of available variables:
				Train Cox models on the inner training set. Each Cox model contains all available variables except of 1 variable at a time
				Select the variable that contributes the least to the model likelihood
				Move the selected variable from the list of available variables to the top of the list of nested model variables
			Move the last available variable to the top of the list of nested model variables
			Iterate over the list of nested variables:
				Train the Cox model containing the present variable and the variables above it in the list of nested variables using the inner training set.
				Evaluate the average time-dependent area under the receiver operating characteristic curve (ATD-AUCROC) h of the present Cox model using the 1 inner testing fold.
				Record the variable usage U in the present Cox model and the size n of the model. U_X(v_m) = 1 if v_m is in model X, 0 otherwise.
		Estimate:
		- the expected model size <n> = Σ_X(h_X n_X)/Σ_X(h_X)
		- the (inner) variable stability score for each variable v_m: <v_m> = Σ_X(h_x U_X(v_m))/Σ_X(h_x)
	Train the Cox model containing the most stable <n> variables using the outer training set.
	Evaluate the ATD-AUCROC k of the present Cox model using the 1 outer testing fold.
	Record the variable usage T in the present Cox model and the size s of the model.
	T_X(v_m) = 1 if v_m is in model X, 0 otherwise.

Back to article page

ISSN: 1465-542X

Contact us

Submission enquiries: journalsubmissions@springernature.com