Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models

Table 3 Performance AU-ROC curve of BCRAT and ML algorithms (with standard deviation) predicting breast cancer lifetime risk from simulated datasets (n = 1200) and the US population-based sample (n = 1143)

Dataset	BCRAT	ML: random forest	ML: Logistic Regression	ML: adapt boosting	ML: Linear Model	ML: K-nearest neighbors	ML: linear discriminant	ML: quadratic discriminant	ML: MCMC GLMM
A.Sim_no_signal	0.5333	0.5016 (0.0231)	0.5133 (0.0271)	0.5067 (0.0307)	0.5015 (0.0220)	0.5054 (0.0211)	0.5158 (0.0276)	0.5133 (0.0323)	0.5090 (0.0210)
B.Sim_atifical_signal	0.5261	0.9308 (0.0171)	0.9417 (0.0103)	0.9292 (0.0095)	0.7859 (0.0197)	0.9125 (0.0109)	0.9312 (0.0154)	0.9188 (0.0111)	0.9329 (0.0087)
C. Sim_ atifical_signal + 20% missing	0.5068	0.9275 (0.0179)	0.9217 (0.0259)	0.9258 (0.0113)	0.7807 (0.0227)	0.9012 (0.0120)	0.9213 (0.0202)	0.9104 (0.0237)	0.9191 (0.0210)
D. Sim_ atifical_signal + 20% missing + imputation	0.5035	0.9167 (0.0184)	0.9300 (0.0111)	0.9213 (0.0119)	0.7824 (0.0200)	0.9058 (0.0117)	0.9275 (0.0148)	0.9121 (0.0081)	0.9232 (0.0099)
US population-based sample	0.6240	0.8889 (0.0201)	0.7192 (0.0314)	0.8828 (0.0229)	0.6813 (0.0378)	0.8089 (0.0217)	0.8692 (0.0284)	0.8675 (0.0241)	0.8234 (0.0189)

ISSN: 1465-542X