Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models

Table 4 Performance AU-ROC curve of the BOADICEA model and ML algorithms (with standard deviation) predicting breast cancer lifetime risk from simulated datasets (n = 2500) and Swiss clinic-based sample (n = 112,587 women from 2481 families)

Dataset	BOADICEA model	ML: random forest	ML: logistic regression	ML: adapt boosting	ML: linear model	ML: K-nearest neighbors	ML: linear discriminant	ML: quadratic discriminant	ML: MCMC GLMM
A.Sim_no_signal	0.5103	0.5020 (0.0197)	0.5093 (0.0210)	0.5029 (0.0177)	0.5151 (0.0190)	0.5254 (0.0199)	0.5094 (0.0241)	0.5002 (0.0216)	0.5075 (0.0201)
B.Sim_ atifical_signal	0.5392	0.9101 (0.0148)	0.9233 (0.0172)	0.9321 (0.0122)	0.6659 (0.0164)	0.9301 (0.0159)	0.9109 (0.0187)	0.9244 (0.0166)	0.9219 (0.0151)
C.Sim_ atifical_signal + 20% missing	0.5022	0.8977 (0.0183)	0.9100 (0.0293)	0.9291 (0.0156)	0.6407 (0.0257)	0.9232 (0.0180)	0.8982 (0.0276)	0.9209 (0.0297)	0.9088 (0.0219)
D.Sim_ atifical_signal + 20% missing +imputation	0.5115	0.9028 (0.0127)	0.9203 (0.0157)	0.9299 (0.0110)	0.6463 (0.0147)	0.9276 (0.0140)	0.9035 (0.0159)	0.9220 (0.0141)	0.9154 (0.0137)
Swiss clinic-based sample	0.5931	0.8535 (0.0214)	0.8271 (0.0189)	0.9017 (0.0162)	0.6921 (0.0202)	0.8377 (0.0156)	0.7899 (0.0188)	0.8369 (0.0192)	0.8932 (0.0149)

ISSN: 1465-542X