Letter to the editor: Response to Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW

Ming, Chang; Viassolo, Valeria; Probst-Hensch, Nicole; Chappuis, Pierre O.; Dinov, Ivo D.; Katapodi, Maria C.

doi:10.1186/s13058-020-01274-x

Letter
Open access
Published: 10 April 2020

Letter to the editor: Response to Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW

Chang Ming¹,
Valeria Viassolo²,
Nicole Probst-Hensch³,
Pierre O. Chappuis^2,4,
Ivo D. Dinov^5,6,7,8 &
…
Maria C. Katapodi^1,8

Breast Cancer Research volume 22, Article number: 35 (2020) Cite this article

1383 Accesses
1 Citations
Metrics details

The Original Article was published on 20 June 2019

We appreciate the opportunity to submit a response to the Letter to the Editor by Giardiello and colleagues [1] addressing our publication in Breast Cancer Research [2].

Giardiello and colleagues mentioned that our machine learning (ML) models were not specific for survival data. BCRAT/BOADICEA were developed and validated using survival data with binary outcomes and retrospective case control/cross-sectional data, respectively [3]. Their clinical application requires only cross-sectional data. Our ML models included the same risk factors and data structure in each comparison as BCRAT/BOADICEA. To avoid exaggerating the function of ML models, we generated the probability of whether a woman at a given age would develop breast cancer in her life, and not specific time frame risks (5-year or 10-year risk).

Giardiello and colleagues mentioned that our validation was unfair because we applied only internal validation processes. Cross-validation is not equivalent to internal validation; it is a statistical out-of-sample testing technique, which pools the results across many iterations, while each fold and each iteration do not blend training and testing data. A slight bias (aka surrogate problem) occurs because the cross-validation training sets are smaller than the original dataset. A 10-fold cross-validation relies on training sets that include 90% of the original dataset. In our study, this translated into two considerable sample sizes, n₁ = 1029 from the US population-based data and n₂ = 2233 from the Swiss clinic-based data. This lower-sample-size bias often translates into more conservative fit/prediction estimates [4].

Giardiello and colleagues mentioned that a fair comparison of the final models requires reporting parameter estimates and calibration. Reporting parameter estimates and their confidence intervals in the final model is not always possible [5]. We generated 80 parameter estimates for each risk factor based on different ML algorithms and different cross-validation summary approaches. The interpretation and usefulness of these estimates for each risk factor is limited without reference values from BCRAT/BOADICEA. Moreover, better/worse calibration does not lead to better/worse class-based or probability-based predictions [6]. Calibration comparisons was not our aim. ML may generate “aggressive” prediction calibration for minor classes due to “increased” sample size through rebalancing processes. Several recalibration methods can be applied and significantly improve some of the ML calibrations and predicted probabilities [6], making calibration comparisons of ML to BRCAT/BOADICEA unfair. Calibrated predicted probabilities should also fit clinically meaningful sensitivity and specificity for patient stratification, instead of one cutoff (cancer/no cancer) [7].

A prediction model cannot be developed, validated, and tested for utility at once. However, the development and validation of our ML models improved model predictive accuracy efficiently, i.e., using less time and fewer resources. Investing into promising new analytic approaches would improve research in the field of disease prediction and significantly further our knowledge about the potential application of ML in personalized medicine.

Availability of data and materials

Not applicable.

References

Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW. Letter to the editor: a response to Ming’s study on machine learning techniques for personalized breast cancer risk prediction. Breast Cancer Res. 2020;22(1):17.
Article Google Scholar
Ming C, Viassolo V, Probst-Hensch N, Chappuis PO, Dinov ID, Katapodi MC. Machine learning techniques for personalized breast cancer risk prediction: comparison with the BCRAT and BOADICEA models. Breast Cancer Res. 2019;21(1):75.
Article Google Scholar
Wang X, Huang Y, Li L, Dai H, Song F, Chen K. Assessment of performance of the Gail model for predicting breast cancer risk: a systematic review and meta-analysis with trial sequential analysis. Breast Cancer Res. 2018;20(1):18.
Article Google Scholar
Steyerberg EW, Harrell FE, Borsboom GJJM, Eijkemans MJC, Vergouwe Y, Habbema JDF. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001;54(8):774–81.
Article CAS Google Scholar
Luo W, Phung D, Tran T, Gupta S, Rana S, Karmakar C, et al. Guidelines for developing and reporting machine learning predictive models in biomedical research: a multidisciplinary view. J Med Internet Res. 2016;18(12):e323.
Article Google Scholar
Niculescu-Mizil A, Caruana R. Predicting good probabilities with supervised learning. Proceedings of the 22nd international conference on Machine learning. Bonn: Association for Computing Machinery; 2005. p. 625–32.
Google Scholar
Brinton JT, Hendrick RE, Ringham BM, Kriege M, Glueck DH. Improving the diagnostic accuracy of a stratified screening strategy by identifying the optimal risk cutoff. Cancer Causes Control. 2019;30(10):1145–55.
Article Google Scholar

Download references

Acknowledgments

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Clinical Research, Faculty of Medicine, University of Basel, Missionstrasse 64, 2 OG – Room 007, 4055, Basel, Switzerland
Chang Ming & Maria C. Katapodi
Oncogenetics and Cancer Prevention, Geneva University Hospitals, Geneva, Switzerland
Valeria Viassolo & Pierre O. Chappuis
Swiss Tropical and Public Health Institute, University of Basel, Basel, Switzerland
Nicole Probst-Hensch
Genetic Medicine, Geneva University Hospitals, Geneva, Switzerland
Pierre O. Chappuis
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Ivo D. Dinov
Michigan Institute for Data Science, University of Michigan, Ann Arbor, MI, USA
Ivo D. Dinov
Statistics Online Computational resource, University of Michigan, Ann Arbor, MI, USA
Ivo D. Dinov
University of Michigan School of Nursing, Ann Arbor, MI, USA
Ivo D. Dinov & Maria C. Katapodi

Authors

Chang Ming
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Viassolo
View author publications
You can also search for this author in PubMed Google Scholar
Nicole Probst-Hensch
View author publications
You can also search for this author in PubMed Google Scholar
Pierre O. Chappuis
View author publications
You can also search for this author in PubMed Google Scholar
Ivo D. Dinov
View author publications
You can also search for this author in PubMed Google Scholar
Maria C. Katapodi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Manuscript preparation: CM, ID, MK. Manuscript editing: VV, NP, PC. The authors read and approved the final manuscript.

Corresponding author

Correspondence to Chang Ming.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Ming, C., Viassolo, V., Probst-Hensch, N. et al. Letter to the editor: Response to Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW. Breast Cancer Res 22, 35 (2020). https://doi.org/10.1186/s13058-020-01274-x

Download citation

Received: 13 March 2020
Accepted: 02 April 2020
Published: 10 April 2020
DOI: https://doi.org/10.1186/s13058-020-01274-x

Letter to the editor: Response to Giardiello D, Antoniou AC, Mariani L, Easton DF, Steyerberg EW

Availability of data and materials

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Breast Cancer Research

Contact us