Letter to the Editor
Copyright ©The Author(s) 2022. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Gastroenterol. Feb 7, 2022; 28(5): 605-607
Published online Feb 7, 2022. doi: 10.3748/wjg.v28.i5.605
Machine learning models and over-fitting considerations
Paris Charilaou, Robert Battat
Paris Charilaou, Robert Battat, Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, New York, NY 10021, United States
Author contributions: Charilaou P and Battat R drafted and edited the manuscript, and reviewed the intellectual content.
Conflict-of-interest statement: The authors have no conflict of interest to declare.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Robert Battat, MD, Assistant Professor, Jill Roberts Center for Inflammatory Bowel Disease - Division of Gastroenterology & Hepatology, Weill Cornell Medicine, 1315 York Avenue, New York, NY 10021, United States. rob9175@med.cornell.edu
Received: October 26, 2021
Peer-review started: October 26, 2021
First decision: December 27, 2021
Revised: December 29, 2021
Accepted: January 14, 2022
Article in press: January 14, 2022
Published online: February 7, 2022

Machine learning models may outperform traditional statistical regression algorithms for predicting clinical outcomes. Proper validation of building such models and tuning their underlying algorithms is necessary to avoid over-fitting and poor generalizability, which smaller datasets can be more prone to. In an effort to educate readers interested in artificial intelligence and model-building based on machine-learning algorithms, we outline important details on cross-validation techniques that can enhance the performance and generalizability of such models.

Keywords: Machine learning, Over-fitting, Cross-validation, Hyper-parameter tuning

Core Tip: Machine learning models are increasingly being used in clinical medicine to predict outcomes. Proper validation techniques of these models are essential to avoid over-fitting and poor generalization on new data.