Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H, Fei SJ. Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection. World J Gastroenterol 2025; 31(11): 102387 [PMID: 40124266 DOI: 10.3748/wjg.v31.i11.102387]
Corresponding Author of This Article
Su-Juan Fei, MD, Chief Physician, Professor, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, No. 99 West Huaihai Road, Xuzhou 221002, Jiangsu Province, China. xyfyfeisj99@163.com
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Gastroenterol. Mar 21, 2025; 31(11): 102387 Published online Mar 21, 2025. doi: 10.3748/wjg.v31.i11.102387
Construction and validation of machine learning-based predictive model for colorectal polyp recurrence one year after endoscopic mucosal resection
Yi-Heng Shi, Jun-Liang Liu, Cong-Cong Cheng, Wen-Ling Li, Han Sun, Xi-Liang Zhou, Hong Wei, Su-Juan Fei
Yi-Heng Shi, Jun-Liang Liu, Cong-Cong Cheng, Wen-Ling Li, Su-Juan Fei, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Yi-Heng Shi, Cong-Cong Cheng, Wen-Ling Li, The First Clinical Medical College of Xuzhou Medical University, Xuzhou 221002, Jiangsu Province, China
Han Sun, Xi-Liang Zhou, Department of Gastroenterology, Xuzhou Central Hospital, The Affiliated Xuzhou Hospital of Medical College of Southeast University, Xuzhou 221009, Jiangsu Province, China
Hong Wei, Department of Gastroenterology, Xuzhou New Health Hospital, North Hospital of Xuzhou Cancer Hospital, Xuzhou 221007, Jiangsu Province, China
Author contributions: Shi YH and Liu JL conceived and designed the study; Shi YH, Liu JL, Cheng CC, Li WL and Sun H participated in data processing and statistical analysis; Shi YH, Liu JL, Cheng CC, Li WL, Sun H, Zhou XL, Wei H and Fei SJ drafted the manuscript; Shi YH and Liu JL contributed to data analysis and interpretation; Fei SJ supervised the review of the study; All authors seriously revised and approved the final manuscript.
Institutional review board statement: The study was designed as per the Declaration of Helsinki and was conducted according to the TRIPOD guidelines, with ethical approval granted by the Ethics Committee of the Affiliated Hospital of Xuzhou Medical University under the approval number XYFY2023-KL360-01.
Informed consent statement: Written informed consent was waived by the Ethics Committee the Affiliated Hospital of Xuzhou Medical University.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Data sharing statement: No additional data are available.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Su-Juan Fei, MD, Chief Physician, Professor, Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical University, No. 99 West Huaihai Road, Xuzhou 221002, Jiangsu Province, China. xyfyfeisj99@163.com
Received: October 16, 2024 Revised: January 25, 2025 Accepted: February 14, 2025 Published online: March 21, 2025 Processing time: 148 Days and 5.6 Hours
Abstract
BACKGROUND
Colorectal polyps are precancerous diseases of colorectal cancer. Early detection and resection of colorectal polyps can effectively reduce the mortality of colorectal cancer. Endoscopic mucosal resection (EMR) is a common polypectomy procedure in clinical practice, but it has a high postoperative recurrence rate. Currently, there is no predictive model for the recurrence of colorectal polyps after EMR.
AIM
To construct and validate a machine learning (ML) model for predicting the risk of colorectal polyp recurrence one year after EMR.
METHODS
This study retrospectively collected data from 1694 patients at three medical centers in Xuzhou. Additionally, a total of 166 patients were collected to form a prospective validation set. Feature variable screening was conducted using univariate and multivariate logistic regression analyses, and five ML algorithms were used to construct the predictive models. The optimal models were evaluated based on different performance metrics. Decision curve analysis (DCA) and SHapley Additive exPlanation (SHAP) analysis were performed to assess clinical applicability and predictor importance.
RESULTS
Multivariate logistic regression analysis identified 8 independent risk factors for colorectal polyp recurrence one year after EMR (P < 0.05). Among the models, eXtreme Gradient Boosting (XGBoost) demonstrated the highest area under the curve (AUC) in the training set, internal validation set, and prospective validation set, with AUCs of 0.909 (95%CI: 0.89-0.92), 0.921 (95%CI: 0.90-0.94), and 0.963 (95%CI: 0.94-0.99), respectively. DCA indicated favorable clinical utility for the XGBoost model. SHAP analysis identified smoking history, family history, and age as the top three most important predictors in the model.
CONCLUSION
The XGBoost model has the best predictive performance and can assist clinicians in providing individualized colonoscopy follow-up recommendations.
Core Tip: This study is the first to use machine learning methods to construct and validate a prediction model for one year recurrence of colorectal polyps after endoscopic mucosal resection. Key predictors included age, smoking, family history, diarrhea, hazard classification, Helicobacter pylori infection, number and size of polyps. According to receiver operating characteristic curves, sensitivity, specificity, accuracy, precision, and F1 scores, eXtreme Gradient Boosting model has the best performance. Based on this model, an online web calculator was built to help clinicians better distinguish high-risk groups and provide patients with personalized colonoscopy follow-up recommendations.