INTRODUCTION
The use of artificial intelligence (AI) and machine learning (ML) in the medical field has shown promising advancements, particularly in the detection and prediction of colorectal polyps and colorectal cancer. These technologies are being integrated into colonoscopy procedures to enhance the detection rates of polyps, which are precursors to colorectal cancer, and to predict the recurrence of colorectal cancer after treatment. By leveraging electronic health records, imaging data, and histopathological features, ML models aim to enhance the early detection of colonic polyps and improve patient outcomes. A new way to use ML was demonstrated in the study by Shi et al[1] recently published in the World Journal of Gastroenterology, in which they tried to predict polyp recurrence within one year after polypectomy.
Several studies have developed ML models to predict the presence of colorectal polyps using noninvasive methods. For instance, a study utilized electronic health records to create a diagnostic model using the adaptive boosting machine algorithm, achieving an area under curve (AUC) of 0.675, indicating moderate predictive performance for colorectal polyps[2]. Another study focused on deep learning algorithms for real-time polyp detection during colonoscopies, achieving high sensitivity and specificity with an AUC of 0.984, demonstrating the potential of ML in enhancing colonoscopy accuracy[3]. ML has also been used to differentiate between benign and premalignant polyps by using computed tomography colonography. A random forest model achieved an AUC of 0.91, showing promise in noninvasively distinguishing polyp types, which is crucial for determining appropriate treatment strategies[4].
According to recent meta-analyses, AI-based systems have significantly improved the detection rates of adenomas and polyps during colonoscopy. Studies have shown that colonoscopies utilizing AI have higher adenoma detection rates and polyp detection rates than colonoscopies without AI. Specifically, AI systems increased adenoma detection rates to 29.6% compared to 19.3% without AI, and polyp detection rates to 45.4% compared to 30.6% without AI, demonstrating a substantial improvement in detection capabilities[5,6]. These systems are particularly effective in identifying small non-advanced adenomas, although they do not show a significant difference in detecting advanced adenomas[5]. Moreover, these systems have demonstrated high accuracy in histology prediction, with an AUC of 0.96, and have outperformed non-expert endoscopists in both detection and characterization tasks[7]. Timely detection and removal of colonic polyps with subsequent appropriate surveillance programs are crucial for reducing mortality due to colon cancer. The recurrence rate of colonic polyps after endoscopic mucosal resection (EMR) varies depending on the technique. Generally, recurrence rates are higher with standard EMR than with advanced techniques. The local recurrence rate for polyps ≥ 10 mm removed with standard EMR varies between 11% and 20% up to 12 months follow-up according to different meta-analyses[8,9]. Advanced EMR techniques, such as cold snare EMR[10], argon plasma coagulation, snare tip soft coagulation[11], and endoscopic submucosal dissection[12] significantly reduce recurrence rates.
Unfortunately, the high risk of metachronous polyps after bowel screening polypectomy requires a surveillance program that is determined by the characteristics of the removed polyp, such as size, number, and morphology. According to these criteria, patients are classified as high-risk, intermediate/low-risk, and corresponding intervals for surveillance are established by guidelines[13-18]. In addition to the obvious impact of surveillance on the early diagnosis and curative treatment of colon cancer, it has been widely criticized for its poor cost efficacy, low compliance, high demand for resources, and underestimation of patient characteristics for the risk of metachronous polyps after bowel screening polypectomy. Up to 20% of total colonoscopies are performed for surveillance after polypectomy, but only 36,6% of them were prescribed correctly[19]; however, during the same period, the demand for colonoscopy for average-risk screening increased nearly 3-fold[20]. This means that more resources will be necessary for screening, but at least the 1/5 of them are already used for surveillance programs that are often non-compliant. The compliance with surveillance guidelines varies significantly. In Israel, only 57.4% of the recommendations for surveillance were compatible with the guidelines, whereas 37% of the recommendations were for shorter interval[21]. Some studies reported high adherence rates, such as 86.5% compliance with British guidelines[22], while others indicated much lower adherence, such as 13.8% compliance with American guidelines[23]. Interestingly, in this study, 25.5% of the patients underwent surveillance endoscopy earlier than recommended, and none were diagnosed with malignancy. However, 45.8% of the patients had surveillance scopes later than recommended or were lost to follow-up. Among these patients, two actually were diagnosed with malignancy 3 and 5 years after their recommended surveillance scope date, respectively[23]. A recent meta-analysis showed that 38% of surveillance colonoscopies were performed earlier than their respective national clinical guidelines suggested[24]. Analysis of Medicare beneficiaries in the United States showed that at five years after bowel screening polypectomy only 45.7% received another colonoscopy, with 32.3% of procedures including polypectomy[25]. Moreover, the use of colonoscopy for surveillance has decreased over the four-year study period. Coupled with other data showing the overuse of follow-up colonoscopy in patients without polyps, there appears to be a significant discordance between guidelines and actual practice. Compliance variability may contribute to the poor cost-efficacy of surveillance programs, which is one of the clear disadvantages that have become evident in recent years. A decision analysis model comparing strategies for performing or not performing one-year endoscopic surveillance in 60-year-old patients who underwent an initial endoscopic polypectomy demonstrated that 345 colonoscopies per year are needed to detect one colorectal cancer case and 1437 colonoscopies to prevent one colorectal cancer-related death[26].
Extending intervals for surveillance colonoscopy for high-risk patients for 3 years, which is accepted in the majority of guidelines, definitely reduces costs but does not seem to increase the efficacy of surveillance. A retrospective analysis involving 33011 patients who had adenomas removed during colonoscopies at 17 hospitals in the United Kingdom revealed that, in the absence of surveillance, the incidence of colorectal cancer was comparable to that of the general population for both low-risk [standardized incidence ratios: 0.86, 95% confidence interval (CI): 0.73-1.02] and intermediate-risk (1.16, 0.97-1.37) groups. However, it was notably higher among high-risk patients (1.91, 1.39-2.56)[27]. However, only 9% of the study population was classified as high-risk. The authors concluded that low- or intermediate-risk patients could be managed by screening rather than surveillance. According to these data updated guidelines in many countries simplify the findings of an index colonoscopy into two categories: “low-risk” in which surveillance is not needed and “high-risk” for which surveillance is recommended[13,18,28]. However, the United States Multi-Society Task Force classifies them into six risk categories with different recommendations[17].
A multicentered study which was conducted in United Kingdom and including patients who underwent polypectomy during screening colonoscopy (2009-2016) followed by surveillance retrospective analysis showed that the rate of non-advanced and advanced metachronous polyps was higher in British Society of Gastroenterology (BSG) 2020 high-risk vs low-risk patients (44.4% vs 35.4% for non-advanced and 15.7% vs 11.8% for advanced, P < 0.001), but the colorectal cancer rate was similar (0.6% vs 1.2%)[29]. This means that the BSG 2020 criteria correlated with metachronous polyps but did not differentiate between advanced and non-advanced lesions and were not predictive of cancer. The results of these studies indicated that risk stratification may benefit from refinement. It seems logical to evaluate the addition of a panel of novel risk factors for metachronous lesion development to the existing risk scores based on polyp number and morphology. Multiple factors such as patient characteristics or proteomic and genomic features of the index polyp tissue may be used[30], with the aim of increasing the positive yield of surveillance colonoscopy and reducing unnecessary procedures for those at a lower risk.
According to a meta-analysis, the detection of colorectal cancer and advanced polyps during surveillance colonoscopy in older individuals was higher than that in younger controls; however, the absolute risk increase for both was small[31]. In most guidelines, the age of the patient is used as a rule for restriction of surveillance rather than a risk factor for polyp recurrence. Thus, the guidelines of the BSG indicate that surveillance should only be performed in people whose life expectancy is greater than 10 years, and in general not in people older than 75 years[13]. The European Society of Gastrointestinal Endoscopy guidelines recommend cessation of surveillance at the age of 80 years or if the expected life expectancy is short due to comorbidities[28]. In contrast, in Japanese guidelines, the age of patients was not mentioned as surveillance endoscopy continued even in the 80s due to the longest life expectancy in the world[14].
Obesity and metabolic syndrome components are also considered important risk factors for metachronous polyps after bowel screening polypectomies. Thus, in a retrospective cohort study including 7473 participants with a median follow-up of 8,5 years after index polypectomy, 619 participants (8.3%) developed advanced colorectal neoplasms. Weight gain of ≥ 3% from baseline was reported as an independent risk factor for metachronous advanced colorectal neoplasm in both men and women, regardless of age[32]. Interestingly, weight loss due to bariatric surgery mitigates the risk of metachronous polyps, mainly in men, coincided with improvement in metabolic syndrome parameters[33,34]. However, components of metabolic syndrome may be associated with different types of lesions. A case-control study of 828 subjects without diabetes and no family or personal history of colorectal cancer showed that abdominal obesity, hypertension, and high HbA1c percentage were independently associated with adenomas, whereas a high triglyceride to high-density lipoprotein cholesterol ratio was associated with serrated polyps[35]. These patient characteristics [such as body mass index (BMI), metabolic syndrome components, and routine blood analyses] are easily accessible through electronic health records and may help select high-risk patients for metachronous polyps after bowel screening polypectomy.
Smoking is a significant risk factor for recurrence of colon polyps after polypectomy. The risk is notably higher in those with a history of heavy smoking, as indicated by pack-years[36-38]. Current smokers also show increased odds of developing hyperplastic polyps, particularly in the distal colon[39]. While smoking cessation reduces the risk slightly, former smokers still face a higher risk of recurrence than never-smokers[36,40]. This finding suggests that the effects of smoking on polyp recurrence are not entirely reversible and underscores the importance of smoking cessation programs as part of post-polypectomy care to mitigate the risk of recurrence.
Whether adding these or more patient characteristics to the established risk factors for metachronous polyps after bowel screening polypectomy will improve the efficacy of surveillance is unknown. However, the results of a study recently published in World Journal of Gastroenterology, in which Shi et al[1] constructed an ML-based predictive model for the relapse of the colonic polys one year after polypectomy clarify this point. Data from 1694 patients who underwent their first EMR for colorectal polyp removal with a one-year follow-up colonoscopy were retrospectively collected at three medical centers. In addition, 166 patients were prospectively enrolled to test the generalizability of the model. The dataset from the retrospective cohort was randomly divided into the training and validation sets. The training set was used to develop the model, allowing it to learn data patterns and extract effective features, whereas the validation set was used to evaluate the model’s performance and identify any overfitting challenges. To build the predictive models, various ML algorithms were utilized, including support vector machine, random forest, decision trees, and Extreme Gradient Boosting (XGBoost). Finally, an interactive and visual web-based calculator was developed.
Authors used in the model constructing process polyp-related features which were typical for all modern guidelines for surveillance and patient-related variables like age, sex, family history, BMI, Helicobacter pylori (H. pylori) infection, smoking, hematochezia, constipation, diarrhea, diabetes, hypertension, coronary heart disease, hyperlipidemia, and alcohol consumption. They also used serum levels of uric acid, total bilirubin, total bile acid, hypersensitive C-reactive protein, carcinoembryonic antigen, and carbohydrate antigens (CA, including CA724, CA199, and CA242). Multivariate analysis revealed that eight variables were independent predictors of colorectal polyp recurrence one year after EMR. These variables included age, family history, smoking, diarrhea, polyp size, number of polyps, H. pylori infection, and hazard classification (non-neoplastic polyps as reference, non-progressive adenoma, and progressive adenoma). Among the models, XGBoost demonstrated the highest AUC in the training, internal validation, and prospective validation sets, with AUCs of 0.909 (95%CI: 0.89-0.92), 0.921 (95%CI: 0.90-0.94), and 0.963 (95%CI: 0.94-0.99), respectively. The importance ranking of the feature variables in the XGBoost model, from highest to lowest, was as follows: Smoking, family history, age, number of polyps ≥ 3, progressive adenoma, diarrhea, H. pylori infection, polyp size > 1 cm, non-progressive adenoma, and polyp size 0.5-1 cm.
Among the four ML algorithms used by the authors, XGBoost demonstrated the highest AUCs for all datasets. However, different ML models have been used successfully in other studies. The random forest ML model demonstrated good performance, with an AUC of 0.859 for young-onset colorectal cancer risk stratification[41]. The Light Gradient Boosting Machine algorithm was successfully used for the development and internal validation of an ML-based colorectal cancer risk prediction model[42]. The adaptive boosting machine model exhibits the best performance among the nine ML models in the development and validation of ML algorithms for the prediction of colorectal polyps based on electronic health records[2]. Therefore, it is a good approach to test as many ML algorithms as possible, which are suitable for the selected task and choose the one that demonstrates the best performance.
High accuracy for the prediction of recurrent polyps is based on the unique approach in which ML provides a weighted importance rank of all risk factors against each other[1]. This can be clearly demonstrated when you try to use the developed application (https://webcalculatorsyh.shinyapps.io/XGBoost/) to put different values for patient variables (Figure 1). From Figure 1, it is clear that patient of 60 years old with less than three non-neoplastic polyps (≥ 0.5 cm has a chance of recurrence of 10% after 1 year. However, patients with the same polyp features and age, but who are H. pylori-positive smokers with diarrhea, have a higher chance of recurrence (> 86%). Smokers that were different only by age (60 years old vs 30 years old) demonstrate the 3 times difference in recurrence chance (39.1% vs 12.8%). Interestingly, the importance rank of the variables related to patients in some scenarios was higher than that of the variables related to polyps.
A clear advantage of this approach is the opportunity to personalize surveillance. According to the guidelines based only on polyp features, all patients shown in Figure 1 will be excluded from the surveillance program and move to screening, even when the prediction of recurrence is 86%. The limitation of this model is the datasets that were used by ML. This model provides excellent results in predicting recurrent colonic polyps in a Chinese population dataset on which the model was trained, but will it be also effective in other population? In China, a large multicenter study found that the majority of recurrent polyps after removal occur almost entirely within the first year, with a rate approaching 60%[43]. Therefore, the surveillance intervals recommended by the Chinese expert consensus on colonoscopy are significantly shorter than those recommended by other guidelines. If a similar model is developed for other populations, it will be necessary to create representative datasets and it is highly likely that the list of independent predictors of polyp recurrence will be different, and its importance rank will not be similar to this model. For example, in economically developed countries in the western hemisphere, the prevalence of H. pylori is much lower than that in China, and the prevalence of obesity and metabolic syndrome is higher, which may change their importance rank in prediction models for colonic polyp recurrence.
Developing ML models for medical applications faces several significant barriers to generalization, such as the ability of a model to perform effectively across diverse patient populations and clinical settings. Medical data often exhibit substantial variability due to differences in patient demographics, disease prevalence, and clinical practices across institutions. This heterogeneity can lead to models that perform well on training data but fail to generalize to new settings. In addition, issues such as incomplete or inconsistent data impede model reliability[44]. Variations in data collection methods, equipment, and electronic health record systems across healthcare institutions introduce inconsistencies that hinder model generalization. Differences in coding definitions, laboratory assays, and imaging protocols can result in models that are not transferable between settings[45]. Fortunately, for the ML model in predicting the recurrence of colonic polyps, some barriers are not irresistible. Endoscopic equipment in many countries is comparable in terms of technical abilities, guidelines for colon preparation, colonoscopy procedure protocols, and polyp description and classification are well standardized, making at least a part of the ML algorithm, which uses polyp-related features, less hindered for generalization. If the training data are not representative of the broader patient population, ML models may perpetuate existing biases, leading to disparities in healthcare outcomes. For example, models trained predominantly on data from one demographic or ethnic group may underperform when applied to under-represented populations. However, this type of barrier is possible to overcome only by collected new dataset from representative population and repeat the protocol of Shi et al[1]. In other words, distributing toolkits or shareable data science notebooks as long as researchers can train and validate local models locally seems the only way to effectively implement ML models for predicting the recurrence of colonic polyps in different settings.
Integrating AI into medical decision making raises several important ethical considerations that must be carefully addressed to ensure patient safety, equity, and trust within healthcare systems. First of all, patient privacy and data security have emerged as significant concerns owing to the large volumes of sensitive patient data required by ML. Current data protection frameworks may not fully safeguard against unauthorized access or misuse, highlighting the need for stronger protection to prevent privacy breaches and data exploitation[46]. However, it is a major problem for any ML study in which patient factors are used, which sometimes extend far from the depersonalized results of the frequency food questionnaire or the number of cigarettes smoked. Thus, the concept of informed consent requires careful consideration. Patients should not only be explicitly informed about AI involvement in their care, including the role of AI, potential benefits, and associated risks, but should also decide what of their data may be used for ML training, ensuring respect for patient autonomy. However, more important for the implementation of the results of the study is the problem of dataset bias, as AI-driven systems have the potential to perpetuate biases found in historical healthcare data, leading to disparities in diagnosis, treatment, prediction of recurrence, and overall care[47]. Therefore, it is essential to develop and implement algorithms trained on diverse and representative datasets to mitigate these biases and promote fairness. Shi et al[1] attempted to reduce the dataset bias by randomly dividing the dataset from a retrospective cohort of 1694 patients into training and validation sets. However, it does not exclude the bias related to ethnically and geographically homogenous populations included in the dataset, and the time frame selected by the authors for retrospective inclusion of cases may be more important for dataset bias. Thus, endoscopic diagnostic techniques used during the last 2 years are far more sophisticated than those used 10 years ago; therefore, the bias may be related to different evaluations of the polyps, as the authors used in the model constructing process polyp-related features. Authors selected 5-year period of inclusion of cases from three hospitals in the dataset[1], which seems minimally acceptable considering the very fast implementation of innovations in endoscopy. For many AI-driven systems, the issue of transparency and explainability arises due to the “black box” nature of certain ML models, potentially limiting healthcare professionals’ ability to interpret and trust AI-driven recommendations. Therefore, enhancing the interpretability of AI systems is critical for maintaining accountability and supporting informed clinical decision-making. Fortunately, for the ML model used by Shi et al[1], all factors selected by ML can be easily explained as to why they are associated with increased colonic polyp recurrence, as supported by a number of published studies. However, it is more interesting why other factors, such as BMI or metabolic syndrome components, which are known risk factors for colonic polyps, were not selected during the ML process. It is universally agreed that AI should function as a support rather than a replacement for healthcare professionals’ judgment, and professional oversight remains crucial to ensure that clinical decisions integrate AI recommendations with clinical expertise and patient values[48]. In the case of the prediction of polyp recurrence, it is clear that the ML algorithm proposed by Shi et al[1] already includes polyp-related risk factors on which existing guidelines for surveillance are based, but it provides a more personalized and better risk assessment tool for healthcare professionals, rather than replacing existing standards of care.
The advent of ML techniques for colonic polyp detection has generated a significant shift towards enhancing the efficacy and cost-effectiveness of screening procedures. AI-assisted colonoscopy has consistently demonstrated the ability to increase adenoma detection rates and reduce adenoma miss rates compared with standard colonoscopy. Studies have shown that AI systems can detect more adenomas per colonoscopy and improve the polyp detection rate, particularly for diminutive and serrated lesions, which are often missed during conventional procedures[49-51]. By enhancing detection accuracy, AI reduces the likelihood of interval colorectal cancers, which develop between screening intervals, thereby lowering the long-term costs associated with cancer treatment[52]. AI systems enable real-time characterization of polyps, distinguishing between neoplastic and non-neoplastic lesions with high accuracy. This capability supports the adoption of “leave-in-situ” and “resect-and-discard” strategies, which avoid unnecessary polypectomies and histopathological examinations for benign lesions[53,54]. For instance, a study in Spain found that AI-assisted colonoscopy avoided 173 polypectomies and 370 histopathologies per 1000 patients, leading to significant cost savings[55].
AI systems provide real-time assistance during colonoscopy, reducing the time required for examination by optimizing lesion detection and characterization. This increased efficiency enables endoscopists to perform more procedures within the same timeframe, thereby improving productivity and reducing operational costs[56]. Additionally, AI can standardize quality metrics such as withdrawal time and bowel preparation adequacy, further enhancing the overall efficiency of colonoscopy services[57]. AI can enhance the performance of noninvasive screening tests, such as fecal immunochemical tests, by improving the detection of advanced adenomas and early-stage cancers. A novel AI-based algorithm combining stool biomarkers and fecal immunochemical tests analysis achieved high sensitivity (82.2%) for advanced adenomas and specificity (90.1%) for non-neoplastic lesions, reducing the need for unnecessary colonoscopies and optimizing the diagnostic workflow[58,59].
Multiple cost-effectiveness analyses have demonstrated that AI-assisted colonoscopy is a cost-effective strategy for colorectal cancer screening. For example, a Canadian study found that the incremental cost-effectiveness ratio for AI-assisted colonoscopy was dominant, meaning that it was both more effective and less costly than conventional colonoscopy[60]. Similarly, an Italian study reported that the implementation of AI systems resulted in cost savings per patient, primarily due to the reduced costs associated with colorectal cancer care[53]. AI-assisted surveillance programs can optimize the colonoscopy capacity by reducing the number of procedures required for low-risk patients. Personalized risk stratification based on polyp characteristics and patient risk factors allows for tailored surveillance intervals (increased or shortened) and is cost-effective[61]. This approach not only reduces healthcare costs but also alleviates the burden on endoscopic resources, making screening programs more scalable, especially in low-income and middle-income countries[54]. The study performed by Shi et al[1] is the first in which the ML algorithm was developed and validated for personalized surveillance. Only further studies can demonstrate whether it is cost-effective compared with existing surveillance programs. There are two possible scenarios: In countries such as China or Japan, where shorter intervals for colonoscopy after polyp removal are usually recommend ML-based surveillance programs may reduce the number of unnecessary colonoscopies. In contrast, in the United States and EU, where longer intervals for surveillance or even allocation of low-risk patients into screening programs are recommended, ML-based surveillance programs may achieve cost efficacy by decreasing colon cancer incidence due to a higher rate of detection of recurrent polyps. Despite the promising cost-effectiveness of AI-assisted screening and surveillance programs, several challenges remain. The high upfront costs of AI systems and the lack of reimbursement frameworks in many healthcare systems pose significant barriers to their widespread adoption[62]. Additionally, the long-term clinical benefits of AI in reducing colorectal cancer incidence and mortality have not yet been fully established, necessitating further research to validate its impact.