Minireviews
Copyright ©The Author(s) 2021.
World J Gastroenterol. May 7, 2021; 27(17): 1920-1935
Published online May 7, 2021. doi: 10.3748/wjg.v27.i17.1920
Table 2 Artificial Intelligence in assessment of disease severity in inflammatory bowel disease
Ref.
AI classifier vs comparator
IBD type
Study design and sample size
Modality
Outcomes
Study results/validation cohort
Kumar et al[40], 2012Support vector machines (SVM) vs human observersCDCross-sectional, 50000 images (number of patients not given)Small bowel capsule endoscopyEndoscopic InflammationDatabase of 47 studies including 50000 capsule endoscopy images evaluating severity of small bowel lesions. Method had good precision (> 90% for lesion detection) and recall (> 90%) for lesions of varying severity. Validation cohort included
Biasci et al[41], 2019Logistic regression with an adaptive Elastic-Net penalty. No comparatorCD/UCProspective cohort, 118 IBD patientsTranscriptomics from purified CD8 T cells and/or whole bloodDisease severity, medication escalationA 17-gene qPCR-based classifier stratified patients into two distinct subgroups. IBDhi patients experienced significantly more aggressive disease than IBDlo patients (analogous to IBD2), with earlier need for treatment escalation [HR 2.65 (CD), 3.12 (UC)] and more escalations over time [for multiple escalations within 18 months: sensitivity=72.7% (CD), 100% (UC); negative predictive value = 90.9% (CD), 100% (UC)]. Validation cohort included
Waljee et al[42], 2019RF. No comparatorCDPost-hoc analysis of prospective clinical trials, 401 CD patientsClinical and laboratory data from publicly available clinical trials (UNITI-1, UNITI-2, and IM-UNITI)Crohn's disease remission, C-reactive protein < 5 mg/LA prediction model using the week-6 albumin to C-reactive protein ratio had an AUC of 0.76 [95% confidence interval (CI): 0.71-0.82]. Validation cohort included
Mahapatra et al[43], 2016RF. No comparatorCDCross-sectional, 35 CD patientsAbdominal magnetic resonance imagingSegmentation of diseased colon (intestinal inflammation)Model segmentation accuracy ranged from 82.7% to 92.2%. Validation cohort included
Reddy et al[44], 2019Gradient boosting machines vs logistic regressionCDRetrospective, 3335 CD patientsElectronic medical recordSeverity of intestinal inflammation (by C-reactive protein)Machine-learning-based analytic methods such as gradient boosting machines can predict the inflammation severity with a very high accuracy (AUC) = 92.82%. Validation cohort included
Douglas et al[45], 2018RF. No comparatorPeds CDCross-sectional, 20 CD patients, 20 healthy controlsShotgun metagenomics (MGS), 16S rRNA gene sequencingDisease State (Relapse/Remission)MGS modules significantly classified samples by disease state (accuracy = 68.4%, P = 0.043 and accuracy = 65.8%, P = 0.03, respectively), 16S datasets had a maximum accuracy of 68.4% and P = 0.016 based on strain level for disease state. Validation cohort included
Maeda et al[46], 2019SVM vs human readerUCRetrospective cohort, 187 UC patientsEndocytoscopyHistologic inflammationComputer aided diagnosis (CAD) of histologic inflammation provided diagnostic sensitivity, specificity, and accuracy as follows: 74% (95%CI: 65-81), 97% (95%CI: 95-99), and 91% (95%CI: 83-95), respectively. Its reproducibility was perfect (k = 1). Validation cohort included
Charisis et al[47], 2016SVM vs human readerCDRetrospective cohort, 13 CD patientsWireless capsule endoscopy (WCE) imagesEndoscopic InflammationExperimental results, along with comparison with other related efforts, have shown that the hybrid adaptive filtering [HAF-Differential Lacunarity (DLac) analysis (HAF-DLac)] via SVM approach evidently outperforms them in the field of WCE image analysis for automated lesion detection, providing higher classification results, up to 93.8% (accuracy), 95.2% (sensitivity), 92.4% (specificity) and 92.6% (precision). Validation cohort included
Klang et al[48], 2020Convolutional neural network (CNN) vs human readerCDRetrospective cohort, 49 CD patientsWCE imagesEndoscopic InflammationDataset included 17640 CE images from 49 patients: 7391 images with mucosal ulcers and 10249 images of normal mucosa. For randomly split images results, AUC was 0.99 with accuracies ranging from 95.4% to 96.7%. For individual patient-level experiments, the AUCs were 0.94-0.99. Validation cohort included
Ungaro et al[49], 2021Random survival forest. No comparatorPeds CDRetrospective case-control, 265 peds CD patientsProtein biomarkers using a proximity extension assay (Olink Proteomics)Penetrating and stricturing complicationsA model with 5 protein markers predicted penetrating complications with an AUC of 0.79 (95%CI: 0.76-0.82) compared to 0.69 (95%CI: 0.66-0.72) for serologies and 0.74 (95%CI: 0.71-0.77) for clinical variables. A model with 4 protein markers predicted structuring complications with an AUC of 0.68 (95%CI: 0.65-0.71) compared to 0.62 (95%CI: 0.59-0.65) for serologies and 0.52 (95%CI: 0.50-0.55) for clinical variables. Validation cohort included
Barash et al[50], 2021Ordinal CNN. No comparatorCDRetrospective cohort, 49 CD patientsWCE imagesUlcer Severity GradingThe classification accuracy of the algorithm was 0.91 (95%CI: 0.867-0.954) for grade 1 vs grade 3 ulcers, 0.78 (95%CI: 0.716-0.844) for grade 2 vs grade 3, and 0.624 (95%CI: 0.547-0.701) for grade 1 vs grade 2. Validation cohort included
Lamash et al[51], 2019CNN vs semi-supervised and active learning modelsCDRetrospective cohort, 23 CD patientsMagnetic resonance imagingActive Crohn’s DiseaseCNN exhibited Dice similarity coefficient of 75% ± 18%, 81% ± 8%, and 97% ± 2% for the lumen, wall, and background, respectively. The extracted markers of wall thickness at the location of min radius (P = 0.0013) and the median value of relative contrast enhancement (P = 0.0033) could differentiate active and nonactive disease segments. Other extracted markers could differentiate between segments with strictures and segments without strictures (P < 0.05). Validation cohort included
Takenaka et al[52], 2020Deep neural networks vs human reader (endoscopist)UCProspective cohort, 2012 UC patientsColonoscopy imagesEndoscopic inflammationDeep neural network identified patients with endoscopic remission with 90.1% accuracy (95%CI: 89.2-90.9) and a kappa coefficient of 0.798 (95%CI: 0.780-0.814), using findings reported by endoscopists as the reference standard. Validation cohort included
Bossuyt et al[53], 2020Computer algorithm based on red density (RD) vs blinded central readersUCProspective cohort, 29 UC patients, 6 healthy controlsColonoscopy ImagesEndoscopic and histologic inflammationIn the construction cohort, RD correlated with rhi (r = 0.74, P < 0.0001), Mayo endoscopic subscores (r = 0.76, P < 0.0001) and Endoscopic index of severity scores (r = 0.74, P < 0.0001). The RD sensitivity to change had a standardized effect size of 1.16. in the validation set, RD correlated with rhi (r = 0.65, P = 0.00002). Validation cohort included
Bhambhvani et al[54], 2021CNN vs human reader (endoscopist)UCRetrospective cohort, 777 UC patientsColonoscopy imagesMayo Endoscopic Scores (MES)The final model classified MES 3 disease with an AUC of 0.96, MES 2 disease with an AUC of 0.86, and MES 1 disease with an AUC 0.89. Overall accuracy was 77.2%. Across MES 1, 2, and 3, average specificity was 85.7%, average sensitivity was 72.4%, average PPV was 77.7%, and the average NPV was 87.0%. Validation cohort included
Ozawa et al[55], 2019CNN vs human reader (endoscopist)UCRetrospective cohort, 841 UC patientsColonoscopy imagesMESThe CNN-based CAD system showed a high level of performance with AUC of 0.86 and 0.98 to identify Mayo 0 and 0-1, respectively. The performance of the CNN was better for the rectum than for the right side and left side of the colon when identifying Mayo 0 (AUC = 0.92, 0.83, and 0.83, respectively). Validation cohort included
Bossuyt et al[56], 2021Automated CAD Algorithm vs human readerUCProspective cohort, 48 UC patientsColonoscopy images with confocal laser endomicroscopyHistologic RemissionThe current automated CAD algorithm detects histologic remission with a high performance (sensitivity of 0.79 and specificity of 0.90) compared with the UCEIS (sensitivity of 0.95 and specificity of 0.69) and MES (sensitivity of 0.98 and specificity of 0.61). No validation cohort included
Stidham et al[57], 2019CNN vs human readerUCRetrospective cohort, 3082 UC patientsColonoscopy imagesEndoscopy severityThe CNN was excellent for distinguishing endoscopic remission from moderate-to-severe disease with an AUC of 0.966 (95%CI: 0.967-0.972); a PPV of 0.87 (95%CI: 0.85-0.88) with a sensitivity of 83.0% (95%CI: 80.8-85.4) and specificity of96.0% (95%CI: 95.1-97.1); and NPV of 0.94 (95%CI: 0.93-0.95). No validation cohort included
Gottlieb et al[58], 2021Neural network vs human central readerUCProspective cohort, 249 UC patientsColonoscopy imagesEndoscopy severityThe model's agreement metric was excellent, with a quadratic weighted kappa of 0.844 (95%CI: 0.787-0.901) for endoscopic Mayo Score and 0.855 (95%CI: 0.80-0.91) for UCEIS. No validation cohort included