Copyright
©The Author(s) 2021.
World J Gastroenterol. May 7, 2021; 27(17): 1920-1935
Published online May 7, 2021. doi: 10.3748/wjg.v27.i17.1920
Published online May 7, 2021. doi: 10.3748/wjg.v27.i17.1920
Table 2 Artificial Intelligence in assessment of disease severity in inflammatory bowel disease
Ref. | AI classifier vs comparator | IBD type | Study design and sample size | Modality | Outcomes | Study results/validation cohort |
Kumar et al[40], 2012 | Support vector machines (SVM) vs human observers | CD | Cross-sectional, 50000 images (number of patients not given) | Small bowel capsule endoscopy | Endoscopic Inflammation | Database of 47 studies including 50000 capsule endoscopy images evaluating severity of small bowel lesions. Method had good precision (> 90% for lesion detection) and recall (> 90%) for lesions of varying severity. Validation cohort included |
Biasci et al[41], 2019 | Logistic regression with an adaptive Elastic-Net penalty. No comparator | CD/UC | Prospective cohort, 118 IBD patients | Transcriptomics from purified CD8 T cells and/or whole blood | Disease severity, medication escalation | A 17-gene qPCR-based classifier stratified patients into two distinct subgroups. IBDhi patients experienced significantly more aggressive disease than IBDlo patients (analogous to IBD2), with earlier need for treatment escalation [HR 2.65 (CD), 3.12 (UC)] and more escalations over time [for multiple escalations within 18 months: sensitivity=72.7% (CD), 100% (UC); negative predictive value = 90.9% (CD), 100% (UC)]. Validation cohort included |
Waljee et al[42], 2019 | RF. No comparator | CD | Post-hoc analysis of prospective clinical trials, 401 CD patients | Clinical and laboratory data from publicly available clinical trials (UNITI-1, UNITI-2, and IM-UNITI) | Crohn's disease remission, C-reactive protein < 5 mg/L | A prediction model using the week-6 albumin to C-reactive protein ratio had an AUC of 0.76 [95% confidence interval (CI): 0.71-0.82]. Validation cohort included |
Mahapatra et al[43], 2016 | RF. No comparator | CD | Cross-sectional, 35 CD patients | Abdominal magnetic resonance imaging | Segmentation of diseased colon (intestinal inflammation) | Model segmentation accuracy ranged from 82.7% to 92.2%. Validation cohort included |
Reddy et al[44], 2019 | Gradient boosting machines vs logistic regression | CD | Retrospective, 3335 CD patients | Electronic medical record | Severity of intestinal inflammation (by C-reactive protein) | Machine-learning-based analytic methods such as gradient boosting machines can predict the inflammation severity with a very high accuracy (AUC) = 92.82%. Validation cohort included |
Douglas et al[45], 2018 | RF. No comparator | Peds CD | Cross-sectional, 20 CD patients, 20 healthy controls | Shotgun metagenomics (MGS), 16S rRNA gene sequencing | Disease State (Relapse/Remission) | MGS modules significantly classified samples by disease state (accuracy = 68.4%, P = 0.043 and accuracy = 65.8%, P = 0.03, respectively), 16S datasets had a maximum accuracy of 68.4% and P = 0.016 based on strain level for disease state. Validation cohort included |
Maeda et al[46], 2019 | SVM vs human reader | UC | Retrospective cohort, 187 UC patients | Endocytoscopy | Histologic inflammation | Computer aided diagnosis (CAD) of histologic inflammation provided diagnostic sensitivity, specificity, and accuracy as follows: 74% (95%CI: 65-81), 97% (95%CI: 95-99), and 91% (95%CI: 83-95), respectively. Its reproducibility was perfect (k = 1). Validation cohort included |
Charisis et al[47], 2016 | SVM vs human reader | CD | Retrospective cohort, 13 CD patients | Wireless capsule endoscopy (WCE) images | Endoscopic Inflammation | Experimental results, along with comparison with other related efforts, have shown that the hybrid adaptive filtering [HAF-Differential Lacunarity (DLac) analysis (HAF-DLac)] via SVM approach evidently outperforms them in the field of WCE image analysis for automated lesion detection, providing higher classification results, up to 93.8% (accuracy), 95.2% (sensitivity), 92.4% (specificity) and 92.6% (precision). Validation cohort included |
Klang et al[48], 2020 | Convolutional neural network (CNN) vs human reader | CD | Retrospective cohort, 49 CD patients | WCE images | Endoscopic Inflammation | Dataset included 17640 CE images from 49 patients: 7391 images with mucosal ulcers and 10249 images of normal mucosa. For randomly split images results, AUC was 0.99 with accuracies ranging from 95.4% to 96.7%. For individual patient-level experiments, the AUCs were 0.94-0.99. Validation cohort included |
Ungaro et al[49], 2021 | Random survival forest. No comparator | Peds CD | Retrospective case-control, 265 peds CD patients | Protein biomarkers using a proximity extension assay (Olink Proteomics) | Penetrating and stricturing complications | A model with 5 protein markers predicted penetrating complications with an AUC of 0.79 (95%CI: 0.76-0.82) compared to 0.69 (95%CI: 0.66-0.72) for serologies and 0.74 (95%CI: 0.71-0.77) for clinical variables. A model with 4 protein markers predicted structuring complications with an AUC of 0.68 (95%CI: 0.65-0.71) compared to 0.62 (95%CI: 0.59-0.65) for serologies and 0.52 (95%CI: 0.50-0.55) for clinical variables. Validation cohort included |
Barash et al[50], 2021 | Ordinal CNN. No comparator | CD | Retrospective cohort, 49 CD patients | WCE images | Ulcer Severity Grading | The classification accuracy of the algorithm was 0.91 (95%CI: 0.867-0.954) for grade 1 vs grade 3 ulcers, 0.78 (95%CI: 0.716-0.844) for grade 2 vs grade 3, and 0.624 (95%CI: 0.547-0.701) for grade 1 vs grade 2. Validation cohort included |
Lamash et al[51], 2019 | CNN vs semi-supervised and active learning models | CD | Retrospective cohort, 23 CD patients | Magnetic resonance imaging | Active Crohn’s Disease | CNN exhibited Dice similarity coefficient of 75% ± 18%, 81% ± 8%, and 97% ± 2% for the lumen, wall, and background, respectively. The extracted markers of wall thickness at the location of min radius (P = 0.0013) and the median value of relative contrast enhancement (P = 0.0033) could differentiate active and nonactive disease segments. Other extracted markers could differentiate between segments with strictures and segments without strictures (P < 0.05). Validation cohort included |
Takenaka et al[52], 2020 | Deep neural networks vs human reader (endoscopist) | UC | Prospective cohort, 2012 UC patients | Colonoscopy images | Endoscopic inflammation | Deep neural network identified patients with endoscopic remission with 90.1% accuracy (95%CI: 89.2-90.9) and a kappa coefficient of 0.798 (95%CI: 0.780-0.814), using findings reported by endoscopists as the reference standard. Validation cohort included |
Bossuyt et al[53], 2020 | Computer algorithm based on red density (RD) vs blinded central readers | UC | Prospective cohort, 29 UC patients, 6 healthy controls | Colonoscopy Images | Endoscopic and histologic inflammation | In the construction cohort, RD correlated with rhi (r = 0.74, P < 0.0001), Mayo endoscopic subscores (r = 0.76, P < 0.0001) and Endoscopic index of severity scores (r = 0.74, P < 0.0001). The RD sensitivity to change had a standardized effect size of 1.16. in the validation set, RD correlated with rhi (r = 0.65, P = 0.00002). Validation cohort included |
Bhambhvani et al[54], 2021 | CNN vs human reader (endoscopist) | UC | Retrospective cohort, 777 UC patients | Colonoscopy images | Mayo Endoscopic Scores (MES) | The final model classified MES 3 disease with an AUC of 0.96, MES 2 disease with an AUC of 0.86, and MES 1 disease with an AUC 0.89. Overall accuracy was 77.2%. Across MES 1, 2, and 3, average specificity was 85.7%, average sensitivity was 72.4%, average PPV was 77.7%, and the average NPV was 87.0%. Validation cohort included |
Ozawa et al[55], 2019 | CNN vs human reader (endoscopist) | UC | Retrospective cohort, 841 UC patients | Colonoscopy images | MES | The CNN-based CAD system showed a high level of performance with AUC of 0.86 and 0.98 to identify Mayo 0 and 0-1, respectively. The performance of the CNN was better for the rectum than for the right side and left side of the colon when identifying Mayo 0 (AUC = 0.92, 0.83, and 0.83, respectively). Validation cohort included |
Bossuyt et al[56], 2021 | Automated CAD Algorithm vs human reader | UC | Prospective cohort, 48 UC patients | Colonoscopy images with confocal laser endomicroscopy | Histologic Remission | The current automated CAD algorithm detects histologic remission with a high performance (sensitivity of 0.79 and specificity of 0.90) compared with the UCEIS (sensitivity of 0.95 and specificity of 0.69) and MES (sensitivity of 0.98 and specificity of 0.61). No validation cohort included |
Stidham et al[57], 2019 | CNN vs human reader | UC | Retrospective cohort, 3082 UC patients | Colonoscopy images | Endoscopy severity | The CNN was excellent for distinguishing endoscopic remission from moderate-to-severe disease with an AUC of 0.966 (95%CI: 0.967-0.972); a PPV of 0.87 (95%CI: 0.85-0.88) with a sensitivity of 83.0% (95%CI: 80.8-85.4) and specificity of96.0% (95%CI: 95.1-97.1); and NPV of 0.94 (95%CI: 0.93-0.95). No validation cohort included |
Gottlieb et al[58], 2021 | Neural network vs human central reader | UC | Prospective cohort, 249 UC patients | Colonoscopy images | Endoscopy severity | The model's agreement metric was excellent, with a quadratic weighted kappa of 0.844 (95%CI: 0.787-0.901) for endoscopic Mayo Score and 0.855 (95%CI: 0.80-0.91) for UCEIS. No validation cohort included |
- Citation: Gubatan J, Levitte S, Patel A, Balabanis T, Wei MT, Sinha SR. Artificial intelligence applications in inflammatory bowel disease: Emerging technologies and future directions. World J Gastroenterol 2021; 27(17): 1920-1935
- URL: https://www.wjgnet.com/1007-9327/full/v27/i17/1920.htm
- DOI: https://dx.doi.org/10.3748/wjg.v27.i17.1920