Retrospective Study
Copyright ©The Author(s) 2025.
World J Gastroenterol. Jun 7, 2025; 31(21): 107601
Published online Jun 7, 2025. doi: 10.3748/wjg.v31.i21.107601
Table 1 The data distribution for the classification of normal mucosa, erosions/erythema, ulcers, and polyps

Training set
Testing set
Total
Normal mucosa9162281144
Erosions/erythema27268340
Ulcers15137188
Polyps501125626
Total images18404582298
Table 2 The hyperparameter values of the deep learning models
Type of hyper-parameter
DenseNet121
VGG16
ResNet50
ViT
Number of epochs100100100300
Batch size32323216
Learning rate1 × 10-31 × 10-31 × 10-36 × 10-3
Weight decay4 × 10-44 × 10-44 × 10-44 × 10-4
momentum0.80.80.80.8
OptimizerSGDSGDSGDSGD
Table 3 Baseline demographic and clinical characteristics of enrolled patients (n = 162), n (%)
Patient characteristics
Value
Gender
Female56 (35)
Male106 (65)
Age (year), median (IQR)11.00 (9.00, 13.00)
Chief complaint
Abdominal pain101 (62)
Diarrhea31 (19)
Anemia2 (1.2)
Hematochezia18 (11)
Vomiting13 (8.0)
Fever8 (4.9)
Oral ulceration4 (2.5)
Mucocutaneous hyperpigmentation of the mouth and lips5 (3.1)
Perianal abscess10 (6.2)
Swallowed VCE by the patients137 (85)
Placed VCE by endoscopy25 (15)
Stomach transit time (minute), median (IQR)22 (5, 55)
Small bowel transit time (minute), median (IQR)282 (217, 377)
Number of lesions per patient
None69 (43)
Single33 (20)
Multiple60 (37)
Diagnosis
Crohn disease36 (22)
Ulcerative colitis2 (1.2)
Suspected inflammatory bowel disease30 (19)
Behcet disease2 (1.2)
Disorder of gut-brain interaction31 (19)
Polyps12 (7.4)
Gastroenteritis35 (22)
Others14 (8.6)
Table 4 Accuracy of different deep learning models in detecting lesions of normal mucosa, ulcers, erosions/erythema, and polyps
Model
Overall accuracy (%) (95%CI)
Normal mucosa (%) (95%CI)
Ulcers (%) (95%CI)
Erosions/erythema (%) (95%CI)
Polyps (%) (95%CI)
DenseNet12190.6 (89.2-92.0)98.6 (96.0-100)83.3 (75.6-91.1)81.9 (74.2-89.6)100 (100-100)
VGG1688.3 (87.9-88.8)92.2 (89.7-94.6)91.6 (89.9-93.3)72.1 (63.6-80.6)75.0 (66.8-83.2)
ResNet5090.5 (89.9-91.1)98.1 (96.8-99.4)87.0 (82.3-91.7)77.3 (72.4-82.2)100 (100-100)
ViT88.1 (86.7-89.6)93.2 (88.5-97.9)87.4 (77.6-97.3)73.0 (61.3-84.8)87.9 (76.3-99.5)
Table 5 The overall performances of the deep learning models in the training set
Model
Accuracy (%) (95%CI)
Precision (%) (95%CI)
Recall (%) (95%CI)
F1-score (%) (95%CI)
AU-ROC (%) (95%CI)
DenseNet12190.6 (89.2-92.0)91.8 (89.6-94.0)91.0 (89.8-92.1)91.2 (89.4-92.9)93.7 (92.9-94.5)
VGG1688.3 (87.9-88.8)83.0 (80.7-85.3)82.8 (81.0-84.6)82.6 (81.9-83.3)89.2 (88.1-90.3)
ResNet5090.5 (89.9-91.2)92.5 (91.6-93.3)90.7 (90.0-91.3)91.3 (90.7-91.9)93.4 (93.1-93.8)
ViT88.1 (86.7-89.6)84.4 (78.5-90.3)85.4 (81.0-89.7)84.6 (80.0-89.2)90.4 (88.1-92.7)
Table 6 Pairwise comparisons of the models’ overall performances in the training set (P value)
Evaluated metrics
DenseNet121 vs VGG16
DenseNet121 vs ResNet50
DenseNet121 vs ViT
VGG16 vs ResNet50
VGG16 vs ViT
ResNet50 vs ViT
Accuracy0.0040.9990.0020.0060.9780.003
Precision0.0010.9810.0030.0010.850.001
Recall0.0010.9950.0020.0010.2120.003
F1-score0.0010.9990.0010.0010.4210.001
AU-ROC0.0010.9840.0010.0010.3040.002
Table 7 The overall performances of the deep learning models in the testing set
Model
Accuracy (%)
Precision (%)
Recall (%)
F1-score (%)
AU-ROC (%)
DenseNet12188.687.579.082.587.1
VGG1685.587.073.377.383.6
ResNet5089.787.881.083.888.5
ViT86.681.380.080.587.5