Review
Copyright ©The Author(s) 2021.
Artif Intell Gastroenterol. Dec 28, 2021; 2(6): 141-156
Published online Dec 28, 2021. doi: 10.35712/aig.v2.i6.141
Table 1 Strengths and weaknesses of machine learning methods in development of artificial intelligence models for gastrointestinal pathology
AI model
Advantages
Disadvantages
Traditional ML (supervised)Allows users to produce a data output from the previously labeled training setLabeling big data can be time-consuming and challenging
Users can reflect domain knowledge featuresAccuracy depends heavily on the quality of feature extraction
Traditional ML (unsupervised)Users do not label any data or supervise the modelInput data is unknown and not labeled by users
Can detect patterns automatically Users cannot get precise information regarding data sorting
Save timeChallenges during interpreting
CNNDetects the important information and features without labelingA large training data is required
High performance in image recognitionLack of interpretability (black boxes)
FCNProvides computational speedRequires large amounts of labeled data for training
Automatically eliminates the background noiseHigh labeling cost
RNNCan decide which information to remember from its past experienceHarder to train the model
A deep learning model for sequential dataHigh computational cost
MILDoes not require detailed annotationA large amount of training data is required
Can be applied to large data setsHigh computational cost
GANGenerates new realistic data resembling the original dataHarder to train the model
Table 2 Artificial intelligence-based applications in gastric cancer
Ref.
Task
No. of cases/data set
Method
Performance
Duraipandian et al[89]Classification700 slidesGastricNetAccuracy (100%)
Cosatto et al[72]> 12000 WSIsMILAUC (0.96)
Sharma et al[31]454 casesCNNAccuracy (69%)
Qu et al[90]9720 imagesDLAUCs (up to 0.97)
Yoshida et al[32]3062 gastric biopsiesMLOverall concordance rate (55.6%)
León et al[91]40 imagesCNN Accuracy (up to 89.7%)
Liang et al[92]1900 images DLAccuracy (91.1%)
Sun et al[93]500 images DL Accuracy (91.6%)
Tomita et al[94]502 images1Attention-based DLAccuracy (83%)
Wang et al[95]608 images Recalibrated multi-instance-DLAccuracy (86.5%)
Iizuka et al[33]1746 biopsy WSIs CNN, RNN Accuracy (95.6%), AUCs (up to 0.98)
Bollschweiler et al[41]Prognosis135 cases ANN Accuracy (93%)
Hensler et al[42]4302 casesQUEEN techniqueAccuracy (72.73%)
Jagric et al[43]213 casesLearning vector quantization NNSensitivity (71%), specificity (96.1%)
Lu et al[36]939 casesMMHGAccuracy (69.28%)
Jiang et al[37]786 cases SVM classifier AUCs (up to 0.83)
Liu et al[40]432 tissue samples SVM classifierAccuracy (up to 94.19%)
Korhani Kangi and Bahrampour[38]339 casesANN, BNNSensitivity (88.2% for ANN, 90.3% for BNN)Specificity (95.4% for ANN, 90.9% for BNN)
Zhang et al[39]669 casesMLAUCs (up to 0.831)
García et al[44]Tumor infiltrating lymphocytes3257 imagesCNNAccuracy (96.9%)
Kather et al[56]Genetic alterations1147 cases2Deep residual learning AUC (0.81 for gastric cancer)
Kather et al[47]> 1000 cases3NN AUC (up to 0.8)
Fu et al[57]> 1000 cases4NN Variable across tumors/gene alterations. Strongest relations in whole genome duplications
Table 3 Artificial intelligence-based applications in colorectal cancer
Ref.
Task
No. of cases/data set
Method
Performance
Xu et al[96]Classification717 patches (N, ADC subtypes)AlexNet Accuracy (97.5%)
Awan et al[97]454 cases (N, ADC grades LG vs HG)NN Accuracy (97%, for 2-class; 91%, for 3-class)
Haj-Hassan et al[98]30 multispectral image patches (N, AD, ADC)CNN Accuracy (99.2%)
Kainz et al[99]165 images (benign vs malignant)CNN (LeNet-5)Accuracy (95%-98%)
Korbar et al[34]697 cases (N, AD subtypes)ResNet Accuracy (93.0%)
Yoshida et al[100]1328 colorectal biopsy WSIsML Accuracy (90.1% for adenoma)
Wei et al[35]326 slides (training), 25 slides (validation) 157 slides (internal set)ResNet 157 slides: Accuracy 93.5% vs 91.4%(pathologists) 238 slides: Accuracy 87.0% vs 86.6%(pathologists)
Ponzio et al[101]27 WSIs (13500 patches) (N, AD, ADC)VGG16 Accuracy (96%)
Kather et al[47]94 WSIs1ResNet18AUC (> 0.99)
Yoon et al[102]57 WSIs (10280 patches) VGG Accuracy (93.5%)
Iizuka et al[33]4036 WSIs (N, AD, ADC)CNN/RNN AUCs (0.96, ADC; 0.99, AD)
Sena et al[103]393 WSIs (12565 patches) (N, HP, AD, ADC)CNN Accuracy (80%)
Bychkov et al[45]Prognosis420 cases RNNHR of 2.3, AUC (0.69)
Kather et al[46]1296 WSIs VGG19 Accuracy (94%-99%)
Kather et al[46]934 cases DL (comp. 5 networks)HR for overall survival of 1.63-1.99
Geessink et al[104]129 cases NN HR of 2.04 for disease free survival
Skrede et al [105]2022 casesNeural networks with MILHR 3.04
Kather et al[47]Genetic alterationsTCGA-DX (93408 patches)1TCGA-KR (60894 patches)ResNet18AUC (0.77), TCGA-DXAUC (0.84), TCGA KR)
Echle et al[55]8836 cases (MSI)ShuffleNet DLAUC (0.92-0.96 in two cohorts)
Kather et al[47]Tumor microenvironment analysis86 WSIs (100000)1VGG19 Accuracy (94%-99%)
Shapcott et al[48]853 patches and 142 TCGA imagesCNN with a grid-based attention networkAccuracy (65-84% in two sets)
Swiderska-Chadaj et al[49]28 WSIs FCN/LSM/U-Net Sensitivity (74.0%)
Alom et al[106]21135 patchesDCRN/R2U-NetAccuracy (91.9%)
Sirinukunwattana et al[107]Molecular subtypes1206 cases NN with domain-adversarial learningAUC (0.84-0.95 in the two validation sets)
Weis et al[50]Tumor budding401 casesCNN Correlation R (0.86)
Table 4 Summary of challenges and suggested solutions in development process of artificial intelligence applications
Process
Challenges
Suggested solutions
Ethical considerationsLack of patient’s approval for commercial useApproval for both research and product development
Design of AI modelsUnderestimation of end-users’ needsCollaboration with skate holders
Optimization of data-setsCNN: Large amounts of imagesAugmentation techniques, transfer learning
Rare tumors: Limited number of imagesGlobal data sharing
Variations in preanalytical and analytical phasesAI algorithms to standardize staining, color properties, and WSIs quality
Annotation of data-setsInterobserver variations in diagnosisMIL algorithms
Discrepancies among performances for trained algorithms
Validation Presence of ground truth without objectivityMulticenter evaluations that include many pathologists and data-set
RegulationLack of current regulatory guidance specific for AI toolsNew guidelines and regulations for safer and effective AI tools
ImplementationChanges in work-flowSelection of AI applications that will speed up the work-flow
IT infrastructure investmentAugmented microscopy directed to the cloud network service
The relative inexperience of pathologistsTraining about AI, integration of AI in medical education
AI applications that lack interpretability ( Black-box) Constructions of interpretable models, generating attention heat map
Lack of external quality assuranceSheme for this purpose should be designed
Legal implicationsThe performance of AI algorithms should be assured for reporting