Letter to the Editor Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Clin Oncol. Aug 24, 2025; 16(8): 109893
Published online Aug 24, 2025. doi: 10.5306/wjco.v16.i8.109893
Deep learning models for pathological classification and staging of oesophageal cancer
Himanshu Agrawal, Department of Surgery, University College of Medical Sciences (University of Delhi), GTB Hospital, Delhi 110095, India
Nikhil Gupta, Department of Surgery, Atal Bihari Vajpayee Institute of Medical Sciences and Dr. Ram Manohar Lohia Hospital, Delhi 110001, India
ORCID number: Himanshu Agrawal (0000-0001-7994-2356); Nikhil Gupta (0000-0001-7265-8168).
Author contributions: Agrawal A and Gupta N were responsible for research conception and design, data acquisition, data analysis and interpretation, drafting of the manuscript, critical revision of the manuscript, supervision and approval of the final manuscript.
Conflict-of-interest statement: There is no conflict of interest associated with any of the senior author or other coauthors contributed their efforts in this manuscript.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Nikhil Gupta, MD, Professor, Department of Surgery, Atal Bihari Vajpayee Institute of Medical Sciences and Dr. Ram Manohar Lohia Hospital, BKS Marg, Delhi 110001, India. nikhil_ms26@yahoo.co.in
Received: May 26, 2025
Revised: June 4, 2025
Accepted: July 7, 2025
Published online: August 24, 2025
Processing time: 88 Days and 0.4 Hours

Abstract

This letter comments on Wei et al's study applying the Wave-Vision Transformer for oesophageal cancer classification. Highlighting its superior accuracy and efficiency, we discuss its potential clinical impact, limitations in dataset diversity, and the need for explainable artificial intelligence to enhance adoption in pathology and personalized treatment.

Key Words: Deep learning; Esophageal neoplasms; Pathological classification; Cancer staging; Artificial intelligence

Core Tip: This letter highlights Wei et al’s study on the Wave-Vision Transformer (Wave-ViT), an advanced deep learning model integrating frequency-domain analysis for accurate, efficient pathological classification and staging of oesophageal cancer. Wave-ViT shows superior performance and clinical potential in early cancer detection and personalized treatment, though broader validation and explainability remain essential.



TO THE EDITOR

We read with interest the recent article by Wei et al[1] in the World Journal of Gastroenterology on deep learning models for pathological classification and staging of oesophageal cancer, focusing on the Wave-Vision Transformer (Wave-ViT). The authors investigate the use of advanced deep learning methods to improve accuracy and efficiency in diagnosing oesophageal adenocarcinoma. Given the global burden of oesophageal cancer and the importance of early detection, this study contributes valuable insights to medical image analysis and cancer diagnosis.

In this commentary, we shift our focus from a detailed description of the original study to a critical evaluation of its strengths and limitations. While we acknowledge the model’s innovative integration of wavelet-based frequency-domain analysis with the transformer architecture, we have expanded our discussion on areas where the study could benefit from further exploration.

CRITICAL EVALUATION
Generalizability and data diversity

Wei et al's study[1] demonstrates the Wave-ViT model’s promising performance on the Hyper Kvasir dataset. However, a significant limitation of the study is the reliance on data from a single geographic region. The model’s performance might be influenced by regional and ethnic differences in oesophageal pathology. While the Hyper Kvasir dataset is highly regarded, its applicability to a broader, multi-center, and multi-ethnic population remains uncertain. This is crucial for the model’s clinical deployment, as it is essential for AI systems to be robust across different population demographics.

We recommend that future studies expand their datasets to include multi-center and multi-ethnic cohorts. Such diversity is critical to ensure that Wave-ViT performs reliably across various populations. Recent research[2] underscores the importance of diverse datasets for training artificial intelligence (AI) models to ensure their generalizability in real-world clinical settings. Furthermore, validating the model in clinical trials across multiple healthcare institutions will improve its external validity.

Performance across cancer subtypes and disease stages

The study by Wei et al[1] primarily focuses on early-stage adenocarcinoma, but oesophageal cancer includes multiple subtypes such as squamous cell carcinoma, which has distinct histological features. Testing the model’s ability to detect and classify other subtypes, as well as its performance across different disease stages, is essential to demonstrate its broader clinical value. Evaluating the model across stages of adenocarcinoma and squamous cell carcinoma will provide deeper insights into its diagnostic capabilities and help tailor treatment strategies based on cancer type and stage.

We believe future work should evaluate the performance of Wave-ViT on a broader range of oesophageal cancer subtypes, including both early and late-stage cases. This will clarify whether the model’s superior performance can be generalized across various cancer stages and histological types. Moreover, assessing its accuracy in identifying metastatic lesions or distinguishing between benign and malignant tissues would further solidify its clinical utility.

LITERATURE ENGAGEMENT AND EMERGING AI TECHNIQUES

While Wei et al[1] contribute to the ongoing development of deep learning for pathological classification, there is a growing body of literature on transformer-based models and multimodal diagnostic frameworks that have shown promise in similar domains. We have incorporated recent developments in AI and medical imaging that employ explainable AI (XAI) techniques to enhance model transparency.

We have expanded our discussion to include relevant works in transformer-based pathology models and multimodal diagnostic systems. For instance, recent studies on multimodal approaches[2] combine imaging, histopathological data, and patient demographics to provide a more holistic view of disease progression. Furthermore, we discuss the role of XAI techniques such as gradient-weighted class activation mapping (Grad-CAM) and SHapley Additive exPlanations (SHAP) in improving model interpretability, which is a crucial consideration for clinical adoption. These techniques can help clinicians better understand the model's decision-making process and support its integration into healthcare systems.

MODEL INTERPRETABILITY

Interpretability remains a challenge for clinical AI models, including Wave-ViT. While the authors present some feature visualizations, we argue that a more in-depth exploration of model transparency is needed. In clinical settings, AI models must provide explanations that clinicians can trust and act upon.

To address these concerns, we recommend incorporating XAI methods such as Grad-CAM or SHAP. These techniques highlight important image regions influencing the model's predictions, which could be especially useful for clinicians during decision-making. By improving model transparency, we can increase clinician confidence in AI systems, facilitating broader acceptance and integration in clinical workflows.

CONCLUSION

In conclusion, while Wei et al's study[1] presents a significant step forward in the use of AI for oesophageal cancer diagnosis, there are areas that warrant further investigation. Multi-center and multi-ethnic validation, broader testing across disease subtypes and stages, and improvements in model interpretability will enhance the clinical applicability of Wave-ViT. Incorporating diverse datasets and XAI techniques will contribute to the model's broader adoption in clinical settings, ultimately improving patient care. We look forward to the continued advancement of AI tools like Wave-ViT in transforming diagnostic practices for oesophageal cancer.

Footnotes

Provenance and peer review: Unsolicited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Oncology

Country of origin: India

Peer-review report’s classification

Scientific Quality: Grade C

Novelty: Grade C

Creativity or Innovation: Grade C

Scientific Significance: Grade C

P-Reviewer: Zhang J S-Editor: Lin C L-Editor: A P-Editor: Zhao YQ

References
1.  Wei W, Zhang XL, Wang HZ, Wang LL, Wen JL, Han X, Liu Q. Application of deep learning models in the pathological classification and staging of esophageal cancer: A focus on Wave-Vision Transformer. World J Gastroenterol. 2025;31:104897.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 1]  [Reference Citation Analysis (3)]
2.  Shafi S, Parwani AV. Artificial intelligence in diagnostic pathology. Diagn Pathol. 2023;18:109.  [RCA]  [PubMed]  [DOI]  [Full Text]  [Full Text (PDF)]  [Cited by in RCA: 80]  [Reference Citation Analysis (0)]