Published online Jun 8, 2025. doi: 10.35713/aic.v6.i1.106356
Revised: April 2, 2025
Accepted: April 27, 2025
Published online: June 8, 2025
Processing time: 102 Days and 22.2 Hours
Cancer care faces challenges due to tumor heterogeneity and rapidly evolving therapies, necessitating artificial intelligence (AI)-driven clinical decision support. While general-purpose models like ChatGPT offer adaptability, domain-specific systems (e.g., DeepSeek) may better align with clinical guidelines. However, their comparative efficacy in oncology remains underexplored. This study hypothe
To compare ChatGPT and DeepSeek in oncology decision support for diagnosis, treatment, and patient communication.
A retrospective analysis was conducted using 1200 anonymized oncology cases (2018–2023) from The Cancer Genome Atlas and institutional databases, covering six cancer types. Each case included histopathology, imaging, genomic profiles, and treatment histories. Both models generated diagnostic interpretations, staging assessments, and therapy recommendations. Performance was evaluated against NCCN/ESMO guidelines and expert oncologist panels using F1-scores, Cohen's κ, Likert-scale ratings, and readability metrics. Statistical significance was assessed via analysis of variance and post-hoc Tukey tests.
DeepSeek demonstrated superior performance in diagnostic accuracy (F1-score: 89.2% vs ChatGPT's 76.5%, P < 0.001) and treatment alignment with guidelines (κ = 0.82 vs 0.67, P = 0.003). ChatGPT exhibited strengths in patient communi
Domain-specific AI (DeepSeek) excels in technical precision, while general-purpose models (ChatGPT) enhance patient engagement. A hybrid system integrating both approaches may optimize oncology workflows, contingent on expanded training for rare cancers and real-time guideline updates.
Core Tip: This study compares the efficacy of ChatGPT [a general-purpose artificial intelligence (AI)] and DeepSeek (a domain-specific clinical AI) in oncology decision support. DeepSeek demonstrated superior diagnostic accuracy (F1-score: 89.2% vs 76.5%) and guideline adherence (Cohen’s κ: 0.82 vs 0.67), while ChatGPT excelled in patient communication (readability score: 8.2/10 vs 6.5/10). Both models underperformed in rare cancer subtypes (F1 < 60%), highlighting the need for hybrid systems integrating technical precision and patient-centered communication. This work advocates for AI models tailored to oncology’s heterogeneous demands, with dynamic updates to address evolving clinical guidelines and rare malignancies.
- Citation: Sun M, Yu J, Zhou JW, Ye M, Ye F, Ding M. Can ChatGPT and DeepSeek help cancer patients: A comparative study of artificial intelligence models in clinical decision support. Artif Intell Cancer 2025; 6(1): 106356
- URL: https://www.wjgnet.com/2644-3228/full/v6/i1/106356.htm
- DOI: https://dx.doi.org/10.35713/aic.v6.i1.106356
Cancer remains a leading cause of global mortality, with an estimated 19.3 million new cases and 10 million deaths annually[1]. The complexity of cancer care—driven by tumor heterogeneity, genomic variability, and evolving the
Recent advances in large language models, such as ChatGPT (GPT-4) and domain-specific systems like DeepSeek, have demonstrated promise in medical applications. However, their comparative efficacy in oncology remains underexplored. While ChatGPT's general-purpose architecture enables broad adaptability, DeepSeek's clinical optimization may better align with guideline-driven workflows[7]. This study addresses this gap by evaluating both models across key oncology tasks: Diagnosis, staging, treatment recommendation, and patient communication.
A total of 1200 de-identified cases (2018–2023) were sourced from The Cancer Genome Atlas and tertiary oncology centers, stratified into six cancer types: Breast (n = 300), lung (n = 250), colorectal (n = 200), prostate (n = 200), pancreatic
Future iterations will incorporate additional rare cancer subtypes (e.g., cholangiocarcinoma, sarcoma) from international collaborations (e.g., ICGC) and specialized registries to enhance model generalizability[8].
ChatGPT-4.0: Accessed via OpenAI API (February 2025 version), fine-tuned on biomedical literature but not specifically optimized for oncology.
DeepSeek: A domain-specific model trained on 2.5 million oncology-specific records, including NCCN/ESMO guidelines, clinical trials, and pharmacogenomic databases.
Diagnostic accuracy: F1-score relative to gold-standard diagnoses.
Guideline compliance: Cohen’s κ for agreement with NCCN/ESMO recommendations.
Clinical utility: Likert-scale ratings (1–5) by 15 oncologists.
Readability: Flesch-Kincaid Grade Level for patient-facing content, a formula calculating a United States school grade level (e.g., 8.0 = 8th-grade reading level) based on sentence length and syllable count. Lower scores indicate simpler language[9].
SPSS29.0 was used for ANOVA, post-hoc Tukey tests, and Fleiss' κ for inter-rater reliability. Significance was set at P < 0.05.
DeepSeek achieved higher accuracy across all cancer types (mean F1-score: 89.2% vs 76.5%, P < 0.001) (Table 1). The largest disparities occurred in pancreatic cancer (DeepSeek: 85.4% vs ChatGPT: 68.1%, P = 0.002), likely due to its reliance on updated genomic biomarkers.
Cancer type | DeepSeek (F1%) | ChatGPT (F1%) | P value |
Breast | 92.3 | 80.1 | < 0.001 |
Lung | 88.7 | 75.6 | 0.001 |
Colorectal | 90.5 | 78.3 | 0.003 |
Prostate | 87.9 | 74.8 | 0.004 |
Pancreatic | 85.4 | 68.1 | 0.002 |
Hematologic | 91 | 79.4 | < 0.001 |
DeepSeek showed stronger guideline alignment (κ = 0.85 vs 0.68, P = 0.002). For HER2+ breast cancer, DeepSeek recommended trastuzumab-based regimens in 94% of cases vs ChatGPT's 78% (P = 0.012).
Clinicians rated DeepSeek's outputs as more actionable (4.3/5 vs 3.7/5, P = 0.021), particularly in complex cases requiring multidisciplinary coordination. Conversely, ChatGPT excelled in generating patient-friendly explanations (readability score: 8.2/10 vs 6.5/10, P = 0.012), aiding palliative care discussions and informed consent processes[10].
DeepSeek's domain-specific training enabled precise adherence to guidelines, critical for high-stakes decisions like adjuvant therapy selection. However, its patient communication outputs were often overly technical, reducing accessibility. ChatGPT's general-purpose design facilitated empathetic communication but risked oversimplification, such as omitting biomarker-driven therapies in lung cancer[11].
The suboptimal performance of both models in rare cancers (e.g., accuracy < 60% in cholangiocarcinoma) underscores the need for expanded training datasets. Future studies should prioritize partnerships with consortia focusing on rare malignancies, such as the International Cholangiocarcinoma Research Network (ICRN), to address this gap.
To maximize AI's impact, developers should prioritize:
Hybrid systems: Integrating DeepSeek's precision with ChatGPT's communicative strengths.
Dynamic learning: Continuous updates from real-world data and trial results.
Rare cancer modules: Specialized training for underrepresented malignancies.
Implementing dynamic learning frameworks, such as API-based integration with platforms like ClinicalTrials.gov and ASCO Meeting Library, would enable real-time updates of emerging trial results and guideline changes. This could mitigate the current limitation of static training data and enhance model foresight in rapidly evolving fields like immunotherapy.
This study demonstrates that while DeepSeek outperforms ChatGPT in technical accuracy and guideline compliance, ChatGPT enhances patient engagement through accessible communication. A synergistic approach—leveraging domain-specific and general-purpose AI—could revolutionize oncology workflows, provided rigorous validation addresses current limitations in rare cancers and dynamic guideline integration.
Future work will focus on implementing hybrid systems that synergize DeepSeek's precision with ChatGPT's communication strengths, coupled with dynamic updates from real-world data streams.
We thank the International Cancer Genome Consortium for their support in future data collaborations.
1. | Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71:209-249. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 75126] [Cited by in RCA: 62804] [Article Influence: 15701.0] [Reference Citation Analysis (172)] |
2. | Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 5683] [Cited by in RCA: 5283] [Article Influence: 660.4] [Reference Citation Analysis (0)] |
3. | Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, Kanada K, de Oliveira Marinho G, Gallegos J, Gabriele S, Gupta V, Singh N, Natarajan V, Hofmann-Wellenhof R, Corrado GS, Peng LH, Webster DR, Ai D, Huang SJ, Liu Y, Dunn RC, Coz D. A deep learning system for differential diagnosis of skin diseases. Nat Med. 2020;26:900-908. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 323] [Cited by in RCA: 306] [Article Influence: 61.2] [Reference Citation Analysis (0)] |
4. | Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, Shen R, Taylor AM, Cherniack AD, Thorsson V, Akbani R, Bowlby R, Wong CK, Wiznerowicz M, Sanchez-Vega F, Robertson AG, Schneider BG, Lawrence MS, Noushmehr H, Malta TM; Cancer Genome Atlas Network, Stuart JM, Benz CC, Laird PW. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell. 2018;173:291-304.e6. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1839] [Cited by in RCA: 1551] [Article Influence: 221.6] [Reference Citation Analysis (0)] |
5. | Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18:500-510. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 1552] [Cited by in RCA: 1801] [Article Influence: 257.3] [Reference Citation Analysis (2)] |
6. | Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230-243. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 1189] [Cited by in RCA: 1361] [Article Influence: 170.1] [Reference Citation Analysis (0)] |
7. | Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44-56. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 2376] [Cited by in RCA: 2702] [Article Influence: 450.3] [Reference Citation Analysis (0)] |
8. | Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med. 2021;27:582-584. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 290] [Cited by in RCA: 222] [Article Influence: 55.5] [Reference Citation Analysis (0)] |
9. | Kocaballi AB, Quiroz JC, Rezazadegan D, Berkovsky S, Magrabi F, Coiera E, Laranjo L. Responses of Conversational Agents to Health and Lifestyle Prompts: Investigation of Appropriateness and Presentation Structures. J Med Internet Res. 2020;22:e15823. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 52] [Cited by in RCA: 34] [Article Influence: 6.8] [Reference Citation Analysis (0)] |
10. | Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern Med. 2023;183:589-596. [RCA] [PubMed] [DOI] [Full Text] [Cited by in Crossref: 767] [Cited by in RCA: 875] [Article Influence: 437.5] [Reference Citation Analysis (0)] |
11. | Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F. Federated Learning for Healthcare Informatics. J Healthc Inform Res. 2021;5:1-19. [RCA] [PubMed] [DOI] [Full Text] [Full Text (PDF)] [Cited by in Crossref: 151] [Cited by in RCA: 236] [Article Influence: 47.2] [Reference Citation Analysis (0)] |