Published online Apr 30, 2024. doi: 10.35712/aig.v5.i1.90503
Revised: March 27, 2024
Accepted: April 16, 2024
Published online: April 30, 2024
Processing time: 144 Days and 2.3 Hours
Small intestinal bacterial overgrowth (SIBO) poses diagnostic and treatment challenges due to its complex management and evolving guidelines. Patients often seek online information related to their health, prompting interest in large language models, like GPT-4, as potential sources of patient education.
To investigate ChatGPT-4's accuracy and reproducibility in responding to patient questions related to SIBO.
A total of 27 patient questions related to SIBO were curated from professional societies, Facebook groups, and Reddit threads. Each question was entered into GPT-4 twice on separate days to examine reproducibility of accuracy on separate occasions. GPT-4 generated responses were independently evaluated for accuracy and reproducibility by two motility fellowship-trained gastroenterologists. A third senior fellowship-trained gastroenterologist resolved disagreements. Accuracy of responses were graded using the scale: (1) Comprehensive; (2) Correct but inadequate; (3) Some correct and some incorrect; or (4) Completely incorrect. Two responses were generated for every question to evaluate reproducibility in accuracy.
In evaluating GPT-4's effectiveness at answering SIBO-related questions, it provided responses with correct information to 18/27 (66.7%) of questions, with 16/27 (59.3%) of responses graded as comprehensive and 2/27 (7.4%) responses graded as correct but inadequate. The model provided responses with incorrect information to 9/27 (33.3%) of questions, with 4/27 (14.8%) of responses graded as completely incorrect and 5/27 (18.5%) of responses graded as mixed correct and incorrect data. Accuracy varied by question category, with questions related to “basic knowledge” achieving the highest proportion of comprehensive responses (90%) and no incorrect responses. On the other hand, the “treatment” related questions yielded the lowest proportion of comprehensive responses (33.3%) and highest percent of completely incorrect responses (33.3%). A total of 77.8% of questions yielded reproducible responses.
Though GPT-4 shows promise as a supplementary tool for SIBO-related patient education, the model requires further refinement and validation in subsequent iterations prior to its integration into patient care.
Core Tip: ChatGPT-4 demonstrates promise in enhancing patient understanding of basic concepts related to small intestinal bacterial overgrowth (SIBO). However, it exhibits limitations in accurately addressing questions about the diagnosis and treatment of SIBO, which are areas where up-to-date medical guidance is crucial. As such, artificial intelligence can be beneficial for general patient education but should not replace professional medical advice, especially for conditions with complex care protocols. Continuous refinement and updating of Chat-GPT’s knowledge are essential for its safe and effective application in healthcare. Rigorous scrutiny of artificial intelligence-generated content is imperative to prevent the dissemination of potentially harmful misinformation.