Copyright
©The Author(s) 2025.
World J Gastroenterol. Feb 14, 2025; 31(6): 102090
Published online Feb 14, 2025. doi: 10.3748/wjg.v31.i6.102090
Published online Feb 14, 2025. doi: 10.3748/wjg.v31.i6.102090
Table 5 Median scores for answers from three large language models
Groups | Items | ChatGPT-4.0 | Gemini-1.5-Pro | Claude-3-Opus |
Expert assessment | Accuracy, median (Q1, Q3) | 4 (4, 4) | 4 (4, 4) | 3 (3, 4) |
Completeness, median (Q1, Q3) | 4 (4, 5) | 4 (4, 5) | 4 (3, 4) | |
Correlation, median (Q1, Q3) | 5 (4, 5) | 5 (4, 5) | 4 (3, 4) | |
Patient assessment | Comprehensibility, median (Q1, Q3) | 4 (3, 5) | 4 (4, 5) | 5 (4, 5) |
Objective evaluation | FRE score, median (Q1, Q3) | 31.10 (27.50, 34.30) | 32.79 (30.53, 42.61) | 51.47 (49.82, 56.09) |
- Citation: Zhang Y, Wan XH, Kong QZ, Liu H, Liu J, Guo J, Yang XY, Zuo XL, Li YQ. Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study. World J Gastroenterol 2025; 31(6): 102090
- URL: https://www.wjgnet.com/1007-9327/full/v31/i6/102090.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i6.102090