Copyright
©The Author(s) 2025.
World J Gastroenterol. Feb 14, 2025; 31(6): 102090
Published online Feb 14, 2025. doi: 10.3748/wjg.v31.i6.102090
Published online Feb 14, 2025. doi: 10.3748/wjg.v31.i6.102090
Table 4 Mean scores for answers from three large language models
Groups | Items | ChatGPT-4.0 | Gemini-1.5-Pro | Claude-3-Opus |
Expert assessment | Accuracy, mean (SD) | 4.06 (0.61) | 4.06 (0.62) | 4.02 (0.66) |
Completeness, mean (SD) | 4.24 (0.64) | 4.20 (0.66) | 4.27 (0.58) | |
Correlation, mean (SD) | 4.57 (0.62) | 4.54 (0.66) | 4.52 (0.66) | |
Patient assessment | Comprehensibility, mean (SD) | 4.02 (0.75) | 4.07 (0.75) | 4.56 (0.66) |
Objective evaluation | FRE score, mean (SD) | 32.25 (6.91) | 36.92 (8.99) | 54.44 (8.22) |
- Citation: Zhang Y, Wan XH, Kong QZ, Liu H, Liu J, Guo J, Yang XY, Zuo XL, Li YQ. Evaluating large language models as patient education tools for inflammatory bowel disease: A comparative study. World J Gastroenterol 2025; 31(6): 102090
- URL: https://www.wjgnet.com/1007-9327/full/v31/i6/102090.htm
- DOI: https://dx.doi.org/10.3748/wjg.v31.i6.102090