Observational Study
Copyright ©The Author(s) 2025.
World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Table 7 Aggregated performance of ChatGPT and GPT-4 in clinical scenarios across published and unpublished cases, categorized by task type, n (%)
Type of task
Overall chatGPT agreement level
Overall GPT-4 agreement level
chatGPT renal transplantation agreement level
GPT-4 renal transplantation agreement level
chatGPT liver transplantation agreement level
GPT-4 liver transplantation agreement level
DD that includes final diagnosisA: 22/30 (73.3)A: 27/30 (90)A: 13/16 (81.3)A: 15/16 (93.8)A: 9/14 (64.3)A: 12/14 (85.7)
PA: 1/30 (3.33)PA: 1/30 (3.3)PA: 1/16 (6.3)PA: 1/16 (6.2)PA: 0/14 (0)PA: 0/14 (0)
Final diagnosis predictionA: 11/31 (35.5)A: 20/31 (64.5)A: 7/17 (41.2)A: 13/17 (76.5)A: 4/14 (28.6)A: 7/14 (50)
PA: 2/31 (6.45)PA: 2/31 (6.5)PA: 1/17 (5.9)PA: 1/17 (5.9)PA: 1/14 (7.1)PA: 1/14 (7.1)
Appropriate next diagnostic testA: 8/19 (42.1)A: 15/19 (78.9)A: 6/13 (46.2)A: 11/13 (84.6)A: 2/6 (33.6)A: 4/6 (66.7)
PA: 8/19 (42.1)PA: 2/19 (10.5)PA: 5/13 (38.5)PA: 1/13 (7.7)PA: 3/6 (50)PA: 1/6 (16.7)
Appropriate treatment A: 11/21 (52.4)A: 15/21 (71.4)A: 5/8 (62.5)A: 7/8 (87.5)A: 6/13 (46.2)A: 4/6 (66.7)
PA: 9/21 (42.9)PA: 4/21 (19)PA: 2/8(25%)PA: 1/8 (12.5)PA: 7/13 (53.8)PA: 1/6 (16.7)
Prediction of prognosis A: 3/5 (60)A: 5/5 (100)A: 1/1 (100%)A: 1/1 (100)A: 2/4 (50)A: 4/4 (100)
PA: 1/5 (20)PA: 0/5 (0)PA: 0/0 (0%)PA: 0/0 (0)PA: 1/4 (25)PA: 0/4 (0)