Copyright
©The Author(s) 2025.
World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Table 1 Overall ChatGPT performance in assigning context labels across 294 virtual cases, highlighting agreement with predefined labels, n (%)
Actual label | Assigned label, GI | Assigned label, diagnosis | Assigned label, DD | Assigned label, treatment | Assigned label, prognosis | Assigned label, total |
GI | 33 (11.22) | 20 (6.8) | 6 (24) | 16 (5.44) | 3 (12) | 78 (26.53) |
Diagnosis | 8 (2.72) | 33 (11.22) | 5 (1.7) | 0 (0) | 2 (0.68) | 48 (16.33) |
DD | 0 (0) | 11 (3.74) | 11 (3.74) | 0 (0) | 0 (0) | 22 (7.48) |
Treatment | 9 (36) | 10 (3.4) | 2 (0.68) | 64 (21.77) | 1 (0.34) | 86 (29.25) |
Prognosis | 12 (48) | 12 (48) | 3 (12) | 3 (12) | 30 (10.2) | 60 (20.41) |
Total | 62 (219) | 86 (29.25) | 27 (9.18) | 83 (28.23) | 36 (12.24) | 294 (100) |
Table 2 Overall GPT-4 performance in assigning context labels in virtual cases across 294 virtual cases, highlighting agreement with predefined labels, n (%)
Actual label | Assigned label, GI | Assigned label, diagnosis | Assigned label, DD | Assigned label, treatment | Assigned label, prognosis | Assigned label, total |
GI | 42 (14.29) | 14 (4.76) | 1 (0.34) | 20 (6.8) | 1 (0.34) | 78 (26.53) |
Diagnosis | 2 (0.68) | 44 (14.97) | 1 (0.34) | 1 (0.34) | 0 (0) | 48 (16.33) |
DD | 0 (0) | 15 (5.1) | 5 (1.7) | 2 (0.68) | 0 (0) | 22 (7.48) |
Treatment | 5 (1.7) | 7 (2.38) | 1 (0.34) | 73 (24.83) | 0 (0) | 86 (29.25) |
Prognosis | 10 (3.4) | 11 (3.74) | 5 (1.7) | 7 (2.38) | 27 (9.18) | 60 (20.41) |
Total | 59 (207) | 91 (30.95) | 13 (4.42) | 103 (353) | 28 (9.52) | 294 (100) |
Table 3 Comparative performance of ChatGPT and GPT-4 in case reports on renal transplantation, detailing agreement levels by task type
Ref. | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
Alharbi et al[21] | 1 | Provide a list of suitable antibiotics for pseudomonas aeruginosa urinary tract infection. | Provided a list of suitable antibiotics including the one used by physicians (meropenem)/provided a list of suitable antibiotics including the one used by physicians (meropenem) | Meropenem was administrated | A/A |
2 | Suggest the next diagnostic test(s) needed for the patient | Suggested a renal ultrasound and a stool culture/suggested a renal ultrasound, abdominal CT, blood cultures, and a stool culture | Abdominal ultrasound and abdominal CT scan were conducted | PA/A | |
3 | Identify the most probable diagnosis for the patient | Renal allograft malignancy/renal allograft malignancy | Eosinophilic chromophobe renal cell carcinoma was confirmed by the histopathological examination of the graft | A/A | |
Rubin et al[22] | 4 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | CMV viremia | A/A |
5 | Provide the most likely diagnosis for the patient | Post-influenza bacterial pneumonia/CMV reactivation | CMV viremia was demonstrated by antigenemia and PCR assay | D/A | |
6 | Suggest treatment for the patient | Suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)/suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir) | Intravenous ganciclovir followed by oral valganciclovir at a dose of 900 mg/day was administered | A/A | |
Molina-Andújar et al[23] | 7 | Provide a DD for the patient | Provided a DD that included the final diagnosis/ Provided a DD that included the final diagnosis | Acute post-infectious glomerulonephritis | A/A |
8 | Provide the most likely diagnosis for the patient | Acute post-infectious glomerulonephritis/acute post-infectious glomerulonephritis | Acute post-infectious glomerulonephritis | A/A | |
Baker et al[24] | 9 | Provide the next step patient’s management | Suggested hemodynamic stabilization with transfusion of blood products and bleeding control including surgical intervention, if necessary, followed by continuous monitoring/Suggested hemodynamic stabilization with transfusion of blood products and surgical exploration if bleeding if suspected to be within the surgical site. Suggested medication reevaluation focus on anticoagulants, prophylactic treatment for infection prevention and continuous monitoring. | The patient was taken back to theatre for exploration where ligation of the bleeding artery, removal of blood clots and blood transfusion took place. Postoperative monitoring was performed | A/A |
10 | Suggest the next diagnostic test needed for the patient | Suggested an abdominal CT scan or an ultrasound/suggested imaging such as abdominal CT with contrast, an ultrasound or an angiogram is performed. Suggested evaluating the patient with new laboratory tests and for the need of re-exploration | An urgent CT angiogram was performed | PA/A | |
11 | Provide a DD for the bleeding | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Bleeding from a small branch of the renal artery | A/A | |
12 | Provide the most likely diagnosis for the patient | Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery/Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery | Bleeding was noticed from a small branch of the renal artery | A/A | |
Gewehr et al[25] | 13 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Fungal infection | A/A |
14 | Provide the most likely diagnosis for the patient | Fungal Infection/fungal Infection, and specifically sporotrichosis | Fungal Infection (sporotrichosis) | A/A | |
Vassallo et al[26] | 15 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Active hepatitis E virus infection | A/A |
16 | Provide the most likely diagnosis for the patient | NAFLD/NAFLD or drug induced | Active hepatitis E virus infection | D/D | |
17 | Suggest the next diagnostic test needed for the patient | Suggested liver biopsy along with further imaging and laboratory investigations/suggested liver biopsy along with further imaging and laboratory investigations | Liver biopsy | A/A | |
18 | Suggest the next diagnostic test needed for the patient after the biopsy results | Suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function/suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function | A more extensive viral screen was conducted | A/A | |
Olsen et al[27] | 19 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Epstein-Barr virus-negative, diffuse, large B-cell lymphoma | A/A |
20 | Provide the most likely diagnosis for the patient | Suggested that infectious aetiologies such as disseminated tuberculosis or fungal infections are more likely. It implied that diagnosis is difficult without further diagnostic investigations/determined as PTLD as the most likely diagnosis followed by infectious aetiologies | Epstein-Barr virus-negative, diffuse, large B cell lymphoma | D/A | |
21 | Suggest the next diagnostic test needed for the patient | Suggested sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy/suggested liver biopsy, sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy | Biopsy from one of the liver lesions | D/A | |
Allam et al[28] | 22 | Suggest the next diagnostic test needed for the patient | Suggested a kidney biopsy/suggested a kidney biopsy and further laboratory tests | Transplant biopsy was performed | A/A |
23 | Provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD including vascular complications such as vein stenosis | Biopsy-induced arteriovenous fistula and venous stenosis | D/PA | |
24 | Suggest treatment for the patient | Suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)/suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting) | Embolization of fistula (coil occlusion) | A/A | |
Subramanian et al[29] | 25 | Provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | A small basal ganglia infarct and an infarct of the spinal cord was found | D/A |
26 | Provide the most likely diagnosis for the patient | Suggested ischemic injury or infarction of the spinal cord/suggested spinal cord ischemia or infarction | A small basal ganglia infarct and an infarct of the spinal cord was found | A/A | |
27 | Suggest the next diagnostic test needed for the patient | Suggested spine MRI, NCS and EMG to assess peripheral nerves and muscles, lumbar puncture if infections suspected, and transplant biopsy if rejection or ischemia is suspected/suggested spine MRI-MRA, neurond physiological studies (SSEP, NCS and EMG), lumbar puncture if infections suspected | A CTAP, and spine/brain MRI were performed | PA/PA | |
Ainsworth et al[30] | 28 | Provide a DD for the patient | Provided a DD that included immune-mediated hemolysis but did not specifically include PLS/provided a DD that included the final diagnosis | PLS | PA/A |
29 | Provide the most likely diagnosis for the patient | Suggested hemolysis due to mismatched blood type of the donor/suggested PLS | PLS | D/A |
Table 4 Comparative performance of ChatGPT and GPT-4 in case reports on liver transplantation, detailing agreement levels by task type
Ref. | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
Rubin et al[22] | 1 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | CMV | A/A |
2 | Provide the most likely diagnosis for the patient | Suggested post-transplant infection, particularly a viral infection (CMV, EBV, or VZV)/CMV | CMV | PA/A | |
3 | Justify the recurrence of CMV infections despite treatment | Suggested resistance to ganciclovir/suggested resistance to ganciclovir or/and inadequate duration of initial treatment-secondary infections | Ganciclovir resistant infection | A/A | |
4 | Suggest alternative treatment for the patient | Suggested foscarnet/suggested foscarnet or cidofovir or letermovir or/and CMV immunoglobulins | Foscarnet was administered | A/A | |
Okeke et al[31] | 5 | Case presentation/suggest treatment for the patient given no arterial flow in the liver | Suggested interventional radiology procedures or/and surgical revascularization/suggested interventional radiology procedures or/and surgical revascularization (thrombectomy or re-anastomosis) | Interventional radiology procedure (thrombolysis) was performed. Then revascularization was achieved intraoperatively (infra-aortic jump was performed) | PA/PA |
6 | Suggest the diagnostic tests needed for the patient following re-thrombosis | Suggested doppler ultrasound, CT angiogram, coagulation profile-thrombophilia testing/suggested thrombophilia workup, repeat imaging (doppler ultrasound, CT/MRI angiography), and autoimmune markers | Hypercoagulable workup was performed | A/A | |
7 | Provide a DD behind re-thrombosis | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Antiphospholipid syndrome | A/A | |
8 | Provide the most likely diagnosis for the patient | Suggested hepatic artery thrombosis/suggested antiphospholipid syndrome | Antiphospholipid syndrome | D/A | |
Eubank et al[32] | 9 | Case presentation/determine the most likely microorganism to be identified by the swab | Suggested Staphylococcus aureus, Streptococcus species, Enterococcus species, and Pseudomonas aeruginosa, and fungi like Candida albicans/suggested Staphylococcus aureus, Enterococcus species, Pseudomonas aeruginosa, Escherichia coli, fungi like Candida or Aspergillus, viruses like CMV, and mycobacteria | 94% Enterococcus faecalis, 93% Rhizopus oryzae, and 5% Aspergillus flavus | D/PA |
10 | Suggest treatment for the patient given the pathogens identified | Suggested intravenous liposomal amphotericin B at an appropriate dosage, along with surgical debridement of infected tissue/suggested intravenous liposomal amphotericin B at an appropriate dosage, oral posaconazole along with surgical debridement of infected tissue. | Oral posaconazole 300 mg and IV amphotericin B and micafungin daily. Amphotericin B deoxycholate irrigation in the wound vacuum | PA/A | |
Kim et al[33] | 11 | Case presentation/provide a DD for the patient’s shock | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD | D/A |
12 | Provide the most likely diagnosis for the patient | Suggested a surgical complication, specifically duodenal perforation/suggested duodenal perforation or drug-induced kidney injury/neutropenia | GVHD | D/D | |
13 | Suggest the further diagnostic tests needed for the patient | Suggested blood cultures, peritoneal fluid analysis, endoscopy or upper GI imaging/suggested blood and urine cultures, viral and fungal tests, peritoneal fluid analysis, laboratory tests, and endoscopy or upper GI imaging | Mixed chimerism studies and skin biopsy were performed | D/D | |
14 | Suggest further treatment for the patient given the mixed chimerism studies results | The following treatment options were suggested: Systemic corticosteroids, adjusting tacrolimus dose, consider additional immunosuppressives such as mycophenolate, and phototherapy/suggested considering the following treatment options: High-dose corticosteroids, ATG, ECP, infliximab, ruxolitinib, MSC transplantation, additional immunosuppressive agents, and IL-2 diphtheria toxin | Steroids were administrated for 4 days followed by ruxolitinib due to patient not responding to treatment | PA/A | |
15 | Guess the survival of the patient | Suggested that the patient did not, most likely, survive/suggested that the patient did not, most likely, survive | The patient died on day 16 of re-admission, 45 days following transplantation | A/A | |
Kim et al[33], (b) | 16 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD | D/A |
17 | Provide the most likely diagnosis for the patient | Suggested Clostridioides difficile colitis/suggested GVHD | GVHD | D/A | |
18 | Suggest treatment for the patient | The following treatment options were suggested: Glucocorticoids, CNIs, ATG, T-cell depleting agents such as basiliximab/high-dose corticosteroids, adjust immunosuppression, consider second line treatments such as ATG, ECP, sirolimus, infliximab, and basiliximab | Steroids were administrated for 2 days followed by ruxolitinib due to patient not responding to treatment | PA/PA | |
19 | Guess the survival of the patient | Declined to make a prediction/suggested that the patient did not, most likely, survive | The patient died 29 days after transplant | D/A | |
Ramírez de la Piscina et al[34] | 20 | Case presentation/Provide a DD for the patient | Provided a DD that included the final diagnosis/ provided a DD that included the final diagnosis | Budd-Chiari syndrome secondary to ADPKD | A/A |
21 | Provide the most likely diagnosis for the patient | Suggested Budd-Chiari syndrome/suggested Budd-Chiari syndrome secondary to the compression from ADPKD cysts | Budd-Chiari syndrome secondary to ADPKD | A/A | |
22 | Suggest treatment for the patient | Provided a list of suitable treatment options including only liver transplantation/provided a list of suitable treatment options including combined transplantation | A combined liver and renal transplantation was performed | PA/A | |
Arstikyte et al[35] | 23 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Venous air embolism | A/A |
24 | Provide the most likely diagnosis for the patient | Suggested that information given is insufficient to single out a specific diagnosis/suggested that based on given information hemorrhage or venous air embolism are the two most likely diagnoses | Venous air embolism | D/A | |
25 | Suggest appropriate diagnostic test for the patient | Suggested TEE/suggested TEE | TEE | A/A | |
Aucejo et al[36] | 26 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis | Narrowing of the RHV at the level of the cava-caval anastomosis | D/D |
27 | Provide the most likely diagnosis for the patient | Suggested adhesions, anastomotic leakage, or biliary complications/suggested PVT | Narrowing of the RHV at the level of the cava-caval anastomosis | D/D | |
28 | Given the RHV stenosis diagnosis, suggest treatment for the patient | Suggested considering stent placement, TIPS or surgical revision/suggested considering stent placement, TIPS or surgical revision | A wall stent 14 mm in diameter by 40 mm in length was placed across the RHV stenosis | A/A | |
Ichimura et al[37] | 29 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | VOD/SOS | A/A |
30 | Provide the most likely diagnosis for the patient | Suggested GVHD/suggested VOD/SOS | VOD/SOS | D/A | |
31 | Suggest treatment for the patient given VOD/SOS | Suggested considering defibrotide, anticoagulant medications, and liver transplantation/suggested considering defibrotide, anticoagulant medications, TIPS, and liver transplantation | The physicians performed a liver transplantation since defibrotide had not yet been approved | A/A | |
32 | Provide a new differential diagnosis for the patient’s deterioration postoperatively | Provided a DD that did not include the final diagnosis/provided a DD that included the final diagnosis | GVHD, several infections | D/A | |
Trevizoli et al[38] | 33 | Case presentation/suggest appropriate treatment for the patient | Suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, diuretics, variceal bleeding prophylaxis and liver transplant evaluation/suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, consider surgical management (colectomy), diuretics, variceal bleeding prophylaxis and liver transplant evaluation | Sodium restriction, diuretic therapy, hydrocortisone 300 mg was started without adequate response, vedolizumab | PA/PA |
34 | Suggest appropriate treatment for the patient given the DVT progression | Suggested LMWH and IVF/suggested LMWH | He underwent hemodynamic intervention with the placement of a vena cava filter | A/D |
Table 5 Comparative performance of ChatGPT and GPT-4 in department cases on renal transplantation, detailing agreement levels by task type
Case ID | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
1 | 1 | Case presentation/provide the diagnostic tests needed to investigate refractory ascites in patient with ADPKD | Suggested abdominal ultrasound, paracentesis with fluid analysis, LF tests, tumor marker tests, CT scan, serologic testing, genetic testing/ suggested paracentesis with fluid analysis, LF tests, abdominal ultrasound, CT scan, echocardiogram, and endoscopy, further evaluation for elevated markers | Paracentesis (ascites fluid was send for cytology, culture, TB investigation, SAAG calculation), abdominal CT, liver ultrasound, LF tests, tumor marker tests, serologic testing, echocardiogram, and endoscopy | PA/A |
2 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Tuberculous peritonitis | A/A | |
3 | Provide the most likely diagnosis for the patient | Suggested malignancy (most likely ovarian cancer) or SBP are the most likely diagnoses/suggested tuberculous peritonitis or malignancy or SBP as the most likely diagnoses | Tuberculous peritonitis | D/A | |
2 | 4 | Case presentation/provide a differential diagnosis for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute PE | A/A |
5 | Provide the most probable diagnosis for the patient | Suggested myocardial infraction as the most probable diagnosis/suggested PE as the most probable diagonal | Acute PE | D/A | |
6 | What diagnostic test is more suitable for this patient | Suggested CTPA and ECG be performed/suggested CTPA, ECG, and d-dimers tests be performed | CTPA was performed | A/A | |
7 | What treatment do you recommend for this patient, given PE is confirmed | Suggested a choice among LMWH, DOACs, and warfarin. No discrimination between short and long-term anticoagulation was made. Suggested initial anticoagulation with either LMWH or DOACs including apixaban followed by a long-term anticoagulation with either a DOAC or warfarin | 10 mg apixaban BD was commenced followed by 5 mg BD for 6 months | PA/A | |
3 | 8 | Case presentation/provide a DD given the post-operative signs/symptoms of the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Post-operative bleeding | A/A |
9 | Provide the most probable diagnosis | Suggested exacerbation or progression of her underlying thrombocytopenic disorder/suggested post-transplant acute thrombotic microangiopathy | Post-operative bleeding | D/D | |
10 | Predict the next diagnostic test that the patient requires | Suggested coagulation studies, renal function test, peripheral blood smear, infectious testing and imaging including ultrasound and CT/suggested peripheral blood smear, LDH level, Coombs test, renal function, immunosuppressive level tests, and infection screening. | Abdominal ultrasound and abdomen/pelvis CT with contrast | PA/D | |
11 | Appropriate treatment given the evidence of active bleeding | Suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring/suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring | The patient was transfused and was re-explored | A/A | |
4 | 12 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute graft thrombosis due to renal vein thrombosis | A/A |
13 | Provide the most probable diagnosis | Suggested acute graft thrombosis due to either renal artery or vein thrombosis/suggested acute graft thrombosis due to renal vein thrombosis | Acute graft thrombosis due to renal vein thrombosis | A/A | |
14 | Provide the most suitable diagnostic test | Suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy/suggested choosing among transplant duplex US, CT angiography, and renal scintigraphy | Transplant doppler US | A/A | |
15 | Given the transplant US findings, provide the patient’s diagnosis | Acute renal allograft rejection/acute renal artery thrombosis or artery stenosis | Renal vein thrombosis | D/D | |
16 | Given the transplant US findings, suggest a diagnostic modality that could verify diagnosis | Renal biopsy/suggested CT angiography | CT angiography was performed | D/A | |
17 | Suggest treatment options for the patient | Suggested considering high-dose corticosteroids, antithymocyte globulin, calcineurin inhibitors, mycophenolate mofetil, basiliximab or alemtuzumab, and plasmapheresis with intravenous immunoglobulin/suggested surgical revascularization | Patient was re-explored | D/A | |
18 | Findings of reperfusion during benchwork after explanation | Suggested inadequate restoration of tissue perfusion and significant vascular compromise and tissue damage/suggested extensive, vascular thrombosis with poor kidney perfusion, and evidence of parenchymal damage | Artery perfusion required high pressure, kidney became turgid, swollen, and a capsular tear was seen | A/A | |
5 | 19 | Case presentation/provide DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Post-transplant obstructive LUTS due to clot retention | A/A |
20 | Provide most probable diagnosis | Suggested urinary tract obstruction most probably at the side of the anastomosis/suggested urinary tract obstruction due to blood clot formation as the most probable diagnosis | Post-transplant obstructive LUTS due clot retention | PA/A | |
21 | Suggest next diagnosis test to verify the diagnosis | Suggested considering transplant US, abdominal CT or renal scintigraphy/suggested transplant US as the first-line image modality. Suggested that other option include abdominal CT, MRI, and nuclear medicine scans | A transplant US was performed | A/A | |
22 | Given findings of US/suggest a suitable treatment option for the patient | Suggested considering manual irrigation, catheter flushing, cystoscopic clot evaluation, and monitoring/suggested replacing the foley catheter to flush out smaller clots, cystoscopic clot evaluation, consider percutaneous nephrostomy, and monitoring | A 3-way irrigation system was applied | PA/PA | |
23 | Despite resolved hematuria patient’s clearance did not improved/provide a DD | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Acute kidney injury with acute tubular necrosis | A/A | |
24 | Provide most probable diagnosis | Suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis/suggested acute kidney injury with acute tubular necrosis as the most probable diagnosis | Acute kidney injury with acute tubular necrosis | A/A | |
25 | Case progression update/poor renal function 3 months post-operatively provide DD for patient’s signs and symptoms | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Recurrence of underlying disease | A/A | |
26 | Provide most probable diagnosis | Suggested chronic allograft dysfunction as the most probable diagnosis/suggested chronic allograft dysfunction and recurrence of the underlying disease as the two most probable diagnoses | Recurrence of underlying disease | D/PA |
Table 6 Comparative performance of ChatGPT and GPT-4 in department cases on liver transplantation, detailing agreement levels by task type
Case ID | Question number | Task | Performance, ChatGPT/GPT-4 | Physicians course of action/ground truth | Agreement status, ChatGPT/GPT-4 |
1 | 1 | Case presentation/provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | Early anastomotic bile leak | A/A |
2 | Provide the most probable diagnosis | Suggested that a biliary complication including bile leak as the most probable diagnosis/suggested bile leak as the most probable diagnosis | Early anastomotic bile leak | A/A | |
3 | Suggest a suitable diagnostic test to confirm the diagnosis | Suggested considering abdominal US or CT, and MRCP/suggested considering abdominal US or CT, fluid drain analysis, and MRCP | Abdominal CT and fluid drain analysis were performed | PA/A | |
4 | Suggest a suitable treatment for this patient | Suggested considering percutaneous drainage, ERCP, surgical intervention, and antibiotics if there are signs of infection/suggested considering as a first line less invasive treatments such as percutaneous drainage and ERCP and procced with re-exploration if those fail, while covering the patient with antibiotics | Antibiotics were commenced, followed by an ERCP which did not resolve the bile leak and the patient was re-explored | A/A | |
2 | 5 | Case presentation/calculate CP score, MELD score, and MELD-sodium score | Accurately calculated CP score and MELD score, underestimated MELD-sodium score/accurately calculated the required scores | CP score = 13, MELD score = 34, and MELD-sodium score = 37 | PA/A |
6 | Patient’s pre-operative assessment findings presented/evaluate patient’s eligibility to proceed with transplantation | Suggested that it’s likely that the operation was postponed or deferred until the patient's condition improved/suggested that given the findings the transplant team would have opted to delay the liver transplantation until active issues were adequately addressed | Transplantation did not proceed | A/A | |
3 | 7 | Case presentation/provide a DD for the patient | Provided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis | PLS | D/D |
8 | Provide the most probable diagnosis | Suggested acute cellular rejection as the most probable diagnosis/suggested acute hemolytic transfusion reaction | PLS | D/D | |
9 | Suggest treatment options for the patient | Suggested high-dose of intravenous corticosteroids, other anti-rejection medications, and plasmapheresis/suggested not furtherly transfusing the patient, administer corticosteroids, and monitor the patient | Patient was treated with high-dose corticosteroids, plasmapheresis, and intravenous immunoglobulin | PA/D | |
10 | Given the patient’s 3-month new signs/symptoms (recurrent ascites, low-grade fever etc.), provide a new DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | PTLD | A/A | |
11 | Provide the most probable diagnosis | Suggested PTLD as the most probable diagnosis/suggested nephrotic syndrome as the most probable diagnosis | PTLD | A/D | |
4 | 12 | Case presentation/ suggest the most suitable diagnostic test | Brain imaging was suggested/suggested brain imaging, EEG, and tacrolimus level test | A brain CT, EEG, and tacrolimus level test were performed | PA/A |
13 | Provide a DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | PRES | A/A | |
14 | Provide the most probable diagnosis | Suggested PRES as the most probable diagnosis/suggested tacrolimus neurotoxicity as the most probable diagnosis | PRES | A/D | |
5 | 15 | Case presentation/provide DD for the patient | Provided a DD that included the final diagnosis/provided a DD that included the final diagnosis | GVHD | A/A |
16 | Provide most probable diagnosis | Suggested CMV infection as the most probable diagnosis/suggest CMV infection as the most probable diagnosis | GVHD | D/D | |
17 | Suggest appropriate diagnostic tests | Suggested CMV testing, biopsy, and imaging studies/suggested CMV testing, imaging studies, and skin biopsy | Peripheral blood flow cytometry, colonoscopy, and skin biopsy were performed | PA/PA |
Table 7 Aggregated performance of ChatGPT and GPT-4 in clinical scenarios across published and unpublished cases, categorized by task type, n (%)
Type of task | Overall chatGPT agreement level | Overall GPT-4 agreement level | chatGPT renal transplantation agreement level | GPT-4 renal transplantation agreement level | chatGPT liver transplantation agreement level | GPT-4 liver transplantation agreement level |
DD that includes final diagnosis | A: 22/30 (73.3) | A: 27/30 (90) | A: 13/16 (81.3) | A: 15/16 (93.8) | A: 9/14 (64.3) | A: 12/14 (85.7) |
PA: 1/30 (3.33) | PA: 1/30 (3.3) | PA: 1/16 (6.3) | PA: 1/16 (6.2) | PA: 0/14 (0) | PA: 0/14 (0) | |
Final diagnosis prediction | A: 11/31 (35.5) | A: 20/31 (64.5) | A: 7/17 (41.2) | A: 13/17 (76.5) | A: 4/14 (28.6) | A: 7/14 (50) |
PA: 2/31 (6.45) | PA: 2/31 (6.5) | PA: 1/17 (5.9) | PA: 1/17 (5.9) | PA: 1/14 (7.1) | PA: 1/14 (7.1) | |
Appropriate next diagnostic test | A: 8/19 (42.1) | A: 15/19 (78.9) | A: 6/13 (46.2) | A: 11/13 (84.6) | A: 2/6 (33.6) | A: 4/6 (66.7) |
PA: 8/19 (42.1) | PA: 2/19 (10.5) | PA: 5/13 (38.5) | PA: 1/13 (7.7) | PA: 3/6 (50) | PA: 1/6 (16.7) | |
Appropriate treatment | A: 11/21 (52.4) | A: 15/21 (71.4) | A: 5/8 (62.5) | A: 7/8 (87.5) | A: 6/13 (46.2) | A: 4/6 (66.7) |
PA: 9/21 (42.9) | PA: 4/21 (19) | PA: 2/8(25%) | PA: 1/8 (12.5) | PA: 7/13 (53.8) | PA: 1/6 (16.7) | |
Prediction of prognosis | A: 3/5 (60) | A: 5/5 (100) | A: 1/1 (100%) | A: 1/1 (100) | A: 2/4 (50) | A: 4/4 (100) |
PA: 1/5 (20) | PA: 0/5 (0) | PA: 0/0 (0%) | PA: 0/0 (0) | PA: 1/4 (25) | PA: 0/4 (0) |
- Citation: Christou CD, Sitsiani O, Boutos P, Katsanos G, Papadakis G, Tefas A, Papalois V, Tsoulfas G. Comparison of ChatGPT-3.5 and GPT-4 as potential tools in artificial intelligence-assisted clinical practice in renal and liver transplantation. World J Transplant 2025; 15(3): 103536
- URL: https://www.wjgnet.com/2220-3230/full/v15/i3/103536.htm
- DOI: https://dx.doi.org/10.5500/wjt.v15.i3.103536