Observational Study
Copyright ©The Author(s) 2025.
World J Transplant. Sep 18, 2025; 15(3): 103536
Published online Sep 18, 2025. doi: 10.5500/wjt.v15.i3.103536
Table 1 Overall ChatGPT performance in assigning context labels across 294 virtual cases, highlighting agreement with predefined labels, n (%)
Actual label
Assigned label, GI
Assigned label, diagnosis
Assigned label, DD
Assigned label, treatment
Assigned label, prognosis
Assigned label, total
GI33 (11.22)20 (6.8)6 (24)16 (5.44)3 (12)78 (26.53)
Diagnosis8 (2.72)33 (11.22)5 (1.7)0 (0)2 (0.68)48 (16.33)
DD0 (0)11 (3.74)11 (3.74)0 (0)0 (0)22 (7.48)
Treatment9 (36)10 (3.4)2 (0.68)64 (21.77)1 (0.34)86 (29.25)
Prognosis12 (48)12 (48)3 (12)3 (12)30 (10.2)60 (20.41)
Total62 (219)86 (29.25)27 (9.18)83 (28.23)36 (12.24)294 (100)
Table 2 Overall GPT-4 performance in assigning context labels in virtual cases across 294 virtual cases, highlighting agreement with predefined labels, n (%)
Actual label
Assigned label, GI
Assigned label, diagnosis
Assigned label, DD
Assigned label, treatment
Assigned label, prognosis
Assigned label, total
GI42 (14.29)14 (4.76)1 (0.34)20 (6.8)1 (0.34)78 (26.53)
Diagnosis2 (0.68)44 (14.97)1 (0.34)1 (0.34)0 (0)48 (16.33)
DD0 (0)15 (5.1)5 (1.7)2 (0.68)0 (0)22 (7.48)
Treatment5 (1.7)7 (2.38)1 (0.34)73 (24.83)0 (0)86 (29.25)
Prognosis10 (3.4)11 (3.74)5 (1.7)7 (2.38)27 (9.18)60 (20.41)
Total59 (207)91 (30.95)13 (4.42)103 (353)28 (9.52)294 (100)
Table 3 Comparative performance of ChatGPT and GPT-4 in case reports on renal transplantation, detailing agreement levels by task type
Ref.
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
Alharbi et al[21]1Provide a list of suitable antibiotics for pseudomonas aeruginosa urinary tract infection.Provided a list of suitable antibiotics including the one used by physicians (meropenem)/provided a list of suitable antibiotics including the one used by physicians (meropenem)Meropenem was administratedA/A
2Suggest the next diagnostic test(s) needed for the patientSuggested a renal ultrasound and a stool culture/suggested a renal ultrasound, abdominal CT, blood cultures, and a stool cultureAbdominal ultrasound and abdominal CT scan were conductedPA/A
3Identify the most probable diagnosis for the patient Renal allograft malignancy/renal allograft malignancyEosinophilic chromophobe renal cell carcinoma was confirmed by the histopathological examination of the graftA/A
Rubin et al[22]4Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisCMV viremiaA/A
5Provide the most likely diagnosis for the patientPost-influenza bacterial pneumonia/CMV reactivationCMV viremia was demonstrated by antigenemia and PCR assayD/A
6Suggest treatment for the patientSuggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)/suggested ganciclovir, valganciclovir, foscarnet, and cidofovir (most preferable ganciclovir or valganciclovir)Intravenous ganciclovir followed by oral valganciclovir at a dose of 900 mg/day was administeredA/A
Molina-Andújar et al[23]7Provide a DD for the patientProvided a DD that included the final diagnosis/ Provided a DD that included the final diagnosisAcute post-infectious glomerulonephritisA/A
8Provide the most likely diagnosis for the patientAcute post-infectious glomerulonephritis/acute post-infectious glomerulonephritisAcute post-infectious glomerulonephritisA/A
Baker et al[24]9Provide the next step patient’s managementSuggested hemodynamic stabilization with transfusion of blood products and bleeding control including surgical intervention, if necessary, followed by continuous monitoring/Suggested hemodynamic stabilization with transfusion of blood products and surgical exploration if bleeding if suspected to be within the surgical site. Suggested medication reevaluation focus on anticoagulants, prophylactic treatment for infection prevention and continuous monitoring. The patient was taken back to theatre for exploration where ligation of the bleeding artery, removal of blood clots and blood transfusion took place. Postoperative monitoring was performedA/A
10Suggest the next diagnostic test needed for the patientSuggested an abdominal CT scan or an ultrasound/suggested imaging such as abdominal CT with contrast, an ultrasound or an angiogram is performed. Suggested evaluating the patient with new laboratory tests and for the need of re-explorationAn urgent CT angiogram was performedPA/A
11Provide a DD for the bleedingProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisBleeding from a small branch of the renal arteryA/A
12Provide the most likely diagnosis for the patientFailure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgery/Failure or dislodgement of a surgical clip: Bleeding from a small branch of the renal artery where a surgical clip had come off during the re-exploration surgeryBleeding was noticed from a small branch of the renal arteryA/A
Gewehr et al[25]13Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisFungal infectionA/A
14Provide the most likely diagnosis for the patientFungal Infection/fungal Infection, and specifically sporotrichosisFungal Infection (sporotrichosis)A/A
Vassallo et al[26] 15Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisActive hepatitis E virus infectionA/A
16Provide the most likely diagnosis for the patientNAFLD/NAFLD or drug inducedActive hepatitis E virus infectionD/D
17Suggest the next diagnostic test needed for the patientSuggested liver biopsy along with further imaging and laboratory investigations/suggested liver biopsy along with further imaging and laboratory investigationsLiver biopsyA/A
18Suggest the next diagnostic test needed for the patient after the biopsy resultsSuggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver function/suggested extensive viral serologic tests, PCR for suspected viruses, immunostaining of liver biopsy, and continuous monitoring of liver functionA more extensive viral screen was conductedA/A
Olsen et al[27]19Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisEpstein-Barr virus-negative, diffuse, large B-cell lymphomaA/A
20Provide the most likely diagnosis for the patientSuggested that infectious aetiologies such as disseminated tuberculosis or fungal infections are more likely. It implied that diagnosis is difficult without further diagnostic investigations/determined as PTLD as the most likely diagnosis followed by infectious aetiologiesEpstein-Barr virus-negative, diffuse, large B cell lymphomaD/A
21Suggest the next diagnostic test needed for the patientSuggested sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsy/suggested liver biopsy, sputum and/or BAL cultures, Mantoux test or IGRA, Blood tests, further imaging and laboratory tests, and lung biopsyBiopsy from one of the liver lesionsD/A
Allam et al[28]22Suggest the next diagnostic test needed for the patientSuggested a kidney biopsy/suggested a kidney biopsy and further laboratory testsTransplant biopsy was performedA/A
23Provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD including vascular complications such as vein stenosisBiopsy-induced arteriovenous fistula and venous stenosisD/PA
24Suggest treatment for the patientSuggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)/suggested intervention to address the arteriovenous fistula and stenosis of the main renal vein (embolization, angioplasty, stenting)Embolization of fistula (coil occlusion)A/A
Subramanian et al[29]25Provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisA small basal ganglia infarct and an infarct of the spinal cord was foundD/A
26Provide the most likely diagnosis for the patientSuggested ischemic injury or infarction of the spinal cord/suggested spinal cord ischemia or infarctionA small basal ganglia infarct and an infarct of the spinal cord was foundA/A
27Suggest the next diagnostic test needed for the patientSuggested spine MRI, NCS and EMG to assess peripheral nerves and muscles, lumbar puncture if infections suspected, and transplant biopsy if rejection or ischemia is suspected/suggested spine MRI-MRA, neurond physiological studies (SSEP, NCS and EMG), lumbar puncture if infections suspectedA CTAP, and spine/brain MRI were performed PA/PA
Ainsworth et al[30]28Provide a DD for the patientProvided a DD that included immune-mediated hemolysis but did not specifically include PLS/provided a DD that included the final diagnosisPLSPA/A
29Provide the most likely diagnosis for the patientSuggested hemolysis due to mismatched blood type of the donor/suggested PLSPLSD/A
Table 4 Comparative performance of ChatGPT and GPT-4 in case reports on liver transplantation, detailing agreement levels by task type
Ref.
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
Rubin et al[22]1Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisCMVA/A
2Provide the most likely diagnosis for the patientSuggested post-transplant infection, particularly a viral infection (CMV, EBV, or VZV)/CMVCMVPA/A
3Justify the recurrence of CMV infections despite treatmentSuggested resistance to ganciclovir/suggested resistance to ganciclovir or/and inadequate duration of initial treatment-secondary infectionsGanciclovir resistant infectionA/A
4Suggest alternative treatment for the patientSuggested foscarnet/suggested foscarnet or cidofovir or letermovir or/and CMV immunoglobulinsFoscarnet was administeredA/A
Okeke et al[31]5Case presentation/suggest treatment for the patient given no arterial flow in the liverSuggested interventional radiology procedures or/and surgical revascularization/suggested interventional radiology procedures or/and surgical revascularization (thrombectomy or re-anastomosis)Interventional radiology procedure (thrombolysis) was performed. Then revascularization was achieved intraoperatively (infra-aortic jump was performed)PA/PA
6Suggest the diagnostic tests needed for the patient following re-thrombosisSuggested doppler ultrasound, CT angiogram, coagulation profile-thrombophilia testing/suggested thrombophilia workup, repeat imaging (doppler ultrasound, CT/MRI angiography), and autoimmune markersHypercoagulable workup was performedA/A
7Provide a DD behind re-thrombosisProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAntiphospholipid syndromeA/A
8Provide the most likely diagnosis for the patientSuggested hepatic artery thrombosis/suggested antiphospholipid syndromeAntiphospholipid syndromeD/A
Eubank et al[32]9Case presentation/determine the most likely microorganism to be identified by the swabSuggested Staphylococcus aureus, Streptococcus species, Enterococcus species, and Pseudomonas aeruginosa, and fungi like Candida albicans/suggested Staphylococcus aureus, Enterococcus species, Pseudomonas aeruginosa, Escherichia coli, fungi like Candida or Aspergillus, viruses like CMV, and mycobacteria94% Enterococcus faecalis, 93% Rhizopus oryzae, and 5% Aspergillus flavusD/PA
10Suggest treatment for the patient given the pathogens identifiedSuggested intravenous liposomal amphotericin B at an appropriate dosage, along with surgical debridement of infected tissue/suggested intravenous liposomal amphotericin B at an appropriate dosage, oral posaconazole along with surgical debridement of infected tissue. Oral posaconazole 300 mg and IV amphotericin B and micafungin daily. Amphotericin B deoxycholate irrigation in the wound vacuumPA/A
Kim et al[33]11Case presentation/provide a DD for the patient’s shockProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHDD/A
12Provide the most likely diagnosis for the patientSuggested a surgical complication, specifically duodenal perforation/suggested duodenal perforation or drug-induced kidney injury/neutropeniaGVHDD/D
13Suggest the further diagnostic tests needed for the patientSuggested blood cultures, peritoneal fluid analysis, endoscopy or upper GI imaging/suggested blood and urine cultures, viral and fungal tests, peritoneal fluid analysis, laboratory tests, and endoscopy or upper GI imagingMixed chimerism studies and skin biopsy were performedD/D
14Suggest further treatment for the patient given the mixed chimerism studies resultsThe following treatment options were suggested: Systemic corticosteroids, adjusting tacrolimus dose, consider additional immunosuppressives such as mycophenolate, and phototherapy/suggested considering the following treatment options: High-dose corticosteroids, ATG, ECP, infliximab, ruxolitinib, MSC transplantation, additional immunosuppressive agents, and IL-2 diphtheria toxinSteroids were administrated for 4 days followed by ruxolitinib due to patient not responding to treatmentPA/A
15Guess the survival of the patientSuggested that the patient did not, most likely, survive/suggested that the patient did not, most likely, surviveThe patient died on day 16 of re-admission, 45 days following transplantationA/A
Kim et al[33], (b)16Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHDD/A
17Provide the most likely diagnosis for the patientSuggested Clostridioides difficile colitis/suggested GVHDGVHDD/A
18Suggest treatment for the patientThe following treatment options were suggested: Glucocorticoids, CNIs, ATG, T-cell depleting agents such as basiliximab/high-dose corticosteroids, adjust immunosuppression, consider second line treatments such as ATG, ECP, sirolimus, infliximab, and basiliximabSteroids were administrated for 2 days followed by ruxolitinib due to patient not responding to treatmentPA/PA
19Guess the survival of the patientDeclined to make a prediction/suggested that the patient did not, most likely, surviveThe patient died 29 days after transplantD/A
Ramírez de la Piscina et al[34]20Case presentation/Provide a DD for the patientProvided a DD that included the final diagnosis/ provided a DD that included the final diagnosisBudd-Chiari syndrome secondary to ADPKDA/A
21Provide the most likely diagnosis for the patientSuggested Budd-Chiari syndrome/suggested Budd-Chiari syndrome secondary to the compression from ADPKD cystsBudd-Chiari syndrome secondary to ADPKDA/A
22Suggest treatment for the patientProvided a list of suitable treatment options including only liver transplantation/provided a list of suitable treatment options including combined transplantationA combined liver and renal transplantation was performedPA/A
Arstikyte et al[35]23Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisVenous air embolismA/A
24Provide the most likely diagnosis for the patientSuggested that information given is insufficient to single out a specific diagnosis/suggested that based on given information hemorrhage or venous air embolism are the two most likely diagnosesVenous air embolismD/A
25Suggest appropriate diagnostic test for the patientSuggested TEE/suggested TEETEEA/A
Aucejo et al[36]26Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosisNarrowing of the RHV at the level of the cava-caval anastomosisD/D
27Provide the most likely diagnosis for the patientSuggested adhesions, anastomotic leakage, or biliary complications/suggested PVTNarrowing of the RHV at the level of the cava-caval anastomosisD/D
28Given the RHV stenosis diagnosis, suggest treatment for the patientSuggested considering stent placement, TIPS or surgical revision/suggested considering stent placement, TIPS or surgical revisionA wall stent 14 mm in diameter by 40 mm in length was placed across the RHV stenosisA/A
Ichimura et al[37]29Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisVOD/SOSA/A
30Provide the most likely diagnosis for the patientSuggested GVHD/suggested VOD/SOSVOD/SOSD/A
31Suggest treatment for the patient given VOD/SOSSuggested considering defibrotide, anticoagulant medications, and liver transplantation/suggested considering defibrotide, anticoagulant medications, TIPS, and liver transplantationThe physicians performed a liver transplantation since defibrotide had not yet been approvedA/A
32Provide a new differential diagnosis for the patient’s deterioration postoperativelyProvided a DD that did not include the final diagnosis/provided a DD that included the final diagnosisGVHD, several infectionsD/A
Trevizoli et al[38]33Case presentation/suggest appropriate treatment for the patientSuggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, diuretics, variceal bleeding prophylaxis and liver transplant evaluation/suggested considering corticosteroids, aminosalicylates, immunomodulators such as azathioprine, biologic agents such as infliximab, consider surgical management (colectomy), diuretics, variceal bleeding prophylaxis and liver transplant evaluationSodium restriction, diuretic therapy, hydrocortisone 300 mg was started without adequate response, vedolizumabPA/PA
34Suggest appropriate treatment for the patient given the DVT progressionSuggested LMWH and IVF/suggested LMWHHe underwent hemodynamic intervention with the placement of a vena cava filterA/D
Table 5 Comparative performance of ChatGPT and GPT-4 in department cases on renal transplantation, detailing agreement levels by task type
Case ID
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
11Case presentation/provide the diagnostic tests needed to investigate refractory ascites in patient with ADPKDSuggested abdominal ultrasound, paracentesis with fluid analysis, LF tests, tumor marker tests, CT scan, serologic testing, genetic testing/ suggested paracentesis with fluid analysis, LF tests, abdominal ultrasound, CT scan, echocardiogram, and endoscopy, further evaluation for elevated markersParacentesis (ascites fluid was send for cytology, culture, TB investigation, SAAG calculation), abdominal CT, liver ultrasound, LF tests, tumor marker tests, serologic testing, echocardiogram, and endoscopyPA/A
2Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisTuberculous peritonitisA/A
3Provide the most likely diagnosis for the patientSuggested malignancy (most likely ovarian cancer) or SBP are the most likely diagnoses/suggested tuberculous peritonitis or malignancy or SBP as the most likely diagnosesTuberculous peritonitisD/A
24Case presentation/provide a differential diagnosis for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute PEA/A
5Provide the most probable diagnosis for the patientSuggested myocardial infraction as the most probable diagnosis/suggested PE as the most probable diagonalAcute PED/A
6What diagnostic test is more suitable for this patientSuggested CTPA and ECG be performed/suggested CTPA, ECG, and d-dimers tests be performed CTPA was performedA/A
7What treatment do you recommend for this patient, given PE is confirmedSuggested a choice among LMWH, DOACs, and warfarin. No discrimination between short and long-term anticoagulation was made. Suggested initial anticoagulation with either LMWH or DOACs including apixaban followed by a long-term anticoagulation with either a DOAC or warfarin10 mg apixaban BD was commenced followed by 5 mg BD for 6 monthsPA/A
38Case presentation/provide a DD given the post-operative signs/symptoms of the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPost-operative bleedingA/A
9Provide the most probable diagnosisSuggested exacerbation or progression of her underlying thrombocytopenic disorder/suggested post-transplant acute thrombotic microangiopathyPost-operative bleeding D/D
10Predict the next diagnostic test that the patient requiresSuggested coagulation studies, renal function test, peripheral blood smear, infectious testing and imaging including ultrasound and CT/suggested peripheral blood smear, LDH level, Coombs test, renal function, immunosuppressive level tests, and infection screening.Abdominal ultrasound and abdomen/pelvis CT with contrastPA/D
11Appropriate treatment given the evidence of active bleedingSuggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoring/suggested stabilization with intravenous fluids and blood products, surgical intervention, and close monitoringThe patient was transfused and was re-exploredA/A
412Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute graft thrombosis due to renal vein thrombosisA/A
13Provide the most probable diagnosisSuggested acute graft thrombosis due to either renal artery or vein thrombosis/suggested acute graft thrombosis due to renal vein thrombosisAcute graft thrombosis due to renal vein thrombosisA/A
14Provide the most suitable diagnostic testSuggested choosing among transplant duplex US, CT angiography, and renal scintigraphy/suggested choosing among transplant duplex US, CT angiography, and renal scintigraphyTransplant doppler USA/A
15Given the transplant US findings, provide the patient’s diagnosisAcute renal allograft rejection/acute renal artery thrombosis or artery stenosisRenal vein thrombosisD/D
16Given the transplant US findings, suggest a diagnostic modality that could verify diagnosisRenal biopsy/suggested CT angiographyCT angiography was performedD/A
17Suggest treatment options for the patientSuggested considering high-dose corticosteroids, antithymocyte globulin, calcineurin inhibitors, mycophenolate mofetil, basiliximab or alemtuzumab, and plasmapheresis with intravenous immunoglobulin/suggested surgical revascularizationPatient was re-exploredD/A
18Findings of reperfusion during benchwork after explanationSuggested inadequate restoration of tissue perfusion and significant vascular compromise and tissue damage/suggested extensive, vascular thrombosis with poor kidney perfusion, and evidence of parenchymal damageArtery perfusion required high pressure, kidney became turgid, swollen, and a capsular tear was seenA/A
519Case presentation/provide DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPost-transplant obstructive LUTS due to clot retentionA/A
20Provide most probable diagnosisSuggested urinary tract obstruction most probably at the side of the anastomosis/suggested urinary tract obstruction due to blood clot formation as the most probable diagnosisPost-transplant obstructive LUTS due clot retentionPA/A
21Suggest next diagnosis test to verify the diagnosisSuggested considering transplant US, abdominal CT or renal scintigraphy/suggested transplant US as the first-line image modality. Suggested that other option include abdominal CT, MRI, and nuclear medicine scansA transplant US was performedA/A
22Given findings of US/suggest a suitable treatment option for the patientSuggested considering manual irrigation, catheter flushing, cystoscopic clot evaluation, and monitoring/suggested replacing the foley catheter to flush out smaller clots, cystoscopic clot evaluation, consider percutaneous nephrostomy, and monitoringA 3-way irrigation system was appliedPA/PA
23Despite resolved hematuria patient’s clearance did not improved/provide a DDProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisAcute kidney injury with acute tubular necrosisA/A
24Provide most probable diagnosisSuggested acute kidney injury with acute tubular necrosis as the most probable diagnosis/suggested acute kidney injury with acute tubular necrosis as the most probable diagnosisAcute kidney injury with acute tubular necrosisA/A
25Case progression update/poor renal function 3 months post-operatively provide DD for patient’s signs and symptomsProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisRecurrence of underlying diseaseA/A
26Provide most probable diagnosisSuggested chronic allograft dysfunction as the most probable diagnosis/suggested chronic allograft dysfunction and recurrence of the underlying disease as the two most probable diagnosesRecurrence of underlying diseaseD/PA
Table 6 Comparative performance of ChatGPT and GPT-4 in department cases on liver transplantation, detailing agreement levels by task type
Case ID
Question number
Task
Performance, ChatGPT/GPT-4
Physicians course of action/ground truth
Agreement status, ChatGPT/GPT-4
11Case presentation/provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisEarly anastomotic bile leakA/A
2Provide the most probable diagnosisSuggested that a biliary complication including bile leak as the most probable diagnosis/suggested bile leak as the most probable diagnosisEarly anastomotic bile leakA/A
3Suggest a suitable diagnostic test to confirm the diagnosisSuggested considering abdominal US or CT, and MRCP/suggested considering abdominal US or CT, fluid drain analysis, and MRCPAbdominal CT and fluid drain analysis were performedPA/A
4Suggest a suitable treatment for this patientSuggested considering percutaneous drainage, ERCP, surgical intervention, and antibiotics if there are signs of infection/suggested considering as a first line less invasive treatments such as percutaneous drainage and ERCP and procced with re-exploration if those fail, while covering the patient with antibioticsAntibiotics were commenced, followed by an ERCP which did not resolve the bile leak and the patient was re-exploredA/A
25Case presentation/calculate CP score, MELD score, and MELD-sodium scoreAccurately calculated CP score and MELD score, underestimated MELD-sodium score/accurately calculated the required scoresCP score = 13, MELD score = 34, and MELD-sodium score = 37PA/A
6Patient’s pre-operative assessment findings presented/evaluate patient’s eligibility to proceed with transplantationSuggested that it’s likely that the operation was postponed or deferred until the patient's condition improved/suggested that given the findings the transplant team would have opted to delay the liver transplantation until active issues were adequately addressedTransplantation did not proceedA/A
37Case presentation/provide a DD for the patientProvided a DD that did not include the final diagnosis/provided a DD that did not include the final diagnosis PLSD/D
8Provide the most probable diagnosisSuggested acute cellular rejection as the most probable diagnosis/suggested acute hemolytic transfusion reactionPLSD/D
9Suggest treatment options for the patientSuggested high-dose of intravenous corticosteroids, other anti-rejection medications, and plasmapheresis/suggested not furtherly transfusing the patient, administer corticosteroids, and monitor the patientPatient was treated with high-dose corticosteroids, plasmapheresis, and intravenous immunoglobulinPA/D
10Given the patient’s 3-month new signs/symptoms (recurrent ascites, low-grade fever etc.), provide a new DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPTLDA/A
11Provide the most probable diagnosisSuggested PTLD as the most probable diagnosis/suggested nephrotic syndrome as the most probable diagnosisPTLDA/D
412Case presentation/ suggest the most suitable diagnostic testBrain imaging was suggested/suggested brain imaging, EEG, and tacrolimus level testA brain CT, EEG, and tacrolimus level test were performedPA/A
13Provide a DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisPRESA/A
14Provide the most probable diagnosisSuggested PRES as the most probable diagnosis/suggested tacrolimus neurotoxicity as the most probable diagnosisPRESA/D
515Case presentation/provide DD for the patientProvided a DD that included the final diagnosis/provided a DD that included the final diagnosisGVHDA/A
16Provide most probable diagnosisSuggested CMV infection as the most probable diagnosis/suggest CMV infection as the most probable diagnosisGVHDD/D
17Suggest appropriate diagnostic testsSuggested CMV testing, biopsy, and imaging studies/suggested CMV testing, imaging studies, and skin biopsyPeripheral blood flow cytometry, colonoscopy, and skin biopsy were performedPA/PA
Table 7 Aggregated performance of ChatGPT and GPT-4 in clinical scenarios across published and unpublished cases, categorized by task type, n (%)
Type of task
Overall chatGPT agreement level
Overall GPT-4 agreement level
chatGPT renal transplantation agreement level
GPT-4 renal transplantation agreement level
chatGPT liver transplantation agreement level
GPT-4 liver transplantation agreement level
DD that includes final diagnosisA: 22/30 (73.3)A: 27/30 (90)A: 13/16 (81.3)A: 15/16 (93.8)A: 9/14 (64.3)A: 12/14 (85.7)
PA: 1/30 (3.33)PA: 1/30 (3.3)PA: 1/16 (6.3)PA: 1/16 (6.2)PA: 0/14 (0)PA: 0/14 (0)
Final diagnosis predictionA: 11/31 (35.5)A: 20/31 (64.5)A: 7/17 (41.2)A: 13/17 (76.5)A: 4/14 (28.6)A: 7/14 (50)
PA: 2/31 (6.45)PA: 2/31 (6.5)PA: 1/17 (5.9)PA: 1/17 (5.9)PA: 1/14 (7.1)PA: 1/14 (7.1)
Appropriate next diagnostic testA: 8/19 (42.1)A: 15/19 (78.9)A: 6/13 (46.2)A: 11/13 (84.6)A: 2/6 (33.6)A: 4/6 (66.7)
PA: 8/19 (42.1)PA: 2/19 (10.5)PA: 5/13 (38.5)PA: 1/13 (7.7)PA: 3/6 (50)PA: 1/6 (16.7)
Appropriate treatment A: 11/21 (52.4)A: 15/21 (71.4)A: 5/8 (62.5)A: 7/8 (87.5)A: 6/13 (46.2)A: 4/6 (66.7)
PA: 9/21 (42.9)PA: 4/21 (19)PA: 2/8(25%)PA: 1/8 (12.5)PA: 7/13 (53.8)PA: 1/6 (16.7)
Prediction of prognosis A: 3/5 (60)A: 5/5 (100)A: 1/1 (100%)A: 1/1 (100)A: 2/4 (50)A: 4/4 (100)
PA: 1/5 (20)PA: 0/5 (0)PA: 0/0 (0%)PA: 0/0 (0)PA: 1/4 (25)PA: 0/4 (0)