Basic Study
Copyright ©The Author(s) 2025.
World J Gastroenterol. Jan 21, 2025; 31(3): 101092
Published online Jan 21, 2025. doi: 10.3748/wjg.v31.i3.101092
Table 1 Quality indicators (scientific adequacy) for answers from ChatGPT-3.5, ChatGPT-4.0, and Google Gemini
Common questions
Sources of answers
Answer lengths, 1st run
Answer lengths, 2nd run
Answer lengths, 3rd run
Grades, mean
Grades, P value
Overall (mean)ChatGPT-3.52753663523.500.2
ChatGPT-4.02742522383.69
Google Gemini3073223253.53
Risk factors
What are the transmission modes of hepatitis B virus?ChatGPT-3.51893164003.670.296
ChatGPT-4.03582412204
Google Gemini2642912913.33
Clinical manifestation
What are the symptoms of hepatitis B infection?ChatGPT-3.52473333563.670.216
ChatGPT-4.02692762953.67
Google Gemini2263493523
Diagnosis
What is the most accurate test for diagnosing Hepatitis B infection?ChatGPT-3.52233413483.670.027
ChatGPT-4.03073492804
Google Gemini2812812813
Treatment
Can hepatitis B infection be cured clinically?ChatGPT-3.53343573953.670.216
ChatGPT-4.02713242643.67
Google Gemini2683603593
What are the indications of antiviral therapy for patients infected with hepatitis B virus?ChatGPT-3.53683674023.670.296
ChatGPT-4.03513342963.33
Google Gemini3853843923
Can patients infected with hepatitis B virus be pregnant during antiviral treatment?ChatGPT-3.53413923833.330.079
ChatGPT-4.03192472424
Google Gemini3693523514
Do patients diagnosed with chronic hepatitis B during pregnancy need antiviral therapy?ChatGPT-3.536641638330.296
ChatGPT-4.02303132563.33
Google Gemini3253303753.67
Can patients diagnosed with chronic hepatitis B during lactation be treated with antiviral therapy?ChatGPT-3.53664193913.330.296
ChatGPT-4.02451902183.67
Google Gemini3623283304
Prevention
How long should a newborn receive the first dose of hepatitis B vaccine after birth?ChatGPT-3.51333921853.670.296
ChatGPT-4.01821461463.33
Google Gemini1932072014
Can pregnant women receive hepatitis B vaccine?ChatGPT-3.51813973384
ChatGPT-4.01791831494
Google Gemini2773183184
How often should patients with hepatitis B virus infection be reexamined?ChatGPT-3.520942132830.027
ChatGPT-4.02751712053.33
Google Gemini3303343344
Prognosis
What are the complications of hepatitis B infection?ChatGPT-3.53432353053.330.216
ChatGPT-4.03002452804.00
Google Gemini4053263183.33
Table 2 Performance of ChatGPT-3.5, ChatGPT-4.0 and Google Gemini on hepatitis B infection test questions by different subfields, n (%)
Test questions by subfields
ChatGPT-3.5, correct
ChatGPT-3.5, incorrect
ChatGPT-4.0, correct
ChatGPT-4.0, incorrect
Google Gemini, correct
Google Gemini, incorrect
All test questions525252
1st run34 (65.4)18 (34.6)43 (82.7)9 (17.3)37 (71.1)15 (28.9)
2nd run30 (57.7)22 (42.3)41 (78.9)11 (21.1)38 (73.1)14 (26.9)
3rd run34 (65.4)18 (34.6)42 (80.8)10 (19.2)39 (75)13 (25)
Concordance among 3 runs41 (78.9)46 (88.4)50 (96.2)
Total accuracy (%)62.980.873.1
Risk factors (n)555
1st run5 (100)0 (0)5 (100)0 (0)5 (100)0 (0)
2nd run5 (100)0 (0)5 (100)0 (0)5 (100)0 (0)
3rd run5 (100)0 (0)5 (100)0 (0)5 (100)0 (0)
Concordance among 3 runs5 (100)5 (100)5 (100)
Total accuracy (%)100100100
Clinical manifestation (n)777
1st run2 (40)5 (71.4)4 (57.1)3 (42.9)5 (71.4)2 (28.6)
2nd run2 (40)5 (71.4)4 (57.1)3 (42.9)5 (71.4)2 (28.6)
3rd run3 (42.9)4 (57.1)4 (57.1)3 (42.9)5 (71.4)2 (28.6)
Concordance among 3 runs5 (71.4)6 (85.7)7 (100)
Total accuracy (%)33.357.171.4
Diagnosis (n)181818
1st run9 (50)9 (50)15 (83.3)3 (16.7)13 (72.2)5 (27.8)
2nd run8 (44.4)10 (55.6)15 (83.3)3 (16.7)14 (77.8)4 (22.2)
3rd run11 (61,1)7 (38.9)15 (83.3)3 (16.7)15 (83.3)3 (16.7)
Concordance among 3 runs12 (66.7)16 (88.9)16 (88.9)
Total accuracy (%)51.983.377.8
Treatment (n)111111
1st run11 (100)0 (0)10 (90.9)1 (9.1)9 (81.9)2 (18.1)
2nd run10 (90.9)1 (9.1)10 (90.9)1 (9.1)9 (81.9)2 (18.1)
3rd run10 (90.9)1 (9.1)11 (100)0 (0)9 (81.9)2 (18.1)
Concordance among 3 runs10 (90.9)10 (90.9)11 (100)
Total accuracy (%)93.993.981.9
Prevention (n)777
1st run4 (57.1)3 (42.9)6 (85.7)1 (14.3)3 (42.9)4 (57.1)
2nd run3 (42.9)4 (57.1)4 (57.1)3 (42.9)3 (42.9)4 (57.1)
3rd run3 (42.9)4 (57.1)4 (57.1)3 (42.9)3 (42.9)4 (57.1)
Concordance among 3 runs6 (85.7)5 (71.4)7 (100)
Total accuracy (%)47.666.742.9
Prognosis (n)444
1st run3 (75)1 (25)3 (75)1 (25)2 (50)2 (50)
2nd run2 (50)2 (50)3 (75)1 (25)2 (50)2 (50)
3rd run2 (50)2 (50)3 (75)1 (25)2 (50)2 (50)
Concordance among 3 runs3 (75)4 (100)4 (100)
Total accuracy (%)58.37550
Table 3 Comparison of readability of answers from ChatGPT-3.5 with the 8th grade reading level, mean ± SD
Subfield
GFI
P value
FKGL
P value
Risk factors16.73 ± 1.770.01312.60 ± 1.720.043
Clinical manifestation13.68 ± 2.120.04310.75 ± 1.720.109
Diagnosis15.46 ± 1.650.01612.12 ± 1.610.048
Treatment21.22 ± 1.99< 0.00117.22 ± 1.47< 0.001
Prevention18.89 ± 1.80< 0.00115.53 ± 1.72< 0.001
Prognosis18.52 ± 1.850.01015.51 ± 2.170.027
Overall18.93 ± 3.03< 0.00115.31 ± 2.67< 0.001
Table 4 Comparison of readability of answers from ChatGPT-4.0 with the 8th grade reading level, mean ± SD
Subfield
GFI
P value
FKGL
P value
Risk factors14.79 ± 0.24< 0.00111.45 ± 0.350.003
Clinical manifestation11.05 ± 0.890.0279.06 ± 0.730.130
Diagnosis14.40 ± 0.420.00111.28 ± 0.47< 0.001
Treatment18.18 ± 1.45< 0.00114.57 ± 1.27< 0.001
Prevention16.49 ± 1.27< 0.00113.49 ± 1.09< 0.001
Prognosis16.10 ± 0.520.00113.18 ± 0.05< 0.001
Overall16.39 ± 2.38< 0.00113.19 ± 1.96< 0.001
Table 5 Comparison of readability of answers from Google Gemini with the 8th grade reading level, mean ± SD
Subfield
GFI
P value
FKGL
P value
Risk factors14.54 ± 0.460.00210.73 ± 0.160.001
Clinical manifestation13.06 ± 0.420.0029.81 ± 0.680.043
Diagnosis17.71 ± 0.30< 0.00113.54 ± 0.24< 0.001
Treatment19.93 ± 1.44< 0.00115.65 ± 1.06< 0.001
Prevention15.63 ± 1.96< 0.00111.71 ± 1.82< 0.001
Prognosis14.81 ± 0.620.00312.37 ± 0.270.001
Overall17.22 ± 2.86< 0.00113.32 ± 2.44< 0.001