Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study

doi:10.3748/wjg.v31.i3.101092

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 31, Issue 3

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (2488)

All Articles published online

The chart showing PDF series, HTML series, Figures (1-1) series, Tables (1-5) series.

Item

Count

PDF

HTML

1256

Figures (1-1)

357

Tables (1-5)

360

Sum=2045

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

Download

248

Sum=297

Jan 21, 2025 (publication date) through Jul 15, 2025

Times Cited of This Article

Times Cited (1)

Journal Information of This Article

Publication Name

World Journal of Gastroenterology

ISSN

1007-9327

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Basic Study

World J Gastroenterol. Jan 21, 2025; 31(3): 101092
Published online Jan 21, 2025. doi: 10.3748/wjg.v31.i3.101092

Table 2 Performance of ChatGPT-3.5, ChatGPT-4.0 and Google Gemini on hepatitis B infection test questions by different subfields, n (%)

Test questions by subfields	ChatGPT-3.5, correct	ChatGPT-3.5, incorrect	ChatGPT-4.0, correct	ChatGPT-4.0, incorrect	Google Gemini, correct	Google Gemini, incorrect
All test questions	52		52		52
1^st run	34 (65.4)	18 (34.6)	43 (82.7)	9 (17.3)	37 (71.1)	15 (28.9)
2^nd run	30 (57.7)	22 (42.3)	41 (78.9)	11 (21.1)	38 (73.1)	14 (26.9)
3^rd run	34 (65.4)	18 (34.6)	42 (80.8)	10 (19.2)	39 (75)	13 (25)
Concordance among 3 runs	41 (78.9)		46 (88.4)		50 (96.2)
Total accuracy (%)	62.9		80.8		73.1
Risk factors (n)	5		5		5
1^st run	5 (100)	0 (0)	5 (100)	0 (0)	5 (100)	0 (0)
2^nd run	5 (100)	0 (0)	5 (100)	0 (0)	5 (100)	0 (0)
3^rd run	5 (100)	0 (0)	5 (100)	0 (0)	5 (100)	0 (0)
Concordance among 3 runs	5 (100)		5 (100)		5 (100)
Total accuracy (%)	100		100		100
Clinical manifestation (n)	7		7		7
1^st run	2 (40)	5 (71.4)	4 (57.1)	3 (42.9)	5 (71.4)	2 (28.6)
2^nd run	2 (40)	5 (71.4)	4 (57.1)	3 (42.9)	5 (71.4)	2 (28.6)
3^rd run	3 (42.9)	4 (57.1)	4 (57.1)	3 (42.9)	5 (71.4)	2 (28.6)
Concordance among 3 runs	5 (71.4)		6 (85.7)		7 (100)
Total accuracy (%)	33.3		57.1		71.4
Diagnosis (n)	18		18		18
1^st run	9 (50)	9 (50)	15 (83.3)	3 (16.7)	13 (72.2)	5 (27.8)
2^nd run	8 (44.4)	10 (55.6)	15 (83.3)	3 (16.7)	14 (77.8)	4 (22.2)
3^rd run	11 (61,1)	7 (38.9)	15 (83.3)	3 (16.7)	15 (83.3)	3 (16.7)
Concordance among 3 runs	12 (66.7)		16 (88.9)		16 (88.9)
Total accuracy (%)	51.9		83.3		77.8
Treatment (n)	11		11		11
1^st run	11 (100)	0 (0)	10 (90.9)	1 (9.1)	9 (81.9)	2 (18.1)
2^nd run	10 (90.9)	1 (9.1)	10 (90.9)	1 (9.1)	9 (81.9)	2 (18.1)
3^rd run	10 (90.9)	1 (9.1)	11 (100)	0 (0)	9 (81.9)	2 (18.1)
Concordance among 3 runs	10 (90.9)		10 (90.9)		11 (100)
Total accuracy (%)	93.9		93.9		81.9
Prevention (n)	7		7		7
1^st run	4 (57.1)	3 (42.9)	6 (85.7)	1 (14.3)	3 (42.9)	4 (57.1)
2^nd run	3 (42.9)	4 (57.1)	4 (57.1)	3 (42.9)	3 (42.9)	4 (57.1)
3^rd run	3 (42.9)	4 (57.1)	4 (57.1)	3 (42.9)	3 (42.9)	4 (57.1)
Concordance among 3 runs	6 (85.7)		5 (71.4)		7 (100)
Total accuracy (%)	47.6		66.7		42.9
Prognosis (n)	4		4		4
1^st run	3 (75)	1 (25)	3 (75)	1 (25)	2 (50)	2 (50)
2^nd run	2 (50)	2 (50)	3 (75)	1 (25)	2 (50)	2 (50)
3^rd run	2 (50)	2 (50)	3 (75)	1 (25)	2 (50)	2 (50)
Concordance among 3 runs	3 (75)		4 (100)		4 (100)
Total accuracy (%)	58.3		75		50

Citation: Li Y, Huang CK, Hu Y, Zhou XD, He C, Zhong JW. Exploring the performance of large language models on hepatitis B infection-related questions: A comparative study. World J Gastroenterol 2025; 31(3): 101092
URL: https://www.wjgnet.com/1007-9327/full/v31/i3/101092.htm
DOI: https://dx.doi.org/10.3748/wjg.v31.i3.101092