Dalal S, Onyema EM, Malik A. Hybrid XGBoost model with hyperparameter tuning for prediction of liver disease with better accuracy. World J Gastroenterol 2022; 28(46): 6551-6563 [PMID: 36569269 DOI: 10.3748/wjg.v28.i46.6551]
Corresponding Author of This Article
Edeh Michael Onyema, Lecturer, Head of Department, Mathematics and Computer Science, Coal City University, Coal City University Emene, Enugu 400102, Nigeria. michael.edeh@ccu.edu.ng
Research Domain of This Article
Engineering, Biomedical
Article-Type of This Article
Clinical and Translational Research
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Gastroenterol. Dec 14, 2022; 28(46): 6551-6563 Published online Dec 14, 2022. doi: 10.3748/wjg.v28.i46.6551
Hybrid XGBoost model with hyperparameter tuning for prediction of liver disease with better accuracy
Surjeet Dalal, Edeh Michael Onyema, Amit Malik
Surjeet Dalal, Department of CSE, Amity University, Gurugram 122413, Haryana, India
Edeh Michael Onyema, Department of Mathematics and Computer Science, Coal City University, Enugu 400102, Nigeria
Amit Malik, Department of CSE, SRM University, Delhi-NCR, Sonipat 131001, Haryana, India
Author contributions: Onyema EM contributed to the introduction, background, results, and analysis; Dalal S contributed to the design, methods, conclusion, and background; Malik A contributed to the discussion, data collection, and review of the final draft.
Institutional review board statement: There was no ethical approval required.
Clinical trial registration statement: This letter is to confirm that the results are being generated on open access data for this study and does not involve any clinical trial.
Informed consent statement: The patients were not required to obtain informed consent for this study as the dataset is available on the open access Kaggle website.
Conflict-of-interest statement: All the authors report having no relevant conflicts of interest for this article.
Data sharing statement: The supporting data may be provided by the corresponding author upon reasonable request.
CONSORT 2010 statement: The authors have read the CONSORT 2010 Statement, and the manuscript was prepared and revised according to the CONSORT 2010 Statement.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Edeh Michael Onyema, Lecturer, Head of Department, Mathematics and Computer Science, Coal City University, Coal City University Emene, Enugu 400102, Nigeria. michael.edeh@ccu.edu.ng
Received: June 30, 2022 Peer-review started: June 30, 2022 First decision: July 13, 2022 Revised: July 27, 2022 Accepted: November 21, 2022 Article in press: November 21, 2022 Published online: December 14, 2022 Processing time: 161 Days and 2.4 Hours
ARTICLE HIGHLIGHTS
Research background
Liver disease is a leading cause of mortality in the United States and is regarded as a life-threatening condition across the world. It is possible for people to develop liver disease at a young age.
Research motivation
Predicting liver disease with precision, accuracy, and reliability can be accomplished through the use of a modified eXtreme Gradient Boosting model with hyperparameter tuning in comparison to the chi-square automated interaction detection (CHAID) and classification and regression tree models.
Research objectives
This study was conducted with the aim of fulfilling various objectives. The first objective was identifying the symptoms of liver disease and their impact on the patient. The authors studied various machine learning approaches for predicting liver disease and evaluated the performance of decision tree algorithms in prediction of liver disease. The next objective was to propose a modified eXtreme Gradient Boosting model with a hyperparameter tuning mechanism. Finally, the performance of the proposed model was validated with the existing models.
Research methods
Hybrid eXtreme Gradient Boosting model with hyperparameter tuning was designed using data from patients who had liver disease and patients who were healthy.
Research results
The experimental results demonstrated that the accuracy level in the CHAID and classification and regression tree models were 71.36% and 73.24%, respectively. The proposed model was designed with the aim of gaining a sufficient level of accuracy. Hence, 93.65% accuracy was achieved in our proposed model.
Research conclusions
The existing machine learning models, i.e. the CHAID model and the classification and regression tree model, do not achieve a high enough accuracy level. The proposed model predicted liver disease with 93.65% accuracy. This model has real-time adaptability and cost-effectiveness in liver disease prediction.
Research perspectives
The proposed model can better predict liver-related disease by identifying the disease causes and suggesting better treatment options.