Atsawarungruangkit A, Laoveeravat P, Promrat K. Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database. World J Hepatol 2021; 13(10): 1417-1427 [PMID: 34786176 DOI: 10.4254/wjh.v13.i10.1417]
Corresponding Author of This Article
Amporn Atsawarungruangkit, MD, Academic Fellow, Instructor, Research Fellow, Division of Gastroenterology, Warren Alpert Medical School, Brown University, 593 Eddy Street, POB 240, Providence, RI 02903, United States. amporn_atsawarungruangkit@brown.edu
Research Domain of This Article
Gastroenterology & Hepatology
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Amporn Atsawarungruangkit, Kittichai Promrat, Division of Gastroenterology, Warren Alpert Medical School, Brown University, Providence, RI 02903, United States
Passisd Laoveeravat, Division of Digestive Diseases and Nutrition, University of Kentucky College of Medicine, Lexington, KY 40536, United States
Kittichai Promrat, Division of Gastroenterology and Hepatology, Providence VA Medical Center, Providence, RI 02908, United States
Author contributions: Atsawarungruangkit A and Laoveeravat P contributed equally to this work including study design, data analysis, result interpretation, and manuscript writing; Promrat K critically revised the manuscript and provided supervision.
Institutional review board statement: The National Health and Nutrition Examination Survey protocol was approved by the National Center for Health Statistics Research Ethics Review Board (Hyattsville, MD, United States).
Informed consent statement: In NHANES III, the consent form was signed by participants in the survey.
Conflict-of-interest statement: No conflict of interest exists.
Data sharing statement: The dataset used in this manuscript is NHANES III, which is publicly available dataset.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Amporn Atsawarungruangkit, MD, Academic Fellow, Instructor, Research Fellow, Division of Gastroenterology, Warren Alpert Medical School, Brown University, 593 Eddy Street, POB 240, Providence, RI 02903, United States. amporn_atsawarungruangkit@brown.edu
Received: March 7, 2021 Peer-review started: March 7, 2021 First decision: May 2, 2021 Revised: May 11, 2021 Accepted: September 19, 2021 Article in press: September 19, 2021 Published online: October 27, 2021 Processing time: 229 Days and 11.1 Hours
ARTICLE HIGHLIGHTS
Research background
Nonalcoholic fatty liver disease (NAFLD) is the most common chronic liver disease that can progress to more severe liver disease.
Research motivation
Early patient identification using a simple method is highly desirable for preventing the progression of NAFLD.
Research objectives
To create machine learning models for predicting NAFLD in the general United States population.
Research methods
This study was designed as a retrospective cohort by using the NHANES 1988-1994. Adults (20 years and above in age) with gradable ultrasound results were included in this study.
Research results
Based on F1, the ensemble of ensemble of random undersampling boosted trees was the top performer (accuracy 71.1% and F1 0.56) while a simple model (coarse trees) had an accuracy of 74.9% and an F1 of 0.33.
Research conclusions
Although a simpler model such as coarse trees was not the top performer, it consisted of only two predictors: fasting C-peptide and waist circumference. Its simplicity is useful in clinical practice.
Research perspectives
The findings from this study can facilitate clinical decision-making for clinicians and also allow researchers to investigate the developed machine learning models. This will lead to proper investigation and treatment selection for specific individuals at risk, helping to maximize healthcare resource utilization.