Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database

doi:10.4254/wjh.v13.i10.1417

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 13, Issue 10

This Article

Academic Content and Language Evaluation of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (9946)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Figures (1-2) series, Tables (1-4) series.

Item

Count

PDF

450

WORD

166

HTML

5527

Figures (1-2)

642

Tables (1-4)

980

Sum=7765

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

617

Download

1564

Sum=2181

Oct 27, 2021 (publication date) through Aug 28, 2025

Times Cited of This Article

Times Cited (20)

Journal Information of This Article

Publication Name

World Journal of Hepatology

ISSN

1948-5182

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Retrospective Study

World J Hepatol. Oct 27, 2021; 13(10): 1417-1427
Published online Oct 27, 2021. doi: 10.4254/wjh.v13.i10.1417

Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database

Amporn Atsawarungruangkit, Passisd Laoveeravat, Kittichai Promrat

Amporn Atsawarungruangkit, Kittichai Promrat, Division of Gastroenterology, Warren Alpert Medical School, Brown University, Providence, RI 02903, United States

Passisd Laoveeravat, Division of Digestive Diseases and Nutrition, University of Kentucky College of Medicine, Lexington, KY 40536, United States

Kittichai Promrat, Division of Gastroenterology and Hepatology, Providence VA Medical Center, Providence, RI 02908, United States

Author contributions: Atsawarungruangkit A and Laoveeravat P contributed equally to this work including study design, data analysis, result interpretation, and manuscript writing; Promrat K critically revised the manuscript and provided supervision.

Institutional review board statement: The National Health and Nutrition Examination Survey protocol was approved by the National Center for Health Statistics Research Ethics Review Board (Hyattsville, MD, United States).

Informed consent statement: In NHANES III, the consent form was signed by participants in the survey.

Conflict-of-interest statement: No conflict of interest exists.

Data sharing statement: The dataset used in this manuscript is NHANES III, which is publicly available dataset.

Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/

Corresponding author: Amporn Atsawarungruangkit, MD, Academic Fellow, Instructor, Research Fellow, Division of Gastroenterology, Warren Alpert Medical School, Brown University, 593 Eddy Street, POB 240, Providence, RI 02903, United States. amporn_atsawarungruangkit@brown.edu

Received: March 7, 2021
Peer-review started: March 7, 2021
First decision: May 2, 2021
Revised: May 11, 2021
Accepted: September 19, 2021
Article in press: September 19, 2021
Published online: October 27, 2021
Processing time: 229 Days and 11.1 Hours

Abstract

BACKGROUND

Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, affecting over 30% of the United States population. Early patient identification using a simple method is highly desirable.

AIM

To create machine learning models for predicting NAFLD in the general United States population.

METHODS

Using the NHANES 1988-1994. Thirty NAFLD-related factors were included. The dataset was divided into the training (70%) and testing (30%) datasets. Twenty-four machine learning algorithms were applied to the training dataset. The best-performing models and another interpretable model (i.e., coarse trees) were tested using the testing dataset.

RESULTS

There were 3235 participants (n = 3235) that met the inclusion criteria. In the training phase, the ensemble of random undersampling (RUS) boosted trees had the highest F1 (0.53). In the testing phase, we compared selective machine learning models and NAFLD indices. Based on F1, the ensemble of RUS boosted trees remained the top performer (accuracy 71.1% and F1 0.56) followed by the fatty liver index (accuracy 68.8% and F1 0.52). A simple model (coarse trees) had an accuracy of 74.9% and an F1 of 0.33.

CONCLUSION

Not every machine learning model is complex. Using a simpler model such as coarse trees, we can create an interpretable model for predicting NAFLD with only two predictors: fasting C-peptide and waist circumference. Although the simpler model does not have the best performance, its simplicity is useful in clinical practice.

Keywords: Artificial intelligence; Machine learning; Non-alcoholic fatty liver disease; Fatty liver; United States population; NHANES

Core Tip: A simple method with a good accuracy for identifying patients with non-alcoholic fatty liver disease is highly desirable. Among 24 machine learning models, the ensemble of random undersampling boosted trees was the top performer (accuracy 71.1% and F1 0.56). A simple model (coarse trees) with only two predictors (fasting C-peptide and waist circumference) had an accuracy of 74.9% and an F1 of 0.33. Not every machine learning model is complex. Using a simple model such as coarse trees, physicians can easily integrate machine learning model into their practice without any software implementation.