Chen IC, Chou LJ, Huang SC, Chu TW, Lee SS. Machine learning-based comparison of factors influencing estimated glomerular filtration rate in Chinese women with or without non-alcoholic fatty liver. World J Clin Cases 2024; 12(15): 2506-2521 [PMID: 38817230 DOI: 10.12998/wjcc.v12.i15.2506]
Corresponding Author of This Article
Shang-Sen Lee, PhD, Chief Physician, Department of Urology, Taichung Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, No. 88 Section 1, Fengxing Road, Tanzi Dist., Taichung 427, Taiwan. j520037@yahoo.com.tw
Research Domain of This Article
Health Care Sciences & Services
Article-Type of This Article
Retrospective Study
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Table 2 Comparison with SMAPE, RAE, RRSE, and RMSE between multiple linear regression and machine learning methods
NAFLD+ group with age
MAPE
SMAPE
RAE
RRSE
RMSE
Linear
0.139
0.132
0.845
0.842
13.959
SGB
0.138
0.131
0.841
0.834
13.825
XGBoost
0.139
0.132
0.845
0.842
13.946
Elasticnet
0.139
0.132
0.845
0.842
13.954
NAFLD- group with age
Linear
0.133
0.128
0.868
0.862
14.671
SGB
0.132
0.126
0.855
0.857
14.59
XGboost
0.132
0.126
0.853
0.857
14.58
Elasticnet
0.134
0.128
0.868
0.862
14.673
NAFLD+ group without age
Linear
0.154
0.14
0.872
0.897
15.606
SGB
0.153
0.139
0.865
0.888
15.444
XGboost
0.153
0.14
0.869
0.891
15.49
Elasticnet
0.154
0.14
0.872
0.897
15.596
NAFLD- group without age
Linear
0.134
0.13
0.905
0.906
15.149
SGB
0.133
0.129
0.895
0.892
14.915
XGboost
0.133
0.129
0.895
0.893
14.916
Elasticnet
0.134
0.13
0.904
0.905
15.119
Table 3 The average of the importance of risk factors derived from stochastic gradient boosting, random forest and extreme gradient boost, in NAFLD+ (Model 1, including age)
Variables
SGB
XGBoost
Elasticnet
Average
Rank
Age
100
100
15.08
71.69
1
Income
0.14
0
0
0.05
Body fat
3.94
1.9
3.27
3.04
Systolic blood pressure
1.37
0.67
1.01
1.02
Diastolic blood pressure
2.67
0.67
1.8
1.71
Leukocyte
0.72
0.33
4.32
1.79
Hemoglobin
1
0
0
0.33
Platelets
3.64
1.62
0.31
1.86
Fasting plasma glucose
2.92
1.18
0.66
1.59
Total bilirubin
6.86
2.29
40.24
16.46
6
Albumin
1.84
0.54
49.85
17.41
5
Globulin
0.28
0.29
17.83
6.13
Alkaline Phosphatase
0.99
0.15
0
0.38
Serum glutamic oxaloacetic transaminase
1.63
0.36
0
0.66
Serum glutamic pyruvic transaminase
3.83
1.94
0.82
2.20
Serum γ-glutamyl transpeptidase
1.89
1.33
0.48
1.23
Lactate dehydrogenase
23.21
23.25
0.86
15.77
Uric acid
27.03
24.05
60.42
37.17
2
Triglyceride
0.84
0
0.02
0.29
High density lipoprotein cholesterol
1.99
0.8
1.16
1.32
Low density lipoprotein cholesterol
1.78
0.14
0.1
0.67
Calcium
3.97
2.87
65
23.95
4
Phosphorus
0.79
0.36
0
0.38
Thyroid stimulating hormone
6.92
3.99
4.91
5.27
C-reactive protein
0.61
0.22
6.99
2.61
Forced expiratory volume in one second
6.63
3.41
100
36.68
3
Drink area
0.11
0
0.07
0.06
Smoke area
0.25
0
0
0.08
Betel nut area
0
0
0
0.00
Sport area
0.45
0.2
0.28
0.31
Sleep time
0.14
0
0
0.05
Marriage
0
0
1.99
0.66
Table 4 The average of the importance of risk factors derived from stochastic gradient boosting, random forest and extreme gradient boost, in NAFLD- (Model 1, including age)
Variables
SGB
XGBoost
Elasticnet
Average
Rank
Age
100
100
15.19
71.73
1
Income
0
0
1.41
0.47
Body fat
3.69
1.05
4.15
2.96
Systolic blood pressure
0.46
0.09
0.75
0.43
Diastolic blood pressure
4.19
3.09
2.96
3.41
Leukocyte
1.21
0.34
4.52
2.02
Hemoglobin
4.81
1
11.57
5.79
Platelets
2.61
1.06
0.2
1.29
Fasting plasma glucose
0.42
0.21
0.17
0.27
Total bilirubin
3.11
1.84
19.24
8.06
Albumin
2.28
1.34
69.53
24.38
4
Globulin
0.42
0.12
3.03
1.19
Alkaline Phosphatase
1.84
0.22
0.04
0.70
Serum glutamic oxaloacetic transaminase
0.52
0
0
0.17
Serum glutamic pyruvic transaminase
3.79
1.94
1.12
2.28
Serum γ-glutamyl transpeptidase
1.16
0
0.38
0.51
Lactate dehydrogenase
21.99
18.24
0.97
13.73
5
Uric acid
26.62
22.99
76.35
41.99
2
Triglyceride
1.59
0.31
0
0.63
High density lipoprotein cholesterol
1.4
0.25
0.65
0.77
Low density lipoprotein cholesterol
2.37
0.26
0.18
0.94
Calcium
1.66
0.64
29.96
10.75
6
Phosphorus
2.05
2.07
10.98
5.03
Thyroid stimulating hormone
11.86
8.69
5.02
8.52
C-reactive protein
0.42
0
2.73
1.05
Forced expiratory volume in one second
5.06
3.01
100
36.02
3
Drink area
0
0.25
0
0.08
Smoke area
0.37
0.17
0.79
0.44
Betel nut area
0
0
0
0.00
Sport area
1.13
0.65
1.55
1.11
Sleep time
0
0
0
0.00
Marriage
0.13
0
5.51
1.88
Table 5 The average of the importance of risk factors derived from stochastic gradient boosting, random forest and extreme gradient boost, in NAFLD+ (Model 2, excluding age)
Variables
SGB
XGBoost
Elasticnet
Average
Rank
Income
12.88
15.99
8.06
12.31
Body fat
22.25
16.83
3.85
14.31
5
Systolic blood pressure
11.29
9.24
0.41
6.98
Diastolic blood pressure
9.17
6.85
1.19
5.74
Leukocyte
1.73
0.58
1.9
1.40
Hemoglobin
2.57
0.31
4.27
2.38
Platelets
17.83
14.87
0.32
11.01
Fasting plasma glucose
9.72
6.71
0.2
5.54
Total bilirubin
9.6
3.56
28.65
13.94
Albumin
12.05
9.81
100
40.62
3
Globulin
1.85
1.25
10.62
4.57
Alkaline Phosphatase
4.28
0.99
0.06
1.78
Serum glutamic oxaloacetic transaminase
3.45
3.05
1.45
2.65
Serum glutamic pyruvic transaminase
16.27
11.92
1.57
9.92
Serum γ-glutamyl transpeptidase
1.14
0.65
0.28
0.69
Lactate dehydrogenase
100
100
0.6
66.87
1
Uric acid
50.16
45.66
30.15
41.99
2
Triglyceride
6.23
3.56
0.14
3.31
High density lipoprotein cholesterol
0.86
0.82
0.05
0.58
Low density lipoprotein cholesterol
9.6
6.73
0.46
5.60
Calcium
12.48
9.07
53.48
25.01
4
Phosphorus
0.79
1.87
1.48
1.38
Thyroid stimulating hormone
16.42
11.23
2.23
9.96
C-reactive protein
0
0
2.11
0.70
Forced expiratory volume in one second
39.39
44.32
38.15
40.62
3
Drink area
0
0
0.26
0.09
Smoke area
0
0
0
0.00
Betel nut area
0
0
0
0.00
Sport area
3.95
3.83
1.49
3.09
Sleep time
0.86
0.5
4.04
1.80
Marriage
0
0
2.38
0.79
Table 6 The average of the importance of risk factors derived from stochastic gradient boosting, random forest and extreme gradient boost, in NAFLD- (Model 2, excluding age)
Variables
SGB
XGBoost
Elasticnet
Average
Rank
Income
2.49
1.95
1.61
2.02
Body fat
7.57
2.68
1.48
3.91
Systolic blood pressure
28.66
30.68
0.59
19.98
6
Diastolic blood pressure
18.44
21.96
1.73
14.04
Leukocyte
9.07
5.26
6.63
6.99
Hemoglobin
12.51
1.95
3.14
5.87
Platelets
12.13
8.68
0.23
7.01
Fasting plasma glucose
6.67
4.96
0.7
4.11
Total bilirubin
9.07
5.16
3.37
5.87
Albumin
21.95
20.16
100
47.37
4
Globulin
1.32
0
0
0.44
Alkaline Phosphatase
2.75
0
0.02
0.92
Serum glutamic oxaloacetic transaminase
4.06
3.15
1.59
2.93
Serum glutamic pyruvic transaminase
9.09
6.48
1.74
5.77
Serum γ-glutamyl transpeptidase
1.11
0
0.11
0.41
Lactate dehydrogenase
100
100
0.63
66.88
1
Uric acid
66.92
63.24
36.68
55.61
2
Triglyceride
12.39
8
0.34
6.91
High density lipoprotein cholesterol
2.64
0.67
0.17
1.16
Low density lipoprotein cholesterol
14.18
10.11
0.51
8.27
Calcium
3.82
2.4
21.04
9.09
Phosphorus
5.65
6.52
0.1
4.09
Thyroid stimulating hormone
34.32
24.16
2.56
20.35
5
C-reactive protein
2.68
0
0
0.89
Forced expiratory volume in one second
61.02
64.4
26.93
50.78
3
Drink area
1.01
1.21
0
0.74
Smoke area
2.24
1.22
0.85
1.44
Betel nut area
0
0
0
0.00
Sport area
13.32
11.44
2.46
9.07
Sleep time
0.79
0.47
10.69
3.98
Marriage
8.88
7.85
30.55
15.76
Citation: Chen IC, Chou LJ, Huang SC, Chu TW, Lee SS. Machine learning-based comparison of factors influencing estimated glomerular filtration rate in Chinese women with or without non-alcoholic fatty liver. World J Clin Cases 2024; 12(15): 2506-2521