Copyright
©The Author(s) 2023.
World J Clin Cases. Nov 26, 2023; 11(33): 7951-7964
Published online Nov 26, 2023. doi: 10.12998/wjcc.v11.i33.7951
Published online Nov 26, 2023. doi: 10.12998/wjcc.v11.i33.7951
Table 1 Participant demographics
Variables | mean ± SD | N |
Age | 67.38 ± 9.69 | 556 |
Body mass index | 26.16 ± 3.9 | 556 |
Duration of diabetes | 13.69 ± 7.94 | 556 |
Systolic blood pressure | 131.14 ± 15.42 | 493 |
Diastolic blood pressure | 73.32 ± 10.15 | 493 |
Hemoglobin | 12.92 ± 1.68 | 444 |
Triglyceride | 153.74 ± 45.85 | 539 |
Glycated hemoglobin | 7.79 ± 1.36 | 538 |
High density lipoprotein cholesterol | 122.65 ± 74.34 | 535 |
Low density lipoprotein cholesterol | 49.65 ± 14.75 | 498 |
Alanine aminotransferase | 23.87 ± 13.94 | 537 |
Creatinine | 1.16 ± 1 | 536 |
Microalbumin creatinine ratio | 194.18 ± 733.73 | 526 |
Homeostasis assessment-insulin resistance | 0.63 ± 0.34 | 366 |
Homeostasis assessment-insulin secretion | 1.71 ± 0.37 | 366 |
Table 2 Participant demographics – sex, smoking and sum stressed score
N (%) | N | |
Sex | 556 | |
0 | 287 (51.62) | |
1 | 269 (48.38) | |
Smoking | 310 | |
0 | 202 (65.16) | |
1 | 108 (34.84) | |
Sum stressed score | 556 | |
0 | 180 (32.37) | |
1 | 376 (67.63) |
Table 3 Summary of the values of the hyperparameters for the best random forest, classification and regression tree, Naïve Byer’s classifier, eXtreme gradient boosting
Methods | Hyperparameters | Best value | Meaning |
RF | Mtry | 8 | The number of random features used in each tree |
Ntree | 500 | The number of trees in forest | |
CART | Minispilt | 20 | The minimum number of observations required to attempt a split in a node |
Minibucket | 7 | The minimum number of observations in a terminal node | |
Maxdepth | 10 | The maximum depth of any node in the final tree | |
Xval | 10 | Number of cross-validations | |
Cp | 0.03588 | Complexity parameter: The minimum improvement required in the model at each node | |
XGBoost | Nrounds | 100 | The number of tree model iterations |
Max_depth | 3 | The maximum depth of a tree | |
Eta | 0.4 | Shrinkage coefficient of tree | |
Gamma | 0 | The minimum loss reduction | |
Subsample | 0.75 | Subsample ratio of columns when building each tree | |
Colsample_bytree | 0.8 | Subsample ratio of columns when constructing each tree | |
Rate_drop | 0.5 | Rate of trees dropped | |
Skip_drop | 0.05 | Probability of skipping the dropout procedure during a boosting iteration | |
Min_child_weight | 1 | The minimum sum of instance weight | |
NB | Fl | 0 | Adjustment of Laplace smoother |
Usekernel | TRUE | Using kernel density estimate for continuous variable versus a Gaussian density estimate | |
Adjust | 1 | Adjust the bandwidth of the kernel density |
Table 4 The average performance of the LR, random forest, stellate ganglion block, classification and regression tree, and eXtreme gradient boosting methods
Accuracy | Sensitivity | Specificity | AUC | |
LGR | 0.685 ± 0.072 | 0.687 ± 0.152 | 0.683 ± 0.114 | 0.703 ± 0.057 |
CART | 0.541 ± 0.074 | 0.546 ± 0.078 | 0.529 ± 0.670 | 0.540 ± 0.070 |
RF | 0.707 ± 0.047 | 0.711 ± 0.100 | 0.678 ± 0.099 | 0.707 ± 0.037 |
XGBoost | 0.712 ± 0.072 | 0.727 ± 0.139 | 0.674 ± 0.088 | 0.719 ± 0.062 |
NB | 0.692 ± 0.059 | 0.702 ± 0.116 | 0.669 ± 0.090 | 0.704 ± 0.056 |
Table 5 The variable importance and rank of the importance of the risk factors derived from machine learning methods
Variables | RF | XGBoost | NB | Average | Rank |
Sex | 100.0 ± 0 | 100.0 ± 0 | 100.0 ± 0 | 100.0 | 1.0 |
Body mass index | 54.2 ± 6.6 | 61.1 ± 14.7 | 86.2 ± 6.8 | 67.1 | 2.0 |
Age | 13.1 ± 7.6 | 78.3 ± 13.2 | 67.9 ± 6.5 | 53.1 | 3.0 |
Low density lipoprotein cholesterol | 30.4 ± 3.1 | 8.4 ± 12.8 | 71.0 ± 7.8 | 36.6 | 4.0 |
Glycated hemoglobin | 15.4 ± 5.9 | 12.8 ± 11.9 | 48.0 ± 8.3 | 25.4 | 5.0 |
Smoking | 12.2 ± 2.7 | 28.8 ± 9.2 | 34.5 ± 6.6 | 25.2 | 6.0 |
Creatinine | 10.1 ± 2.3 | 5.3 ± 9.12 | 53.1 ± 7.3 | 22.8 | 7.0 |
Duration | 6.3 ± 4.61 | 41.5 ± 8.6 | 10.1 ± 8.9 | 19.3 | 8.0 |
Hemoglobin | 8.0 ± 4.16 | 16.6 ± 8.9 | 17.0 ± 5.7 | 13.8 | 9.0 |
Blood urine nitrogen | 9.0 ± 8.15 | 6.5 ± 6.79 | 17.3 ± 9.6 | 11.0 | 10.0 |
Systolic blood pressure | 4.2 ± 1.03 | 21.6 ± 5.1 | 6.4 ± 2.88 | 10.7 | 11.0 |
Triglyceride | 5.4 ± 17.5 | 15.0 ± 4.4 | 11.1 ± 12.3 | 10.5 | 12.0 |
Microalbumin | 4.3 ± 2.23 | 3.6 ± 3.83 | 22.7 ± 6.9 | 10.2 | 13.0 |
Diastolic blood pressure | 2.5 ± 5.91 | 18.9 ± 3.7 | 5.6 ± 9.33 | 9.0 | 14.0 |
Alainine aminotransferase | 3.2 ± 5.96 | 6.9 ± 3.90 | 13.0 ± 12.6 | 7.7 | 15.0 |
High density lipoprotein cholesterol | 1.3 ± 3.60 | 9.8 ± 3.29 | 7.3 ± 8.41 | 6.1 | 16.0 |
HOMA-IR | 5.7 ± 2.85 | 2.2 ± 2.52 | 10.2 ± 8.1 | 6.0 | 17.0 |
HOMA-B | 4.3 ± 2.22 | 0.0 ± 0.00 | 7.4 ± 8.831 | 3.9 | 18.0 |
- Citation: Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11(33): 7951-7964
- URL: https://www.wjgnet.com/2307-8960/full/v11/i33/7951.htm
- DOI: https://dx.doi.org/10.12998/wjcc.v11.i33.7951