Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes

doi:10.12998/wjcc.v11.i33.7951

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 11, Issue 33

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (4203)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Figures (1-4) series, Tables (1-5) series.

Item

Count

PDF

133

WORD

HTML

1935

Figures (1-4)

523

Tables (1-5)

412

Sum=3024

Publishing Process of This Article

The chart showing Browse series, Download series.

Item

Count

Browse

205

Download

815

Sum=1020

Nov 26, 2023 (publication date) through Aug 28, 2025

Times Cited of This Article

Times Cited (0)

Journal Information of This Article

Publication Name

World Journal of Clinical Cases

ISSN

2307-8960

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Retrospective Cohort Study

World J Clin Cases. Nov 26, 2023; 11(33): 7951-7964
Published online Nov 26, 2023. doi: 10.12998/wjcc.v11.i33.7951

Table 1 Participant demographics

Variables	mean ± SD	N
Age	67.38 ± 9.69	556
Body mass index	26.16 ± 3.9	556
Duration of diabetes	13.69 ± 7.94	556
Systolic blood pressure	131.14 ± 15.42	493
Diastolic blood pressure	73.32 ± 10.15	493
Hemoglobin	12.92 ± 1.68	444
Triglyceride	153.74 ± 45.85	539
Glycated hemoglobin	7.79 ± 1.36	538
High density lipoprotein cholesterol	122.65 ± 74.34	535
Low density lipoprotein cholesterol	49.65 ± 14.75	498
Alanine aminotransferase	23.87 ± 13.94	537
Creatinine	1.16 ± 1	536
Microalbumin creatinine ratio	194.18 ± 733.73	526
Homeostasis assessment-insulin resistance	0.63 ± 0.34	366
Homeostasis assessment-insulin secretion	1.71 ± 0.37	366

Table 2 Participant demographics – sex, smoking and sum stressed score

	N (%)	N
Sex		556
0	287 (51.62)
1	269 (48.38)
Smoking		310
0	202 (65.16)
1	108 (34.84)
Sum stressed score		556
0	180 (32.37)
1	376 (67.63)

Table 3 Summary of the values of the hyperparameters for the best random forest, classification and regression tree, Naïve Byer’s classifier, eXtreme gradient boosting

Methods	Hyperparameters	Best value	Meaning
RF	Mtry	8	The number of random features used in each tree
RF	Ntree	500	The number of trees in forest
CART	Minispilt	20	The minimum number of observations required to attempt a split in a node
	Minibucket	7	The minimum number of observations in a terminal node
	Maxdepth	10	The maximum depth of any node in the final tree
	Xval	10	Number of cross-validations
	Cp	0.03588	Complexity parameter: The minimum improvement required in the model at each node
XGBoost	Nrounds	100	The number of tree model iterations
	Max_depth	3	The maximum depth of a tree
	Eta	0.4	Shrinkage coefficient of tree
	Gamma	0	The minimum loss reduction
	Subsample	0.75	Subsample ratio of columns when building each tree
	Colsample_bytree	0.8	Subsample ratio of columns when constructing each tree
	Rate_drop	0.5	Rate of trees dropped
	Skip_drop	0.05	Probability of skipping the dropout procedure during a boosting iteration
	Min_child_weight	1	The minimum sum of instance weight
NB	Fl	0	Adjustment of Laplace smoother
	Usekernel	TRUE	Using kernel density estimate for continuous variable versus a Gaussian density estimate
	Adjust	1	Adjust the bandwidth of the kernel density

CART: Classification and regression tree; RF: random forest; XGBoost: eXtreme gradient boosting; NB: Naïve Byes.

Table 4 The average performance of the LR, random forest, stellate ganglion block, classification and regression tree, and eXtreme gradient boosting methods

	Accuracy	Sensitivity	Specificity	AUC
LGR	0.685 ± 0.072	0.687 ± 0.152	0.683 ± 0.114	0.703 ± 0.057
CART	0.541 ± 0.074	0.546 ± 0.078	0.529 ± 0.670	0.540 ± 0.070
RF	0.707 ± 0.047	0.711 ± 0.100	0.678 ± 0.099	0.707 ± 0.037
XGBoost	0.712 ± 0.072	0.727 ± 0.139	0.674 ± 0.088	0.719 ± 0.062
NB	0.692 ± 0.059	0.702 ± 0.116	0.669 ± 0.090	0.704 ± 0.056

AUC: Area under the curve; LGR: Logistic regression; CART: Classification and regression tree; RF: Random forest; XGBoost: eXtreme gradient boosting; NB: Naïve Byes.

Table 5 The variable importance and rank of the importance of the risk factors derived from machine learning methods

Variables	RF	XGBoost	NB	Average	Rank
Sex	100.0 ± 0	100.0 ± 0	100.0 ± 0	100.0	1.0
Body mass index	54.2 ± 6.6	61.1 ± 14.7	86.2 ± 6.8	67.1	2.0
Age	13.1 ± 7.6	78.3 ± 13.2	67.9 ± 6.5	53.1	3.0
Low density lipoprotein cholesterol	30.4 ± 3.1	8.4 ± 12.8	71.0 ± 7.8	36.6	4.0
Glycated hemoglobin	15.4 ± 5.9	12.8 ± 11.9	48.0 ± 8.3	25.4	5.0
Smoking	12.2 ± 2.7	28.8 ± 9.2	34.5 ± 6.6	25.2	6.0
Creatinine	10.1 ± 2.3	5.3 ± 9.12	53.1 ± 7.3	22.8	7.0
Duration	6.3 ± 4.61	41.5 ± 8.6	10.1 ± 8.9	19.3	8.0
Hemoglobin	8.0 ± 4.16	16.6 ± 8.9	17.0 ± 5.7	13.8	9.0
Blood urine nitrogen	9.0 ± 8.15	6.5 ± 6.79	17.3 ± 9.6	11.0	10.0
Systolic blood pressure	4.2 ± 1.03	21.6 ± 5.1	6.4 ± 2.88	10.7	11.0
Triglyceride	5.4 ± 17.5	15.0 ± 4.4	11.1 ± 12.3	10.5	12.0
Microalbumin	4.3 ± 2.23	3.6 ± 3.83	22.7 ± 6.9	10.2	13.0
Diastolic blood pressure	2.5 ± 5.91	18.9 ± 3.7	5.6 ± 9.33	9.0	14.0
Alainine aminotransferase	3.2 ± 5.96	6.9 ± 3.90	13.0 ± 12.6	7.7	15.0
High density lipoprotein cholesterol	1.3 ± 3.60	9.8 ± 3.29	7.3 ± 8.41	6.1	16.0
HOMA-IR	5.7 ± 2.85	2.2 ± 2.52	10.2 ± 8.1	6.0	17.0
HOMA-B	4.3 ± 2.22	0.0 ± 0.00	7.4 ± 8.831	3.9	18.0

The most important sixth rank. RF: Random forest; XGBoost: eXtreme gradient boosting; NB: Naïve Byes; HOMA-IR: Homeostasis assessment insulin resistance; HOMA-B: homeostasis model assessment of beta-cell function.

Citation: Yang CC, Peng CH, Huang LY, Chen FY, Kuo CH, Wu CZ, Hsia TL, Lin CY. Comparison between multiple logistic regression and machine learning methods in prediction of abnormal thallium scans in type 2 diabetes. World J Clin Cases 2023; 11(33): 7951-7964
URL: https://www.wjgnet.com/2307-8960/full/v11/i33/7951.htm
DOI: https://dx.doi.org/10.12998/wjcc.v11.i33.7951