Published online Jan 15, 2024. doi: 10.4239/wjd.v15.i1.43
Peer-review started: August 24, 2023
First decision: November 9, 2023
Revised: November 25, 2023
Accepted: December 25, 2023
Article in press: December 25, 2023
Published online: January 15, 2024
Processing time: 141 Days and 9.9 Hours
Among older adults, type 2 diabetes mellitus (T2DM) is widely recognized as one of the most prevalent diseases. Diabetic nephropathy (DN) is a frequent com
To investigate the factors that impact T2DM complicated with DN and utilize this information to develop a predictive model.
The clinical data of 210 patients diagnosed with T2DM and admitted to the First People’s Hospital of Wenling between August 2019 and August 2022 were retrospectively analyzed. According to whether the patients had DN, they were divided into the DN group (complicated with DN) and the non-DN group (without DN). Multivariate logistic regression analysis was used to explore factors affecting DN in patients with T2DM. The data were randomly split into a training set (n = 147) and a test set (n = 63) in a 7:3 ratio using a random function. The training set was used to construct the nomogram, decision tree, and random forest models, and the test set was used to evaluate the prediction performance of the model by comparing the sensitivity, specificity, accuracy, recall, precision, and area under the receiver operating characteristic curve.
Among the 210 patients with T2DM, 74 (35.34%) had DN. The validation dataset showed that the accuracies of the nomogram, decision tree, and random forest models in predicting DN in patients with T2DM were 0.746, 0.714, and 0.730, respectively. The sensitivities were 0.710, 0.710, and 0.806, respectively; the specificities were 0.844, 0.875, and 0.844, respectively; the area under the receiver operating characteristic curve (AUC) of the patients were 0.811, 0.735, and 0.850, respectively. The Delong test results revealed that the AUC values of the decision tree model were lower than those of the random forest and nomogram models (P < 0.05), whereas the difference in AUC values of the random forest and column-line graph models was not statistically significant (P > 0.05).
Among the three prediction models, random forest performs best and can help identify patients with T2DM at high risk of DN.
Core Tip: Machine learning is widely used in medical prediction models. Logistic regression (nomogram), decision tree, and random forest models are three important machine learning techniques. However, few studies have compared the predictive efficacies of these three models in patients with type 2 diabetes mellitus and diabetic nephropathy. Here, we established three risk prediction models-nomogram, decision tree, and random forest-for comparison and found that random forest has the strongest combined predictive power.