1
|
Fu XY, Song YQ, Lin JY, Wang Y, Wu WD, Peng JB, Ye LP, Chen K, Li SW. Developing a Prognostic Model for Primary Biliary Cholangitis Based on a Random Survival Forest Model. Int J Med Sci 2024; 21:61-69. [PMID: 38164345 PMCID: PMC10750344 DOI: 10.7150/ijms.88481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Accepted: 10/11/2023] [Indexed: 01/03/2024] Open
Abstract
Background: Primary biliary cholangitis (PBC) is a rare autoimmune liver disease with few effective treatments and a poor prognosis, and its incidence is on the rise. There is an urgent need for more targeted treatment strategies to accurately identify high-risk patients. The use of stochastic survival forest models in machine learning is an innovative approach to constructing a prognostic model for PBC that can improve the prognosis by identifying high-risk patients for targeted treatment. Method: Based on the inclusion and exclusion criteria, the clinical data and follow-up data of patients diagnosed with PBC-associated cirrhosis between January 2011 and December 2021 at Taizhou Hospital of Zhejiang Province were retrospectively collected and analyzed. Data analyses and random survival forest model construction were based on the R language. Result: Through a Cox univariate regression analysis of 90 included samples and 46 variables, 17 variables with p-values <0.1 were selected for initial model construction. The out-of-bag (OOB) performance error was 0.2094, and K-fold cross-validation yielded an internal validation C-index of 0.8182. Through model selection, cholinesterase, bile acid, the white blood cell count, total bilirubin, and albumin were chosen for the final predictive model, with a final OOB performance error of 0.2002 and C-index of 0.7805. Using the final model, patients were stratified into high- and low-risk groups, which showed significant differences with a P value <0.0001. The area under the curve was used to evaluate the predictive ability for patients in the first, third, and fifth years, with respective results of 0.9595, 0.8898, and 0.9088. Conclusion: The present study constructed a prognostic model for PBC-associated cirrhosis patients using a random survival forest model, which accurately stratified patients into low- and high-risk groups. Treatment strategies can thus be more targeted, leading to improved outcomes for high-risk patients.
Collapse
Affiliation(s)
- Xin-yu Fu
- Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Ya-qi Song
- Department of Gastroenterology, Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Jia-ying Lin
- Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Yi Wang
- Department of Gastroenterology, Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Wei-dan Wu
- Department of Gastroenterology, Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Jin-bang Peng
- Department of Gastroenterology, Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Li-ping Ye
- Key Laboratory of Minimally Invasive Techniques & Rapid Rehabilitation of Digestive System Tumor of Zhejiang Province, Taizhou Hospital Affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| | - Kai Chen
- Taizhou Chinese Traditional Hospital, Jiaojiang, Zhejiang, China
| | - Shao-wei Li
- Department of Gastroenterology, Taizhou Hospital of Zhejiang Province affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
- Key Laboratory of Minimally Invasive Techniques & Rapid Rehabilitation of Digestive System Tumor of Zhejiang Province, Taizhou Hospital Affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
- Institute of Digestive Disease, Taizhou Hospital of Zhejiang Province Affiliated to Wenzhou Medical University, Linhai, Zhejiang, China
| |
Collapse
|
2
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Overinterpretation of findings in machine learning prediction model studies in oncology: a systematic review. J Clin Epidemiol 2023; 157:120-133. [PMID: 36935090 PMCID: PMC11913775 DOI: 10.1016/j.jclinepi.2023.03.012] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 03/03/2023] [Accepted: 03/14/2023] [Indexed: 03/19/2023]
Abstract
OBJECTIVES In biomedical research, spin is the overinterpretation of findings, and it is a growing concern. To date, the presence of spin has not been evaluated in prognostic model research in oncology, including studies developing and validating models for individualized risk prediction. STUDY DESIGN AND SETTING We conducted a systematic review, searching MEDLINE and EMBASE for oncology-related studies that developed and validated a prognostic model using machine learning published between 1st January, 2019, and 5th September, 2019. We used existing spin frameworks and described areas of highly suggestive spin practices. RESULTS We included 62 publications (including 152 developed models; 37 validated models). Reporting was inconsistent between methods and the results in 27% of studies due to additional analysis and selective reporting. Thirty-two studies (out of 36 applicable studies) reported comparisons between developed models in their discussion and predominantly used discrimination measures to support their claims (78%). Thirty-five studies (56%) used an overly strong or leading word in their title, abstract, results, discussion, or conclusion. CONCLUSION The potential for spin needs to be considered when reading, interpreting, and using studies that developed and validated prognostic models in oncology. Researchers should carefully report their prognostic model research using words that reflect their actual results and strength of evidence.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; Meta-Research Centre, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, UK, ST5 5BG
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium; Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands; EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford OX3 7LD, UK; NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
3
|
Petinrin OO, Saeed F, Toseef M, Liu Z, Basurra S, Muyide IO, Li X, Lin Q, Wong KC. Machine learning in metastatic cancer research: Potentials, possibilities, and prospects. Comput Struct Biotechnol J 2023; 21:2454-2470. [PMID: 37077177 PMCID: PMC10106342 DOI: 10.1016/j.csbj.2023.03.046] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/26/2023] [Accepted: 03/27/2023] [Indexed: 03/31/2023] Open
Abstract
Cancer has received extensive recognition for its high mortality rate, with metastatic cancer being the top cause of cancer-related deaths. Metastatic cancer involves the spread of the primary tumor to other body organs. As much as the early detection of cancer is essential, the timely detection of metastasis, the identification of biomarkers, and treatment choice are valuable for improving the quality of life for metastatic cancer patients. This study reviews the existing studies on classical machine learning (ML) and deep learning (DL) in metastatic cancer research. Since the majority of metastatic cancer research data are collected in the formats of PET/CT and MRI image data, deep learning techniques are heavily involved. However, its black-box nature and expensive computational cost are notable concerns. Furthermore, existing models could be overestimated for their generality due to the non-diverse population in clinical trial datasets. Therefore, research gaps are itemized; follow-up studies should be carried out on metastatic cancer using machine learning and deep learning tools with data in a symmetric manner.
Collapse
Affiliation(s)
| | - Faisal Saeed
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
| | - Muhammad Toseef
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Zhe Liu
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| | - Shadi Basurra
- DAAI Research Group, Department of Computing and Data Science, School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK
| | | | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Jilin, China
| | - Qiuzhen Lin
- School of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
- Hong Kong Institute for Data Science, City University of Hong Kong, Kowloon Tong, Kowloon, Hong Kong SAR
| |
Collapse
|
4
|
Predicting Overall Survival in Patients with Nonmetastatic Gastric Signet Ring Cell Carcinoma: A Machine Learning Approach. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:4862376. [PMID: 36148015 PMCID: PMC9489421 DOI: 10.1155/2022/4862376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 08/16/2022] [Accepted: 08/24/2022] [Indexed: 11/30/2022]
Abstract
Background and Aims Accurate prediction is essential for the survival of patients with nonmetastatic gastric signet ring cell carcinoma (GSRC) and medical decision-making. Current models rely on prespecified variables, limiting their performance and not being suitable for individual patients. Our study is aimed at developing a more precise model for predicting 1-, 3-, and 5-year overall survival (OS) in patients with nonmetastatic GSRC based on a machine learning approach. Methods We selected 2127 GSRC patients diagnosed from 2004 to 2014 from the Surveillance, Epidemiology, and End Results (SEER) database and then randomly partitioned them into a training and validation cohort. We compared the performance of several machine learning-based models and finally chose the eXtreme gradient boosting (XGBoost) model as the optimal method to predict the OS in patients with nonmetastatic GSRC. The model was assessed using the receiver operating characteristic curve (ROC). Results In the training cohort, for predicting OS rates at 1-, 3-, and 5-year, the AUCs of the XGBoost model were 0.842, 0.831, and 0.838, respectively, while in the testing cohort, the AUCs of 1-, 3-, and 5-year OS rates were 0.749, 0.823, and 0.829, respectively. Besides, the XGBoost model also performed better when compared with the American Joint Committee on Cancer (AJCC) stage. The performance for this model was stably maintained when stratified by age and ethnicity. Conclusion The XGBoost-based model accurately predicts the 1-, 3-, and 5-year OS in patients with nonmetastatic GSRC. Machine learning is a promising way to predict the survival outcomes of tumor patients.
Collapse
|
5
|
Li R, Zhang C, Du K, Dan H, Ding R, Cai Z, Duan L, Xie Z, Zheng G, Wu H, Ren G, Dou X, Feng F, Zheng J. Analysis of Prognostic Factors of Rectal Cancer and Construction of a Prognostic Prediction Model Based on Bayesian Network. Front Public Health 2022; 10:842970. [PMID: 35784233 PMCID: PMC9247333 DOI: 10.3389/fpubh.2022.842970] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundThe existing prognostic models of rectal cancer after radical resection ignored the relationships among prognostic factors and their mutual effects on prognosis. Thus, a new modeling method is required to remedy this defect. The present study aimed to construct a new prognostic prediction model based on the Bayesian network (BN), a machine learning tool for data mining, clinical decision-making, and prognostic prediction.MethodsFrom January 2015 to December 2017, the clinical data of 705 patients with rectal cancer who underwent radical resection were analyzed. The entire cohort was divided into training and testing datasets. A new prognostic prediction model based on BN was constructed and compared with a nomogram.ResultsA univariate analysis showed that age, Carcinoembryonic antigen (CEA), Carbohydrate antigen19-9 (CA19-9), Carbohydrate antigen 125 (CA125), preoperative chemotherapy, macropathology type, tumor size, differentiation status, T stage, N stage, vascular invasion, KRAS mutation, and postoperative chemotherapy were associated with overall survival (OS) of the training dataset. Based on the above-mentioned variables, a 3-year OS prognostic prediction BN model of the training dataset was constructed using the Tree Augmented Naïve Bayes method. In addition, age, CEA, CA19-9, CA125, differentiation status, T stage, N stage, KRAS mutation, and postoperative chemotherapy were identified as independent prognostic factors of the training dataset through multivariate Cox regression and were used to construct a nomogram. Then, based on the testing dataset, the two models were evaluated using the receiver operating characteristic (ROC) curve. The results showed that the area under the curve (AUC) of ROC of the BN model and nomogram was 80.11 and 74.23%, respectively.ConclusionThe present study established a BN model for prognostic prediction of rectal cancer for the first time, which was demonstrated to be more accurate than a nomogram.
Collapse
Affiliation(s)
- Ruikai Li
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Chi Zhang
- Department of Industrial Engineering, School of Mechantronics, Northwestern Polytechnical University, Xi'an, China
| | - Kunli Du
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Hanjun Dan
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Ruxin Ding
- Department of Cell Biology and Genetics, Medical College of Yan'an University, Yan'an, China
| | - Zhiqiang Cai
- Department of Industrial Engineering, School of Mechantronics, Northwestern Polytechnical University, Xi'an, China
| | - Lili Duan
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Zhenyu Xie
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Gaozan Zheng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Hongze Wu
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
| | - Guangming Ren
- Graduate Work Department, Xi'an Medical University, Xi'an, China
| | - Xinyu Dou
- Graduate Work Department, Xi'an Medical University, Xi'an, China
| | - Fan Feng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
- Fan Feng
| | - Jianyong Zheng
- Department of Gastrointestinal Surgery, Xijing Hospital, Fourth Military Medical University, Xi'an, China
- *Correspondence: Jianyong Zheng
| |
Collapse
|
6
|
Dhiman P, Ma J, Andaur Navarro CL, Speich B, Bullock G, Damen JAA, Hooft L, Kirtley S, Riley RD, Van Calster B, Moons KGM, Collins GS. Methodological conduct of prognostic prediction models developed using machine learning in oncology: a systematic review. BMC Med Res Methodol 2022; 22:101. [PMID: 35395724 PMCID: PMC8991704 DOI: 10.1186/s12874-022-01577-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2021] [Accepted: 03/18/2022] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Describe and evaluate the methodological conduct of prognostic prediction models developed using machine learning methods in oncology. METHODS We conducted a systematic review in MEDLINE and Embase between 01/01/2019 and 05/09/2019, for studies developing a prognostic prediction model using machine learning methods in oncology. We used the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) statement, Prediction model Risk Of Bias ASsessment Tool (PROBAST) and CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) to assess the methodological conduct of included publications. Results were summarised by modelling type: regression-, non-regression-based and ensemble machine learning models. RESULTS Sixty-two publications met inclusion criteria developing 152 models across all publications. Forty-two models were regression-based, 71 were non-regression-based and 39 were ensemble models. A median of 647 individuals (IQR: 203 to 4059) and 195 events (IQR: 38 to 1269) were used for model development, and 553 individuals (IQR: 69 to 3069) and 50 events (IQR: 17.5 to 326.5) for model validation. A higher number of events per predictor was used for developing regression-based models (median: 8, IQR: 7.1 to 23.5), compared to alternative machine learning (median: 3.4, IQR: 1.1 to 19.1) and ensemble models (median: 1.7, IQR: 1.1 to 6). Sample size was rarely justified (n = 5/62; 8%). Some or all continuous predictors were categorised before modelling in 24 studies (39%). 46% (n = 24/62) of models reporting predictor selection before modelling used univariable analyses, and common method across all modelling types. Ten out of 24 models for time-to-event outcomes accounted for censoring (42%). A split sample approach was the most popular method for internal validation (n = 25/62, 40%). Calibration was reported in 11 studies. Less than half of models were reported or made available. CONCLUSIONS The methodological conduct of machine learning based clinical prediction models is poor. Guidance is urgently needed, with increased awareness and education of minimum prediction modelling standards. Particular focus is needed on sample size estimation, development and validation analysis methods, and ensuring the model is available for independent validation, to improve quality of machine learning based clinical prediction models.
Collapse
Affiliation(s)
- Paula Dhiman
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK.
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
| | - Jie Ma
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Constanza L Andaur Navarro
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Benjamin Speich
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
- Basel Institute for Clinical Epidemiology and Biostatistics, Department of Clinical Research, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Garrett Bullock
- Nuffield Department of Orthopaedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Oxford, UK
| | - Johanna A A Damen
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Lotty Hooft
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Shona Kirtley
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
| | - Richard D Riley
- Centre for Prognosis Research, School of Medicine, Keele University, Staffordshire, ST5 5BG, UK
| | - Ben Van Calster
- Department of Development and Regeneration, KU Leuven, Leuven, Belgium
- Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, the Netherlands
- EPI-centre, KU Leuven, Leuven, Belgium
| | - Karel G M Moons
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Cochrane Netherlands, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gary S Collins
- Centre for Statistics in Medicine, Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, OX3 7LD, UK
- NIHR Oxford Biomedical Research Centre, Oxford University Hospitals NHS Foundation Trust, Oxford, UK
| |
Collapse
|
7
|
Kaur I, Doja M, Ahmad T. Data Mining and Machine Learning in Cancer Survival Research: An Overview and Future Recommendations. J Biomed Inform 2022; 128:104026. [DOI: 10.1016/j.jbi.2022.104026] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 02/07/2022] [Accepted: 02/09/2022] [Indexed: 12/29/2022]
|
8
|
Tang M, Gao L, He B, Yang Y. Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort. Cancer Manag Res 2022; 14:25-35. [PMID: 35018119 PMCID: PMC8742582 DOI: 10.2147/cmar.s340739] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 12/01/2021] [Indexed: 12/16/2022] Open
Abstract
Purpose The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with non-metastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
Collapse
Affiliation(s)
- Mo Tang
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Lihao Gao
- Smart City Business Unit, Baidu Inc., Beijing, People's Republic of China
| | - Bin He
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| | - Yufei Yang
- Oncology Department, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing, People's Republic of China
| |
Collapse
|
9
|
Gensheimer MF, Aggarwal S, Benson KRK, Carter JN, Henry AS, Wood DJ, Soltys SG, Hancock S, Pollom E, Shah NH, Chang DT. Automated model versus treating physician for predicting survival time of patients with metastatic cancer. J Am Med Inform Assoc 2021; 28:1108-1116. [PMID: 33313792 DOI: 10.1093/jamia/ocaa290] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/09/2020] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE Being able to predict a patient's life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians' performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model. MATERIALS AND METHODS A machine learning model was trained using 14 600 metastatic cancer patients' data to predict each patient's distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015-2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots. RESULTS The radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63-0.81), 0.77 (0.73-0.81), and 0.68 (0.65-0.71), respectively. CONCLUSIONS The machine learning model's predictions were more accurate than those of the treating physician or a traditional model.
Collapse
Affiliation(s)
| | - Sonya Aggarwal
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Kathryn R K Benson
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Justin N Carter
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - A Solomon Henry
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Douglas J Wood
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Scott G Soltys
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Steven Hancock
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Erqi Pollom
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Nigam H Shah
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Daniel T Chang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| |
Collapse
|
10
|
Yakar M, Etiz D. Artificial intelligence in rectal cancer. Artif Intell Gastroenterol 2021; 2:10-26. [DOI: 10.35712/aig.v2.i2.10] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Revised: 03/03/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
|
11
|
Wang M, Jing X, Cao W, Zeng Y, Wu C, Zeng W, Chen W, Hu X, Zhou Y, Cai X. A non-lab nomogram of survival prediction in home hospice care patients with gastrointestinal cancer. BMC Palliat Care 2020; 19:185. [PMID: 33287827 PMCID: PMC7722330 DOI: 10.1186/s12904-020-00690-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2020] [Accepted: 11/24/2020] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Patients suffering from gastrointestinal cancer comprise a large group receiving home hospice care in China, however, little is known about the prediction of their survival time. This study aimed to develop a gastrointestinal cancer-specific non-lab nomogram predicting survival time in home-based hospice. METHODS We retrospectively studied the patients with gastrointestinal cancer from a home-based hospice between 2008 and 2018. General baseline characteristics, disease-related characteristics, and related assessment scale scores were collected from the case records. The data were randomly split into a training set (75%) for developing a predictive nomogram and a testing set (25%) for validation. A non-lab nomogram predicting the 30-day and 60-day survival probability was created using the least absolute shrinkage and selection operator (LASSO) Cox regression. We evaluated the performance of our predictive model by means of the area under receiver operating characteristic curve (AUC) and calibration curve. RESULTS A total of 1618 patients were included and divided into two sets: 1214 patients (110 censored) as training dataset and 404 patients (33 censored) as testing dataset. The median survival time for overall included patients was 35 days (IQR, 17-66). The 5 most significant prognostic variables were identified to construct the nomogram among all 28 initial variables, including Karnofsky Performance Status (KPS), abdominal distention, edema, quality of life (QOL), and duration of pain. In training dataset validation, the AUC at 30 days and 60 days were 0.723 (95% CI, 0.694-0.753) and 0.733 (95% CI, 0.702-0.763), respectively. Similarly, the AUC value was 0.724 (0.673-0.774) at 30 days and 0.725 (0.672-0.778) at 60 days in the testing dataset validation. Further, the calibration curves revealed good agreement between the nomogram predictions and actual observations in both the training and testing dataset. CONCLUSION This non-lab nomogram may be a useful clinical tool. It needs prospective multicenter validation as well as testing with Chinese clinicians in charge of hospice patients with gastrointestinal cancer to assess acceptability and usability.
Collapse
Affiliation(s)
- Muqing Wang
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Xubin Jing
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Weihua Cao
- Department of Hospice, The First Affiliated Hospital of Shantou University Medical College, Shantou, Guangdong, 515041, People's Republic of China
| | - Yicheng Zeng
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Chaofen Wu
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Weilong Zeng
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Wenxia Chen
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Xi Hu
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Yanna Zhou
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China
| | - Xianbin Cai
- Department of Gastroenterology, The First Affiliated Hospital of Shantou University Medical College, 57 Changping Road, Shantou, Guangdong, 515041, People's Republic of China.
| |
Collapse
|
12
|
Zhao B, Gabriel RA, Vaida F, Eisenstein S, Schnickel GT, Sicklick JK, Clary BM. Using machine learning to construct nomograms for patients with metastatic colon cancer. Colorectal Dis 2020; 22:914-922. [PMID: 31991031 PMCID: PMC8722819 DOI: 10.1111/codi.14991] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/21/2020] [Indexed: 02/06/2023]
Abstract
AIM Patients with synchronous colon cancer metastases have highly variable overall survival (OS), making accurate predictive models challenging to build. We aim to use machine learning to more accurately predict OS in these patients and to present this predictive model in the form of nomograms for patients and clinicians. METHODS Using the National Cancer Database (2010-2014), we identified right colon (RC) and left colon (LC) cancer patients with synchronous metastases. Each primary site was split into training and testing datasets. Nomograms predicting 3- year OS were created for each site using Cox proportional hazard regression with lasso regression. Each model was evaluated by both calibration (comparison of predicted vs observed OS) and validation (degree of concordance as measured by the c-index) methodologies. RESULTS A total of 11 018 RC and 8346 LC patients were used to construct and validate the nomograms. After stratifying each model into five risk groups, the predicted OS was within the 95% CI of the observed OS in four out of five risk groups for both the RC and LC models. Externally validated c-indexes at 3 years for the RC and LC models were 0.794 and 0.761, respectively. CONCLUSIONS Utilization of machine learning can result in more accurate predictive models for patients with metastatic colon cancer. Nomograms built from these models can assist clinicians and patients in the shared decision-making process of their cancer care.
Collapse
Affiliation(s)
- Beiqun Zhao
- Department of Surgery, University of California San Diego
| | | | - Florin Vaida
- Department of Family Medicine and Public Health, University of California San Diego
| | | | | | | | - Bryan M. Clary
- Department of Surgery, University of California San Diego
| |
Collapse
|