Editorial Open Access
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Clin Cases. Apr 16, 2025; 13(11): 100966
Published online Apr 16, 2025. doi: 10.12998/wjcc.v13.i11.100966
Predicting outcomes using neural networks in the intensive care unit
Gumpeny R Sridhar, Department of Endocrinology and Diabetes, Endocrine and Diabetes Centre, Visakhapatnam 530002, India
Venkat Yarabati, Chief Architect, Data and Insights, AGILISYS, London W127RZ, United Kingdom
Lakshmi Gumpeny, Department of Internal Medicine, Gayatri Vidya Parishad Institute of Healthcare and Medical Technology, Visakhapatnam 530048, India
ORCID number: Gumpeny R Sridhar (0000-0002-7446-1251); Lakshmi Gumpeny (0000-0002-1368-745X).
Author contributions: Sridhar GR and Venkat Y designed the concept and contributed to the writing; Lakshmi G contributed to the writing and editing of the manuscript; all of the authors read and approved the final version of the manuscript to be published.
Conflict-of-interest statement: All authors declare no conflict of interest in publishing the manuscript.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Gumpeny R Sridhar, FRCP (Hon), MD, Consultant Physician-Scientist, Department of Endocrinology and Diabetes, Endocrine and Diabetes Centre, 15-12-15 Krishnanagar, Visakhapatnam 530002, India. grsridhar@hotmail.com
Received: August 31, 2024
Revised: November 21, 2024
Accepted: December 12, 2024
Published online: April 16, 2025
Processing time: 116 Days and 17.5 Hours

Abstract

Patients in intensive care units (ICUs) require rapid critical decision making. Modern ICUs are data rich, where information streams from diverse sources. Machine learning (ML) and neural networks (NN) can leverage the rich data for prognostication and clinical care. They can handle complex nonlinear relationships in medical data and have advantages over traditional predictive methods. A number of models are used: (1) Feedforward networks; and (2) Recurrent NN and convolutional NN to predict key outcomes such as mortality, length of stay in the ICU and the likelihood of complications. Current NN models exist in silos; their integration into clinical workflow requires greater transparency on data that are analyzed. Most models that are accurate enough for use in clinical care operate as ‘black-boxes’ in which the logic behind their decision making is opaque. Advances have occurred to see through the opacity and peer into the processing of the black-box. In the near future ML is positioned to help in clinical decision making far beyond what is currently possible. Transparency is the first step toward validation which is followed by clinical trust and adoption. In summary, NNs have the transformative ability to enhance predictive accuracy and improve patient management in ICUs. The concept should soon be turning into reality.

Key Words: Large language models; Hallucinations; Supervised learning; Unsupervised learning; Convoluted neural networks; Black-box; Workflow

Core Tip: Healthcare workers in intensive care units undertake swift and critical decisions, based on physiological and clinical data recorded in digital form, leading to information overload. Neural network models and machine learning can analyse the dense information and can potentially aid in decision making by patient triage, preventing treatment errors and providing insights into possible outcomes. Practical, legal and ethical issues need to be addressed as with other areas of healthcare. But research and its quick translation strongly suggests its imminent incorporation into routine clinical workflow.



INTRODUCTION

In the intensive care unit (ICU) prioritizing the patients and starting appropriate treatment could mean the difference between life and death. Large amounts of data on patients’ clinical condition are streamed; a heavy patient load makes it difficult for the clinician to take a considered judgement based on many rapidly changing data. This leads to patients receiving delayed treatment and worse outcomes. Therefore the physician must be sensitive to the time between the arrival of the critically ill patient and initiation of treatment[1]. Neural network (NN) and machine learning (ML) have been employed to aid the clinician undertaking these decisions. Although Alan Turing, the father of artificial intelligence (AI) proposed the skeleton of ML[2] to enable computers to learn from analyzing existing data, it took several decades for advances in computational power and creation of databases to enable practical implementation of ML in clinical care[3].

NN AND STATISTICAL METHODS

Earlier, outcomes in ICU were predicted by the use of statistical methods in the form of scoring systems. How are NN related to conventional statistical techniques (Figure 1)? There is considerable overlap because they are both interlinked[4]; only their applications vary. NNs align with discriminant analysis and regression; traditional methods are supplanted by NNs in areas of prediction and classification, though there are certain conceptual differences as well as similarities[4]. NNs have the advantage of being able to automatically approximate nonlinear mathematical function, which is particularly useful when the relation between the variables is complex or unknown. In general NN models equal or even outperform other methods[4]. Dreiseitl and Ohno-Machado[5] provide a technical review of the similarities in pattern recognition between NN (k-nearest neighbors, decision trees and support vector machines) and statistical pattern recognition. NNs were developed as ‘generalizations of mathematical models of human cognition through biological neurons’[6].

Figure 1
Figure 1  Concept of neural networks.
ML

Broadly, ML is an automated process to discern or learn patterns of data to classify and predict[7]. It comes under the subfield of AI. Weidener and Fischer[8] referred to the 1955 proposal for introducing the concept of AI to McCarthy et al[9] as ‘the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it’. It was put forward by an interdisciplinary group of scientists who went on to become household names in the field of AI and ML: (1) Shannon CE was a mathematician at the Bell Telephone Laboratories; (2) Minsky ML was a Harvard Junior Fellow in Mathematics and Neurology; (3) Rochester N was the Manager of Information Research at the IBM Corporation; and (4) McCarthy J was Assistant Professor of Mathematics at Darmouth College[9].

Classification of ML

Broadly, ML can be classified into supervised learning techniques, unsupervised learning and reinforcement learning[7].

Supervised learning refers to the use of data with known outcomes to classify or predict outcomes from new data[10].

In unsupervised learning, the model works on its own to identify patterns and information in unlabeled data[11]. It is possible to uncover unanticipated connections among different features in a dataset.

Reinforced learning (RL), allows machines to learn through trial and error. This method has been gaining traction. RL agents map the optimal paths to action, which obtains the highest reward. Though they may not affect the current reward, they can affect the subsequent rewards. In short, RL tries imitate human learning[12].

A number of other ML algorithms were developed, viz decision tree, support vector machine, k-nearest neighbor, random forest, regularized regression, naïve Bayes and convoluted NN. These are classified according to their complexity. A review of these methods was published in 2023[13]. The different models are not used in isolation. For optimal outcome, a number of them are combined, e.g., convoluted NN for image processing [e.g. computed tomography (CT) images of the lungs], k-nearest neighbour for clustering of physiological variables (such as blood pressure and oxygen saturation). This integrated model was was useful to predict the outcome of coronavirus disease 2019 (COVID-19) infection admissions.

In all these methods, four characteristics are necessary for efficient data mining processing: (1) High quality data; (2) Accurate data; (3) Adequate sample size; and (4) The right tool[14].

Training of datasets in AI

The critical component in the development of NN and AI models is the data on which they are trained. Biases tend to creep in at this stage; it is challenging to get data that sufficiently represent all patients. Efforts must be made to include under-represented groups to improve performance and generalizability of AI models. Failure to do so can propagate societal biases, which results in misdiagnosing certain patient groups, which are underrepresented in datasets, thereby amplifying inequalities. Prevention of bias can be addressed by: (1) Participant-centered development of AI algorithms with a focus on representative participants; (2) Sharing data and incorporating data standards that support interoperability; and (3) Sharing code, including that of AI algorithms that allows synthesis of underrepresented data. Lack of such care in the training set reinforces bias, which can lead to misdiagnoses, lack of generalization and even death.

The following procedures address the issue of potential bias in training datasets.

Recognition of bias sources: The sources of bias are explicitly identified, such as socioeconomic and biological differences, disparities in resource access, and biases in data collection (e.g., missing or skewed data). Also highlighted algorithmic biases stemming from errors in design, overadjustment, or evaluation processes.

Class imbalance handling: To mitigate class imbalance, methods like skewness-based transformations and balanced random forest algorithms are selected. These techniques ensure that minority classes (e.g., rare conditions or less-represented demographic groups) are adequately represented during training.

Benchmarking and validation: Benchmarking frameworks are used to validate models against diverse datasets. This includes external validation in heterogeneous clinical settings to assess generalizability and reduce the risk of overfitting to biased training data.

Feature selection and preprocessing: Preprocessing steps include careful feature engineering and normalization to avoid introducing biases inherent in raw data. Features are selected with domain knowledge to minimize irrelevant or biased influences.

Transparent model development: The need for transparency in model development is stressed, including sharing performance metrics like calibration and discrimination to identify and address bias.

Call for broader and representative datasets: It is essential to collect more diverse and representative datasets to reduce systemic biases and improve the fairness of models.

Incorporation of feedback loops: Iterative refinement using feedback from clinical use cases helps identify and correct biases that manifest during real-world applications.

CONCEPT OF AI IN ICU

Before AI, there was human intelligence—one of the remarkable outcomes of evolution. AI models are built to represent complexity for predicting outcomes[15]. Advances in AI and ML dealing with a large number of variables can exceed human performance in some areas of medicine. Complex non-linear relationships between independent and dependent variables are best detected by NNs[16].

Attractive as the potential use of AI and NN in ICUs appears to be, there are certain principles to adhere to, viz benchmarking. These are dependent, but not circumscribed by real world factors (e.g. socioeconomic and biological differences, access to resources), application (e.g. prejudice against the application, reinforcement of existing defects and unfair application), digital data (bias in measurement and recording, missing data), bias in preparation of variables and biases in AI algorithm design (e.g. errors in design, over and under adjustment and evaluation bias)[15]. Published studies on benchmarking in critical care outcomes were reviewed by Atallah et al[17] in 2023. Advances are required in feature types, model selection, preprocessing and validation. Further research must address class imbalance, generalizability, improved calibration, fairness and long-term validation[17].

In the ICU, traditional outcome quality indicators are mortality, complications, length of stay, readmission rate to ICU, ventilator outcomes and patient-reported outcomes. Other outcomes include medication adherence, social support or mobility before admission into ICU.

Currently, raw data that are used for model development are obtained by hand-crafting demographics, input diagnoses, labs and vital signs. Specific models employing skewness-based transformations[18] and balanced random forest algorithms[19] can correct class imbalance[17].

Table 1 shows the models employed in the studies conducted; Table 2 is a more comprehensive list of models that are available.

Table 1 Neural network models discussed in the manuscript.
Model/scoring system
Primary use case
Strengths
Limitations
Convolutional neural networksImage-based tasks (e.g., computed tomography scans and X-rays)High accuracy in spatial feature extractionComputationally expensive
Recurrent neural networksTime-series predictions (e.g., sepsis progression)Captures temporal dependencies effectivelyPotentially high computational cost
Multilayer perceptronNonlinear relationship modeling (e.g., ICU mortality)Flexible, integrates with hybrid systemsProne to overfitting if not regularized
Balanced random forestsHandling imbalanced datasetsInterpretable, robust to class imbalanceRequires careful tuning of hyperparameters
Sequential Organ Failure Assessment Assessing organ failure severityWidely validated, clinically interpretableLimited to scoring; no predictive modeling
Acute Physiology and Chronic Health Evaluation Evaluating ICU patient mortality riskComprehensive, includes chronic health factorsLimited in real-time adaptability
Table 2 List of currently available neural network models.
Model
Variations
Use cases
Strengths
Weaknesses
Multilayer perceptronN/AClassification and regression tasksSimple architecture and good for baseline modelsNot ideal for spatial or sequential data. Can overfit with high dimensional data
Convolutional neural networksAlexNetImage recognitionCaptures spatial hierarchiesComputationally intensive
VGGNetObject detectionEffective for image processingRequires large datasets
ResNetComplex computer vision tasksResidual learning avoids vanishing gradientRequires higher computation
InceptionImage recornition with lower computationsEfficient use of resourcesArchitecture complexity
MobileNetMobile and embedded vision applicationsLightweight and efficientTrade-off in accuracy for efficiency
RNNLSTMLanguage modelingHandles sequential dataVanishing gradient problem
Gated recurrent unitTime series forecastingSimplified version of LSTMLess powerful for complex tasks
Bidirectional RNNSpeech recognitionConsiders past and future contextComputationally expensive
GANDCGANImage generationGenerates high quality dataTraining instability
CycleGANUnsupervised image-to-image translationAdvances data augmentationMode collapse issues
StyleGANSynthetic image creation for design tasksGenerates photorealistic imagesComputationally expensive
AutoencodersVariational autoencodersDimensionality reduction, generative tasksEffective for feature extractionBlurry reconstructions
Denoising autoencodersAnomaly detection and noise reductionRobust against noisy inputsLimited generative capability
TransformersBidirectional Encoder Representations from TransformersContextual embeddings for natural language processing tasksCaptures long-range dependenciesHigh computational requirements
General Purpose Transformers seriesGenerative tasks (e.g. text generation)Powerful generative abilitiesRequires vast amounts of training data
T5Text summarization, translationTask-agnostic and flexibleComputationally intensive
Graph neural networksGraph convolutional networksSocial network analysis, biological modelingHandles graph-structured dataScalability issues
Graph attention networks Recommendation systemsCaptures relational informationComplex architecture
GraphSAGEMolecular modeling, protein interactionsEffective for inductive learningRequires large-scale graph sampling
Self organizing maps N/AData visualizationIntuitive mapping and visualizationLess effective for high-dimensional data
Boltzmann machinesRestricted boltzmann machinesCollaborative filtering, dimensionality reductionProbabilistic feature learning Difficult to train
Deep belief networksFeature learning and pretrainingEffective for unsupervised learningComputationally expensive
Deep reinforcement learning modelsDeep Q networksGame playing (e.g., AlphaGo)Learns optimal policiesSample inefficiency
Proximal policy optimizationRobotics, autonomous navigationHandles high-dimensional inputsRequires hyperparameter tuning
Actor critic methodsAutonomous systemsBalances policy and value learningMay require extensive exploration

In addition to employing balanced random forest algorithms to address class imbalance, here and in earlier investigations, additional measures such as the ones listed below to ensure the outlier cases were addressed systematically, enhancing the model’s reliability and applicability in the ICU setting.

Data augmentation

Synthetic data generation techniques, such as Synthetic Minority Over-sampling Technique and its variants, were used to create synthetic samples for under-represented rare cases. This helped balance the dataset while preserving the characteristics of the minority class.

Cross-validation

A rigorous cross-validation strategy ensured the model's generalizability and minimized overfitting. Stratified k-fold cross-validation was specifically chosen to maintain the distribution of classes across training and validation sets.

Feature selection and engineering

Careful feature selection avoided overfitting by reducing noise and irrelevant features. Additionally, domain-specific feature engineering enhanced the representation of rare cases without artificially inflating their significance.

Regularization techniques

Models were configured with regularization techniques such as L1 and L2 penalties to discourage overly complex models that might overfit to the majority class.

Evaluation metrics

Performance was monitored using a range of metrics beyond accuracy, such as precision, recall, F1-score, and area under the receiver operating characteristic and precision-recall curves. This ensured the model’s effectiveness in identifying rare cases.

Ensemble methods

In addition to the balanced random forest algorithm, ensemble techniques like AdaBoost and Gradient Boosting were evaluated to combine the strengths of multiple models and reduce bias toward the majority class.

Threshold tuning

Decision thresholds were carefully adjusted post-training to optimize sensitivity and specificity, particularly for rare cases. This was guided by clinical priorities and outcome-specific requirements.

Validation on external datasets

The model was validated on external datasets, where available, to confirm its robustness and effectiveness in generalizing to unseen data.

COMPARISON OF CONVENTIONAL METHODS VS AI IN PREDICTING ICU SURVIVAL

Two reports were published in 2022 comparing conventional methods and AI in predicting survival in the ICU. Mirzakhani et al[20] used a retrospective study of data from patients admitted in ICU (n = 840). Data from medical records were obtained about conventional severity classification (Acute Physiology and Chronic Health Evaluation (APACHE) II and APACHE IV, Sequential Organ Failure Assessment (SOFA) score and Simplified Acute Physiology Score (SAPS II). These scores were developed using statistical methods, with their inherent assumptions and limitations. They are meant to predict patient survival in ICU based on the severity of the illness, as assessed by the severity of physiological instability, and the severity of vital organ dysfunction[20]. SOFA model assesses the function of respiratory, hepatic, renal, cardiovascular, coagulation and nervous systems. SAPS II is based on 12 physiological variables, age, type of admission and three more related to underlying diseases. APACHE II employs physiological variables along with age and chronic diseases in patients admitted to ICU. APACHE IV evolved from a reformulation of the earlier equations.

The NN model consisted of the multilayer perceptron NN and classification and regression tree[20]. Variables were chosen using the univariate logistic regression; those showing statistically significant relation with the outcome (‘hospital mortality’) as the dependent variable were entered as the selected variable in the AI model. By dividing the sample to training (70%) and test (30%) set, AI models [multilayer Perceptron (MLP) NNs and highly active antiretroviral therapy, Distress Thermometers] were developed. The best model was selected based on the performance. To develop the architecture of NN, a feedforward network with a back propagation learning method with two connected hidden connected layers was used. Both conventional scores and NN models were equally good in predicting the ICU outcome, but MLP NN models outperformed others in external validation[20]. This was attributed to the greater efficiency of NN to develop nonlinear models compared to logistic regression. Further work must be done by carrying out prospective multi-centric studies in more than one centre to unravel the black-box approach of MLP NN that is currently used[20].

Barboi et al[21] from Indianapolis and Chicago (2022) performed a literature review and meta-analysis of articles which compared binary models of classification using ML with severity of disease scores for predicting mortality in ICU. They determined which model showed superior performance so that clinicians receive guidance on their performance and validity. A systematic search was carried out on publications between 2000 and 2020. Among 461 abstracts that were screened, full text was assessed in 66 (14.3%) articles. The review included 20 (4.3%) studies. They concluded that ML based models can predict ICU mortality and serve as an alternative to traditional scoring methods. Although the range of performance of ML models was superior, there was much heterogeneity, which did not allow generalization of the results. This needs externally validated models that are tested in clinical practice and updated to the patient population and the practice environment[21].

The following criteria were suggested for model developers: (1) Statement of purpose, i.e, whether they are intended for clinical practice; (2) If so, full transparency must be provided including clinical setting, steps of model development and external validation of models to allow generalizability; and (3) Metrices of models performance must be shared, including measures of calibration, discrimination and classification[21].

APPLICATION OF AI MODELS IN ICU

Predictive models using NN were used in a number of scenarios both in critical care units and in surgical procedures.

COVID-19 infection

COVID-19 infection swept through the globe suddenly and without warning, catching the healthcare world unawares. The need for quick decision making in managing the patients was never more urgent. Predictive models were developed to aid in the process. Beginning with elementary NN models based on a small sample size[22], more sophisticated prototypes were developed. Staging of patients based on imaging of the lung and clinical and biochemical parameters used AI to improve assessment of the disease outcome. An automatic method for disease quantification was obtained from computerized tomographic images of solid organs (lung, breast and heart) and integrated with known clinical and biochemical markers. CNN model was combined with multi-omic signature of COVID-specific parameters; the outcome was short term and long term prognosis[23]. This identified patients with severe disease, so that healthcare resources were optimally utilized.

The selection of ML models is based on their appropriateness for the particular characteristics of each clinical problem, and a tricky balancing act targeting predictive accuracy, generalizability, and interpretability. The areas considered for this selection include.

Data type and structure: (1) CNNs were preferred for COVID-19-related problems as they lent themselves well to image-based data, such as CT scans of the lungs, and are more efficient at extracting spatial features; and (2) The adopted RNNs modelled various problems on time-series data, including but not limited to patient recovery trajectory monitoring and sepsis progression prediction, because of their capability of temporal dependency capturing.

Complexity of the problem: (1) Most problems where there was a complex incorporation of nonlinear relationships, such as in the case of predicting mortality in ICUs, were best addressed by multilayer perceptron networks or hybrids that combined aspects of neural and decision-tree approaches; and (2) Simple tasks, where simplicity and interpretability are needed, such as preliminary feature selection or class imbalance correction, were done using Decision Trees and Random Forests.

Outcome specificity and target variables: The model types were chosen based on a particular clinical outcome. For instance, integrating CNNs with multi-omics data on COVID-19 prognosis was essential; similarly, integration of imaging with molecular data was also important.

Generalizability needs: For scenarios where adaptation is crucial across different ICU settings, such as arrhythmia detection or in the reduction of alarm fatigue, models are developed based on architectures such as CNNs and their hybrids.

Performance benchmarks: Selection was performed basing the choice on various comparative metrics, including accuracy, sensitivity, and specificity. The models selected indicated better predictive capabilities in the selected clinical context.

Interpretability requirements: Tasks requiring actionable insights, such as sepsis detection, were modelled using interpretability mechanisms ranging from attention-based RNNs to transparent hybrid methods.

A comparative assessment of artificial NN with clinical scoring assessed the risk of patients with COVID-19 getting admitted to ICU. In a prospective multi-center study 296 subjects with COVID-19 pneumonia were enrolled, and were split into general ward care group (n = 238) and ICU-admission group (n = 58). The NN model had similar predictive ability compared to the traditional scoring system (Patient Satisfaction Index), but needed fewer input variables. However another complex NN model predicted outcomes with higher accuracy, which can be used as a better prediction model[24]. But it is unlikely to find widespread clinical application because of the ‘black box’ approach of the model.

Cardiac arrhythmia prediction in ICU

It is critical to detect cardiac arrhythmias in the ICU for proper timely care. Traditional monitors give a high rate of false alarm due to physical displacement of sensors, resulting in alarm fatigue in healthcare workers. NN models were used to improve the detection of life-threatening arrhythmias[25]. Attempts were made to lower the rate by the use of electrocardiograph (ECG) signs alone or together with arterial blood pressure signal employing wavelet transform, data mining and ML approaches. These were effective when the type of alarm was known, but failed with unknown arrhythmias. The authors therefore used a hybrid-CNN method that combined conventional features obtained by physicians with features learned from CNN. The advantage included using the best of both methods. In all 953 independent alarms were annotated from 410 critical care subjects. The hybrid system outperformed either only CNN or only feature-based methods; additionally, this can be adapted to multiple modules and is flexible to work on different duration signals.

To ensure the generalizability of hybrid CNN models for arrhythmia detection in ICU settings, the article highlights several strategies.

Incorporation of diverse features: The hybrid model combines conventional features extracted by clinicians with features learned from the CNN. This leverages human expertise with data-driven insights, allowing the model to adapt to scenarios, including unknown arrhythmias.

Use of flexible architectures: It employs a modular hybrid structure that can integrate multiple types of input signals (e.g., ECG and arterial blood pressure). This flexibility allows the model to adapt to varying ICU setups and alarm types, accommodating diverse clinical contexts.

Extensive and diverse training data: To prepare for unknown arrhythmias, the model was trained on a comprehensive dataset that includes a variety of arrhythmia types and scenarios. This improves its ability to recognize new patterns not explicitly encountered during training.

Post-hoc model interpretation: Interpretability tools can be applied to understand the decision-making process when unknown arrhythmias are detected. This aids clinicians in evaluating the model's outputs and making informed adjustments.

Validation in heterogeneous settings: The hybrid CNN model is validated across multiple ICU environments and patient cohorts to assess its robustness and reliability. This ensures that it can perform consistently under different operational and patient conditions.

Continuous learning and updates: Incorporating mechanisms for ongoing learning allows the model to update itself with new data, ensuring it evolves to handle emergent arrhythmias over time.

Together, they collectively enhance the model's ability to maintain high performance while being adaptable to unknown or novel arrhythmias in various ICU settings.

Sepsis

Sepsis is a serious condition, leading to high risk of death, because of difficulty in diagnosing and delay in treatment. Antibiotics must be initiated early, for which it must be identified early. ML method was used to differentiate patients with different trajectories to sepsis.

Patients with sepsis have altered pharmacokinetics resulting in unpredictable responses to administered antibiotics; in addition they are also likely to be resistant to antibiotics. These can be approached by therapeutic drug monitoring to ensure that antibiotic concentrations remain at target exposures throughout treatment[26]. Widespread application of therapeutic drug monitoring is hampered by limited availability, complexities in operations as well as costs. Therefore physicians rely on clinical judgement to decide on the patient groups likely to derive the greatest benefit from their employment.

Large-dimensional and heterogeneous data are difficult to process. ML helps navigate through these complex situations. Information from analysis of the multidimensional temporal data aids in monitoring recovery trajectory and responses to treatment. This is achieved by measuring, assessing and adjusting drug levels to achieve the desired result and to avoid serious side effects. Parameters can be refined to obtain better predictive ability[26].

INTERPRETING THE ‘BLACK BOX’

Unless the logic behind the processing in the black box is known, they cannot be used in clinical practice.

A confluence of features extracted by CNN classifier on baseline CT images was employed. This along with laboratory and clinical data was fed into the model; a multidimensional scoring system was able to give clinical decision support to healthcare workers[27]. The following variables were used in building the system: (1) Sex; (2) Age; (3) Body mass index; (4) Comorbidities; (5) Vital signs at admission; (6) Arterial blood gas analysis; (7) Complete blood count; and (8) Any additional laboratory results. It was interpretable at two levels: (1) The global; and (2) The single patient level. An understanding of the logic that goes behind the decision process is required before it can be routinely introduced into clinical workflow.

Bio-statistical methods can give scores to quantify the likelihood of adverse outcomes and assess the effectiveness of treatment, deep learning (DL) is more capable in object recognition, which is useful to detect patterns in patient data and predict outcomes. This has limitated interpretability, leading to a trade off between predictive accuracy and interpretability. A multi-scale deep convolutional architecture was developed to improve the predictability in ICU using more ‘transparent’ methods of analysis. This was a visually interpretable method to predict mortality in the ICU, named ISeeU[28]. Employing input variables such as type of admission, chronic disease, Glasgow Coma Scale, PO2 and diastolic blood pressure it performed well. Performance between training and validation set was close, showing that it has good generalization properties without serious overfitting[28].

ISeeU model is used as a visually interpretable method to predict mortality in ICU. It balances interpretability and complexity by employing a multi-scale deep convolutional architecture. This enhances transparency while maintaining accuracy. Calibration and discrimination matrices, along with performance comparison between training and validation datasets, ensure generalizability and reduce overfitting. These benchmarks assess the model's interpretability without compromising predictive strength.

Ho et al[29] presented a method to understand the model’s decision making process by assessing which input features were responsible for predictions in a recurrent NN model with the use of electronic medical data. They employed a Learned Binary Masks to identify inputs contributing to the predictions of a many-to-many RNN model. The lean body mass (LBM) and Mernel SHAP methods were considered complementary, not competing because evaluations depend on clinical insights and experience, which are often not quantifiable. This proof of concept study provided information on which input features gave the most significant contributions to assessing the risk of mortality predictions of a known RNN model using electromagnetic radiation of children who were critically ill[29].

More recent studies improved the interpretability of the ‘black box’ analysis of predictive models. Strickler et al[30] described a global interpretation mechanism for DL networks to predict sepsis by understanding the human-interpretability of the algorithms in sepsis detection. In principle, a balance must be struck between model complexity and accuracy. While there is an inverse relation between model complexity and human-interpretability, a trade-off is necessary. The authors proposed a post-hoc, model-agnostic interpretable mechanism to comprehensively understand sepsis-related concepts during training by a black-box ML method[30]. The authors propose to extend their work by utilizing other datasets and exploring supervised fine-tuning vs a model trained from scratch.

Both LBM and SHAP were valuable tools for interpreting RNN predictions, with LBM being more effective for capturing temporal dependencies critical to ICU events and SHAP providing a broader feature-importance perspective. These complementary strengths underline their potential for joint application in future studies.

Comparison framework: Both LBM and SHAP were evaluated on their ability to identify key features contributing to predictions of critical ICU events (e.g., sepsis onset, acute respiratory distress).

The evaluation focused on their alignment with clinical intuition and their ability to highlight actionable insights for clinicians.

Effectiveness in capturing temporal dependencies: (1) LBM: By masking irrelevant features at specific time points, LBM excelled in emphasizing temporal patterns and dynamic changes in patient data. This was particularly useful in identifying sequences of events leading up to critical conditions, such as a gradual decline in oxygen saturation before an acute event; and (2) SHAP: SHAP provided a more global perspective on feature importance across the entire dataset but was less precise in capturing time-specific dependencies compared to LBM.

Clinical interpretability: (1) LBM: Clinicians reported that LBM outputs were more intuitive for understanding time-series patterns, as the masks highlighted critical windows of observation. This facilitated real-time decision-making during ICU monitoring; and (2) SHAP: While SHAP offered detailed insights into feature contributions, the static nature of the explanations sometimes made it challenging to interpret the progression of critical events over time.

Quantitative metrics: Both methods were evaluated for their predictive accuracy improvement when used as part of the interpretation pipeline. Models with LBM interpretations demonstrated a slightly higher alignment with expert-annotated critical event sequences (85% vs 82% for SHAP) in benchmark tests. Computational efficiency was also considered. LBM required less processing time for time-series data, whereas SHAP was computationally intensive due to its reliance on generating numerous perturbed samples.

Complementary use: While LBM and SHAP were individually effective in specific aspects, combining their outputs provided a more holistic understanding. LBM highlighted critical temporal windows, while SHAP contextualized the contribution of specific features within those windows.

In 2024, Zilker et al[31] proposed a more accurate predictive tool with interpretable and actionable understanding. The authors introduced an ML framework termed PatWay-Net to make interpretable predictions at admission into ICU with features of sepsis. A novel type of recurrent NN was combined with multi-layer perceptrons for processing patient pathways and giving interpretable predictive results. In addition, the end user is provided a comprehensive dashboard to visualize the patients’ health trajectories. It was proposed as a valuable addition to health care decision making[31]. While explainable ML uses flexible ML models with high predictive ability, which need subsequent post-hoc explanation methods to convert complex math functions to easier to comprehend explanations, interpretable ML is different: Here an intrinsically interpretable model is developed so that it affords a better understanding of how predictions are made. Generalized Additive Models are more advanced intrinsically interpretable ML models. Here, input features are independently modeled in a non-linear manner to generate univariate shape functions which remain fully interpretable[31].

The advantages of PatWay-Net model is its ability to support analysis of patient pathways using patients data, thus avoiding subjectivity. It has a high performance while allowing flexible decision support applications in complex healthcare environments. It can simultaneously improve decision making both at the individual level and for administrative decisions[31].

CONCLUSION

Despite the many potential applications of NN in the ICU to ease workflow and improve outcomes, there are caveats to consider. First, AI systems are a complex and challenging multiphase process[32]. Second, AI based applications are considered to be medical devices and thereby subject to less rigorous reviewing and authorization criteria. Combined with dearth of raw data and financial constraints, decisions on whether or not to use these tools are often based on cost savings, not so much in improved patient outcomes. Finally, once a device is released into the market, no further fine-tuning to assess its performance is mandated. In addition to this is the manner in which healthcare workers respond to the alert is another variable, particularly with the possibility of ‘alert fatigue’[32]. Biases in the training set are limited in how widely the model or device is incorporated into clinical practice, all related to legal aspects and ethics[33]. In spite of the limitations, NN methods are commonly being used in the ICUs[34]. As the glitches are ironed out, they promise to aid the clinician in improving the quality of care, and to ease the burden of handling multidimensional data. Apart from the implementation of NN in the ICU setting, the concept of neural interfaces in the promises to usher in a revolutionary era that extends beyond traditional healthcare[35,36]. They in turn require, apart from technological advances, a robust ethical framework, multi-disciplinary collaboration, and regulatory oversight[35].

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Medicine, research and experimental

Country of origin: India

Peer-review report’s classification

Scientific Quality: Grade B, Grade D

Novelty: Grade A, Grade B

Creativity or Innovation: Grade B, Grade B

Scientific Significance: Grade B, Grade C

P-Reviewer: Xu SM S-Editor: Luo ML L-Editor: A P-Editor: Wang WB

References
1.  Chen Y, Chen H, Sun Q, Zhai R, Liu X, Zhou J, Li S. Machine learning model identification and prediction of patients' need for ICU admission: A systematic review. Am J Emerg Med. 2023;73:166-170.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
2.  Paaß G, Hecker D.   Artificial Intelligence. Berlin: Springer, 2024.  [PubMed]  [DOI]  [Cited in This Article: ]
3.  Sridhar GR. Diabetes and data in many forms. Int J Diabetes Dev Ctries. 2016;36:381-384.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 1]  [Article Influence: 0.1]  [Reference Citation Analysis (0)]
4.  Paliwal M, Kumar UA. Neural networks and statistical techniques: A review of applications. Expert Syst Appl. 2009;36:2-17.  [PubMed]  [DOI]  [Cited in This Article: ]
5.  Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform. 2002;35:352-359.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1196]  [Cited by in F6Publishing: 742]  [Article Influence: 32.3]  [Reference Citation Analysis (0)]
6.  Razi M, Athappilly K. A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models. Expert Syst Appl. 2005;29:65-74.  [PubMed]  [DOI]  [Cited in This Article: ]
7.  Black JE, Kueper JK, Williamson TS. An introduction to machine learning for classification and prediction. Fam Pract. 2023;40:200-204.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 14]  [Article Influence: 7.0]  [Reference Citation Analysis (0)]
8.  Weidener L, Fischer M. Teaching AI Ethics in Medical Education: A Scoping Review of Current Literature and Practices. Perspect Med Educ. 2023;12:399-410.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 12]  [Cited by in F6Publishing: 5]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
9.  McCarthy J, Minsky ML, Rochester N, Shannon CE. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence. AI Mag. 1955;2006.  [PubMed]  [DOI]  [Cited in This Article: ]
10.  Zsidai B, Kaarre J, Narup E, Hamrin Senorski E, Pareek A, Grassi A, Ley C, Longo UG, Herbst E, Hirschmann MT, Kopf S, Seil R, Tischer T, Samuelsson K, Feldt R; ESSKA Artificial Intelligence Working Group. A practical guide to the implementation of artificial intelligence in orthopaedic research-Part 2: A technical introduction. J Exp Orthop. 2024;11:e12025.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 4]  [Cited by in F6Publishing: 4]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
11.  Sogandi F. Identifying diseases symptoms and general rules using supervised and unsupervised machine learning. Sci Rep. 2024;14:17956.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
12.  Al-Hamadani MNA, Fadhel MA, Alzubaidi L, Balazs H. Reinforcement Learning Algorithms and Applications in Healthcare and Robotics: A Comprehensive and Systematic Review. Sensors (Basel). 2024;24:2461.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
13.  Hassan AM, Rajesh A, Asaad M, Nelson JA, Coert JH, Mehrara BJ, Butler CE. A Surgeon's Guide to Artificial Intelligence-Driven Predictive Models. Am Surg. 2023;89:11-19.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 5]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
14.  Singh Y, Chauhan AS. Neural networks in data mining. JATIT. 2009;5:36-42.  [PubMed]  [DOI]  [Cited in This Article: ]
15.  Wehkamp K, Krawczak M, Schreiber S. The Quality and Utility of Artificial Intelligence in Patient Care. Dtsch Arztebl Int. 2023;120:463-469.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Reference Citation Analysis (0)]
16.  Baldassarre D, Grossi E, Buscema M, Intraligi M, Amato M, Tremoli E, Pustina L, Castelnuovo S, Sanvito S, Gerosa L, Sirtori CR. Recognition of patients with cardiovascular disease by artificial neural networks. Ann Med. 2004;36:630-640.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 16]  [Cited by in F6Publishing: 10]  [Article Influence: 0.5]  [Reference Citation Analysis (0)]
17.  Atallah L, Nabian M, Brochini L, Amelung PJ. Machine Learning for Benchmarking Critical Care Outcomes. Healthc Inform Res. 2023;29:301-314.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in F6Publishing: 1]  [Reference Citation Analysis (0)]
18.  Bhattacharya S, Rajan V, Shrivastava H. ICU Mortality Prediction: A Classification Algorithm for Imbalanced Datasets. AAAI. 2017;31.  [PubMed]  [DOI]  [Cited in This Article: ]
19.  Li L, Liu G.   In-hospital Mortality Prediction for ICU Patients on Large Healthcare MIMIC Datasets Using Class Imbalance Learning. 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). 2020: 90-93.  [PubMed]  [DOI]  [Cited in This Article: ]
20.  Mirzakhani F, Sadoughi F, Hatami M, Amirabadizadeh A. Which model is superior in predicting ICU survival: artificial intelligence versus conventional approaches. BMC Med Inform Decis Mak. 2022;22:167.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in F6Publishing: 6]  [Reference Citation Analysis (0)]
21.  Barboi C, Tzavelis A, Muhammad LN. Comparison of Severity of Illness Scores and Artificial Intelligence Models That Are Predictive of Intensive Care Unit Mortality: Meta-analysis and Review of the Literature. JMIR Med Inform. 2022;10:e35293.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 10]  [Article Influence: 2.5]  [Reference Citation Analysis (0)]
22.  Venturini S, Orso D, Cugini F, Crapis M, Fossati S, Callegari A, Pellis T, Tomasello DC, Tonizzo M, Grembiale A, D'Andrea N, Vetrugno L, Bove T. Artificial neural network model from a case series of COVID-19 patients: a prognostic analysis. Acta Biomed. 2021;92:e2021202.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in F6Publishing: 1]  [Reference Citation Analysis (0)]
23.  Chassagnon G, Vakalopoulou M, Battistella E, Christodoulidis S, Hoang-Thi TN, Dangeard S, Deutsch E, Andre F, Guillo E, Halm N, El Hajj S, Bompard F, Neveu S, Hani C, Saab I, Campredon A, Koulakian H, Bennani S, Freche G, Barat M, Lombard A, Fournier L, Monnier H, Grand T, Gregory J, Nguyen Y, Khalil A, Mahdjoub E, Brillet PY, Tran Ba S, Bousson V, Mekki A, Carlier RY, Revel MP, Paragios N. AI-driven quantification, staging and outcome prediction of COVID-19 pneumonia. Med Image Anal. 2021;67:101860.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 69]  [Cited by in F6Publishing: 87]  [Article Influence: 17.4]  [Reference Citation Analysis (0)]
24.  Dong Y, Wang K, Zou X, Tan X, Zang Y, Li X, Ren X, Xie D, Jie Z, Chen X, Zeng Y, Shi J. Evaluating the ability of the NLHA2 and artificial neural network models to predict COVID-19 severity, and comparing them with the four existing scoring systems. Microb Pathog. 2022;171:105735.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
25.  Bollepalli SC, Sevakula RK, Au-Yeung WM, Kassab MB, Merchant FM, Bazoukis G, Boyer R, Isselbacher EM, Armoundas AA. Real-Time Arrhythmia Detection Using Hybrid Convolutional Neural Networks. J Am Heart Assoc. 2021;10:e023222.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 4]  [Cited by in F6Publishing: 7]  [Article Influence: 1.8]  [Reference Citation Analysis (0)]
26.  Ates HC, Alshanawani A, Hagel S, Cotta MO, Roberts JA, Dincer C, Ates C. Unraveling the impact of therapeutic drug monitoring via machine learning for patients with sepsis. Cell Rep Med. 2024;5:101681.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
27.  Chieregato M, Frangiamore F, Morassi M, Baresi C, Nici S, Bassetti C, Bnà C, Galelli M. A hybrid machine learning/deep learning COVID-19 severity predictive model from CT images and clinical data. Sci Rep. 2022;12:4329.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 10]  [Cited by in F6Publishing: 47]  [Article Influence: 15.7]  [Reference Citation Analysis (0)]
28.  Caicedo-Torres W, Gutierrez J. ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU. J Biomed Inform. 2019;98:103269.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 30]  [Cited by in F6Publishing: 33]  [Article Influence: 5.5]  [Reference Citation Analysis (0)]
29.  Ho LV, Aczon M, Ledbetter D, Wetzel R. Interpreting a recurrent neural network's predictions of ICU mortality risk. J Biomed Inform. 2021;114:103672.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 13]  [Article Influence: 3.3]  [Reference Citation Analysis (0)]
30.  Strickler EAT, Thomas J, Thomas JP, Benjamin B, Shamsuddin R. Exploring a global interpretation mechanism for deep learning networks when predicting sepsis. Sci Rep. 2023;13:3067.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Reference Citation Analysis (0)]
31.  Zilker S, Weinzierl S, Kraus M, Zschech P, Matzner M. A machine learning framework for interpretable predictions in patient pathways: The case of predicting ICU admission for patients with symptoms of sepsis. Health Care Manag Sci. 2024;27:136-167.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
32.  Lenharo M. The testing of AI in medicine is a mess. Here's how it should be done. Nature. 2024;632:722-724.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
33.  Sridhar G, Lakshmi G. Ethical Issues of Artificial Intelligence in Diabetes Mellitus. Arch Med Res. 2023;11.  [PubMed]  [DOI]  [Cited in This Article: ]
34.  Wang L, Long DY. Significant risk factors for intensive care unit-acquired weakness: A processing strategy based on repeated machine learning. World J Clin Cases. 2024;12:1235-1242.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in CrossRef: 11]  [Reference Citation Analysis (37)]
35.  Xu S, Liu Y, Lee H, Li W. Neural interfaces: Bridging the brain to the world beyond healthcare. Exploration (Beijing). 2024;4:20230146.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 5]  [Cited by in F6Publishing: 3]  [Article Influence: 3.0]  [Reference Citation Analysis (0)]
36.  Xu S, Manshaii F, Xiao X, Chen J. Artificial Intelligence Assisted Nanogenerator Applications. J Mater Chem A.  2024.  [PubMed]  [DOI]  [Cited in This Article: ]