1
|
McCradden MD, Thai K, Assadi A, Tonekaboni S, Stedman I, Joshi S, Zhang M, Chevalier F, Goldenberg A. What makes a 'good' decision with artificial intelligence? A grounded theory study in paediatric care. BMJ Evid Based Med 2025:bmjebm-2024-112919. [PMID: 39939160 DOI: 10.1136/bmjebm-2024-112919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/15/2024] [Indexed: 02/14/2025]
Abstract
OBJECTIVE To develop a framework for good clinical decision-making using machine learning (ML) models for interventional, patient-level decisions. DESIGN Grounded theory qualitative interview study. SETTING Primarily single-site at a major urban academic paediatric hospital, with external sampling. PARTICIPANTS Sixteen participants representing physicians (n=10), nursing (n=3), respiratory therapists (n=2) and an ML specialist (n=1) with experience working in acute care environments were identified through purposive sampling. Individuals were recruited to represent a spectrum of ML knowledge (three expert, four knowledgeable and nine non-expert) and years of experience (median=12.9 years postgraduation). Recruitment proceeded through snowball sampling, with individuals approached to represent a diversity of fields, levels of experience and attitudes towards artificial intelligence (AI)/ML. A member check step and consultation with patients was undertaken to vet the framework, which resulted in some minor revisions to the wording and framing. INTERVENTIONS A semi-structured virtual interview simulating an intensive care unit handover for a hypothetical patient case using a simulated ML model and seven visualisations using known methods addressing interpretability of models in healthcare. Participants were asked to make an initial care plan for the patient, then were presented with a model prediction followed by the seven visualisations to explore their judgement and potential influence and understanding of the visualisations. Two visualisations contained contradicting information to probe participants' resolution process for the contrasting information. The ethical justifiability and clinical reasoning process were explored. MAIN OUTCOME A comprehensive framework was developed that is grounded in established medicolegal and ethical standards and accounts for the incorporation of inference from ML models. RESULTS We found that for making good decisions, participants reflected across six main categories: evidence, facts and medical knowledge relevant to the patient's condition; how that knowledge may be applied to this particular patient; patient-level, family-specific and local factors; facts about the model, its development and testing; the patient-level knowledge sufficiently represented by the model; the model's incorporation of relevant contextual factors. This judgement was centred on and anchored most heavily on the overall balance of benefits and risks to the patient, framed by the goals of care. We found evidence of automation bias, with many participants assuming that if the model's explanation conflicted with their prior knowledge that their judgement was incorrect; others concluded the exact opposite, drawing from their medical knowledge base to reject the incorrect information provided in the explanation. Regarding knowledge about the model, we found that participants most consistently wanted to know about the model's historical performance in the cohort of patients in their local unit where the hypothetical patient was situated. CONCLUSION Good decisions using AI tools require reflection across multiple domains. We provide an actionable framework and question guide to support clinical decision-making with AI.
Collapse
Affiliation(s)
- Melissa D McCradden
- The Hospital for Sick Children, Toronto, Ontario, Canada
- SickKids Research Institute, Toronto, Ontario, Canada
- Australian Institue for Machine Learning, Adelaide, South Australia, Australia
| | - Kelly Thai
- SickKids Research Institute, Toronto, Ontario, Canada
| | - Azadeh Assadi
- The Hospital for Sick Children, Toronto, Ontario, Canada
- Faculty of Applied Sciences and Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Sana Tonekaboni
- SickKids Research Institute, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for AI, Toronto, Ontario, Canada
| | | | | | - Minfan Zhang
- Vector Institute for AI, Toronto, Ontario, Canada
- University of Toronto, Toronto, Ontario, Canada
| | - Fanny Chevalier
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | - Anna Goldenberg
- SickKids Research Institute, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for AI, Toronto, Ontario, Canada
- CIFAR, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Heuser S, Steil J, Salloch S. AI Ethics beyond Principles: Strengthening the Life-world Perspective. SCIENCE AND ENGINEERING ETHICS 2025; 31:7. [PMID: 39928281 PMCID: PMC11811459 DOI: 10.1007/s11948-025-00530-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 01/15/2025] [Indexed: 02/11/2025]
Abstract
The search for ethical guidance in the development of artificial intelligence (AI) systems, especially in healthcare and decision support, remains a crucial effort. So far, principles usually serve as the main reference points to achieve ethically correct implementations. Based on reviewing classical criticism of principle-based ethics and taking into account the severity and potentially life-changing relevance of decisions assisted by AI-driven systems, we argue for strengthening a complementary perspective that focuses on the life-world as ensembles of practices which shape people's lives. This perspective focuses on the notion of ethical judgment sensitive to life forms, arguing that principles alone do not guarantee ethicality in a moral world that is rather a joint construction of reality than a matter of mere control. We conclude that it is essential to support and supplement the implementation of moral principles in the development of AI systems for decision-making in healthcare by recognizing the normative relevance of life forms and practices in ethical judgment.
Collapse
Affiliation(s)
- Stefan Heuser
- Institute for Protestant Theology and Religious Education, Technische Universität Braunschweig, Braunschweig, Germany
| | - Jochen Steil
- Institute for Robotics and Process Control, Technische Universität Braunschweig, Braunschweig, Germany
| | - Sabine Salloch
- Hannover Medical School, Institute for Ethics, History and Philosophy of Medicine, Hannover, Germany.
| |
Collapse
|
3
|
Rademakers FE, Biasin E, Bruining N, Caiani EG, Davies RH, Gilbert SH, Kamenjasevic E, McGauran G, O'Connor G, Rouffet JB, Vasey B, Fraser AG. CORE-MD clinical risk score for regulatory evaluation of artificial intelligence-based medical device software. NPJ Digit Med 2025; 8:90. [PMID: 39915308 PMCID: PMC11802784 DOI: 10.1038/s41746-025-01459-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 01/15/2025] [Indexed: 02/09/2025] Open
Abstract
The European CORE-MD consortium (Coordinating Research and Evidence for Medical Devices) proposes a score for medical devices incorporating artificial intelligence or machine learning algorithms. Its domains are summarised as valid clinical association, technical performance, and clinical performance. High scores indicate that extensive clinical investigations should be undertaken before regulatory approval, whereas lower scores indicate devices for which less pre-market clinical evaluation may be balanced by more post-market evidence.
Collapse
Affiliation(s)
| | - Elisabetta Biasin
- Researcher in Law, Center for IT & IP Law (CiTiP), KU Leuven, Leuven, Belgium
| | - Nico Bruining
- Department of Cardiology, Erasmus Medical Center, Thorax Center, Rotterdam, the Netherlands
| | - Enrico G Caiani
- Department of Electronics, Information and Biomedical Engineering, Politecnico di Milano, Milan, Italy
- IRCCS Istituto Auxologico Italiano, Milan, Italy
| | - Rhodri H Davies
- Institute of Cardiovascular Science, University College London, London, UK
| | - Stephen H Gilbert
- Professor for Medical Device Regulatory Science, Else Kröner Fresenius Center, for Digital Health, TUD Dresden University of Technology, Dresden, Germany
| | - Eric Kamenjasevic
- Doctoral researcher in Law and Ethics, Center for IT & IP Law (CiTiP), KU Leuven, Leuven, Belgium
| | - Gearóid McGauran
- Medical Officer, Medical Devices, Health Products Regulatory Authority, Dublin, Ireland
| | - Gearóid O'Connor
- Medical Officer, Medical Devices, Health Products Regulatory Authority, Dublin, Ireland
| | - Jean-Baptiste Rouffet
- Policy Advisor, European Affairs, European Federation of National Societies of Orthopaedics and Traumatology, Rolle, Switzerland
| | - Baptiste Vasey
- Nuffield Department of Surgical Sciences, University of Oxford, Oxford, UK
- Department of Surgery, Geneva University Hospital, Geneva, Switzerland
| | - Alan G Fraser
- Consultant Cardiologist, University Hospital of Wales, and Emeritus Professor of Cardiology, School of Medicine, Cardiff University, Heath Park, Cardiff, UK
- Cardiovascular Imaging and Dynamics, KU Leuven, Leuven, Belgium
| |
Collapse
|
4
|
Jiang S, Bukhari SMA, Krishnan A, Bera K, Sharma A, Caovan D, Rosipko B, Gupta A. Deployment of Artificial Intelligence in Radiology: Strategies for Success. AJR Am J Roentgenol 2025:1-11. [PMID: 39475198 DOI: 10.2214/ajr.24.31898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
Radiology, as a highly technical and information-rich medical specialty, is well suited for artificial intelligence (AI) product development, and many U.S. FDA-cleared AI medical devices are authorized for uses within the specialty. In this Clinical Perspective, we discuss the deployment of AI tools in radiology, exploring regulatory processes, the need for transparency, and other practical challenges. We further highlight the importance of rigorous validation, real-world testing, seamless workflow integration, and end user education. We emphasize the role for continuous feedback and robust monitoring processes, to guide AI tools' adaptation and help ensure sustained performance. Traditional standalone and alternative platform-based approaches to radiology AI implementation are considered. The presented strategies will help achieve successful deployment and fully realize AI's potential benefits in radiology.
Collapse
Affiliation(s)
- Sirui Jiang
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| | - Syed M A Bukhari
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| | - Arjun Krishnan
- Department of Biology, Cleveland State University, Cleveland, OH
| | - Kaustav Bera
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| | - Avishkar Sharma
- Department of Radiology, Jefferson Einstein Philadelphia Hospital, Philadelphia, PA
| | - Danielle Caovan
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| | - Beverly Rosipko
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| | - Amit Gupta
- Department of Radiology, University Hospitals Cleveland Medical Center, 11100 Euclid Ave, Cleveland, OH 44106
| |
Collapse
|
5
|
Bradshaw TJ, Tie X, Warner J, Hu J, Li Q, Li X. Large Language Models and Large Multimodal Models in Medical Imaging: A Primer for Physicians. J Nucl Med 2025; 66:173-182. [PMID: 39819692 DOI: 10.2967/jnumed.124.268072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 12/19/2024] [Indexed: 01/19/2025] Open
Abstract
Large language models (LLMs) are poised to have a disruptive impact on health care. Numerous studies have demonstrated promising applications of LLMs in medical imaging, and this number will grow as LLMs further evolve into large multimodal models (LMMs) capable of processing both text and images. Given the substantial roles that LLMs and LMMs will have in health care, it is important for physicians to understand the underlying principles of these technologies so they can use them more effectively and responsibly and help guide their development. This article explains the key concepts behind the development and application of LLMs, including token embeddings, transformer networks, self-supervised pretraining, fine-tuning, and others. It also describes the technical process of creating LMMs and discusses use cases for both LLMs and LMMs in medical imaging.
Collapse
Affiliation(s)
- Tyler J Bradshaw
- Department of Radiology, University of Wisconsin-Madison, Madison, Wisconsin;
| | - Xin Tie
- Department of Radiology, University of Wisconsin-Madison, Madison, Wisconsin
| | - Joshua Warner
- Department of Radiology, University of Wisconsin-Madison, Madison, Wisconsin
| | - Junjie Hu
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin; and
| | - Quanzheng Li
- Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| | - Xiang Li
- Center for Advanced Medical Computing and Analysis, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
6
|
Anderson JW, Visweswaran S. Algorithmic individual fairness and healthcare: a scoping review. JAMIA Open 2025; 8:ooae149. [PMID: 39737346 PMCID: PMC11684587 DOI: 10.1093/jamiaopen/ooae149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2024] [Revised: 12/05/2024] [Accepted: 12/12/2024] [Indexed: 01/01/2025] Open
Abstract
Objectives Statistical and artificial intelligence algorithms are increasingly being developed for use in healthcare. These algorithms may reflect biases that magnify disparities in clinical care, and there is a growing need for understanding how algorithmic biases can be mitigated in pursuit of algorithmic fairness. We conducted a scoping review on algorithmic individual fairness (IF) to understand the current state of research in the metrics and methods developed to achieve IF and their applications in healthcare. Materials and Methods We searched four databases: PubMed, ACM Digital Library, IEEE Xplore, and medRxiv for algorithmic IF metrics, algorithmic bias mitigation, and healthcare applications. Our search was restricted to articles published between January 2013 and November 2024. We identified 2498 articles through database searches and seven additional articles, of which 32 articles were included in the review. Data from the selected articles were extracted, and the findings were synthesized. Results Based on the 32 articles in the review, we identified several themes, including philosophical underpinnings of fairness, IF metrics, mitigation methods for achieving IF, implications of achieving IF on group fairness and vice versa, and applications of IF in healthcare. Discussion We find that research of IF is still in their early stages, particularly in healthcare, as evidenced by the limited number of relevant articles published between 2013 and 2024. While healthcare applications of IF remain sparse, growth has been steady in number of publications since 2012. The limitations of group fairness further emphasize the need for alternative approaches like IF. However, IF itself is not without challenges, including subjective definitions of similarity and potential bias encoding from data-driven methods. These findings, coupled with the limitations of the review process, underscore the need for more comprehensive research on the evolution of IF metrics and definitions to advance this promising field. Conclusion While significant work has been done on algorithmic IF in recent years, the definition, use, and study of IF remain in their infancy, especially in healthcare. Future research is needed to comprehensively apply and evaluate IF in healthcare.
Collapse
Affiliation(s)
- Joshua W Anderson
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213, United States
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213, United States
- Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15213, United States
| |
Collapse
|
7
|
Pham TT, Brecheisen J, Wu CC, Nguyen H, Deng Z, Adjeroh D, Doretto G, Choudhary A, Le N. ItpCtrl-AI: End-to-end interpretable and controllable artificial intelligence by modeling radiologists' intentions. Artif Intell Med 2025; 160:103054. [PMID: 39689443 PMCID: PMC11757032 DOI: 10.1016/j.artmed.2024.103054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 10/13/2024] [Accepted: 12/05/2024] [Indexed: 12/19/2024]
Abstract
Using Deep Learning in computer-aided diagnosis systems has been of great interest due to its impressive performance in the general domain and medical domain. However, a notable challenge is the lack of explainability of many advanced models, which poses risks in critical applications such as diagnosing findings in CXR. To address this problem, we propose ItpCtrl-AI, a novel end-to-end interpretable and controllable framework that mirrors the decision-making process of the radiologist. By emulating the eye gaze patterns of radiologists, our framework initially determines the focal areas and assesses the significance of each pixel within those regions. As a result, the model generates an attention heatmap representing radiologists' attention, which is then used to extract attended visual information to diagnose the findings. By allowing the directional input, our framework is controllable by the user. Furthermore, by displaying the eye gaze heatmap which guides the diagnostic conclusion, the underlying rationale behind the model's decision is revealed, thereby making it interpretable. In addition to developing an interpretable and controllable framework, our work includes the creation of a dataset, named Diagnosed-Gaze++, which aligns medical findings with eye gaze data. Our extensive experimentation validates the effectiveness of our approach in generating accurate attention heatmaps and diagnoses. The experimental results show that our model not only accurately identifies medical findings but also precisely produces the eye gaze attention of radiologists. The dataset, models, and source code will be made publicly available upon acceptance.
Collapse
Affiliation(s)
- Trong-Thang Pham
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| | - Jacob Brecheisen
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| | - Carol C Wu
- MD Anderson Cancer Center, Houston, TX 77079, USA.
| | - Hien Nguyen
- Department of ECE, University of Houston, TX 77204, USA.
| | - Zhigang Deng
- Department of CS, University of Houston, TX 77204, USA.
| | - Donald Adjeroh
- Department of CSEE, West Virginia University, WV 26506, USA.
| | | | - Arabinda Choudhary
- University of Arkansas for Medical Sciences, Little Rock, AR 72705, USA.
| | - Ngan Le
- AICV Lab, Department of EECS, University of Arkansas, AR 72701, USA.
| |
Collapse
|
8
|
Savardi M, Signoroni A, Benini S, Vaccher F, Alberti M, Ciolli P, Di Meo N, Falcone T, Ramanzin M, Romano B, Sozzi F, Farina D. Upskilling or deskilling? Measurable role of an AI-supported training for radiology residents: a lesson from the pandemic. Insights Imaging 2025; 16:23. [PMID: 39881013 PMCID: PMC11780016 DOI: 10.1186/s13244-024-01893-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 12/19/2024] [Indexed: 01/31/2025] Open
Abstract
OBJECTIVES This article aims to evaluate the use and effects of an artificial intelligence system supporting a critical diagnostic task during radiology resident training, addressing a research gap in this field. MATERIALS AND METHODS We involved eight residents evaluating 150 CXRs in three scenarios: no AI, on-demand AI, and integrated-AI. The considered task was the assessment of a multi-regional severity score of lung compromise in patients affected by COVID-19. The chosen artificial intelligence tool, fully integrated in the RIS/PACS, demonstrated superior performance in scoring compared to the average radiologist. Using quantitative metrics and questionnaires, we measured the 'upskilling' effects of using AI support and residents' resilience to 'deskilling,' i.e., their ability to overcome AI errors. RESULTS Residents required AI in 70% of cases when left free to choose. AI support significantly reduced severity score errors and increased inter-rater agreement by 22%. Residents were resilient to AI errors above an acceptability threshold. Questionnaires indicated high tool usefulness, reliability, and explainability, with a preference for collaborative AI scenarios. CONCLUSION With this work, we gathered quantitative and qualitative evidence of the beneficial use of a high-performance AI tool that is well integrated into the diagnostic workflow as a training aid for radiology residents. CRITICAL RELEVANCE STATEMENT Balancing educational benefits and deskilling risks is essential to exploit AI systems as effective learning tools in radiology residency programs. Our work highlights metrics for evaluating these aspects. KEY POINTS Insights into AI tools' effects in radiology resident training are lacking. Metrics were defined to observe residents using an AI tool in different settings. This approach is advisable for evaluating AI tools in radiology training.
Collapse
Affiliation(s)
- Mattia Savardi
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
| | - Alberto Signoroni
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy.
| | - Sergio Benini
- Department of Information Engineering, University of Brescia, Brescia, Italy
| | - Filippo Vaccher
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Matteo Alberti
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Pietro Ciolli
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Nunzia Di Meo
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Teresa Falcone
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Marco Ramanzin
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Barbara Romano
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Federica Sozzi
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Davide Farina
- Department of Medical and Surgical Specialties, Radiological Sciences and Public Health, University of Brescia, Brescia, Italy
- Radiology Unit 2, ASST Spedali Civili di Brescia, Brescia, Italy
| |
Collapse
|
9
|
Niraula D, Cuneo KC, Dinov ID, Gonzalez BD, Jamaluddin JB, Jin JJ, Luo Y, Matuszak MM, Ten Haken RK, Bryant AK, Dilling TJ, Dykstra MP, Frakes JM, Liveringhouse CL, Miller SR, Mills MN, Palm RF, Regan SN, Rishi A, Torres-Roca JF, Yu HHM, El Naqa I. Intricacies of human-AI interaction in dynamic decision-making for precision oncology. Nat Commun 2025; 16:1138. [PMID: 39881134 PMCID: PMC11779952 DOI: 10.1038/s41467-024-55259-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Accepted: 12/04/2024] [Indexed: 01/31/2025] Open
Abstract
AI decision support systems can assist clinicians in planning adaptive treatment strategies that can dynamically react to individuals' cancer progression for effective personalized care. However, AI's imperfections can lead to suboptimal therapeutics if clinicians over or under rely on AI. To investigate such collaborative decision-making process, we conducted a Human-AI interaction study on response-adaptive radiotherapy for non-small cell lung cancer and hepatocellular carcinoma. We investigated two levels of collaborative behavior: model-agnostic and model-specific; and found that Human-AI interaction is multifactorial and depends on the complex interrelationship between prior knowledge and preferences, patient's state, disease site, treatment modality, model transparency, and AI's learned behavior and biases. In summary, some clinicians may disregard AI recommendations due to skepticism; others will critically analyze AI recommendations on a case-by-case basis; clinicians will adjust their decisions if they find AI recommendations beneficial to patients; and clinician will disregard AI recommendations if deemed harmful or suboptimal and seek alternatives.
Collapse
Affiliation(s)
- Dipesh Niraula
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, USA.
| | - Kyle C Cuneo
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Ivo D Dinov
- Department of Health Behavior and Biological Sciences, University of Michigan, Ann Arbor, MI, USA
| | - Brian D Gonzalez
- Department of Health Outcomes and Behavior, Moffitt Cancer Center, Tampa, FL, USA
| | - Jamalina B Jamaluddin
- Department of Nuclear Engineering and Radiological Sciences, Moffitt Cancer Center, Tampa, FL, USA
| | - Jionghua Judy Jin
- Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, MI, USA
| | - Yi Luo
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, USA
| | - Martha M Matuszak
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Randall K Ten Haken
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Alex K Bryant
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
- Department of Radiation Oncology, Veterans Affairs Ann Arbor Healthcare System, Ann Arbor, MI, USA
| | - Thomas J Dilling
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Michael P Dykstra
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Jessica M Frakes
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Casey L Liveringhouse
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Sean R Miller
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Matthew N Mills
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Russell F Palm
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Samuel N Regan
- Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, USA
| | - Anupam Rishi
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Javier F Torres-Roca
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Hsiang-Hsuan Michael Yu
- Department of Radiation Oncology, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Issam El Naqa
- Department of Machine Learning, Moffitt Cancer Center, Tampa, FL, USA.
| |
Collapse
|
10
|
Liu X, Liu H, Yang G, Jiang Z, Cui S, Zhang Z, Wang H, Tao L, Sun Y, Song Z, Hong T, Yang J, Gao T, Zhang J, Li X, Zhang J, Sang Y, Yang Z, Xue K, Wu S, Zhang P, Yang J, Song C, Wang G. A generalist medical language model for disease diagnosis assistance. Nat Med 2025:10.1038/s41591-024-03416-6. [PMID: 39779927 DOI: 10.1038/s41591-024-03416-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Accepted: 11/12/2024] [Indexed: 01/11/2025]
Abstract
The delivery of accurate diagnoses is crucial in healthcare and represents the gateway to appropriate and timely treatment. Although recent large language models (LLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, their effectiveness in clinical diagnosis remains unproven. Here we present MedFound, a generalist medical language model with 176 billion parameters, pre-trained on a large-scale corpus derived from diverse medical text and real-world clinical records. We further fine-tuned MedFound to learn physicians' inferential diagnosis with a self-bootstrapping strategy-based chain-of-thought approach and introduced a unified preference alignment framework to align it with standard clinical practice. Extensive experiments demonstrate that our medical LLM outperforms other baseline LLMs and specialized models in in-distribution (common diseases), out-of-distribution (external validation) and long-tailed distribution (rare diseases) scenarios across eight specialties. Further ablation studies indicate the effectiveness of key components in our medical LLM training approach. We conducted a comprehensive evaluation of the clinical applicability of LLMs for diagnosis involving artificial intelligence (AI) versus physician comparison, AI-assistance study and human evaluation framework. Our proposed framework incorporates eight clinical evaluation metrics, covering capabilities such as medical record summarization, diagnostic reasoning and risk management. Our findings demonstrate the model's feasibility in assisting physicians with disease diagnosis as part of the clinical workflow.
Collapse
Affiliation(s)
- Xiaohong Liu
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Hao Liu
- Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China
| | - Guoxing Yang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Zeyu Jiang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Shuguang Cui
- School of Science and Engineering (SSE), Future Network of Intelligence Institute (FNii) and Guangdong Provincial Key Laboratory of Future Networks of Intelligence, Chinese University of Hong Kong, Shenzhen, China
| | - Zhaoze Zhang
- Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China
| | - Huan Wang
- Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China
| | - Liyuan Tao
- Research Center of Clinical Epidemiology, Peking University Third Hospital, Beijing, China
| | - Yongchang Sun
- Department of Respiratory and Critical Care Medicine, Peking University Third Hospital and Research Center for Chronic Airway Diseases, Peking University Health Science Center, Beijing, China
| | - Zhu Song
- Department of Respiratory and Critical Care Medicine, Peking University Third Hospital and Research Center for Chronic Airway Diseases, Peking University Health Science Center, Beijing, China
| | - Tianpei Hong
- Department of Endocrinology and Metabolism, Peking University Third Hospital, Beijing, China
| | - Jin Yang
- Department of Endocrinology and Metabolism, Peking University Third Hospital, Beijing, China
| | - Tianrun Gao
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Jiangjiang Zhang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Xiaohu Li
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Jing Zhang
- Department of Cardiology, The First College of Clinical Medical Science, China Three Gorges University and Yichang Central People's Hospital, Yichang, China
| | - Ye Sang
- Department of Cardiology, The First College of Clinical Medical Science, China Three Gorges University and Yichang Central People's Hospital, Yichang, China
| | - Zhao Yang
- Peking University First Hospital and Research Center of Public Policy, Peking University, Beijing, China
| | - Kanmin Xue
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| | - Song Wu
- South China Hospital, Medical School, Shenzhen University, Shenzhen, China
| | - Ping Zhang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
| | - Jian Yang
- Department of Cardiology, The First College of Clinical Medical Science, China Three Gorges University and Yichang Central People's Hospital, Yichang, China.
| | - Chunli Song
- Department of Orthopedics, Peking University Third Hospital & Beijing Key Laboratory of Spinal Disease & Engineering Research Center of Bone and Joint Precision Medicine, Beijing, China.
| | - Guangyu Wang
- State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China.
| |
Collapse
|
11
|
Weber S, Wyszynski M, Godefroid M, Plattfaut R, Niehaves B. How do medical professionals make sense (or not) of AI? A social-media-based computational grounded theory study and an online survey. Comput Struct Biotechnol J 2024; 24:146-159. [PMID: 38434249 PMCID: PMC10904922 DOI: 10.1016/j.csbj.2024.02.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/14/2024] [Accepted: 02/14/2024] [Indexed: 03/05/2024] Open
Abstract
To investigate opinions and attitudes of medical professionals towards adopting AI-enabled healthcare technologies in their daily business, we used a mixed-methods approach. Study 1 employed a qualitative computational grounded theory approach analyzing 181 Reddit threads in the several subreddits of r/medicine. By utilizing an unsupervised machine learning clustering method, we identified three key themes: (1) consequences of AI, (2) physician-AI relationship, and (3) a proposed way forward. In particular Reddit posts related to the first two themes indicated that the medical professionals' fear of being replaced by AI and skepticism toward AI played a major role in the argumentations. Moreover, the results suggest that this fear is driven by little or moderate knowledge about AI. Posts related to the third theme focused on factual discussions about how AI and medicine have to be designed to become broadly adopted in health care. Study 2 quantitatively examined the relationship between the fear of AI, knowledge about AI, and medical professionals' intention to use AI-enabled technologies in more detail. Results based on a sample of 223 medical professionals who participated in the online survey revealed that the intention to use AI technologies increases with increasing knowledge about AI and that this effect is moderated by the fear of being replaced by AI.
Collapse
Affiliation(s)
- Sebastian Weber
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marc Wyszynski
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| | - Marie Godefroid
- University of Siegen, Information Systems, Kohlbettstr. 15, 57072 Siegen, Germany
| | - Ralf Plattfaut
- University of Duisburg-Essen, Information Systems and Transformation Management, Universitätsstr. 9, 45141 Essen, Germany
| | - Bjoern Niehaves
- University of Bremen, Digital Public, Bibliothekstr. 1, 28359 Bremen, Germany
| |
Collapse
|
12
|
Wahid KA, Kaffey ZY, Farris DP, Humbert-Vidan L, Moreno AC, Rasmussen M, Ren J, Naser MA, Netherton TJ, Korreman S, Balakrishnan G, Fuller CD, Fuentes D, Dohopolski MJ. Artificial intelligence uncertainty quantification in radiotherapy applications - A scoping review. Radiother Oncol 2024; 201:110542. [PMID: 39299574 PMCID: PMC11648575 DOI: 10.1016/j.radonc.2024.110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Revised: 08/18/2024] [Accepted: 09/09/2024] [Indexed: 09/22/2024]
Abstract
BACKGROUND/PURPOSE The use of artificial intelligence (AI) in radiotherapy (RT) is expanding rapidly. However, there exists a notable lack of clinician trust in AI models, underscoring the need for effective uncertainty quantification (UQ) methods. The purpose of this study was to scope existing literature related to UQ in RT, identify areas of improvement, and determine future directions. METHODS We followed the PRISMA-ScR scoping review reporting guidelines. We utilized the population (human cancer patients), concept (utilization of AI UQ), context (radiotherapy applications) framework to structure our search and screening process. We conducted a systematic search spanning seven databases, supplemented by manual curation, up to January 2024. Our search yielded a total of 8980 articles for initial review. Manuscript screening and data extraction was performed in Covidence. Data extraction categories included general study characteristics, RT characteristics, AI characteristics, and UQ characteristics. RESULTS We identified 56 articles published from 2015 to 2024. 10 domains of RT applications were represented; most studies evaluated auto-contouring (50 %), followed by image-synthesis (13 %), and multiple applications simultaneously (11 %). 12 disease sites were represented, with head and neck cancer being the most common disease site independent of application space (32 %). Imaging data was used in 91 % of studies, while only 13 % incorporated RT dose information. Most studies focused on failure detection as the main application of UQ (60 %), with Monte Carlo dropout being the most commonly implemented UQ method (32 %) followed by ensembling (16 %). 55 % of studies did not share code or datasets. CONCLUSION Our review revealed a lack of diversity in UQ for RT applications beyond auto-contouring. Moreover, we identified a clear need to study additional UQ methods, such as conformal prediction. Our results may incentivize the development of guidelines for reporting and implementation of UQ in RT.
Collapse
Affiliation(s)
- Kareem A Wahid
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zaphanlene Y Kaffey
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David P Farris
- Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Laia Humbert-Vidan
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Amy C Moreno
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Jintao Ren
- Department of Oncology, Aarhus University Hospital, Denmark
| | - Mohamed A Naser
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Tucker J Netherton
- Department of Radiation Physics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Stine Korreman
- Department of Oncology, Aarhus University Hospital, Denmark
| | | | - Clifton D Fuller
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - David Fuentes
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
| | - Michael J Dohopolski
- Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, TX, USA.
| |
Collapse
|
13
|
Prinster D, Mahmood A, Saria S, Jeudy J, Lin CT, Yi PH, Huang CM, Wolfe S. Care to Explain? AI Explanation Types Differentially Impact Chest Radiograph Diagnostic Performance and Physician Trust in AI. Radiology 2024; 313:e233261. [PMID: 39560483 PMCID: PMC11605106 DOI: 10.1148/radiol.233261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 09/18/2024] [Accepted: 09/24/2024] [Indexed: 11/20/2024]
Abstract
Background It is unclear whether artificial intelligence (AI) explanations help or hurt radiologists and other physicians in AI-assisted radiologic diagnostic decision-making. Purpose To test whether the type of AI explanation and the correctness and confidence level of AI advice impact physician diagnostic performance, perception of AI advice usefulness, and trust in AI advice for chest radiograph diagnosis. Materials and Methods A multicenter, prospective randomized study was conducted from April 2022 to September 2022. Two types of AI explanations prevalent in medical imaging-local (feature-based) explanations and global (prototype-based) explanations-were a between-participant factor, while AI correctness and confidence were within-participant factors. Radiologists (task experts) and internal or emergency medicine physicians (task nonexperts) received a chest radiograph to read; then, simulated AI advice was presented. Generalized linear mixed-effects models were used to analyze the effects of the experimental variables on diagnostic accuracy, efficiency, physician perception of AI usefulness, and "simple trust" (ie, speed of alignment with or divergence from AI advice); the control variables included knowledge of AI, demographic characteristics, and task expertise. Holm-Sidak corrections were used to adjust for multiple comparisons. Results Data from 220 physicians (median age, 30 years [IQR, 28-32.75 years]; 146 male participants) were analyzed. Compared with global AI explanations, local AI explanations yielded better physician diagnostic accuracy when the AI advice was correct (β = 0.86; P value adjusted for multiple comparisons [Padj] < .001) and increased diagnostic efficiency overall by reducing the time spent considering AI advice (β = -0.19; Padj = .01). While there were interaction effects of explanation type, AI confidence level, and physician task expertise on diagnostic accuracy (β = -1.05; Padj = .04), there was no evidence that AI explanation type or AI confidence level significantly affected subjective measures (physician diagnostic confidence and perception of AI usefulness). Finally, radiologists and nonradiologists placed greater simple trust in local AI explanations than in global explanations, regardless of the correctness of the AI advice (β = 1.32; Padj = .048). Conclusion The type of AI explanation impacted physician diagnostic performance and trust in AI, even when physicians themselves were not aware of such effects. © RSNA, 2024 Supplemental material is available for this article.
Collapse
Affiliation(s)
| | | | - Suchi Saria
- From the Department of Computer Science, Johns Hopkins University,
3400 N Charles St, Baltimore, MD 21218 (D.P., A.M., S.S., C.M.H.); Bayesian
Health, New York, NY (S.S.); Department of Diagnostic Radiology, University of
Maryland School of Medicine, Baltimore, Md (J.J., P.H.Y.); Department of
Radiology, St Jude Children’s Research Hospital, Memphis, Tenn (P.H.Y.);
and Department of Radiology, Johns Hopkins University School of Medicine,
Baltimore, Md (C.T.L.)
| | - Jean Jeudy
- From the Department of Computer Science, Johns Hopkins University,
3400 N Charles St, Baltimore, MD 21218 (D.P., A.M., S.S., C.M.H.); Bayesian
Health, New York, NY (S.S.); Department of Diagnostic Radiology, University of
Maryland School of Medicine, Baltimore, Md (J.J., P.H.Y.); Department of
Radiology, St Jude Children’s Research Hospital, Memphis, Tenn (P.H.Y.);
and Department of Radiology, Johns Hopkins University School of Medicine,
Baltimore, Md (C.T.L.)
| | - Cheng Ting Lin
- From the Department of Computer Science, Johns Hopkins University,
3400 N Charles St, Baltimore, MD 21218 (D.P., A.M., S.S., C.M.H.); Bayesian
Health, New York, NY (S.S.); Department of Diagnostic Radiology, University of
Maryland School of Medicine, Baltimore, Md (J.J., P.H.Y.); Department of
Radiology, St Jude Children’s Research Hospital, Memphis, Tenn (P.H.Y.);
and Department of Radiology, Johns Hopkins University School of Medicine,
Baltimore, Md (C.T.L.)
| | | | | | - Shannyn Wolfe
- From the Department of Computer Science, Johns Hopkins University,
3400 N Charles St, Baltimore, MD 21218 (D.P., A.M., S.S., C.M.H.); Bayesian
Health, New York, NY (S.S.); Department of Diagnostic Radiology, University of
Maryland School of Medicine, Baltimore, Md (J.J., P.H.Y.); Department of
Radiology, St Jude Children’s Research Hospital, Memphis, Tenn (P.H.Y.);
and Department of Radiology, Johns Hopkins University School of Medicine,
Baltimore, Md (C.T.L.)
| |
Collapse
|
14
|
Reis M, Reis F, Kunde W. Influence of believed AI involvement on the perception of digital medical advice. Nat Med 2024; 30:3098-3100. [PMID: 39054373 PMCID: PMC11564086 DOI: 10.1038/s41591-024-03180-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 07/04/2024] [Indexed: 07/27/2024]
Abstract
Large language models offer novel opportunities to seek digital medical advice. While previous research primarily addressed the performance of such artificial intelligence (AI)-based tools, public perception of these advancements received little attention. In two preregistered studies (n = 2,280), we presented participants with scenarios of patients obtaining medical advice. All participants received identical information, but we manipulated the putative source of this advice ('AI', 'human physician', 'human + AI'). 'AI'- and 'human + AI'-labeled advice was evaluated as significantly less reliable and less empathetic compared with 'human'-labeled advice. Moreover, participants indicated lower willingness to follow the advice when AI was believed to be involved in advice generation. Our findings point toward an anti-AI bias when receiving digital medical advice, even when AI is supposedly supervised by physicians. Given the tremendous potential of AI for medicine, elucidating ways to counteract this bias should be an important objective of future research.
Collapse
Affiliation(s)
- Moritz Reis
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany.
- Judge Business School, University of Cambridge, Cambridge, UK.
| | - Florian Reis
- Medical Affairs, Pfizer Pharma GmbH, Berlin, Germany
| | - Wilfried Kunde
- Institute of Psychology, Julius-Maximilians-Universität Würzburg, Würzburg, Germany
| |
Collapse
|
15
|
Kenny R, Fischhoff B, Davis A, Canfield C. Improving Social Bot Detection Through Aid and Training. HUMAN FACTORS 2024; 66:2323-2344. [PMID: 37963198 PMCID: PMC11382440 DOI: 10.1177/00187208231210145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 10/09/2023] [Indexed: 11/16/2023]
Abstract
OBJECTIVE We test the effects of three aids on individuals' ability to detect social bots among Twitter personas: a bot indicator score, a training video, and a warning. BACKGROUND Detecting social bots can prevent online deception. We use a simulated social media task to evaluate three aids. METHOD Lay participants judged whether each of 60 Twitter personas was a human or social bot in a simulated online environment, using agreement between three machine learning algorithms to estimate the probability of each persona being a bot. Experiment 1 compared a control group and two intervention groups, one provided a bot indicator score for each tweet; the other provided a warning about social bots. Experiment 2 compared a control group and two intervention groups, one receiving the bot indicator scores and the other a training video, focused on heuristics for identifying social bots. RESULTS The bot indicator score intervention improved predictive performance and reduced overconfidence in both experiments. The training video was also effective, although somewhat less so. The warning had no effect. Participants rarely reported willingness to share content for a persona that they labeled as a bot, even when they agreed with it. CONCLUSIONS Informative interventions improved social bot detection; warning alone did not. APPLICATION We offer an experimental testbed and methodology that can be used to evaluate and refine interventions designed to reduce vulnerability to social bots. We show the value of two interventions that could be applied in many settings.
Collapse
Affiliation(s)
- Ryan Kenny
- United States Army, Fayetteville, NC, USA
| | | | - Alex Davis
- Carnegie Mellon University, Pittsburgh, PA, USA
| | - Casey Canfield
- Missouri University of Science and Technology, Rolla, MO, USA
| |
Collapse
|
16
|
Mortlock R, Lucas C. Generative artificial intelligence (Gen-AI) in pharmacy education: Utilization and implications for academic integrity: A scoping review. EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY 2024; 15:100481. [PMID: 39184524 PMCID: PMC11341932 DOI: 10.1016/j.rcsop.2024.100481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 08/27/2024] Open
Abstract
Introduction Generative artificial intelligence (Gen-AI), exemplified by the widely adopted ChatGPT, has garnered significant attention in recent years. Its application spans various health education domains, including pharmacy, where its potential benefits and drawbacks have become increasingly apparent. Despite the growing adoption of Gen-AIsuch as ChatGPT in pharmacy education, there remains a critical need to assess and mitigate associated risks. This review exploresthe literature and potential strategies for mitigating risks associated with the integration of Gen-AI in pharmacy education. Aim To conduct a scoping review to identify implications of Gen-AI in pharmacy education, identify its use and emerging evidence, with a particular focus on strategies which mitigate potential risks to academic integrity. Methods A scoping review strategy was employed in accordance with the PRISMA-ScR guidelines. Databases searched includedPubMed, ERIC [Education Resources Information Center], Scopus and ProQuestfrom August 2023 to 20 February 2024 and included all relevant records from 1 January 2000 to 20 February 2024 relating specifically to LLM use within pharmacy education. A grey literature search was also conducted due to the emerging nature of this topic. Policies, procedures, and documents from institutions such as universities and colleges, including standards, guidelines, and policy documents, were hand searched and reviewed in their most updated form. These documents were not published in the scientific literature or indexed in academic search engines. Results Articles (n = 12) were derived from the scientific data bases and Records (n = 9) derived from the grey literature. Potential use and benefits of Gen-AI within pharmacy education were identified in all included published articles however there was a paucity of published articles related the degree of consideration to the potential risks to academic integrity. Grey literature recordsheld the largest proportion of risk mitigation strategies largely focusing on increased academic and student education and training relating to the ethical use of Gen-AI as well considerations for redesigning of current assessments likely to be a risk for Gen-AI use to academic integrity. Conclusion Drawing upon existing literature, this review highlights the importance of evidence-based approaches to address the challenges posed by Gen-AI such as ChatGPT in pharmacy education settings. Additionally, whilst mitigation strategies are suggested, primarily drawn from the grey literature, there is a paucity of traditionally published scientific literature outlining strategies for the practical and ethical implementation of Gen-AI within pharmacy education. Further research related to the responsible and ethical use of Gen-AIin pharmacy curricula; and studies related to strategies adopted to mitigate risks to academic integrity would be beneficial.
Collapse
Affiliation(s)
- R. Mortlock
- Graduate School of Health, Faculty of Health, University of Technology, Sydney, Australia
| | - C. Lucas
- Graduate School of Health, Faculty of Health, University of Technology, Sydney, Australia
- School of Population Health, Faculty of Medicine and Health, University of NSW, Sydney, Australia
- Connected Intelligence Centre (CIC), University of Technology Sydney, Australia
| |
Collapse
|
17
|
Topff L, Steltenpool S, Ranschaert ER, Ramanauskas N, Menezes R, Visser JJ, Beets-Tan RGH, Hartkamp NS. Artificial intelligence-assisted double reading of chest radiographs to detect clinically relevant missed findings: a two-centre evaluation. Eur Radiol 2024; 34:5876-5885. [PMID: 38466390 PMCID: PMC11364654 DOI: 10.1007/s00330-024-10676-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 01/21/2024] [Accepted: 02/01/2024] [Indexed: 03/13/2024]
Abstract
OBJECTIVES To evaluate an artificial intelligence (AI)-assisted double reading system for detecting clinically relevant missed findings on routinely reported chest radiographs. METHODS A retrospective study was performed in two institutions, a secondary care hospital and tertiary referral oncology centre. Commercially available AI software performed a comparative analysis of chest radiographs and radiologists' authorised reports using a deep learning and natural language processing algorithm, respectively. The AI-detected discrepant findings between images and reports were assessed for clinical relevance by an external radiologist, as part of the commercial service provided by the AI vendor. The selected missed findings were subsequently returned to the institution's radiologist for final review. RESULTS In total, 25,104 chest radiographs of 21,039 patients (mean age 61.1 years ± 16.2 [SD]; 10,436 men) were included. The AI software detected discrepancies between imaging and reports in 21.1% (5289 of 25,104). After review by the external radiologist, 0.9% (47 of 5289) of cases were deemed to contain clinically relevant missed findings. The institution's radiologists confirmed 35 of 47 missed findings (74.5%) as clinically relevant (0.1% of all cases). Missed findings consisted of lung nodules (71.4%, 25 of 35), pneumothoraces (17.1%, 6 of 35) and consolidations (11.4%, 4 of 35). CONCLUSION The AI-assisted double reading system was able to identify missed findings on chest radiographs after report authorisation. The approach required an external radiologist to review the AI-detected discrepancies. The number of clinically relevant missed findings by radiologists was very low. CLINICAL RELEVANCE STATEMENT The AI-assisted double reader workflow was shown to detect diagnostic errors and could be applied as a quality assurance tool. Although clinically relevant missed findings were rare, there is potential impact given the common use of chest radiography. KEY POINTS • A commercially available double reading system supported by artificial intelligence was evaluated to detect reporting errors in chest radiographs (n=25,104) from two institutions. • Clinically relevant missed findings were found in 0.1% of chest radiographs and consisted of unreported lung nodules, pneumothoraces and consolidations. • Applying AI software as a secondary reader after report authorisation can assist in reducing diagnostic errors without interrupting the radiologist's reading workflow. However, the number of AI-detected discrepancies was considerable and required review by a radiologist to assess their relevance.
Collapse
Affiliation(s)
- Laurens Topff
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands.
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands.
| | - Sanne Steltenpool
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| | - Erik R Ranschaert
- Department of Radiology, St. Nikolaus Hospital, Eupen, Belgium
- Ghent University, Ghent, Belgium
| | - Naglis Ramanauskas
- Oxipit UAB, Vilnius, Lithuania
- Department of Radiology, Nuclear Medicine and Medical Physics, Institute of Biomedical Sciences, Faculty of Medicine, Vilnius University, Vilnius, Lithuania
| | - Renee Menezes
- Biostatistics Centre, Department of Psychosocial Research and Epidemiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
| | - Jacob J Visser
- Department of Radiology and Nuclear Medicine, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Regina G H Beets-Tan
- Department of Radiology, Netherlands Cancer Institute, Amsterdam, The Netherlands
- GROW School for Oncology and Reproduction, Maastricht University, Maastricht, The Netherlands
| | - Nolan S Hartkamp
- Department of Radiology, Elisabeth-TweeSteden Hospital, Tilburg, The Netherlands
| |
Collapse
|
18
|
Desolda G, Dimauro G, Esposito A, Lanzilotti R, Matera M, Zancanaro M. A Human-AI interaction paradigm and its application to rhinocytology. Artif Intell Med 2024; 155:102933. [PMID: 39094227 DOI: 10.1016/j.artmed.2024.102933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 07/17/2024] [Accepted: 07/19/2024] [Indexed: 08/04/2024]
Abstract
This article explores Human-Centered Artificial Intelligence (HCAI) in medical cytology, with a focus on enhancing the interaction with AI. It presents a Human-AI interaction paradigm that emphasizes explainability and user control of AI systems. It is an iterative negotiation process based on three interaction strategies aimed to (i) elaborate the system outcomes through iterative steps (Iterative Exploration), (ii) explain the AI system's behavior or decisions (Clarification), and (iii) allow non-expert users to trigger simple retraining of the AI model (Reconfiguration). This interaction paradigm is exploited in the redesign of an existing AI-based tool for microscopic analysis of the nasal mucosa. The resulting tool is tested with rhinocytologists. The article discusses the analysis of the results of the conducted evaluation and outlines lessons learned that are relevant for AI in medicine.
Collapse
Affiliation(s)
- Giuseppe Desolda
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Giovanni Dimauro
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Andrea Esposito
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Rosa Lanzilotti
- Department of Computer Science, University of Bari Aldo Moro, Via E. Orabona 4, Bari, 70125, Italy.
| | - Maristella Matera
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Via Ponzio 34/5, Milan, 20133, Italy.
| | - Massimo Zancanaro
- Department of Psychology and Cognitive Science, University of Trento, Corso Bettini 31, Rovereto, 38068, Italy; Fondazione Bruno Kessler, Povo, Trento, 38123, Italy.
| |
Collapse
|
19
|
McCradden MD, Stedman I. Explaining decisions without explainability? Artificial intelligence and medicolegal accountability. Future Healthc J 2024; 11:100171. [PMID: 39371527 PMCID: PMC11452834 DOI: 10.1016/j.fhj.2024.100171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 08/06/2024] [Indexed: 10/08/2024]
Abstract
Image, graphical abstract.
Collapse
Affiliation(s)
- Melissa D. McCradden
- Australian Institute for Machine Learning, University of Adelaide, Australia
- Women's and Children's Hospital, Adelaide, Australia
- SickKids Research Institute, Toronto, Canada
| | - Ian Stedman
- School of Public Policy and Administration at York University, Toronto, Ontario, Canada
| |
Collapse
|
20
|
Moosavi A, Huang S, Vahabi M, Motamedivafa B, Tian N, Mahmood R, Liu P, Sun CL. Prospective Human Validation of Artificial Intelligence Interventions in Cardiology: A Scoping Review. JACC. ADVANCES 2024; 3:101202. [PMID: 39372457 PMCID: PMC11450923 DOI: 10.1016/j.jacadv.2024.101202] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 07/09/2024] [Accepted: 07/11/2024] [Indexed: 10/08/2024]
Abstract
Background Despite the potential of artificial intelligence (AI) in enhancing cardiovascular care, its integration into clinical practice is limited by a lack of evidence on its effectiveness with respect to human experts or gold standard practices in real-world settings. Objectives The purpose of this study was to identify AI interventions in cardiology that have been prospectively validated against human expert benchmarks or gold standard practices, assessing their effectiveness, and identifying future research areas. Methods We systematically reviewed Scopus and MEDLINE to identify peer-reviewed publications that involved prospective human validation of AI-based interventions in cardiology from January 2015 to December 2023. Results Of 2,351 initial records, 64 studies were included. Among these studies, 59 (92.2%) were published after 2020. A total of 11 (17.2%) randomized controlled trials were published. AI interventions in 44 articles (68.75%) reported definite clinical or operational improvements over human experts. These interventions were mostly used in imaging (n = 14, 21.9%), ejection fraction (n = 10, 15.6%), arrhythmia (n = 9, 14.1%), and coronary artery disease (n = 12, 18.8%) application areas. Convolutional neural networks were the most common predictive model (n = 44, 69%), and images were the most used data type (n = 38, 54.3%). Only 22 (34.4%) studies made their models or data accessible. Conclusions This review identifies the potential of AI in cardiology, with models often performing equally well as human counterparts for specific and clearly scoped tasks suitable for such models. Nonetheless, the limited number of randomized controlled trials emphasizes the need for continued validation, especially in real-world settings that closely examine joint human AI decision-making.
Collapse
Affiliation(s)
- Amirhossein Moosavi
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Steven Huang
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Maryam Vahabi
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Bahar Motamedivafa
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Nelly Tian
- Marshall School of Business, University of Southern California, Los Angeles, California, USA
| | - Rafid Mahmood
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
| | - Peter Liu
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| | - Christopher L.F. Sun
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
- University of Ottawa Heart Institute, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
21
|
Dingel J, Kleine AK, Cecil J, Sigl AL, Lermer E, Gaube S. Predictors of Health Care Practitioners' Intention to Use AI-Enabled Clinical Decision Support Systems: Meta-Analysis Based on the Unified Theory of Acceptance and Use of Technology. J Med Internet Res 2024; 26:e57224. [PMID: 39102675 PMCID: PMC11333871 DOI: 10.2196/57224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Revised: 05/03/2024] [Accepted: 05/13/2024] [Indexed: 08/07/2024] Open
Abstract
BACKGROUND Artificial intelligence-enabled clinical decision support systems (AI-CDSSs) offer potential for improving health care outcomes, but their adoption among health care practitioners remains limited. OBJECTIVE This meta-analysis identified predictors influencing health care practitioners' intention to use AI-CDSSs based on the Unified Theory of Acceptance and Use of Technology (UTAUT). Additional predictors were examined based on existing empirical evidence. METHODS The literature search using electronic databases, forward searches, conference programs, and personal correspondence yielded 7731 results, of which 17 (0.22%) studies met the inclusion criteria. Random-effects meta-analysis, relative weight analyses, and meta-analytic moderation and mediation analyses were used to examine the relationships between relevant predictor variables and the intention to use AI-CDSSs. RESULTS The meta-analysis results supported the application of the UTAUT to the context of the intention to use AI-CDSSs. The results showed that performance expectancy (r=0.66), effort expectancy (r=0.55), social influence (r=0.66), and facilitating conditions (r=0.66) were positively associated with the intention to use AI-CDSSs, in line with the predictions of the UTAUT. The meta-analysis further identified positive attitude (r=0.63), trust (r=0.73), anxiety (r=-0.41), perceived risk (r=-0.21), and innovativeness (r=0.54) as additional relevant predictors. Trust emerged as the most influential predictor overall. The results of the moderation analyses show that the relationship between social influence and use intention becomes weaker with increasing age. In addition, the relationship between effort expectancy and use intention was stronger for diagnostic AI-CDSSs than for devices that combined diagnostic and treatment recommendations. Finally, the relationship between facilitating conditions and use intention was mediated through performance and effort expectancy. CONCLUSIONS This meta-analysis contributes to the understanding of the predictors of intention to use AI-CDSSs based on an extended UTAUT model. More research is needed to substantiate the identified relationships and explain the observed variations in effect sizes by identifying relevant moderating factors. The research findings bear important implications for the design and implementation of training programs for health care practitioners to ease the adoption of AI-CDSSs into their practice.
Collapse
Affiliation(s)
- Julius Dingel
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Anne-Kathrin Kleine
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Julia Cecil
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
| | - Anna Leonie Sigl
- Department of Liberal Arts and Sciences, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Eva Lermer
- Human-AI-Interaction Group, Center for Leadership and People Management, Ludwig Maximilian University of Munich, Munich, Germany
- Department of Liberal Arts and Sciences, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Susanne Gaube
- Human Factors in Healthcare, Global Business School for Health, University College London, London, United Kingdom
| |
Collapse
|
22
|
Rainey C, Bond R, McConnell J, Hughes C, Kumar D, McFadden S. Reporting radiographers' interaction with Artificial Intelligence-How do different forms of AI feedback impact trust and decision switching? PLOS DIGITAL HEALTH 2024; 3:e0000560. [PMID: 39110687 PMCID: PMC11305567 DOI: 10.1371/journal.pdig.0000560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/22/2024] [Indexed: 08/10/2024]
Abstract
Artificial Intelligence (AI) has been increasingly integrated into healthcare settings, including the radiology department to aid radiographic image interpretation, including reporting by radiographers. Trust has been cited as a barrier to effective clinical implementation of AI. Appropriating trust will be important in the future with AI to ensure the ethical use of these systems for the benefit of the patient, clinician and health services. Means of explainable AI, such as heatmaps have been proposed to increase AI transparency and trust by elucidating which parts of image the AI 'focussed on' when making its decision. The aim of this novel study was to quantify the impact of different forms of AI feedback on the expert clinicians' trust. Whilst this study was conducted in the UK, it has potential international application and impact for AI interface design, either globally or in countries with similar cultural and/or economic status to the UK. A convolutional neural network was built for this study; trained, validated and tested on a publicly available dataset of MUsculoskeletal RAdiographs (MURA), with binary diagnoses and Gradient Class Activation Maps (GradCAM) as outputs. Reporting radiographers (n = 12) were recruited to this study from all four regions of the UK. Qualtrics was used to present each participant with a total of 18 complete examinations from the MURA test dataset (each examination contained more than one radiographic image). Participants were presented with the images first, images with heatmaps next and finally an AI binary diagnosis in a sequential order. Perception of trust in the AI systems was obtained following the presentation of each heatmap and binary feedback. The participants were asked to indicate whether they would change their mind (or decision switch) in response to the AI feedback. Participants disagreed with the AI heatmaps for the abnormal examinations 45.8% of the time and agreed with binary feedback on 86.7% of examinations (26/30 presentations).'Only two participants indicated that they would decision switch in response to all AI feedback (GradCAM and binary) (0.7%, n = 2) across all datasets. 22.2% (n = 32) of participants agreed with the localisation of pathology on the heatmap. The level of agreement with the GradCAM and binary diagnosis was found to be correlated with trust (GradCAM:-.515;-.584, significant large negative correlation at 0.01 level (p = < .01 and-.309;-.369, significant medium negative correlation at .01 level (p = < .01) for GradCAM and binary diagnosis respectively). This study shows that the extent of agreement with both AI binary diagnosis and heatmap is correlated with trust in AI for the participants in this study, where greater agreement with the form of AI feedback is associated with greater trust in AI, in particular in the heatmap form of AI feedback. Forms of explainable AI should be developed with cognisance of the need for precision and accuracy in localisation to promote appropriate trust in clinical end users.
Collapse
Affiliation(s)
- Clare Rainey
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| | - Raymond Bond
- Ulster University, School of Computing, York St, Belfast, Northern Ireland
| | | | - Ciara Hughes
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| | - Devinder Kumar
- School of Medicine, Stanford University, California, United States of America
| | - Sonyia McFadden
- Ulster University, School of Health Sciences, York St, Belfast, Northern Ireland
| |
Collapse
|
23
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. J Am Coll Radiol 2024; 21:1292-1310. [PMID: 38276923 DOI: 10.1016/j.jacr.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Artificial intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. KEY POINTS.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama; American College of Radiology Data Science Institute, Reston, Virginia
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California; Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, California
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany; Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts; Tufts University Medical School, Boston, Massachusetts; Commision on Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia; College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
24
|
Montomoli J, Bitondo MM, Cascella M, Rezoagli E, Romeo L, Bellini V, Semeraro F, Gamberini E, Frontoni E, Agnoletti V, Altini M, Benanti P, Bignami EG. Algor-ethics: charting the ethical path for AI in critical care. J Clin Monit Comput 2024; 38:931-939. [PMID: 38573370 PMCID: PMC11297831 DOI: 10.1007/s10877-024-01157-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 03/22/2024] [Indexed: 04/05/2024]
Abstract
The integration of Clinical Decision Support Systems (CDSS) based on artificial intelligence (AI) in healthcare is groundbreaking evolution with enormous potential, but its development and ethical implementation, presents unique challenges, particularly in critical care, where physicians often deal with life-threating conditions requiring rapid actions and patients unable to participate in the decisional process. Moreover, development of AI-based CDSS is complex and should address different sources of bias, including data acquisition, health disparities, domain shifts during clinical use, and cognitive biases in decision-making. In this scenario algor-ethics is mandatory and emphasizes the integration of 'Human-in-the-Loop' and 'Algorithmic Stewardship' principles, and the benefits of advanced data engineering. The establishment of Clinical AI Departments (CAID) is necessary to lead AI innovation in healthcare, ensuring ethical integrity and human-centered development in this rapidly evolving field.
Collapse
Affiliation(s)
- Jonathan Montomoli
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
- Health Services Research, Evaluation and Policy Unit, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy.
| | - Maria Maddalena Bitondo
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Marco Cascella
- Unit of Anesthesia and Pain Medicine, Department of Medicine, Surgery and Dentistry "Scuola Medica Salernitana, " University of Salerno, Baronissi, Salerno, Italy
| | - Emanuele Rezoagli
- School of Medicine and Surgery, University of Milano-Bicocca, Via Cadore, 48, Monza, 20900, Italy
- Dipartimento di Emergenza e Urgenza, Terapia intensiva e Semintensiva adulti e pediatrica, Fondazione IRCCS San Gerardo dei Tintori, Via Pergolesi, 33, Monza, 20900, Italy
| | - Luca Romeo
- Department of Economics and Law, University of Macerata, Macerata, 62100, Italy
| | - Valentina Bellini
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| | - Federico Semeraro
- Department of Anesthesia, Intensive Care and Prehospital Emergency, Ospedale Maggiore Carlo Alberto Pizzardi, Largo Bartolo Nigrisoli, 2, Bologna, 40133, Italy
| | - Emiliano Gamberini
- Department of Anesthesia and Intensive Care, Infermi Hospital, Romagna Local Health Authority, Viale Settembrini 2, Rimini, 47923, Italy
| | - Emanuele Frontoni
- Department of Political Sciences, Communication and International Relations, University of Macerata, Macerata, 62100, Italy
| | - Vanni Agnoletti
- Department of Surgery and Trauma, Anesthesia and Intensive Care Unit, Maurizio Bufalini Hospital, Romagna Local Health Authority, Viale Giovanni Ghirotti, 286, Cesena, 47521, Italy
| | - Mattia Altini
- Hospital Care Sector, Emilia-Romagna Region, Via Aldo Moro, 21, Bologna, 40127, Italy
| | - Paolo Benanti
- Pontifical Gregorian University, Piazza della Pilotta 4, Roma, 00187, Italy
| | - Elena Giovanna Bignami
- Anesthesiology, Critical Care and Pain Medicine Division, Department of Medicine and Surgery, University of Parma, Via Gramsci 14, Parma, 43125, Italy
| |
Collapse
|
25
|
Kostick-Quenet K, Lang BH, Smith J, Hurley M, Blumenthal-Barby J. Trust criteria for artificial intelligence in health: normative and epistemic considerations. JOURNAL OF MEDICAL ETHICS 2024; 50:544-551. [PMID: 37979976 PMCID: PMC11101592 DOI: 10.1136/jme-2023-109338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 11/02/2023] [Indexed: 11/20/2023]
Abstract
Rapid advancements in artificial intelligence and machine learning (AI/ML) in healthcare raise pressing questions about how much users should trust AI/ML systems, particularly for high stakes clinical decision-making. Ensuring that user trust is properly calibrated to a tool's computational capacities and limitations has both practical and ethical implications, given that overtrust or undertrust can influence over-reliance or under-reliance on algorithmic tools, with significant implications for patient safety and health outcomes. It is, thus, important to better understand how variability in trust criteria across stakeholders, settings, tools and use cases may influence approaches to using AI/ML tools in real settings. As part of a 5-year, multi-institutional Agency for Health Care Research and Quality-funded study, we identify trust criteria for a survival prediction algorithm intended to support clinical decision-making for left ventricular assist device therapy, using semistructured interviews (n=40) with patients and physicians, analysed via thematic analysis. Findings suggest that physicians and patients share similar empirical considerations for trust, which were primarily epistemic in nature, focused on accuracy and validity of AI/ML estimates. Trust evaluations considered the nature, integrity and relevance of training data rather than the computational nature of algorithms themselves, suggesting a need to distinguish 'source' from 'functional' explainability. To a lesser extent, trust criteria were also relational (endorsement from others) and sometimes based on personal beliefs and experience. We discuss implications for promoting appropriate and responsible trust calibration for clinical decision-making use AI/ML.
Collapse
Affiliation(s)
- Kristin Kostick-Quenet
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Benjamin H Lang
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
- Department of Philosophy, University of Oxford, Oxford, Oxfordshire, UK
| | - Jared Smith
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | - Meghan Hurley
- Center for Medical Ethics and Health Policy, Baylor College of Medicine, Houston, Texas, USA
| | | |
Collapse
|
26
|
Chang JY, Makary MS. Evolving and Novel Applications of Artificial Intelligence in Thoracic Imaging. Diagnostics (Basel) 2024; 14:1456. [PMID: 39001346 PMCID: PMC11240935 DOI: 10.3390/diagnostics14131456] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 07/01/2024] [Accepted: 07/06/2024] [Indexed: 07/16/2024] Open
Abstract
The advent of artificial intelligence (AI) is revolutionizing medicine, particularly radiology. With the development of newer models, AI applications are demonstrating improved performance and versatile utility in the clinical setting. Thoracic imaging is an area of profound interest, given the prevalence of chest imaging and the significant health implications of thoracic diseases. This review aims to highlight the promising applications of AI within thoracic imaging. It examines the role of AI, including its contributions to improving diagnostic evaluation and interpretation, enhancing workflow, and aiding in invasive procedures. Next, it further highlights the current challenges and limitations faced by AI, such as the necessity of 'big data', ethical and legal considerations, and bias in representation. Lastly, it explores the potential directions for the application of AI in thoracic radiology.
Collapse
Affiliation(s)
- Jin Y Chang
- Department of Radiology, The Ohio State University College of Medicine, Columbus, OH 43210, USA
| | - Mina S Makary
- Department of Radiology, The Ohio State University College of Medicine, Columbus, OH 43210, USA
- Division of Vascular and Interventional Radiology, Department of Radiology, The Ohio State University Wexner Medical Center, Columbus, OH 43210, USA
| |
Collapse
|
27
|
Day TG, Matthew J, Budd SF, Venturini L, Wright R, Farruggia A, Vigneswaran TV, Zidere V, Hajnal JV, Razavi R, Simpson JM, Kainz B. Interaction between clinicians and artificial intelligence to detect fetal atrioventricular septal defects on ultrasound: how can we optimize collaborative performance? ULTRASOUND IN OBSTETRICS & GYNECOLOGY : THE OFFICIAL JOURNAL OF THE INTERNATIONAL SOCIETY OF ULTRASOUND IN OBSTETRICS AND GYNECOLOGY 2024; 64:28-35. [PMID: 38197584 DOI: 10.1002/uog.27577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/19/2023] [Accepted: 12/30/2023] [Indexed: 01/11/2024]
Abstract
OBJECTIVES Artificial intelligence (AI) has shown promise in improving the performance of fetal ultrasound screening in detecting congenital heart disease (CHD). The effect of giving AI advice to human operators has not been studied in this context. Giving additional information about AI model workings, such as confidence scores for AI predictions, may be a way of further improving performance. Our aims were to investigate whether AI advice improved overall diagnostic accuracy (using a single CHD lesion as an exemplar), and to determine what, if any, additional information given to clinicians optimized the overall performance of the clinician-AI team. METHODS An AI model was trained to classify a single fetal CHD lesion (atrioventricular septal defect (AVSD)), using a retrospective cohort of 121 130 cardiac four-chamber images extracted from 173 ultrasound scan videos (98 with normal hearts, 75 with AVSD); a ResNet50 model architecture was used. Temperature scaling of model prediction probability was performed on a validation set, and gradient-weighted class activation maps (grad-CAMs) produced. Ten clinicians (two consultant fetal cardiologists, three trainees in pediatric cardiology and five fetal cardiac sonographers) were recruited from a center of fetal cardiology to participate. Each participant was shown 2000 fetal four-chamber images in a random order (1000 normal and 1000 AVSD). The dataset comprised 500 images, each shown in four conditions: (1) image alone without AI output; (2) image with binary AI classification; (3) image with AI model confidence; and (4) image with grad-CAM image overlays. The clinicians were asked to classify each image as normal or AVSD. RESULTS A total of 20 000 image classifications were recorded from 10 clinicians. The AI model alone achieved an accuracy of 0.798 (95% CI, 0.760-0.832), a sensitivity of 0.868 (95% CI, 0.834-0.902) and a specificity of 0.728 (95% CI, 0.702-0.754), and the clinicians without AI achieved an accuracy of 0.844 (95% CI, 0.834-0.854), a sensitivity of 0.827 (95% CI, 0.795-0.858) and a specificity of 0.861 (95% CI, 0.828-0.895). Showing a binary (normal or AVSD) AI model output resulted in significant improvement in accuracy to 0.865 (P < 0.001). This effect was seen in both experienced and less-experienced participants. Giving incorrect AI advice resulted in a significant deterioration in overall accuracy, from 0.761 to 0.693 (P < 0.001), which was driven by an increase in both Type-I and Type-II errors by the clinicians. This effect was worsened by showing model confidence (accuracy, 0.649; P < 0.001) or grad-CAM (accuracy, 0.644; P < 0.001). CONCLUSIONS AI has the potential to improve performance when used in collaboration with clinicians, even if the model performance does not reach expert level. Giving additional information about model workings such as model confidence and class activation map image overlays did not improve overall performance, and actually worsened performance for images for which the AI model was incorrect. © 2024 The Authors. Ultrasound in Obstetrics & Gynecology published by John Wiley & Sons Ltd on behalf of International Society of Ultrasound in Obstetrics and Gynecology.
Collapse
Affiliation(s)
- T G Day
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J Matthew
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - S F Budd
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - L Venturini
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Wright
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - A Farruggia
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - T V Vigneswaran
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - V Zidere
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
- Harris Birthright Research Centre, King's College London NHS Foundation Trust, London, UK
| | - J V Hajnal
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
| | - R Razavi
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - J M Simpson
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department of Congenital Heart Disease, Evelina London Children's Healthcare, Guy's and St Thomas' NHS Foundation Trust, London, UK
| | - B Kainz
- School of Biomedical Engineering and Imaging Sciences, Faculty of Life Sciences and Medicine, King's College London, London, UK
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Germany
- Department of Computing, Faculty of Engineering, Imperial College London, London, UK
| |
Collapse
|
28
|
Chen H, Ma X, Rives H, Serpedin A, Yao P, Rameau A. Trust in Machine Learning Driven Clinical Decision Support Tools Among Otolaryngologists. Laryngoscope 2024; 134:2799-2804. [PMID: 38230948 DOI: 10.1002/lary.31260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/29/2023] [Accepted: 12/20/2023] [Indexed: 01/18/2024]
Abstract
BACKGROUND Machine learning driven clinical decision support tools (ML-CDST) are on the verge of being integrated into clinical settings, including in Otolaryngology-Head & Neck Surgery. In this study, we investigated whether such CDST may influence otolaryngologists' diagnostic judgement. METHODS Otolaryngologists were recruited virtually across the United States for this experiment on human-AI interaction. Participants were shown 12 different video-stroboscopic exams from patients with previously diagnosed laryngopharyngeal reflux or vocal fold paresis and asked to determine the presence of disease. They were then exposed to a random diagnosis purportedly resulting from an ML-CDST and given the opportunity to revise their diagnosis. The ML-CDST output was presented with no explanation, a general explanation, or a specific explanation of its logic. The ML-CDST impact on diagnostic judgement was assessed with McNemar's test. RESULTS Forty-five participants were recruited. When participants reported less confidence (268 observations), they were significantly (p = 0.001) more likely to change their diagnostic judgement after exposure to ML-CDST output compared to when they reported more confidence (238 observations). Participants were more likely to change their diagnostic judgement when presented with a specific explanation of the CDST logic (p = 0.048). CONCLUSIONS Our study suggests that otolaryngologists are susceptible to accepting ML-CDST diagnostic recommendations, especially when less confident. Otolaryngologists' trust in ML-CDST output is increased when accompanied with a specific explanation of its logic. LEVEL OF EVIDENCE 2 Laryngoscope, 134:2799-2804, 2024.
Collapse
Affiliation(s)
- Hannah Chen
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Xiaoyue Ma
- Division of Biostatistics, Department of Population Health Sciences, Weill Cornell Medical College, New York, New York, USA
| | - Hal Rives
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Aisha Serpedin
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Peter Yao
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| | - Anaïs Rameau
- Sean Parker Institute for the Voice, Department of Otolaryngology-Head and Neck Surgery, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
29
|
Kotter E, Pinto Dos Santos D. [Ethics and artificial intelligence]. RADIOLOGIE (HEIDELBERG, GERMANY) 2024; 64:498-502. [PMID: 38499692 DOI: 10.1007/s00117-024-01286-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 02/26/2024] [Indexed: 03/20/2024]
Abstract
The introduction of artificial intelligence (AI) into radiology promises to enhance efficiency and improve diagnostic accuracy, yet it also raises manifold ethical questions. These include data protection issues, the future role of radiologists, liability when using AI systems, and the avoidance of bias. To prevent data bias, the datasets need to be compiled carefully and to be representative of the target population. Accordingly, the upcoming European Union AI act sets particularly high requirements for the datasets used in training medical AI systems. Cognitive bias occurs when radiologists place too much trust in the results provided by AI systems (overreliance). So far, diagnostic AI systems are used almost exclusively as "second look" systems. If diagnostic AI systems are to be used in the future as "first look" systems or even as autonomous AI systems in order to enhance efficiency in radiology, the question of liability needs to be addressed, comparable to liability for autonomous driving. Such use of AI would also significantly change the role of radiologists.
Collapse
Affiliation(s)
- Elmar Kotter
- Klinik für Diagnostische und Interventionelle Radiologie, Universitätsklinikum Freiburg, Hugstetterstr. 55, 79106, Freiburg, Deutschland.
| | - Daniel Pinto Dos Santos
- Institut für Diagnostische und Interventionelle Radiologie, Uniklinik Köln, Kerpener Str. 62, 50937, Köln, Deutschland.
- Institut für Diagnostische und Interventionelle Radiologie, Universitätsklinik Frankfurt, Theodor-Stern-Kai 7, 60596, Frankfurt am Main, Deutschland.
| |
Collapse
|
30
|
Hasani AM, Singh S, Zahergivar A, Ryan B, Nethala D, Bravomontenegro G, Mendhiratta N, Ball M, Farhadi F, Malayeri A. Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. Eur Radiol 2024; 34:3566-3574. [PMID: 37938381 DOI: 10.1007/s00330-023-10384-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/01/2023] [Accepted: 09/08/2023] [Indexed: 11/09/2023]
Abstract
OBJECTIVE Radiology reporting is an essential component of clinical diagnosis and decision-making. With the advent of advanced artificial intelligence (AI) models like GPT-4 (Generative Pre-trained Transformer 4), there is growing interest in evaluating their potential for optimizing or generating radiology reports. This study aimed to compare the quality and content of radiologist-generated and GPT-4 AI-generated radiology reports. METHODS A comparative study design was employed in the study, where a total of 100 anonymized radiology reports were randomly selected and analyzed. Each report was processed by GPT-4, resulting in the generation of a corresponding AI-generated report. Quantitative and qualitative analysis techniques were utilized to assess similarities and differences between the two sets of reports. RESULTS The AI-generated reports showed comparable quality to radiologist-generated reports in most categories. Significant differences were observed in clarity (p = 0.027), ease of understanding (p = 0.023), and structure (p = 0.050), favoring the AI-generated reports. AI-generated reports were more concise, with 34.53 fewer words and 174.22 fewer characters on average, but had greater variability in sentence length. Content similarity was high, with an average Cosine Similarity of 0.85, Sequence Matcher Similarity of 0.52, BLEU Score of 0.5008, and BERTScore F1 of 0.8775. CONCLUSION The results of this proof-of-concept study suggest that GPT-4 can be a reliable tool for generating standardized radiology reports, offering potential benefits such as improved efficiency, better communication, and simplified data extraction and analysis. However, limitations and ethical implications must be addressed to ensure the safe and effective implementation of this technology in clinical practice. CLINICAL RELEVANCE STATEMENT The findings of this study suggest that GPT-4 (Generative Pre-trained Transformer 4), an advanced AI model, has the potential to significantly contribute to the standardization and optimization of radiology reporting, offering improved efficiency and communication in clinical practice. KEY POINTS • Large language model-generated radiology reports exhibited high content similarity and moderate structural resemblance to radiologist-generated reports. • Performance metrics highlighted the strong matching of word selection and order, as well as high semantic similarity between AI and radiologist-generated reports. • Large language model demonstrated potential for generating standardized radiology reports, improving efficiency and communication in clinical settings.
Collapse
Affiliation(s)
- Amir M Hasani
- Laboratory of Translation Research, National Heart Blood Lung Institute, NIH, Bethesda, MD, USA
| | - Shiva Singh
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Aryan Zahergivar
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Beth Ryan
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Daniel Nethala
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | | | - Neil Mendhiratta
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Mark Ball
- Urology Oncology Branch, National Cancer Institute, NIH, Bethesda, MD, USA
| | - Faraz Farhadi
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA
| | - Ashkan Malayeri
- Radiology & Imaging Sciences Department, Clinical Center, NIH, Bethesda, MD, USA.
| |
Collapse
|
31
|
Yuan W, Du Z, Han S. Semi-supervised skin cancer diagnosis based on self-feedback threshold focal learning. Discov Oncol 2024; 15:180. [PMID: 38776027 PMCID: PMC11111630 DOI: 10.1007/s12672-024-01043-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 05/17/2024] [Indexed: 05/25/2024] Open
Abstract
Worldwide, skin cancer prevalence necessitates accurate diagnosis to alleviate public health burdens. Although the application of artificial intelligence in image analysis and pattern recognition has improved the accuracy and efficiency of early skin cancer diagnosis, existing supervised learning methods are limited due to their reliance on a large amount of labeled data. To overcome the limitations of data labeling and enhance the performance of diagnostic models, this study proposes a semi-supervised skin cancer diagnostic model based on Self-feedback Threshold Focal Learning (STFL), capable of utilizing partial labeled and a large scale of unlabeled medical images for training models in unseen scenarios. The proposed model dynamically adjusts the selection threshold of unlabeled samples during training, effectively filtering reliable unlabeled samples and using focal learning to mitigate the impact of class imbalance in further training. The study is experimentally validated on the HAM10000 dataset, which includes images of various types of skin lesions, with experiments conducted across different scales of labeled samples. With just 500 annotated samples, the model demonstrates robust performance (0.77 accuracy, 0.6408 Kappa, 0.77 recall, 0.7426 precision, and 0.7462 F1-score), showcasing its efficiency with limited labeled data. Further, comprehensive testing validates the semi-supervised model's significant advancements in diagnostic accuracy and efficiency, underscoring the value of integrating unlabeled data. This model offers a new perspective on medical image processing and contributes robust scientific support for the early diagnosis and treatment of skin cancer.
Collapse
Affiliation(s)
- Weicheng Yuan
- College of Basic Medicine, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China
| | - Zeyu Du
- School of Health Science, University of Manchester, Sackville Street, Manchester, 610101, England, UK
| | - Shuo Han
- Department of Anatomy, Hebei Medical University, Zhongshan East, Shijiazhuang, 050017, Hebei, China.
| |
Collapse
|
32
|
Rosen S, Saban M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: a comparative study with a clinical decision support system. Eur Radiol 2024; 34:2826-2837. [PMID: 37828297 DOI: 10.1007/s00330-023-10230-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 07/28/2023] [Accepted: 08/01/2023] [Indexed: 10/14/2023]
Abstract
OBJECTIVES As the technology continues to evolve and advance, we can expect to see artificial intelligence (AI) being used in increasingly sophisticated ways to make a diagnosis and decisions such as suggesting the most appropriate imaging referrals. We aim to explore whether Chat Generative Pretrained Transformer (ChatGPT) can provide accurate imaging referrals for clinical use that are at least as good as the ESR iGuide. METHODS A comparative study was conducted in a tertiary hospital. Data was collected from 97 consecutive cases that were admitted to the emergency department with abdominal complaints. We compared the imaging test referral recommendations suggested by the ESR iGuide and the ChatGPT and analyzed cases of disagreement. In addition, we selected cases where ChatGPT recommended a chest abdominal pelvis (CAP) CT (n = 66), and asked four specialists to grade the appropriateness of the referral. RESULTS ChatGPT recommendations were consistent with the recommendations provided by the ESR iGuide. No statistical differences were found between the appropriateness of referrals by age or gender. Using a sub-analysis of CAP cases, a high agreement between ChatGPT and the specialists was found. Cases of disagreement (12.4%) were further analyzed and presented themes of vague recommendations such as "it would be advisable" and "this would help to rule out." CONCLUSIONS ChatGPT's ability to guide the selection of appropriate tests may be comparable to some degree with the ESR iGuide. Features such as the clinical, ethical, and regulatory implications are still warranted and need to be addressed prior to clinical implementation. Further studies are needed to confirm these findings. CLINICAL RELEVANCE STATEMENT The article explores the potential of using advanced language models, such as ChatGPT, in healthcare as a CDS for selecting appropriate imaging tests. Using ChatGPT can improve the efficiency of the decision-making process KEY POINTS: • ChatGPT recommendations were highly consistent with the recommendations provided by the ESR iGuide. • ChatGPT's ability in guiding the selection of appropriate tests may be comparable to some degree with ESR iGuide's.
Collapse
Affiliation(s)
- Shani Rosen
- Department of Health Technology and Policy Evaluation, Gertner Institute for Epidemiology and Health Policy, Institute of Epidemiology & Health Policy Research, Sheba Medical Center, Tel HaShomer, Ramat-Gan, Israel
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Mor Saban
- Nursing Department, School of Health Sciences, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| |
Collapse
|
33
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement From the ACR, CAR, ESR, RANZCR & RSNA. Can Assoc Radiol J 2024; 75:226-244. [PMID: 38251882 DOI: 10.1177/08465371231222229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever‑growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi‑society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- Data Science Institute, American College of Radiology, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, CA, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, QC, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- American College of Radiology, Reston, VA, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, SA, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
| |
Collapse
|
34
|
Cecil J, Lermer E, Hudecek MFC, Sauer J, Gaube S. Explainability does not mitigate the negative impact of incorrect AI advice in a personnel selection task. Sci Rep 2024; 14:9736. [PMID: 38679619 PMCID: PMC11056364 DOI: 10.1038/s41598-024-60220-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 04/19/2024] [Indexed: 05/01/2024] Open
Abstract
Despite the rise of decision support systems enabled by artificial intelligence (AI) in personnel selection, their impact on decision-making processes is largely unknown. Consequently, we conducted five experiments (N = 1403 students and Human Resource Management (HRM) employees) investigating how people interact with AI-generated advice in a personnel selection task. In all pre-registered experiments, we presented correct and incorrect advice. In Experiments 1a and 1b, we manipulated the source of the advice (human vs. AI). In Experiments 2a, 2b, and 2c, we further manipulated the type of explainability of AI advice (2a and 2b: heatmaps and 2c: charts). We hypothesized that accurate and explainable advice improves decision-making. The independent variables were regressed on task performance, perceived advice quality and confidence ratings. The results consistently showed that incorrect advice negatively impacted performance, as people failed to dismiss it (i.e., overreliance). Additionally, we found that the effects of source and explainability of advice on the dependent variables were limited. The lack of reduction in participants' overreliance on inaccurate advice when the systems' predictions were made more explainable highlights the complexity of human-AI interaction and the need for regulation and quality standards in HRM.
Collapse
Affiliation(s)
- Julia Cecil
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany.
| | - Eva Lermer
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- Department of Business Psychology, Technical University of Applied Sciences Augsburg, Augsburg, Germany
| | - Matthias F C Hudecek
- Department of Experimental Psychology, University of Regensburg, Regensburg, Germany
| | - Jan Sauer
- Department of Business Administration, University of Applied Sciences Amberg-Weiden, Weiden, Germany
| | - Susanne Gaube
- Department of Psychology, LMU Center for Leadership and People Management, LMU Munich, Munich, Germany
- UCL Global Business School for Health, University College London, London, UK
| |
Collapse
|
35
|
Jiang T, Chen C, Zhou Y, Cai S, Yan Y, Sui L, Lai M, Song M, Zhu X, Pan Q, Wang H, Chen X, Wang K, Xiong J, Chen L, Xu D. Deep learning-assisted diagnosis of benign and malignant parotid tumors based on ultrasound: a retrospective study. BMC Cancer 2024; 24:510. [PMID: 38654281 PMCID: PMC11036551 DOI: 10.1186/s12885-024-12277-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/16/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND To develop a deep learning(DL) model utilizing ultrasound images, and evaluate its efficacy in distinguishing between benign and malignant parotid tumors (PTs), as well as its practicality in assisting clinicians with accurate diagnosis. METHODS A total of 2211 ultrasound images of 980 pathologically confirmed PTs (Training set: n = 721; Validation set: n = 82; Internal-test set: n = 89; External-test set: n = 88) from 907 patients were retrospectively included in this study. The optimal model was selected and the diagnostic performance evaluation is conducted by utilizing the area under curve (AUC) of the receiver-operating characteristic(ROC) based on five different DL networks constructed at varying depths. Furthermore, a comparison of different seniority radiologists was made in the presence of the optimal auxiliary diagnosis model. Additionally, the diagnostic confusion matrix of the optimal model was calculated, and an analysis and summary of misjudged cases' characteristics were conducted. RESULTS The Resnet18 demonstrated superior diagnostic performance, with an AUC value of 0.947, accuracy of 88.5%, sensitivity of 78.2%, and specificity of 92.7% in internal-test set, and with an AUC value of 0.925, accuracy of 89.8%, sensitivity of 83.3%, and specificity of 90.6% in external-test set. The PTs were subjectively assessed twice by six radiologists, both with and without the assisted of the model. With the assisted of the model, both junior and senior radiologists demonstrated enhanced diagnostic performance. In the internal-test set, there was an increase in AUC values by 0.062 and 0.082 for junior radiologists respectively, while senior radiologists experienced an improvement of 0.066 and 0.106 in their respective AUC values. CONCLUSIONS The DL model based on ultrasound images demonstrates exceptional capability in distinguishing between benign and malignant PTs, thereby assisting radiologists of varying expertise levels to achieve heightened diagnostic performance, and serve as a noninvasive imaging adjunct diagnostic method for clinical purposes.
Collapse
Affiliation(s)
- Tian Jiang
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Chen Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yahan Zhou
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Shenzhou Cai
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Yuqi Yan
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Lin Sui
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Min Lai
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
- Second Clinical College, Zhejiang University of Traditional Chinese Medicine, 310022, Hangzhou, Zhejiang, China
| | - Mei Song
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China
| | - Xi Zhu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Qianmeng Pan
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Hui Wang
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Xiayi Chen
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China
| | - Kai Wang
- Dongyang Hospital Affiliated to Wenzhou Medical University, 322100, Jinhua, Zhejiang, China
| | - Jing Xiong
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 518000, Shenzhen, Guangdong, China
| | - Liyu Chen
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
| | - Dong Xu
- Department of Diagnostic Ultrasound Imaging & Interventional Therapy, Zhejiang Cancer Hospital, Hangzhou Institute of Medicine (HIM), Chinese Academy of Sciences, 310022, Hangzhou, Zhejiang, China.
- Postgraduate training base Alliance of Wenzhou Medical University (Zhejiang Cancer Hospital), 310022, Hangzhou, Zhejiang, China.
- Zhejiang Provincial Research Center for Cancer Intelligent Diagnosis and Molecular Technology, 310022, Hangzhou, Zhejiang, China.
- Wenling Big Data and Artificial Intelligence Institute in Medicine, 317502, TaiZhou, Zhejiang, China.
- Taizhou Key Laboratory of Minimally Invasive Interventional Therapy & Artificial Intelligence, Taizhou Campus of Zhejiang Cancer Hospital (Taizhou Cancer Hospital), 317502, Taizhou, Zhejiang, China.
| |
Collapse
|
36
|
Vaidya A, Chen RJ, Williamson DFK, Song AH, Jaume G, Yang Y, Hartvigsen T, Dyer EC, Lu MY, Lipkova J, Shaban M, Chen TY, Mahmood F. Demographic bias in misdiagnosis by computational pathology models. Nat Med 2024; 30:1174-1190. [PMID: 38641744 DOI: 10.1038/s41591-024-02885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2023] [Accepted: 02/23/2024] [Indexed: 04/21/2024]
Abstract
Despite increasing numbers of regulatory approvals, deep learning-based computational pathology systems often overlook the impact of demographic factors on performance, potentially leading to biases. This concern is all the more important as computational pathology has leveraged large public datasets that underrepresent certain demographic groups. Using publicly available data from The Cancer Genome Atlas and the EBRAINS brain tumor atlas, as well as internal patient data, we show that whole-slide image classification models display marked performance disparities across different demographic groups when used to subtype breast and lung carcinomas and to predict IDH1 mutations in gliomas. For example, when using common modeling approaches, we observed performance gaps (in area under the receiver operating characteristic curve) between white and Black patients of 3.0% for breast cancer subtyping, 10.9% for lung cancer subtyping and 16.0% for IDH1 mutation prediction in gliomas. We found that richer feature representations obtained from self-supervised vision foundation models reduce performance variations between groups. These representations provide improvements upon weaker models even when those weaker models are combined with state-of-the-art bias mitigation strategies and modeling choices. Nevertheless, self-supervised vision foundation models do not fully eliminate these discrepancies, highlighting the continuing need for bias mitigation efforts in computational pathology. Finally, we demonstrate that our results extend to other demographic factors beyond patient race. Given these findings, we encourage regulatory and policy agencies to integrate demographic-stratified evaluation into their assessment guidelines.
Collapse
Affiliation(s)
- Anurag Vaidya
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
| | - Richard J Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Drew F K Williamson
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology and Laboratory Medicine, Emory University School of Medicine, Atlanta, GA, USA
| | - Andrew H Song
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Guillaume Jaume
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Yuzhe Yang
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Thomas Hartvigsen
- School of Data Science, University of Virginia, Charlottesville, VA, USA
| | - Emma C Dyer
- T.H. Chan School of Public Health, Harvard University, Cambridge, MA, USA
| | - Ming Y Lu
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
- Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
| | - Jana Lipkova
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Muhammad Shaban
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Tiffany Y Chen
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Faisal Mahmood
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
- Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
- Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA.
- Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
37
|
Balagopalan A, Baldini I, Celi LA, Gichoya J, McCoy LG, Naumann T, Shalit U, van der Schaar M, Wagstaff KL. Machine learning for healthcare that matters: Reorienting from technical novelty to equitable impact. PLOS DIGITAL HEALTH 2024; 3:e0000474. [PMID: 38620047 PMCID: PMC11018283 DOI: 10.1371/journal.pdig.0000474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 02/18/2024] [Indexed: 04/17/2024]
Abstract
Despite significant technical advances in machine learning (ML) over the past several years, the tangible impact of this technology in healthcare has been limited. This is due not only to the particular complexities of healthcare, but also due to structural issues in the machine learning for healthcare (MLHC) community which broadly reward technical novelty over tangible, equitable impact. We structure our work as a healthcare-focused echo of the 2012 paper "Machine Learning that Matters", which highlighted such structural issues in the ML community at large, and offered a series of clearly defined "Impact Challenges" to which the field should orient itself. Drawing on the expertise of a diverse and international group of authors, we engage in a narrative review and examine issues in the research background environment, training processes, evaluation metrics, and deployment protocols which act to limit the real-world applicability of MLHC. Broadly, we seek to distinguish between machine learning ON healthcare data and machine learning FOR healthcare-the former of which sees healthcare as merely a source of interesting technical challenges, and the latter of which regards ML as a tool in service of meeting tangible clinical needs. We offer specific recommendations for a series of stakeholders in the field, from ML researchers and clinicians, to the institutions in which they work, and the governments which regulate their data access.
Collapse
Affiliation(s)
- Aparna Balagopalan
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
| | - Ioana Baldini
- IBM Research; Yorktown Heights, New York, United States of America
| | - Leo Anthony Celi
- Laboratory for Computational Physiology, Massachusetts Institute of Technology; Cambridge, Massachusetts, United States of America
- Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center; Boston, Massachusetts, United States of America
- Department of Biostatistics, Harvard T.H. Chan School of Public Health; Boston, Massachusetts, United States of America
| | - Judy Gichoya
- Department of Radiology and Imaging Sciences, School of Medicine, Emory University; Atlanta, Georgia, United States of America
| | - Liam G. McCoy
- Division of Neurology, Department of Medicine, University of Alberta; Edmonton, Alberta, Canada
| | - Tristan Naumann
- Microsoft Research; Redmond, Washington, United States of America
| | - Uri Shalit
- The Faculty of Data and Decision Sciences, Technion; Haifa, Israel
| | - Mihaela van der Schaar
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge; Cambridge, United Kingdom
- The Alan Turing Institute; London, United Kingdom
| | | |
Collapse
|
38
|
Simmons C, DeGrasse J, Polakovic S, Aibinder W, Throckmorton T, Noerdlinger M, Papandrea R, Trenhaile S, Schoch B, Gobbato B, Routman H, Parsons M, Roche CP. Initial clinical experience with a predictive clinical decision support tool for anatomic and reverse total shoulder arthroplasty. EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY & TRAUMATOLOGY : ORTHOPEDIE TRAUMATOLOGIE 2024; 34:1307-1318. [PMID: 38095688 DOI: 10.1007/s00590-023-03796-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Accepted: 11/19/2023] [Indexed: 04/02/2024]
Abstract
PURPOSE Clinical decision support tools (CDSTs) are software that generate patient-specific assessments that can be used to better inform healthcare provider decision making. Machine learning (ML)-based CDSTs have recently been developed for anatomic (aTSA) and reverse (rTSA) total shoulder arthroplasty to facilitate more data-driven, evidence-based decision making. Using this shoulder CDST as an example, this external validation study provides an overview of how ML-based algorithms are developed and discusses the limitations of these tools. METHODS An external validation for a novel CDST was conducted on 243 patients (120F/123M) who received a personalized prediction prior to surgery and had short-term clinical follow-up from 3 months to 2 years after primary aTSA (n = 43) or rTSA (n = 200). The outcome score and active range of motion predictions were compared to each patient's actual result at each timepoint, with the accuracy quantified by the mean absolute error (MAE). RESULTS The results of this external validation demonstrate the CDST accuracy to be similar (within 10%) or better than the MAEs from the published internal validation. A few predictive models were observed to have substantially lower MAEs than the internal validation, specifically, Constant (31.6% better), active abduction (22.5% better), global shoulder function (20.0% better), active external rotation (19.0% better), and active forward elevation (16.2% better), which is encouraging; however, the sample size was small. CONCLUSION A greater understanding of the limitations of ML-based CDSTs will facilitate more responsible use and build trust and confidence, potentially leading to greater adoption. As CDSTs evolve, we anticipate greater shared decision making between the patient and surgeon with the aim of achieving even better outcomes and greater levels of patient satisfaction.
Collapse
Affiliation(s)
- Chelsey Simmons
- University of Florida, PO Box 116250, Gainesville, FL, 32605, USA
- Exactech, 2320 NW 66th Court, Gainesville, FL, 32653, USA
| | | | | | - William Aibinder
- University of Michigan, 1500 E. Medical Center Drive, Ann Arbor, MI, 48109, USA
| | | | - Mayo Noerdlinger
- Atlantic Orthopaedics and Sports Medicine, 1900 Lafayette Road, Portsmouth, NH, USA
| | | | | | - Bradley Schoch
- Mayo Clinic, Florida, 4500 San Pablo Rd., Jacksonville, FL, 32224, USA
| | - Bruno Gobbato
- , R. José Emmendoerfer, 1449, Nova Brasília, Jaraguá do Sul, SC, 89252-278, Brazil
| | - Howard Routman
- Atlantis Orthopedics, 900 Village Square Crossing, #170, Palm Beach Gardens, FL, 33410, USA
| | - Moby Parsons
- , 333 Borthwick Ave Suite #301, Portsmouth, NH, 03801, USA
| | | |
Collapse
|
39
|
Ciet P, Eade C, Ho ML, Laborie LB, Mahomed N, Naidoo J, Pace E, Segal B, Toso S, Tschauner S, Vamyanmane DK, Wagner MW, Shelmerdine SC. The unintended consequences of artificial intelligence in paediatric radiology. Pediatr Radiol 2024; 54:585-593. [PMID: 37665368 DOI: 10.1007/s00247-023-05746-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 08/07/2023] [Accepted: 08/08/2023] [Indexed: 09/05/2023]
Abstract
Over the past decade, there has been a dramatic rise in the interest relating to the application of artificial intelligence (AI) in radiology. Originally only 'narrow' AI tasks were possible; however, with increasing availability of data, teamed with ease of access to powerful computer processing capabilities, we are becoming more able to generate complex and nuanced prediction models and elaborate solutions for healthcare. Nevertheless, these AI models are not without their failings, and sometimes the intended use for these solutions may not lead to predictable impacts for patients, society or those working within the healthcare profession. In this article, we provide an overview of the latest opinions regarding AI ethics, bias, limitations, challenges and considerations that we should all contemplate in this exciting and expanding field, with a special attention to how this applies to the unique aspects of a paediatric population. By embracing AI technology and fostering a multidisciplinary approach, it is hoped that we can harness the power AI brings whilst minimising harm and ensuring a beneficial impact on radiology practice.
Collapse
Affiliation(s)
- Pierluigi Ciet
- Department of Radiology and Nuclear Medicine, Erasmus MC - Sophia's Children's Hospital, Rotterdam, The Netherlands
- Department of Medical Sciences, University of Cagliari, Cagliari, Italy
| | | | - Mai-Lan Ho
- University of Missouri, Columbia, MO, USA
| | - Lene Bjerke Laborie
- Department of Radiology, Section for Paediatrics, Haukeland University Hospital, Bergen, Norway
- Department of Clinical Medicine, University of Bergen, Bergen, Norway
| | - Nasreen Mahomed
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Jaishree Naidoo
- Paediatric Diagnostic Imaging, Dr J Naidoo Inc., Johannesburg, South Africa
- Envisionit Deep AI Ltd, Coveham House, Downside Bridge Road, Cobham, UK
| | - Erika Pace
- Department of Diagnostic Radiology, The Royal Marsden NHS Foundation Trust, London, UK
| | - Bradley Segal
- Department of Radiology, University of Witwatersrand, Johannesburg, South Africa
| | - Seema Toso
- Pediatric Radiology, Children's Hospital, University Hospitals of Geneva, Geneva, Switzerland
| | - Sebastian Tschauner
- Division of Paediatric Radiology, Department of Radiology, Medical University of Graz, Graz, Austria
| | - Dhananjaya K Vamyanmane
- Department of Pediatric Radiology, Indira Gandhi Institute of Child Health, Bangalore, India
| | - Matthias W Wagner
- Department of Diagnostic Imaging, Division of Neuroradiology, The Hospital for Sick Children, Toronto, Canada
- Department of Medical Imaging, University of Toronto, Toronto, ON, Canada
- Department of Neuroradiology, University Hospital Augsburg, Augsburg, Germany
| | - Susan C Shelmerdine
- Department of Clinical Radiology, Great Ormond Street Hospital for Children NHS Foundation Trust, Great Ormond Street, London, WC1H 3JH, UK.
- Great Ormond Street Hospital for Children, UCL Great Ormond Street Institute of Child Health, London, UK.
- NIHR Great Ormond Street Hospital Biomedical Research Centre, 30 Guilford Street, Bloomsbury, London, UK.
- Department of Clinical Radiology, St George's Hospital, London, UK.
| |
Collapse
|
40
|
Anderson JW, Visweswaran S. Algorithmic Individual Fairness and Healthcare: A Scoping Review. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.25.24304853. [PMID: 38585746 PMCID: PMC10996729 DOI: 10.1101/2024.03.25.24304853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Objective Statistical and artificial intelligence algorithms are increasingly being developed for use in healthcare. These algorithms may reflect biases that magnify disparities in clinical care, and there is a growing need for understanding how algorithmic biases can be mitigated in pursuit of algorithmic fairness. Individual fairness in algorithms constrains algorithms to the notion that "similar individuals should be treated similarly." We conducted a scoping review on algorithmic individual fairness to understand the current state of research in the metrics and methods developed to achieve individual fairness and its applications in healthcare. Methods We searched three databases, PubMed, ACM Digital Library, and IEEE Xplore, for algorithmic individual fairness metrics, algorithmic bias mitigation, and healthcare applications. Our search was restricted to articles published between January 2013 and September 2023. We identified 1,886 articles through database searches and manually identified one article from which we included 30 articles in the review. Data from the selected articles were extracted, and the findings were synthesized. Results Based on the 30 articles in the review, we identified several themes, including philosophical underpinnings of fairness, individual fairness metrics, mitigation methods for achieving individual fairness, implications of achieving individual fairness on group fairness and vice versa, fairness metrics that combined individual fairness and group fairness, software for measuring and optimizing individual fairness, and applications of individual fairness in healthcare. Conclusion While there has been significant work on algorithmic individual fairness in recent years, the definition, use, and study of individual fairness remain in their infancy, especially in healthcare. Future research is needed to apply and evaluate individual fairness in healthcare comprehensively.
Collapse
Affiliation(s)
| | - Shyam Visweswaran
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| |
Collapse
|
41
|
Wei ML, Tada M, So A, Torres R. Artificial intelligence and skin cancer. Front Med (Lausanne) 2024; 11:1331895. [PMID: 38566925 PMCID: PMC10985205 DOI: 10.3389/fmed.2024.1331895] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/26/2024] [Indexed: 04/04/2024] Open
Abstract
Artificial intelligence is poised to rapidly reshape many fields, including that of skin cancer screening and diagnosis, both as a disruptive and assistive technology. Together with the collection and availability of large medical data sets, artificial intelligence will become a powerful tool that can be leveraged by physicians in their diagnoses and treatment plans for patients. This comprehensive review focuses on current progress toward AI applications for patients, primary care providers, dermatologists, and dermatopathologists, explores the diverse applications of image and molecular processing for skin cancer, and highlights AI's potential for patient self-screening and improving diagnostic accuracy for non-dermatologists. We additionally delve into the challenges and barriers to clinical implementation, paths forward for implementation and areas of active research.
Collapse
Affiliation(s)
- Maria L. Wei
- Department of Dermatology, University of California, San Francisco, San Francisco, CA, United States
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| | - Mikio Tada
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, United States
| | - Alexandra So
- School of Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Rodrigo Torres
- Dermatology Service, San Francisco VA Health Care System, San Francisco, CA, United States
| |
Collapse
|
42
|
Campion JR, O'Connor DB, Lahiff C. Human-artificial intelligence interaction in gastrointestinal endoscopy. World J Gastrointest Endosc 2024; 16:126-135. [PMID: 38577646 PMCID: PMC10989254 DOI: 10.4253/wjge.v16.i3.126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Revised: 01/18/2024] [Accepted: 02/23/2024] [Indexed: 03/14/2024] Open
Abstract
The number and variety of applications of artificial intelligence (AI) in gastrointestinal (GI) endoscopy is growing rapidly. New technologies based on machine learning (ML) and convolutional neural networks (CNNs) are at various stages of development and deployment to assist patients and endoscopists in preparing for endoscopic procedures, in detection, diagnosis and classification of pathology during endoscopy and in confirmation of key performance indicators. Platforms based on ML and CNNs require regulatory approval as medical devices. Interactions between humans and the technologies we use are complex and are influenced by design, behavioural and psychological elements. Due to the substantial differences between AI and prior technologies, important differences may be expected in how we interact with advice from AI technologies. Human–AI interaction (HAII) may be optimised by developing AI algorithms to minimise false positives and designing platform interfaces to maximise usability. Human factors influencing HAII may include automation bias, alarm fatigue, algorithm aversion, learning effect and deskilling. Each of these areas merits further study in the specific setting of AI applications in GI endoscopy and professional societies should engage to ensure that sufficient emphasis is placed on human-centred design in development of new AI technologies.
Collapse
Affiliation(s)
- John R Campion
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| | - Donal B O'Connor
- Department of Surgery, Trinity College Dublin, Dublin D02 R590, Ireland
| | - Conor Lahiff
- Department of Gastroenterology, Mater Misericordiae University Hospital, Dublin D07 AX57, Ireland
- School of Medicine, University College Dublin, Dublin D04 C7X2, Ireland
| |
Collapse
|
43
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Pinto Dos Santos D, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: Practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. J Med Imaging Radiat Oncol 2024; 68:7-26. [PMID: 38259140 DOI: 10.1111/1754-9485.13612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 11/23/2023] [Indexed: 01/24/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, Alabama, USA
- American College of Radiology Data Science Institute, Reston, Virginia, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, Ontario, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, California, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, California, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, South Australia, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montreal, Quebec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, Massachusetts, USA
- Tufts University Medical School, Boston, Massachusetts, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Reston, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, South Australia, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia
| |
Collapse
|
44
|
Groh M, Badri O, Daneshjou R, Koochek A, Harris C, Soenksen LR, Doraiswamy PM, Picard R. Deep learning-aided decision support for diagnosis of skin disease across skin tones. Nat Med 2024; 30:573-583. [PMID: 38317019 PMCID: PMC10878981 DOI: 10.1038/s41591-023-02728-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/16/2023] [Indexed: 02/07/2024]
Abstract
Although advances in deep learning systems for image-based medical diagnosis demonstrate their potential to augment clinical decision-making, the effectiveness of physician-machine partnerships remains an open question, in part because physicians and algorithms are both susceptible to systematic errors, especially for diagnosis of underrepresented populations. Here we present results from a large-scale digital experiment involving board-certified dermatologists (n = 389) and primary-care physicians (n = 459) from 39 countries to evaluate the accuracy of diagnoses submitted by physicians in a store-and-forward teledermatology simulation. In this experiment, physicians were presented with 364 images spanning 46 skin diseases and asked to submit up to four differential diagnoses. Specialists and generalists achieved diagnostic accuracies of 38% and 19%, respectively, but both specialists and generalists were four percentage points less accurate for the diagnosis of images of dark skin as compared to light skin. Fair deep learning system decision support improved the diagnostic accuracy of both specialists and generalists by more than 33%, but exacerbated the gap in the diagnostic accuracy of generalists across skin tones. These results demonstrate that well-designed physician-machine partnerships can enhance the diagnostic accuracy of physicians, illustrating that success in improving overall diagnostic accuracy does not necessarily address bias.
Collapse
Affiliation(s)
- Matthew Groh
- Northwestern University Kellogg School of Management, Evanston, IL, USA.
- MIT Media Lab, Cambridge, MA, USA.
| | - Omar Badri
- Northeast Dermatology Associates, Beverly, MA, USA
| | - Roxana Daneshjou
- Stanford Department of Biomedical Data Science, Stanford, CA, USA
- Stanford Department of Dermatology, Redwood City, CA, USA
| | | | | | - Luis R Soenksen
- Wyss Institute for Bioinspired Engineering at Harvard, Boston, MA, USA
| | - P Murali Doraiswamy
- MIT Media Lab, Cambridge, MA, USA
- Duke University School of Medicine, Durham, NC, USA
| | | |
Collapse
|
45
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, Dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, purchasing, implementing and monitoring AI tools in radiology: practical considerations. A multi-society statement from the ACR, CAR, ESR, RANZCR & RSNA. Insights Imaging 2024; 15:16. [PMID: 38246898 PMCID: PMC10800328 DOI: 10.1186/s13244-023-01541-3] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024] Open
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones.This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools.Key points • The incorporation of artificial intelligence (AI) in radiological practice demands increased monitoring of its utility and safety.• Cooperation between developers, clinicians, and regulators will allow all involved to address ethical issues and monitor AI performance.• AI can fulfil its promise to advance patient well-being if all steps from development to integration in healthcare are rigorously evaluated.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical Center, Birmingham, AL, USA
- American College of Radiology Data Science Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and Interventional Radiology, Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA, USA
- Stanford Center for Artificial Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning, University of Adelaide, Adelaide, Australia
| | - Daniel Pinto Dos Santos
- Department of Radiology, University Hospital of Cologne, Cologne, Germany
- Department of Radiology, University Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation Oncology, and Nuclear Medicine, Université de Montréal, Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital & Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston, MA, USA
- Commision On Informatics, and Member, Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging, Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health, Flinders University, Adelaide, Australia
| |
Collapse
|
46
|
Nguyen T. ChatGPT in Medical Education: A Precursor for Automation Bias? JMIR MEDICAL EDUCATION 2024; 10:e50174. [PMID: 38231545 PMCID: PMC10831594 DOI: 10.2196/50174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 12/11/2023] [Indexed: 01/18/2024]
Abstract
Artificial intelligence (AI) in health care has the promise of providing accurate and efficient results. However, AI can also be a black box, where the logic behind its results is nonrational. There are concerns if these questionable results are used in patient care. As physicians have the duty to provide care based on their clinical judgment in addition to their patients' values and preferences, it is crucial that physicians validate the results from AI. Yet, there are some physicians who exhibit a phenomenon known as automation bias, where there is an assumption from the user that AI is always right. This is a dangerous mindset, as users exhibiting automation bias will not validate the results, given their trust in AI systems. Several factors impact a user's susceptibility to automation bias, such as inexperience or being born in the digital age. In this editorial, I argue that these factors and a lack of AI education in the medical school curriculum cause automation bias. I also explore the harms of automation bias and why prospective physicians need to be vigilant when using AI. Furthermore, it is important to consider what attitudes are being taught to students when introducing ChatGPT, which could be some students' first time using AI, prior to their use of AI in the clinical setting. Therefore, in attempts to avoid the problem of automation bias in the long-term, in addition to incorporating AI education into the curriculum, as is necessary, the use of ChatGPT in medical education should be limited to certain tasks. Otherwise, having no constraints on what ChatGPT should be used for could lead to automation bias.
Collapse
Affiliation(s)
- Tina Nguyen
- The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
47
|
Dot G, Gajny L, Ducret M. [The challenges of artificial intelligence in odontology]. Med Sci (Paris) 2024; 40:79-84. [PMID: 38299907 DOI: 10.1051/medsci/2023199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open
Abstract
Artificial intelligence has numerous potential applications in dentistry, as these algorithms aim to improve the efficiency and safety of several clinical situations. While the first commercial solutions are being proposed, most of these algorithms have not been sufficiently validated for clinical use. This article describes the challenges surrounding the development of these new tools, to help clinicians to keep a critical eye on this technology.
Collapse
Affiliation(s)
- Gauthier Dot
- UFR odontologie, université Paris Cité, Paris, France - AP-HP, hôpital Pitié-Salpêtrière, service de médecine bucco-dentaire, Paris, France - Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Laurent Gajny
- Institut de biomécanique humaine Georges Charpak, école nationale supérieure d'Arts et Métiers, Paris, France
| | - Maxime Ducret
- Faculté d'odontologie, université Claude Bernard Lyon 1, hospices civils de Lyon, Lyon, France
| |
Collapse
|
48
|
Brady AP, Allen B, Chong J, Kotter E, Kottler N, Mongan J, Oakden-Rayner L, dos Santos DP, Tang A, Wald C, Slavotinek J. Developing, Purchasing, Implementing and Monitoring AI Tools in Radiology: Practical Considerations. A Multi-Society Statement from the ACR, CAR, ESR, RANZCR and RSNA. Radiol Artif Intell 2024; 6:e230513. [PMID: 38251899 PMCID: PMC10831521 DOI: 10.1148/ryai.230513] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
Artificial Intelligence (AI) carries the potential for unprecedented disruption in radiology, with possible positive and negative consequences. The integration of AI in radiology holds the potential to revolutionize healthcare practices by advancing diagnosis, quantification, and management of multiple medical conditions. Nevertheless, the ever-growing availability of AI tools in radiology highlights an increasing need to critically evaluate claims for its utility and to differentiate safe product offerings from potentially harmful, or fundamentally unhelpful ones. This multi-society paper, presenting the views of Radiology Societies in the USA, Canada, Europe, Australia, and New Zealand, defines the potential practical problems and ethical issues surrounding the incorporation of AI into radiological practice. In addition to delineating the main points of concern that developers, regulators, and purchasers of AI tools should consider prior to their introduction into clinical practice, this statement also suggests methods to monitor their stability and safety in clinical use, and their suitability for possible autonomous function. This statement is intended to serve as a useful summary of the practical issues which should be considered by all parties involved in the development of radiology AI resources, and their implementation as clinical tools. This article is simultaneously published in Insights into Imaging (DOI 10.1186/s13244-023-01541-3), Journal of Medical Imaging and Radiation Oncology (DOI 10.1111/1754-9485.13612), Canadian Association of Radiologists Journal (DOI 10.1177/08465371231222229), Journal of the American College of Radiology (DOI 10.1016/j.jacr.2023.12.005), and Radiology: Artificial Intelligence (DOI 10.1148/ryai.230513). Keywords: Artificial Intelligence, Radiology, Automation, Machine Learning Published under a CC BY 4.0 license. ©The Author(s) 2024. Editor's Note: The RSNA Board of Directors has endorsed this article. It has not undergone review or editing by this journal.
Collapse
Affiliation(s)
| | - Bibb Allen
- Department of Radiology, Grandview Medical
Center, Birmingham, AL, USA
- American College of Radiology Data Science
Institute, Reston, VA, USA
| | - Jaron Chong
- Department of Medical Imaging, Schulich
School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Elmar Kotter
- Department of Diagnostic and
Interventional Radiology, Medical Center, Faculty of Medicine, University of
Freiburg, Freiburg, Germany
| | - Nina Kottler
- Radiology Partners, El Segundo, CA,
USA
- Stanford Center for Artificial
Intelligence in Medicine & Imaging, Palo Alto, CA, USA
| | - John Mongan
- Department of Radiology and Biomedical
Imaging, University of California, San Francisco, USA
| | - Lauren Oakden-Rayner
- Australian Institute for Machine Learning,
University of Adelaide, Adelaide, Australia
| | - Daniel Pinto dos Santos
- Department of Radiology, University
Hospital of Cologne, Cologne, Germany
- Department of Radiology, University
Hospital of Frankfurt, Frankfurt, Germany
| | - An Tang
- Department of Radiology, Radiation
Oncology, and Nuclear Medicine, Université de Montréal,
Montréal, Québec, Canada
| | - Christoph Wald
- Department of Radiology, Lahey Hospital
& Medical Center, Burlington, MA, USA
- Tufts University Medical School, Boston,
MA, USA
- Commission On Informatics, and Member,
Board of Chancellors, American College of Radiology, Virginia, USA
| | - John Slavotinek
- South Australia Medical Imaging,
Flinders Medical Centre Adelaide, Adelaide, Australia
- College of Medicine and Public Health,
Flinders University, Adelaide, Australia
| |
Collapse
|
49
|
Teneggi J, Yi PH, Sulam J. Examination-Level Supervision for Deep Learning-based Intracranial Hemorrhage Detection on Head CT Scans. Radiol Artif Intell 2024; 6:e230159. [PMID: 38294324 PMCID: PMC10831525 DOI: 10.1148/ryai.230159] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 11/02/2023] [Accepted: 12/05/2023] [Indexed: 02/01/2024]
Abstract
Purpose To compare the effectiveness of weak supervision (ie, with examination-level labels only) and strong supervision (ie, with image-level labels) in training deep learning models for detection of intracranial hemorrhage (ICH) on head CT scans. Materials and Methods In this retrospective study, an attention-based convolutional neural network was trained with either local (ie, image level) or global (ie, examination level) binary labels on the Radiological Society of North America (RSNA) 2019 Brain CT Hemorrhage Challenge dataset of 21 736 examinations (8876 [40.8%] ICH) and 752 422 images (107 784 [14.3%] ICH). The CQ500 (436 examinations; 212 [48.6%] ICH) and CT-ICH (75 examinations; 36 [48.0%] ICH) datasets were employed for external testing. Performance in detecting ICH was compared between weak (examination-level labels) and strong (image-level labels) learners as a function of the number of labels available during training. Results On examination-level binary classification, strong and weak learners did not have different area under the receiver operating characteristic curve values on the internal validation split (0.96 vs 0.96; P = .64) and the CQ500 dataset (0.90 vs 0.92; P = .15). Weak learners outperformed strong ones on the CT-ICH dataset (0.95 vs 0.92; P = .03). Weak learners had better section-level ICH detection performance when more than 10 000 labels were available for training (average f1 = 0.73 vs 0.65; P < .001). Weakly supervised models trained on the entire RSNA dataset required 35 times fewer labels than equivalent strong learners. Conclusion Strongly supervised models did not achieve better performance than weakly supervised ones, which could reduce radiologist labor requirements for prospective dataset curation. Keywords: CT, Head/Neck, Brain/Brain Stem, Hemorrhage Supplemental material is available for this article. © RSNA, 2023 See also commentary by Wahid and Fuentes in this issue.
Collapse
Affiliation(s)
- Jacopo Teneggi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Paul H. Yi
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| | - Jeremias Sulam
- From the Department of Computer Science (J.T.), Department of
Biomedical Engineering (J.S.), and Mathematical Institute for Data Science
(MINDS) (J.S., J.T.), Johns Hopkins University, 3400 N Charles St, Clark Hall,
Suite 320, Baltimore, MD 21218; and University of Maryland Medical Intelligent
Imaging Center (UM2ii), Department of Diagnostic Radiology and Nuclear Medicine,
University of Maryland School of Medicine, Baltimore, Md (P.H.Y.)
| |
Collapse
|
50
|
Jabbour S, Fouhey D, Shepard S, Valley TS, Kazerooni EA, Banovic N, Wiens J, Sjoding MW. Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study. JAMA 2023; 330:2275-2284. [PMID: 38112814 PMCID: PMC10731487 DOI: 10.1001/jama.2023.22295] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 10/11/2023] [Indexed: 12/21/2023]
Abstract
Importance Artificial intelligence (AI) could support clinicians when diagnosing hospitalized patients; however, systematic bias in AI models could worsen clinician diagnostic accuracy. Recent regulatory guidance has called for AI models to include explanations to mitigate errors made by models, but the effectiveness of this strategy has not been established. Objectives To evaluate the impact of systematically biased AI on clinician diagnostic accuracy and to determine if image-based AI model explanations can mitigate model errors. Design, Setting, and Participants Randomized clinical vignette survey study administered between April 2022 and January 2023 across 13 US states involving hospitalist physicians, nurse practitioners, and physician assistants. Interventions Clinicians were shown 9 clinical vignettes of patients hospitalized with acute respiratory failure, including their presenting symptoms, physical examination, laboratory results, and chest radiographs. Clinicians were then asked to determine the likelihood of pneumonia, heart failure, or chronic obstructive pulmonary disease as the underlying cause(s) of each patient's acute respiratory failure. To establish baseline diagnostic accuracy, clinicians were shown 2 vignettes without AI model input. Clinicians were then randomized to see 6 vignettes with AI model input with or without AI model explanations. Among these 6 vignettes, 3 vignettes included standard-model predictions, and 3 vignettes included systematically biased model predictions. Main Outcomes and Measures Clinician diagnostic accuracy for pneumonia, heart failure, and chronic obstructive pulmonary disease. Results Median participant age was 34 years (IQR, 31-39) and 241 (57.7%) were female. Four hundred fifty-seven clinicians were randomized and completed at least 1 vignette, with 231 randomized to AI model predictions without explanations, and 226 randomized to AI model predictions with explanations. Clinicians' baseline diagnostic accuracy was 73.0% (95% CI, 68.3% to 77.8%) for the 3 diagnoses. When shown a standard AI model without explanations, clinician accuracy increased over baseline by 2.9 percentage points (95% CI, 0.5 to 5.2) and by 4.4 percentage points (95% CI, 2.0 to 6.9) when clinicians were also shown AI model explanations. Systematically biased AI model predictions decreased clinician accuracy by 11.3 percentage points (95% CI, 7.2 to 15.5) compared with baseline and providing biased AI model predictions with explanations decreased clinician accuracy by 9.1 percentage points (95% CI, 4.9 to 13.2) compared with baseline, representing a nonsignificant improvement of 2.3 percentage points (95% CI, -2.7 to 7.2) compared with the systematically biased AI model. Conclusions and Relevance Although standard AI models improve diagnostic accuracy, systematically biased AI models reduced diagnostic accuracy, and commonly used image-based AI model explanations did not mitigate this harmful effect. Trial Registration ClinicalTrials.gov Identifier: NCT06098950.
Collapse
Affiliation(s)
- Sarah Jabbour
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - David Fouhey
- Computer Science and Engineering, University of Michigan, Ann Arbor
- Now with Computer Science Courant Institute, New York University, New York
- Now with Electrical and Computer Engineering Tandon School of Engineering, New York University, New York
| | | | - Thomas S. Valley
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| | - Ella A. Kazerooni
- Department of Radiology, University of Michigan Medical School, Ann Arbor
| | - Nikola Banovic
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Jenna Wiens
- Computer Science and Engineering, University of Michigan, Ann Arbor
| | - Michael W. Sjoding
- Pulmonary and Critical Care Medicine, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor
| |
Collapse
|