Alkhawaldeh IM, Albalkhi I, Naswhan AJ. Challenges and limitations of synthetic minority oversampling techniques in machine learning. World J Methodol 2023; 13(5): 373-378 [PMID: 38229946 DOI: 10.5662/wjm.v13.i5.373]
Corresponding Author of This Article
Abdulqadir Jeprel Naswhan, MSc, RN, Director, Research Scientist, Senior Lecturer, Senior Researcher, Nursing for Education and Practice Development, Hamad Medical Corporation, Rayyan Road, Doha 3050, Qatar. anashwan@hamad.qa
Research Domain of This Article
Methodology
Article-Type of This Article
Editorial
Open-Access Policy of This Article
This article is an open-access article which was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
World J Methodol. Dec 20, 2023; 13(5): 373-378 Published online Dec 20, 2023. doi: 10.5662/wjm.v13.i5.373
Challenges and limitations of synthetic minority oversampling techniques in machine learning
Ibraheem M Alkhawaldeh, Ibrahem Albalkhi, Abdulqadir Jeprel Naswhan
Ibraheem M Alkhawaldeh, Faculty of Medicine, Mutah University, Karak 61710, Jordan
Ibrahem Albalkhi, Department of Neuroradiology, Alfaisal University, Great Ormond Street Hospital NHS Foundation Trust, London WC1N 3JH, United Kingdom
Abdulqadir Jeprel Naswhan, Nursing for Education and Practice Development, Hamad Medical Corporation, Doha 3050, Qatar
Author contributions: Alkhawaldeh IM, Albalkhi I, and Naswhan AJ contributed to the writing and editing the manuscript, illustrations, and review of the literature of this paper; Alkhawaldeh IM and Naswhan AJ designed the overall concept and outline of the manuscript.
Conflict-of-interest statement: All the authors report no relevant conflicts of interest for this article.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Abdulqadir Jeprel Naswhan, MSc, RN, Director, Research Scientist, Senior Lecturer, Senior Researcher, Nursing for Education and Practice Development, Hamad Medical Corporation, Rayyan Road, Doha 3050, Qatar. anashwan@hamad.qa
Received: September 21, 2023 Peer-review started: September 21, 2023 First decision: September 29, 2023 Revised: September 30, 2023 Accepted: November 3, 2023 Article in press: November 3, 2023 Published online: December 20, 2023 Processing time: 89 Days and 20.3 Hours
Core Tip
Core Tip: Addressing class imbalance in medical datasets, particularly in the context of machine learning applications, requires a cautious approach. While oversampling methods like synthetic minority oversampling technique are commonly used, it is crucial to recognize their limitations. They may introduce synthetic instances that do not accurately represent the minority class, potentially leading to overfitting and unreliable results in real-world medical scenarios. Instead, we can consider exploring alternative approaches such as Ensemble Learning-Based Methods like XGBoost and Easy Ensemble which have shown promise in mitigating bias and providing more robust performance. Collaborating with data science specialists and medical professionals to design and validate these techniques is essential to ensure their reliability and effectiveness in medical applications.