Case Control Study
Copyright ©The Author(s) 2025. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Clin Cases. Jul 16, 2025; 13(20): 104556
Published online Jul 16, 2025. doi: 10.12998/wjcc.v13.i20.104556
Prediction of genomic biomarkers for endometriosis using the transcriptomic dataset
Zeynep Kucukakcali, Sami Akbulut, Cemil Colak
Zeynep Kucukakcali, Sami Akbulut, Cemil Colak, Department of Biostatistics and Medical Informatics, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
Sami Akbulut, Surgery and Liver Transplant Institute, Inonu University Faculty of Medicine, Malatya 44280, Türkiye
Author contributions: Akbulut S and Kucukakcali Z collected data; Kucukakcali Z and Colak C analyzed statistical analysis; Akbulut S and Kucukakcali Z wrote manuscript; Akbulut S and Kucukakcali Z projected development and reviewed final version.
Institutional review board statement: This study was reviewed and approved by the Inonu University institutional review board for non-interventional studies (Approval No: 2022/3842).
Informed consent statement: Not applicable, as this study was retrospective.
Conflict-of-interest statement: The authors declare that they have no conflicts of interest regarding this study.
STROBE statement: The authors have read the STROBE Statement—checklist of items, and the manuscript was prepared and revised according to the STROBE Statement—checklist of items.
Data sharing statement: There are no additional data available for this study.
Open Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Sami Akbulut, MD, PhD, Professor, Surgery and Liver Transplant Institute, Inonu University Faculty of Medicine, Elazig Yolu 10. Km, Malatya 44280, Türkiye. akbulutsami@gmail.com
Received: December 24, 2024
Revised: March 3, 2025
Accepted: March 13, 2025
Published online: July 16, 2025
Processing time: 106 Days and 8.2 Hours
Abstract
BACKGROUND

Endometriosis is a clinical condition characterized by the presence of endometrial glands outside the uterine cavity. While its incidence remains mostly uncertain, endometriosis impacts around 180 million women worldwide. Despite the presentation of several epidemiological and clinical explanations, the precise mechanism underlying the disease remains ambiguous. In recent years, researchers have examined the hereditary dimension of the disease. Genetic research has aimed to discover the gene or genes responsible for the disease through association or linkage studies involving candidate genes or DNA mapping techniques.

AIM

To identify genetic biomarkers linked to endometriosis by the application of machine learning (ML) approaches.

METHODS

This case-control study accounted for the open-access transcriptomic data set of endometriosis and the control group. We included data from 22 controls and 16 endometriosis patients for this purpose. We used AdaBoost, XGBoost, Stochasting Gradient Boosting, Bagged Classification and Regression Trees (CART) for classification using five-fold cross validation. We evaluated the performance of the models using the performance measures of accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score.

RESULTS

Bagged CART gave the best classification metrics. The metrics obtained from this model are 85.7%, 85.7%, 100%, 75%, 75%, 100% and 85.7% for accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score, respectively. Based on the variable importance of modeling, we can use the genes CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2 and NKG7 and other transcripts with inaccessible gene names as potential biomarkers for endometriosis.

CONCLUSION

This study determined possible genomic biomarkers for endometriosis using transcriptomic data from patients with/without endometriosis. The applied ML model successfully classified endometriosis and created a highly accurate diagnostic prediction model. Future genomic studies could explain the underlying pathology of endometriosis, and a non-invasive diagnostic method could replace the invasive ones.

Keywords: Endometriosis; RNA-seq; Transcriptomics; Machine learning; Classification

Core Tip: Genetic research has aimed to discover the gene or genes responsible for the disease through association or linkage studies involving candidate genes or DNA mapping techniques. This study aimed to determine genomic biomarkers associated with endometriosis by using machine learning models (AdaBoost, XGBoost, Stochasting Gradient Boosting, Bagged Classification and Regression Trees). According to the variables' importance in the modeling, CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2, and NKG7 genes and transcripts whose other gene names are inaccessible can be used as candidate biomarkers for endometriosis.