Retrospective Study Open Access
Copyright ©The Author(s) 2024. Published by Baishideng Publishing Group Inc. All rights reserved.
World J Clin Pediatr. Dec 9, 2024; 13(4): 98472
Published online Dec 9, 2024. doi: 10.5409/wjcp.v13.i4.98472
Prediction of cyanotic and acyanotic congenital heart disease using machine learning models
Sana Shahid, Department of Statistics, Bahauddin Zakariya University, Multan 60000, Punjab, Pakistan
Haris Khurram, Apiradee Lim, Department of Mathematics and Computer Science, Faculty of Science and Technology, Prince of Songkla University, Pattani Campus, Pattani 94000, Thailand
Haris Khurram, Department of Science and Humanities, National University of Computer and Emerging Sciences, Chiniot-Faisalabad Campus, Chiniot 35400, Punjab, Pakistan
Muhammad Farhan Shabbir, Department of Cardiology, Chaudhary Pervaiz Elhai Institute of Cardiology, Multan 60000, Punjab, Pakistan
Baki Billah, School of Public Health and Preventive Medicine, Monash University, Melbourne 3000, Victoria, Australia
ORCID number: Haris Khurram (0000-0003-1814-4742).
Co-first authors: Sana Shahid and Haris Khurram.
Co-corresponding authors: Haris Khurram and Apiradee Lim.
Author contributions: Shahid S, Khurram H, and Lim A conceptualized and designed the research; Shahid S and Khurram H organized the dataset, performed statistical analysis and data interpretation, and wrote the first draft of the manuscript with the help of Lim A; Khurram H and Lim A played important and indispensable roles in the experimental design, data interpretation, and manuscript preparation as the co-corresponding authors; Shahid S and Khurram H made crucial and indispensable contributions towards the completion of the project and were thus qualified as the co-first authors of the paper; Lim A proofread the draft and gave valuable suggestions to improve the manuscript; Shabbir MF reviewed the final draft from a medical perspective. Billah B reviewed the final draft from a statistical perspective. All authors contributed to the manuscript revision, and read and approved the final draft.
Institutional review board statement: The study was reviewed and approved by the Advance Studies & Research Board, Bahauddin Zakariya University, Multan, Pakistan (No. 8973).
Informed consent statement: All study participants or their legal guardians gave informed verbal consent prior to study inclusion.
Conflict-of-interest statement: All authors have no conflicts of interest to disclose.
Data sharing statement: The data and code of R language are available from the corresponding author [Email: Hariskhurram2@gmail.com; haris.khurram@nu.edu.pk] upon reasonable request.
Open-Access: This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: https://creativecommons.org/Licenses/by-nc/4.0/
Corresponding author: Haris Khurram, PhD, Assistant Professor, Postdoctoral Fellow, Department of Mathematics and Computer Science, Prince of Songkla University, 181 Village No. 6 Charoen Pradit Road, Rusamilae, Mueang Pattani District, Pattani 94000, Thailand. haris.khurram@nu.edu.pk
Received: June 27, 2024
Revised: August 28, 2024
Accepted: September 23, 2024
Published online: December 9, 2024
Processing time: 125 Days and 8.2 Hours

Abstract
BACKGROUND

Congenital heart disease is most commonly seen in neonates and it is a major cause of pediatric illness and childhood morbidity and mortality.

AIM

To identify and build the best predictive model for predicting cyanotic and acyanotic congenital heart disease in children during pregnancy and identify their potential risk factors.

METHODS

The data were collected from the Pediatric Cardiology Department at Chaudhry Pervaiz Elahi Institute of Cardiology Multan, Pakistan from December 2017 to October 2019. A sample of 3900 mothers whose children were diagnosed with cyanotic or acyanotic congenital heart disease was taken. Multivariate outlier detection methods were used to identify the potential outliers. Different machine learning models were compared, and the best-fitted model was selected using the area under the curve, sensitivity, and specificity of the models.

RESULTS

Out of 3900 patients included, about 69.5% had acyanotic and 30.5% had cyanotic congenital heart disease. Males had more cases of acyanotic (53.6%) and cyanotic (54.5%) congenital heart disease as compared to females. The odds of having cyanotic was 1.28 times higher for children whose mothers used more fast food frequently during pregnancy. The artificial neural network model was selected as the best predictive model with an area under the curve of 0.9012, sensitivity of 65.76%, and specificity of 97.23%.

CONCLUSION

Children having a positive family history are at very high risk of having cyanotic and acyanotic congenital heart disease. Males are more at risk and their mothers need more care, good food, and physical activity during pregnancy. The best-fitted model for predicting cyanotic and acyanotic congenital heart disease is the artificial neural network. The results obtained and the best model identified will be useful for medical practitioners and public health scientists for an informed decision-making process about the earlier diagnosis and improve the health condition of children in Pakistan.

Key Words: Congenital heart disease; Cyanotic heart disease; Acyanotic heart disease; Logistic regression model; Artificial neural network

Core Tip: In this study, to identify and build the best model for predicting cyanotic and acyanotic congenital heart disease in children during pregnancy and identify their risk factors, we employed machine learning models and compared their performance to choose the best one. We also used multivariate outlier detection methods to determine the outlier cases. The best fit model for congenital heart disease was the artificial neural network model. Children having a positive family history are at very high risk of having cyanotic and acyanotic congenital heart disease.



INTRODUCTION

Congenital heart disease (CHD) is most commonly seen in neonates[1] and is a major cause of pediatric illness and childhood morbidity and mortality[2]. CHD is usually the result of the abnormal embryonic development of a normal structure during the early stage of embryonic or fetal development[3]. The incidence of CHD is 8 to 10 per 1000 births in Pakistan and nearly about 50000 children are affected by CHD each year[4]. The prevalence of CHD was 4 per 1000 live births in Karachi, Pakistan and 41.7% of children had cyanotic CHD and 58.3% had acyanotic CHD[5]. Acyanotic CHD was more common than cyanotic CHD and both conditions were found to have a higher incidence in males as compared to females[6,7].

In underdeveloped countries, families of children with CHD are faced with many health care and socioeconomic problems[1]. Late diagnosis of CHD carries a high risk of avoidable morbidity, mortality, and handicap. Problem identification and modification at an early stage were crucial in avoiding complexity, improving quality of life, and reducing mortality[2]. Awareness among parents about the disease can reduce the delay in the identification of disease, which can undoubtedly prevent mortality and morbidity in the subjects[8]. In rural areas of Pakistan, the prevalence of CHD was very high as compared to urban areas[9]. There were several fetal factors associated with CHD, like premature birth, stillbirth, and low birth weight. Low birth weight, family history of CHD, maternal co-morbidities, and consanguineous marriage were associated with CHD[10]. Physical activity, nutrition, partner interaction, access to basic health care facilities, calories in food, environment, and housing conditions during pregnancy reduce the risk factors of cyanotic and acyanotic CHD[11]. The prevalence of CHD was 9.3 per 1000 Live births in Asia and 8 to 10 per 1000 Live births worldwide; 60.6% of cases were acyanotic CHD and 38.6% were cyanotic CHD[12]. The prevalence of CHD for Whites was significantly higher than for Blacks or Mexican Americans. The prevalence rate of CHD in children aged 5 to 15 years has been reported as 2 per 1000 in Sudan, 3 per 1000 in Uganda, and 3.6 per 1000 in Nigeria[13]. In India, the prevalence of CHD was reported from 8.5 to 13.6 per 1000 live births, and the 10% infant mortality was due to CHD. Acyanotic CHD was present in 79% of CHD children, 21% had cyanotic CHD, and 82.9% were diagnosed between 0 to 3 years of age. Parental age, illness during pregnancy, and advanced maternal age were found to be risk factors for CHD[14,15]. The prevalence of CHD was 5 to 10 per 1000 live births and 10.01 per 1000 in school children in Alexandria, Egypt. Parental consanguinity, positive family history, and maternal health during pregnancy were high-risk factors for CHD[16]. CHD was the most common birth defect in China and the prevalence was 7 to 8 per 1000 live births; it shows about 100000 to 150000 new cases annually. The mental stress in the mother, number of previous pregnancies, maternal infection, and education level of the mother were the risk factors for CHD[17]. CHD risk was higher among those children who had a family history of heart disease[18].

The CHD prevalence in Asia, Europe, and Africa was found to be 9.3 per 1000, 8.2 per 1000, and 1.9 per 1000 live births, respectively. The CHD prevalence was reported to be higher in the Asian region as compared with other regions[19]. Smoking status in mothers and mental stress in mothers during pregnancy were found highly associated with CHD in children[20]. The increase in the risk of CHD was associated with poor socioeconomic status, family income, occupation, and education level of mothers[21]. In Brazil, CHD was most common in newborns and reached 1% of the population of Brazil. Socioeconomic status and family income are important factors in child development, and indirectly, they affect the process and outcome of child development with home type, nutrition quality, availability of school, health care, and medical facilities[22]. CHD is associated with physical inactivity and obesity in children and adolescents. To reduce the risk of obesity and heart disease in children and adolescents, it must be necessary to adopt a healthy lifestyle[23]. The environment and lifestyle factors also influence children with CHD[24].

In recent years, machine learning models have emerged as crucial tools in revolutionizing disease prediction and diagnosis. These advanced analytical models have transformed the way that healthcare professionals approach patient care, enabling early detection, accurate diagnosis, and personalized treatment[25]. The artificial neural network (ANN) model is vital in disease prediction due to their ability to learn from vast amounts of complex data, identify suitable patterns and correlations that may elude human clinicians, adapt to new data and improve prediction accuracy over time, and provide tailored recommendations for patient care[26]. The ANN models, inspired by the human brain’s structure and function, have shown remarkable promise in disease prediction due to their ability to mimic human brain function, capacity to analyze complex relationships between variables, and identification of non-linear patterns and interactions[27]. The present study aimed to identify the risk factors for cyanotic and acyanotic CHD in children, predict cyanotic and acyanotic CHD in children at the time of pregnancy, and suggest the best machine learning-based predictive model.

MATERIALS AND METHODS
Study design and sample

A retrospective study design was used, and data was collected from the outpatient department, inpatient department, and ward of the Pediatric Department at the Institute of Cardiology Multan, Pakistan from December 2017 to October 2019. The data of the present study were collected from 3900 mothers whose children were diagnosed with cyanotic and acyanotic CHD by echocardiography. The sample of the current study attained a greater than 80% power of the test.

Patients’ consent and ethics approval

The study was approved by the Departmental Ethics Committee and Board of Advanced Studies, Bahauddin Zakariya University, Multan, Pakistan. Also, we have taken permission from the hospital and all the families included in the study were volunteers and were well informed about the study and the confidentiality of their identity.

Operational definition of variables

The data was collected by the principal author. With the discussion of medical practitioners and based on literature survey, different factors were isolated and these factors are described as diabetes in family (diabetes in first-order relatives), smoking status in the family (smoking in first-order relatives), family history of heart disease (family history of heart disease in first-order relatives), anemia in mother during pregnancy, physically active mother during pregnancy (the mother can walk at least two and half hour in a week), use of fast food, low-calorie food, and staple food during pregnancy (mothers eating fast food more than once a week, mothers consuming less than 2000 calories per day, and mothers using cereal grain and tubers as staple food). Nutrition status during pregnancy (good nutrition: Protein more than 5 ounces, fruits up to 2 cups, vegetables up to 3 cups, grains up to 6 ounces, and dairy up to 3 cups; normal nutrition: Protein up to 5 ounces, fruits up to 1.5 cups, vegetables up to 2.5 cups, grains up to 5 ounces, and dairy up to 2.5 cups. Less than normal nutrition is considered as poor nutrition), monthly income of family, education level of parents, dwelling area (the area of the child was categorized into rural and urban areas), home environment during pregnancy, health condition of other people living in home (respiratory infections, asthma, lead poisoning, injuries, and mental health), mother interaction with their partner during pregnancy, quality of basic health care facilities during pregnancy (well-trained and motivated staff, accurate medical record, water, energy, sanitation, hand hygiene, and waste disposal facilities which are functional, reliable, and safe; adequate stocks of medicines, supplies, and equipment that is safe, effective, timely, efficient, and equitable), access to health care facilities (is there any government hospital or medical unit and government doctor available in their surroundings), housing tenure (house is rented or owned), housing condition (the good condition of house contained: Being dry, safe, and hygienic, good ventilation, good sanitation, good heating, good lighting, good facilities of cooking, availability of suitable storage for food, and good access to shop and facilities). The dependent/outcome variable was the type of CHD, which was categorized as cyanotic and acyanotic.

Data management and analysis

For data analyses, R was used. Categorical data are presented as frequencies and percentages. The data were randomly divided into two parts for modeling and validation: The first part (85%) was used for training the model, and the second part (15%) was used for validation of the model. For multivariate outlier detection in the generalized linear model, different measurements were used, i.e., Cook’s distance[28], modified Cook’s distance[29], leverage[30], Andrew’s Pregibon[31], Welsch’s distance[32], and covariance ratio[33]. Those cases were considered outliers that were jointly identified by all the above methods. The prediction performance for predicting the type of CHD was evaluated using subset logistic regression (SLR)[34], subset logistic regression after deletion (SLRAD), and the machine learning model ANN[35]. The performance of the models was compared using the area under the receiver operating characteristic (ROC) curve (AUC) and its 95% confidence interval, sensitivity, and specificity. In ANN models, the best generalization is achieved by using a model whose complexity is the most appropriate to produce an adequate fit of the data. In Supplementary material, the mathematical and procedural details of the diagnostic measures of outliers and all models are described.

RESULTS

There were 53.6% of males and 46.6% of females who had acyanotic CHD, and 54.5% of males and 45.5% of females who had cyanotic CHD. The children with acyanotic CHD who had a family history of diabetes accounted for 36.0%, and 40.3% of children with cyanotic CHD had a family history of diabetes. The results of univariate analyses are presented in Table 1.

Table 1 Descriptive analysis of categorical data of congenital heart disease, n (%).
Variable
Category
Acyanotic
Cyanotic
Variable
Category
Acyanotic
Cyanotic
GenderFemale1258 (46.4)518 (45.5)Father’s educationUneducated1102 (40.7)702 (59.0)
Male1452 (53.6)672 (54.5)Primary/middle1220 (45.0)354 (29.7)
DiabetesNo1734(64.0)710 (59.7)Secondary/higher326 (12.0)118 (9.9)
Yes976 (36.0)480 (40.3)Graduate52 (1.9)12 (1.0)
SmokingNo1318 (48.6)664 (55.8)Masters or higher10 (0.4)4 (0.3)
Yes1392 (51.4)526 (44.2)Father’s occupationDead/ unemployed4 (0.1)4 (0.3)
Family HistoryNo858 (31.7)510 (42.9)Labour/former1866 (68.9)826 (69.4)
Yes1852 (68.3)680 (57.1)Private job194 (7.2)20 (1.7)
Anemia during pregnancyNo2598 (95.9)1150 (96.6)Small business620 (22.9)328 (27.6)
Yes112 (4.1)40 (3.4)Civil servant26 (1.0)12 (1.0)
InactiveNo800 (29.5)342 (28.7)AreaRural1604 (59.2)878 (73.8)
Yes1910 (70.5)848 (71.3)Urban1106 (40.8)312 (26.2)
Fast food during pregnancyNo1356 (50.0)486 (40.8)Home environmentPoor1650 (60.9)518 (43.5)
Yes1354 (50.0)704 (59.2)Normal590 (21.8)400 (33.6)
Low-calorie food during pregnancyNo1036 (38.2)472 (39.7)Good470 (17.3)272(22.9)
Yes1674 (61.8)718 (60.3)Health conditionPoor602 (22.2)402 (33.8)
Nutrition during pregnancyPoor1558 (57.5)492 (41.3)Normal1660 (61.3)522 (43.9)
Normal532 (19.6)382 (32.1)Good448 (16.5)266 (22.4)
Good620 (22.9)316 (26.6)Interaction with partner during pregnancyPoor1592 (58.7)498 (41.8)
Staple food during pregnancyNo1720 (63.5)692 (58.2)Normal558 (20.6)392 (32.9)
Yes990 (36.5)498 (41.8)Good560 (20.7)300 (25.2)
Income< 1000020 (0.70)18 (1.5)Health care qualityPoor1540 (56.8)506 (42.5)
10000 to 200002076 (76.6)956 (80.3)Normal660 (24.4)414 (34.8)
> 20000614 (22.7)216(18.2)Good510 (18.8)270 (22.7)
Mother’s educationUneducated1492 (55.1)690 (58.0)Health care accessNo2034 (75.1)770 (64.7)
Primary/middle1002 (37.0)404 (33.9)Yes676 (24.9)420 (35.3)
Secondary/higher200 (7.4)90 (7.6)Housing tenureOwned2628 (97.0)1164 (97.8)
Graduate8 (0.3)6 (0.5)Rented82 (3.0)26 (2.2)
Masters or higher8 (0.3)0 (0.0)Housing conditionPoor630 (23.2)528 (44.4)
Normal1742 (64.3)548 (46.1)
Good338 (12.5)114 (9.6)

Figure 1 shows the graphs of influential diagnostic measures for CHD. In this figure, the circles show the observation of the data, the red line shows the cut point of the measure, and the points along with the observation number that are beyond the cut point were identified as influential observations for each measure. We delete those observations that were commonly identified as outliers by all the diagnostic measures.

Figure 1
Figure 1 Graphs of influential diagnostic measures. A: Detection using Cook’s distance method; B: Detection using leverage method; C: Detection using covariance ratio method; D: Detection using modified Cook’s distance method; E: Detection using Andrew’s Pregibon method; F: Detection using Walsh’s distance method.

The results of the logistic regression analysis is given in Table 2. The results of SLR showed that family history of heart disease, use of fast-food during pregnancy, use of staple food during pregnancy, poor nutrition during pregnancy, low family monthly income, uneducated parents, urban area, poor quality of health care facilities, rented house, and poor housing condition were significant risk factors for CHD. The results of SLRAD showed that family history of heart disease, use of fast-food during pregnancy, poor nutrition during pregnancy, low family monthly income, uneducated father, urban area, poor quality of health care facilities, rented house, and poor housing conditions were significant risk factors for CHD.

Table 2 Multivariate logistic regression models by using stepwise selection approach.
Variable
Categories4
SLR
SLRAD
OR
95%CI
OR
95%CI
(Intercept)-2.0060.300-13.4000.000-
HistoryYes0.55110.463-0.6560.54110.454-0.646
Fast foodYes1.28921.027-1.6181.33121.056-1.677
Staple foodYes0.79430.609-1.0340.8030.615-1.049
NutritionNormal0.6680.345-1.2940.6980.382-1.273
Poor0.62120.409-0.9420.57110.382-0.853
Children-1.36811.267-1.4761.36511.264-1.473
Family income< 200000.27610.124-0.6140.26310.118-0.588
10000 to 200000.39620.181-0.8660.39020.177-0.857
Mother educationMaster or higher0.0000-2.96E+1630.998-
Primary/middle0.4740.129-1.7443.96E+05-
Secondary/higher 0.7810.208-2.9337.09E+05-
Uneducated0.30030.082-1.1062.58E+05-
Father’s educationMaster or higher0.3910.053-2.9070.000-
Primary/middle1.9840.779-5.0532.63030.907-7.624
Secondary/higher 1.9500.775-4.9092.59930.902-7.489
Uneducated6.22112.395-16.1608.51612.875-25.225
Father’s occupationDead/unemployed0.6720.101-4.4788.65E+05-
Labour/former0.33620.123-0.9140.7360.196-2.768
Private job0.08210.026-0.2590.14510.035-0.611
Small business0.4650.168-1.2920.9860.259-3.761
Dwelling areaUrban0.58210.478-0.7100.55410.453-0.678
Partner interactionNormal 1.0600.583-1.927--
Poor0.7320.496-1.081--
Quality of health care facilitiesNormal0.46820.25-0.8780.40910.226-0.743
Poor0.6820.421-1.1040.50310.316-0.802
Housing tenureRented0.51120.276-0.9460.43720.227-0.841
Housing conditionNormal1.49821.053-2.1311.55421.086-2.225
Poor2.85212.04-3.9873.00412.136-4.225

Figure 2 shows the weight of each input variable, and the weights were obtained by the ANN model for CHD through normalizing importance. According to importance, the most important risk factors for CHD were obtained: Father's education, family income, father’s occupation, health condition, mother’s education, nutrition, and number of children in family. In all the important factors, mother's education, nutrition status, and number of children in family had positive weight, while father’s education, family income, father’s occupation, and health condition had negative weight.

Figure 2
Figure 2 Weights according to importance of variables by artificial neural network.

Figure 3 demonstrates the sequence of each predictor and describes the final ANN fitted model for CHD, which was generated by plotting each risk factor by normalized importance. In the ANN model for CHD, there were 20 input variables, 4 hidden variables, and 1 output variable.

Figure 3
Figure 3 Modeling structure of artificial neural network with weights of each node.

Table 3 shows the comparison of all models by AUC and its 95% confidence interval, sensitivity, and specificity. The results showed that the ANN model had the highest AUC at 0.901 (95%CI: 0.892–0.910) with a sensitivity and specificity of 65.76% and 97.23%, respectively. The SLRAD model had the second highest AUC at 0.886 (95%CI: 0.876-0.896), with a sensitivity of 57.69% and specificity of 98.69%. The SLRM model had the third-highest AUC at 0.860 (95%CI: 0.849-0.871) with a sensitivity of 49.62% and specificity of 98.38%. Figure 4 also shows that the ANN model had the highest diagnostic accuracy for CHD.

Figure 4
Figure 4 Receiver operating characteristic curves for comparison of subset logistic regression, subset logistic regression after deletion, and artificial neural network model. SLR: Subset logistic regression; SLRAD: Subset logistic regression after deletion; ANN: Artificial neural network.
Table 3 Performance comparison of models.
Model
AUC
95%CI
Sensitivity
Specificity
SLRM86.010.849–0.87149.6298.38
SLRAD88.570.876-0.89657.6998.69
ANN90.120.892–0.91065.7697.23
DISCUSSION

The results of the current study show that acyanotic CHD is more common in children as compared to cyanotic CHD, which is consistent with the findings of a previous study done in Pakistan[6]. Our results show that the odds of having cyanotic CHD was 1.28 times higher for children whose mothers used fast food during pregnancy as compared to those whose mothers did not use. The odds of having cyanotic CHD was 6.22 times higher for children whose father was uneducated as compared to those whose father was educated. The odds of having cyanotic CHD was 1.49 times higher for children whose mothers had normal housing conditions as compared to those whose mothers had good housing conditions. Children who had a family history of heart disease had 0.55 times the odds of having acyanotic CHD as compared to those who had not. Children whose mothers used the staple food during pregnancy had 0.79 times the odds of having acyanotic CHD as compared to those whose mother did not. Male children were more affected by cyanotic and acyanotic CHD as compared to female children. A study in China has similar findings[17]. The result of our study shows that family history of heart disease is a risk factor for CHD, in agreement with the results of the studies in Egypt and China[16,18]. The result of the model comparison shows that the ANN model had the highest diagnostic accuracy. The result of analysis based on the ANN, the best-selected model, shows that father’s education, family income, father’s occupation, health condition of other people's living in home, mother’s education, nutrition, and number of children in family are risk factors for cyanotic and acyanotic CHD in children. A study in China also concluded that mother’s education level is a risk factor for CHD[17,21]. A study in Pakistan also supports our findings, i.e., health condition of other people living in home, and quality and access to basic health care facilities are risk factors of cyanotic and acyanotic CHD in children[11].

The field of machine learning has undergone significant advancements in recent years, leading to a surge in the development of innovative models that can accurately predict disease[36]. The ANN and machine learning models can analyze medical images, genetic data, and patient information to predict the risk factors of disease, detect early warning signs, and recommend preventive measures[37]. In the current study, we used different machine learning models to predict cyanotic and acyanotic CHD in children. One recent study reported that the neural network model is an accurate decision support tool in diagnosing CHD[38]. Another study shows that the ANN model yields the best accuracy while predicting CHD in children[39]. The results of another study show that the best predictive model for CHD children was machine learning models and the AUC values for those models ranged from 0.81 to 0.83[40].

CONCLUSION

Children having a family history of heart disease are at very high risk of developing cyanotic and acyanotic CHD. The incidence of cyanotic CHD can be reduced by limiting fast food during pregnancy. Similarly, reducing the number of children can also minimize the incidence of CHD. Moreover, mothers with an uneducated partner and poor housing conditions are at high risk of birthing a child having cyanotic CHD. Similarly, the incidence of acyanotic CHD can be reduced by adopting good dietary habits (high nutrition food and rich calorie food) during pregnancy. Families with low income, uneducated mothers, and those living in urban areas are at higher risk of birthing a child having cyanotic CHD. The best fit model for our data is ANN, which can be used for earlier diagnostics. This prediction model can help medical practitioners and experts to identify the risk and make earlier diagnoses of cyanotic and acyanotic CHD during pregnancy, which will improve healthcare.

Limitations and future directions

The accuracy of the models may be limited by the quality and availability of the regional data. This can be improved by using large nationwide data. For future studies, investigating new features and feature engineering techniques can help improve model performance. Developing models that are more interpretable can help clinicians understand why certain predictions are made. Validating the models prospectively can help establish their clinical utility. Comparing the performance of different machine learning models can help identify the best approach.

ACKNOWLEDGEMENTS

We thank the Prince of Songkhla University for providing Haris Khurram with the post-doctoral fellowship. We also want to thank Ms. Sadia Ashfaq, Lecturer, National University of Computer Science, CFD Campus, Pakistan, for providing the honorary service for language editing, which greatly enhanced the quality of our manuscript.

Footnotes

Provenance and peer review: Invited article; Externally peer reviewed.

Peer-review model: Single blind

Specialty type: Cardiac and cardiovascular systems

Country of origin: Thailand

Peer-review report’s classification

Scientific Quality: Grade D

Novelty: Grade C

Creativity or Innovation: Grade C

Scientific Significance: Grade C

P-Reviewer: Jeyaram K S-Editor: Liu JH L-Editor: Wang TQ P-Editor: Yu HG

References
1.  Shabana NA, Shahid SU, Irfan U. Genetic Contribution to Congenital Heart Disease (CHD). Pediatr Cardiol. 2020;41:12-23.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 25]  [Cited by in F6Publishing: 25]  [Article Influence: 6.3]  [Reference Citation Analysis (0)]
2.  Mohammad N, Shaikh S, Memon S, Das H. Spectrum of heart disease in children under 5 years of age at Liaquat University Hospital, Hyderabad, Pakistan. Indian Heart J. 2014;66:145-149.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 8]  [Cited by in F6Publishing: 14]  [Article Influence: 1.4]  [Reference Citation Analysis (0)]
3.  Balat MS, Sahu SK. Congenital heart disease: factor affecting it and role of RBSK in dealing with situation. Int J Community Med Public Health. 2018;5:4437.  [PubMed]  [DOI]  [Cited in This Article: ]
4.  Humayun KN, Atiq M. Clinical profile and outcome of cyanotic congenital heart disease in neonates. J Coll Physicians Surg Pak. 2008;18:290-293.  [PubMed]  [DOI]  [Cited in This Article: ]
5.  Masood N, Sharif M, Asghar R, Qamar M, Hussain I. Frequency of congenital heart diseases at Benazir Bhutto Hospital Rawalpindi. Ann Pak Inst Med Sci. 2010;6:120-123.  [PubMed]  [DOI]  [Cited in This Article: ]
6.  Farooqui R, Haroon UF, Niazi A, Rehan N, Butt T, Niazi M. Congenital heart diseases in neonates. JRMC. 2010;14:31-33.  [PubMed]  [DOI]  [Cited in This Article: ]
7.  Pathan IH, Bangash SK, Khawaja AM. Spectrum of heart defects in children presenting for paediaric cardiac surgery. Pak Heart J. 2016;49.  [PubMed]  [DOI]  [Cited in This Article: ]
8.  Hussain M, Hussain S, Krishin J, Abbasi S. Presentation of congestive cardiac failure in children with ventricular septal defect. J Ayub Med Coll Abbottabad. 2010;22:135-138.  [PubMed]  [DOI]  [Cited in This Article: ]
9.  Rizvi SF, Mustafa G, Kundi A, Khan MA. Prevalence of congenital heart disease in rural communities of Pakistan. J Ayub Med Coll Abbottabad. 2015;27:124-127.  [PubMed]  [DOI]  [Cited in This Article: ]
10.  Ul Haq F, Jalil F, Hashmi S, Jumani MI, Imdad A, Jabeen M, Hashmi JT, Irfan FB, Imran M, Atiq M. Risk factors predisposing to congenital heart defects. Ann Pediatr Cardiol. 2011;4:117-121.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 19]  [Cited by in F6Publishing: 20]  [Article Influence: 1.7]  [Reference Citation Analysis (0)]
11.  Shahid S, Akbar A. Conventional and non-conventional risk factors of cyanotic and acyanotic congenital heart diseases in children of southern Punjab, Pakistan. Pak Heart J. 2020;53.  [PubMed]  [DOI]  [Cited in This Article: ]
12.  Pate N, Jawed S, Nigar N, Junaid F, Wadood AA, Abdullah F. Frequency and pattern of congenital heart defects in a tertiary care cardiac hospital of Karachi. Pak J Med Sci. 2016;32:79-84.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 8]  [Cited by in F6Publishing: 9]  [Article Influence: 1.1]  [Reference Citation Analysis (0)]
13.  Aman W, Sherin A, Hafizullah M. Frequency of congenital heart diseases in patients under the age of twelve years at Lady Reading Hospital Peshawar. JPMI. 2006;20.  [PubMed]  [DOI]  [Cited in This Article: ]
14.  Kapoor R, Gupta S. Prevalence of congenital heart disease, Kanpur, India. Indian Pediatr. 2008;45:309-311.  [PubMed]  [DOI]  [Cited in This Article: ]
15.  Abqari S, Gupta A, Shahab T, Rabbani MU, Ali SM, Firdaus U. Profile and risk factors for congenital heart defects: A study in a tertiary care hospital. Ann Pediatr Cardiol. 2016;9:216-221.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 28]  [Cited by in F6Publishing: 40]  [Article Influence: 5.0]  [Reference Citation Analysis (0)]
16.  Settin A, Almarsafawy H, Alhussieny A, Dowaidar M. Dysmorphic Features, Consanguinity and Cytogenetic Pattern of Congenital Heart Diseases: a pilot study from Mansoura Locality, Egypt. Int J Health Sci (Qassim). 2008;2:101-111.  [PubMed]  [DOI]  [Cited in This Article: ]
17.  Liu S, Liu J, Tang J, Ji J, Chen J, Liu C. Environmental risk factors for congenital heart disease in the Shandong Peninsula, China: a hospital-based case-control study. J Epidemiol. 2009;19:122-130.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 46]  [Cited by in F6Publishing: 48]  [Article Influence: 3.2]  [Reference Citation Analysis (0)]
18.  Pei L, Kang Y, Zhao Y, Yan H. Prevalence and risk factors of congenital heart defects among live births: a population-based cross-sectional survey in Shaanxi province, Northwestern China. BMC Pediatr. 2017;17:18.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 23]  [Cited by in F6Publishing: 24]  [Article Influence: 3.4]  [Reference Citation Analysis (0)]
19.  van der Linde D, Konings EE, Slager MA, Witsenburg M, Helbing WA, Takkenberg JJ, Roos-Hesselink JW. Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysis. J Am Coll Cardiol. 2011;58:2241-2247.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1678]  [Cited by in F6Publishing: 2003]  [Article Influence: 166.9]  [Reference Citation Analysis (0)]
20.  Feng Y, Yu D, Yang L, Da M, Wang Z, Lin Y, Ni B, Wang S, Mo X. Maternal lifestyle factors in pregnancy and congenital heart defects in offspring: review of the current evidence. Ital J Pediatr. 2014;40:85.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 24]  [Cited by in F6Publishing: 27]  [Article Influence: 2.7]  [Reference Citation Analysis (0)]
21.  Yu D, Feng Y, Yang L, Da M, Fan C, Wang S, Mo X. Maternal socioeconomic status and the risk of congenital heart defects in offspring: a meta-analysis of 33 studies. PLoS One. 2014;9:e111056.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 26]  [Cited by in F6Publishing: 37]  [Article Influence: 3.7]  [Reference Citation Analysis (0)]
22.  Mari MA, Cascudo MM, Alchieri JC. Congenital Heart Disease and Impacts on Child Development. Braz J Cardiovasc Surg. 2016;31:31-37.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 8]  [Cited by in F6Publishing: 12]  [Article Influence: 1.5]  [Reference Citation Analysis (0)]
23.  Barbiero SM, D'Azevedo Sica C, Schuh DS, Cesa CC, de Oliveira Petkowicz R, Pellanda LC. Overweight and obesity in children with congenital heart disease: combination of risks for the future? BMC Pediatr. 2014;14:271.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 35]  [Cited by in F6Publishing: 41]  [Article Influence: 4.1]  [Reference Citation Analysis (0)]
24.  Wacker-Gussmann A, Oberhoffer-Fritz R. Cardiovascular Risk Factors in Childhood and Adolescence. J Clin Med. 2022;11.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 1]  [Cited by in F6Publishing: 8]  [Article Influence: 4.0]  [Reference Citation Analysis (0)]
25.  Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, Aldairem A, Alrashed M, Bin Saleh K, Badreldin HA, Al Yami MS, Al Harbi S, Albekairy AM. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023;23:689.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 6]  [Cited by in F6Publishing: 174]  [Article Influence: 174.0]  [Reference Citation Analysis (0)]
26.  Byeon H, Gc P, Hannan SA, Alghayadh FY, Soomar AM, Soni M, Bhatt MW. Deep neural network model for enhancing disease prediction using auto encoder based broad learning. SLAS Technol. 2024;29:100145.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
27.  Taherdoost H. Deep Learning and Neural Networks: Decision-Making Implications. Symmetry. 2023;15:1723.  [PubMed]  [DOI]  [Cited in This Article: ]
28.  Belle V, Papantonis I. Principles and Practice of Explainable Machine Learning. Front Big Data. 2021;4:688969.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 224]  [Cited by in F6Publishing: 107]  [Article Influence: 35.7]  [Reference Citation Analysis (0)]
29.  Shahid S  Statistical Modeling of Epidemiology of Cardiovascular Diseases in Children (Doctoral dissertation, Bahauddin Zakariya University Multan, Pakistan). 2022.  [PubMed]  [DOI]  [Cited in This Article: ]
30.  Belsley DA, Kuh E, Welsch RE.   Regression diagnostics: Identifying influential data and sources of collinearity. New York and Chichester: John Wiley & Sons, 1980.  [PubMed]  [DOI]  [Cited in This Article: ]
31.  Bagheri A, Midi H, Imon A. The Effect of Collinearity-influential Observations on Collinear Data Set: A Monte Carlo Simulation Study. J App Sci. 2010;10:2086-2093.  [PubMed]  [DOI]  [Cited in This Article: ]
32.  Van den Broeck J, Cunningham SA, Eeckels R, Herbst K. Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med. 2005;2:e267.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 200]  [Cited by in F6Publishing: 149]  [Article Influence: 7.8]  [Reference Citation Analysis (0)]
33.  Ullah MA, Pasha GR. The origin and developments of influence measures in regression. PJS. 2009;25.  [PubMed]  [DOI]  [Cited in This Article: ]
34.  Hosmer DW, Lemeshow S, Sturdivant RX.   Applied logistic regression. Canada: Jhon Wiley & Sons, 2013.  [PubMed]  [DOI]  [Cited in This Article: ]
35.  Shahid S, Khurram H, Billah B, Akbar A, Shehzad MA, Shabbir MF. Machine learning methods for predicting major types of rheumatic heart diseases in children of Southern Punjab, Pakistan. Front Cardiovasc Med. 2022;9:996225.  [PubMed]  [DOI]  [Cited in This Article: ]  [Reference Citation Analysis (0)]
36.  Shivahare BD, Singh J, Ravi V, Chandan RR, Alahmadi TJ, Singh P, Diwakar M. Delving into Machine Learning's Influence on Disease Diagnosis and Prediction. TOPHJ. 2024;17.  [PubMed]  [DOI]  [Cited in This Article: ]
37.  Kumar Y, Koul A, Singla R, Ijaz MF. Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. J Ambient Intell Humaniz Comput. 2023;14:8459-8486.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 112]  [Cited by in F6Publishing: 147]  [Article Influence: 73.5]  [Reference Citation Analysis (0)]
38.  Hoodbhoy Z, Jiwani U, Sattar S, Salam R, Hasan B, Das JK. Diagnostic Accuracy of Machine Learning Models to Identify Congenital Heart Disease: A Meta-Analysis. Front Artif Intell. 2021;4:708365.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 4]  [Cited by in F6Publishing: 10]  [Article Influence: 3.3]  [Reference Citation Analysis (0)]
39.  Rani S, Masood S. Predicting congenital heart disease using machine learning techniques. JDMSC. 2020;23:293-303.  [PubMed]  [DOI]  [Cited in This Article: ]
40.  Guo K, Fu X, Zhang H, Wang M, Hong S, Ma S. Predicting the postoperative blood coagulation state of children with congenital heart disease by machine learning based on real-world data. Transl Pediatr. 2021;10:33-43.  [PubMed]  [DOI]  [Cited in This Article: ]  [Cited by in Crossref: 4]  [Cited by in F6Publishing: 4]  [Article Influence: 1.3]  [Reference Citation Analysis (0)]