Published online Mar 20, 2025. doi: 10.5662/wjm.v15.i1.98626
Revised: August 15, 2024
Accepted: August 29, 2024
Published online: March 20, 2025
Processing time: 90 Days and 3.3 Hours
There is a growing body of clinical research on the utility of synthetic data derivatives, an emerging research tool in medicine. In nephrology, clinicians can use machine learning and artificial intelligence as powerful aids in their clinical decision-making while also preserving patient privacy. This is especially important given the epidemiology of chronic kidney disease, renal oncology, and hypertension worldwide. However, there remains a need to create a framework for guidance regarding how to better utilize synthetic data as a practical appli
Core Tip: The application of synthetic data may help accelerate research development as a tool coupled with traditional, established research datasets. However, this tool is still in its early stages in this clinical area. The current literature focuses on three major areas as of current including renal cell carcinoma, chronic kidney disease, and blood pressure and hypertension.
- Citation: Jamal A, Singh S, Qureshi F. Synthetic data as an investigative tool in hypertension and renal diseases research. World J Methodol 2025; 15(1): 98626
- URL: https://www.wjgnet.com/2222-0682/full/v15/i1/98626.htm
- DOI: https://dx.doi.org/10.5662/wjm.v15.i1.98626
There has been tremendous growth in the use of artificial intelligence as a clinical and translational modality in medicine[1]. Its advantageous profile allows artificial intelligence to amplify the identification of trends, patterns, and relationships within data in conjunction with traditional statistical analytics[2]. In nephrology, artificial intelligence provides clinicians with a powerful tool to aid their clinical decision-making across a breadth of pathology, including in hemodialysis management and transplantation medicine[3-7]. This contributes towards the direction towards more precision-based healthcare for this population of patients given the need for more personalized patient care in the management of renal function and the public health burden of kidney disease. However, as this direction continues, there remains a need to evaluate modalities for creating appropriate research investigations in nephrology and hypertension research. The aim of this article will be to describe an emerging research tool in synthetic healthcare data for improving precision-based healthcare as well as develop a framework for its application in nephrology and hypertension research.
Predictive modeling of renal disease has historically involved data from clinician experience and statistical analysis[8-10]. While these statistical models have been the foundation for numerous guideline-based interventions and management strategies, there remains the presence of obstacles. For example, the presence of nonresponses in data collection responses, or the presence of nonadherence to assigned treatments in clinical trials[11,12]. This leads to adjustments in statistical analyses to account for such obstacles while limiting the capability of a study’s conclusions. However, as global integration grows, clinician and researcher collaboration by using large-scale datasets or “Big Data” assist in reducing these obstacles to improve epidemiological surveillance and predictive analytics as well as genomic and translational research[13].
However, the development and maintenance of these large-scale datasets can require a degree of financial and time investment[14,15]. Let alone, the presence of these resources for clinical investigation in nephrology and hypertension lags behind other clinical interests[16,17]. Similarly, in designing these statistical models in nephrology, a common issue encountered in the data itself is the availability of data which protects patient privacy and consistent ability to keep data de-identified for an individual. This can create a degree of restriction for users to share data and collaborate with outside parties, and therefore increase the timeline towards potential research investigation breakthroughs.
In designing these statistical models in nephrology, a common issue encountered in the data itself is the availability of data which protects patient privacy and consistent ability to keep data de-identified for an individual. This can create a degree of restriction for users to share data and collaborate with outside parties, and therefore increase the timeline towards potential research investigation breakthroughs.
One potential tool to overcoming such obstacles which has been growing in clinical evidence is the use of synthetic data in research investigation. Briefly, this type of data is developed using statistical algorithmic modeling using real-world healthcare data[18,19]. Moreover, this real data is used to train artificial intelligence and deep learning to generate a new dataset (e.g., synthetic data) which aims to preserve patterns and structures found within the original dataset[18-20]. In essence, synthetic data provides the ability to move beyond the patient deidentification process as there no longer remains the direct correspondence with a patient’s data and protects privacy. Likewise, since the barrier of potential patient reidentification is somewhat alleviated in the synthetic dataset, this characteristic creates more ability for cross-collaboration for clinicians and researchers with outside parties to improve research productivity.
The application of synthetic data in nephrology and hypertension research may present an advantageous profile for clinicians and researchers to consider. However, the current body of literature which applies synthetic data in nephrology and hypertension is limited compared to other internal medicine areas. Given its emergence, this paper suggests creating a framework across several research interests.
There are over 400000 newly diagnosed cases of renal cell carcinoma (RCC) annually[21]. Moreover, a bibliometric analysis of RCC has suggested that some of the most in-demand topics within RCC include drug-related clinical trials and immunotherapy[22]. Given the general tendencies of clinical trials to be exposed to the presence of nonadherence to assigned treatments, there is potential for the generation of synthetic data to help further support, or at least compare to, the findings of clinical trials. Moreover, in a study by Sabharwal 2023[23], the development of a synthetic image generation tool which was trained using surgical resection pathological slides which can aid in the detection of RCC, and this adds to the current literature on artificial intelligence in renal histopathology[24,25]. Given this current evidence, future studies could consider designing models which can compare synthetic histological data of RCC to clinical data to clinical trial data to further justify its utility.
The presence of chronic kidney disease affects approximately 1 in every 7 individuals in the United States[26]. Moreover, a bibliometric analysis of chronic kidney disease from 2011 to 2020 suggests that modifiable risk factors including diet management and obesity have been areas of clinical investigation[27]. Given the characteristics of these clinical studies to use electronic health records data, there is heightened awareness for the need to protect patient privacy and compliant de-identification. This provides an opportunity to use synthetic data to be used as a research tool. Moreover, in a recent study that evaluated the performance of synthetically generated data using multiple supervised machine learning algorithms compared to real patient data suggested impressive accuracy for a model[28]. Given this current evidence, the framework for future clinical studies ought to consider utilizing synthetically generated data to evaluate currently established trends in the literature. Moreover, this is imperative given the leveraging of nephology research using synthetically generated datasets is continuing to emerge across related concepts including in dialysis and kidney transplantation[29,30].
The presence of elevated blood pressure and hypertension is well established in clinical literature, affecting over 1 billion individuals worldwide[31]. Bibliometric analytics show over a 40% increase in published research articles related to hypertension in the previous 2 decades[32]. This tremendous body of research has been heavily contributed by individuals in the United States alone compared to other countries. Additionally, given the various epidemiological contributors of hypertension, which can vary across countries (i.e., modifiable risk factors, socioeconomics, etc.), the use of a large-scale dataset from one country may not be as clinically applicable to other country populations[33,34]. Let alone, the current evidence of synthetic data suggests it has equally accurate capability in blood pressure monitoring and prediction but requires further evaluation before greater clinical correlation or ability to apply across multiple study populations[33-35]. However, the use of synthetic data can assist in further refining and identifying these contributors across other countries where modifiable risk factors and socioeconomics vary comparably (i.e., developing vs. developed countries). This can be achieved by synthetic data due to the ability to generate artificial intelligence to create large-scale datasets that may not be as freely accessible across countries. Finally, the use of synthetic data is primarily a research tool at this time, as utilizing these datasets to guide clinical decision-making would be premature without effective clinician involvement.
The application of artificial intelligence, machine learning, and deep learning continues to emerge in nephrology and hypertension research. Synthetic data can serve as an appropriate research tool to further enrich this body of literature from histopathology to population health. Specifically, the framework for research investigation has focused on renal oncology, chronic kidney disease, and blood pressure and hypertension. However, there are other avenues for strong implementation of synthetic data in nephrology and hypertension research, which were not discussed due to a relative paucity of literature on synthetic data applications such as autoimmune kidney diseases, dialysis, or renal-associated syndromes. This may create ample opportunity to accelerate research development.
1. | Guo Y, Hao Z, Zhao S, Gong J, Yang F. Artificial Intelligence in Health Care: Bibliometric Analysis. J Med Internet Res. 2020;22:e18228. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 82] [Cited by in F6Publishing: 138] [Article Influence: 27.6] [Reference Citation Analysis (0)] |
2. | Pappada SM. Machine learning in medicine: It has arrived, let's embrace it. J Card Surg. 2021;36:4121-4124. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 4] [Cited by in F6Publishing: 6] [Article Influence: 1.5] [Reference Citation Analysis (0)] |
3. | Niel O, Bastard P. Artificial Intelligence in Nephrology: Core Concepts, Clinical Applications, and Perspectives. Am J Kidney Dis. 2019;74:803-810. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 50] [Cited by in F6Publishing: 75] [Article Influence: 12.5] [Reference Citation Analysis (0)] |
4. | Yuan Q, Zhang H, Deng T, Tang S, Yuan X, Tang W, Xie Y, Ge H, Wang X, Zhou Q, Xiao X. Role of Artificial Intelligence in Kidney Disease. Int J Med Sci. 2020;17:970-984. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 21] [Cited by in F6Publishing: 29] [Article Influence: 5.8] [Reference Citation Analysis (0)] |
5. | Chaudhuri S, Long A, Zhang H, Monaghan C, Larkin JW, Kotanko P, Kalaskar S, Kooman JP, van der Sande FM, Maddux FW, Usvyat LA. Artificial intelligence enabled applications in kidney disease. Semin Dial. 2021;34:5-16. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 20] [Cited by in F6Publishing: 13] [Article Influence: 3.3] [Reference Citation Analysis (0)] |
6. | Loftus TJ, Shickel B, Ozrazgat-Baslanti T, Ren Y, Glicksberg BS, Cao J, Singh K, Chan L, Nadkarni GN, Bihorac A. Artificial intelligence-enabled decision support in nephrology. Nat Rev Nephrol. 2022;18:452-465. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 24] [Cited by in F6Publishing: 18] [Article Influence: 6.0] [Reference Citation Analysis (0)] |
7. | Büllow RD, Marsh JN, Swamidass SJ, Gaut JP, Boor P. The potential of artificial intelligence-based applications in kidney pathology. Curr Opin Nephrol Hypertens. 2022;31:251-257. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 2] [Cited by in F6Publishing: 8] [Article Influence: 2.7] [Reference Citation Analysis (0)] |
8. | Tangri N, Stevens LA, Griffith J, Tighiouart H, Djurdjev O, Naimark D, Levin A, Levey AS. A predictive model for progression of chronic kidney disease to kidney failure. JAMA. 2011;305:1553-1559. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 734] [Cited by in F6Publishing: 834] [Article Influence: 59.6] [Reference Citation Analysis (0)] |
9. | Krishnamurthy S, Ks K, Dovgan E, Luštrek M, Gradišek Piletič B, Srinivasan K, Li YJ, Gradišek A, Syed-Abdul S. Machine Learning Prediction Models for Chronic Kidney Disease Using National Health Insurance Claim Data in Taiwan. Healthcare (Basel). 2021;9. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 11] [Cited by in F6Publishing: 25] [Article Influence: 6.3] [Reference Citation Analysis (0)] |
10. | Chuah A, Walters G, Christiadi D, Karpe K, Kennard A, Singer R, Talaulikar G, Ge W, Suominen H, Andrews TD, Jiang S. Machine Learning Improves Upon Clinicians' Prediction of End Stage Kidney Disease. Front Med (Lausanne). 2022;9:837232. [PubMed] [DOI] [Cited in This Article: ] [Cited by in F6Publishing: 6] [Reference Citation Analysis (0)] |
11. | Shiovitz TM, Bain EE, McCann DJ, Skolnick P, Laughren T, Hanina A, Burch D. Mitigating the Effects of Nonadherence in Clinical Trials. J Clin Pharmacol. 2016;56:1151-1164. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 64] [Cited by in F6Publishing: 64] [Article Influence: 7.1] [Reference Citation Analysis (0)] |
12. | Jones J. The effects of non-response on statistical inference. J Health Soc Policy. 1996;8:49-62. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 40] [Cited by in F6Publishing: 43] [Article Influence: 1.4] [Reference Citation Analysis (0)] |
13. | Ristevski B, Chen M. Big Data Analytics in Medicine and Healthcare. J Integr Bioinform. 2018;15. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 114] [Cited by in F6Publishing: 111] [Article Influence: 15.9] [Reference Citation Analysis (0)] |
14. | Reith C, Landray M, Devereaux PJ, Bosch J, Granger CB, Baigent C, Califf RM, Collins R, Yusuf S. Randomized clinical trials--removing unnecessary obstacles. N Engl J Med. 2013;369:1061-1065. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 89] [Cited by in F6Publishing: 83] [Article Influence: 6.9] [Reference Citation Analysis (0)] |
15. | Greenberg JK, Landman JM, Kelly MP, Pennicooke BH, Molina CA, Foraker RE, Ray WZ. Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research. Global Spine J. 2023;13:2409-2421. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 3] [Cited by in F6Publishing: 1] [Article Influence: 0.5] [Reference Citation Analysis (0)] |
16. | Kaur N, Bhattacharya S, Butte AJ. Big Data in Nephrology. Nat Rev Nephrol. 2021;17:676-687. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 4] [Cited by in F6Publishing: 7] [Article Influence: 1.8] [Reference Citation Analysis (0)] |
17. | Saez-Rodriguez J, Rinschen MM, Floege J, Kramann R. Big science and big data in nephrology. Kidney Int. 2019;95:1326-1337. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 42] [Cited by in F6Publishing: 47] [Article Influence: 7.8] [Reference Citation Analysis (0)] |
18. | Ive J. Leveraging the potential of synthetic text for AI in mental healthcare. Front Digit Health. 2022;4:1010202. [PubMed] [DOI] [Cited in This Article: ] [Cited by in F6Publishing: 1] [Reference Citation Analysis (0)] |
19. | Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digit Health. 2023;2:e0000082. [PubMed] [DOI] [Cited in This Article: ] [Cited by in F6Publishing: 40] [Reference Citation Analysis (0)] |
20. | Rajotte JF, Bergen R, Buckeridge DL, El Emam K, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience. 2022;25:105331. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 41] [Cited by in F6Publishing: 22] [Article Influence: 7.3] [Reference Citation Analysis (0)] |
21. | Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin. 2020;70:7-30. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 12667] [Cited by in F6Publishing: 14754] [Article Influence: 2950.8] [Reference Citation Analysis (4)] |
22. | Zhou H, Cui F, Lv D, Gong Q, Wen J, Shuang W. Top 100 most-cited articles on renal cell carcinoma: A bibliometric analysis. Medicine (Baltimore). 2023;102:e32926. [PubMed] [DOI] [Cited in This Article: ] [Cited by in F6Publishing: 1] [Reference Citation Analysis (0)] |
23. | Sabharwal Y. Electrical Engineering and Systems Science. NephroNet: A Novel Program for Identifying Renal Cell Carcinoma and Generating Synthetic Training Images with Convolutional Neural Networks and Diffusion Models. 2023. [DOI] [Cited in This Article: ] |
24. | Kowalewski KF, Egen L, Fischetti CE, Puliatti S, Juan GR, Taratkin M, Ines RB, Sidoti Abate MA, Mühlbauer J, Wessels F, Checcucci E, Cacciamani G; Young Academic Urologists (YAU)-Urotechnology-Group. Artificial intelligence for renal cancer: From imaging to histology and beyond. Asian J Urol. 2022;9:243-252. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 1] [Cited by in F6Publishing: 8] [Article Influence: 2.7] [Reference Citation Analysis (0)] |
25. | Chen RJ, Lu MY, Chen TY, Williamson DFK, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng. 2021;5:493-497. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 325] [Cited by in F6Publishing: 195] [Article Influence: 48.8] [Reference Citation Analysis (0)] |
26. | Lv JC, Zhang LX. Prevalence and Disease Burden of Chronic Kidney Disease. Adv Exp Med Biol. 2019;1165:3-15. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 218] [Cited by in F6Publishing: 468] [Article Influence: 78.0] [Reference Citation Analysis (0)] |
27. | Yin T, Chen Y, Tang L, Yuan H, Zeng X, Fu P. Relationship between modifiable lifestyle factors and chronic kidney disease: a bibliometric analysis of top-cited publications from 2011 to 2020. BMC Nephrol. 2022;23:120. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 10] [Cited by in F6Publishing: 2] [Article Influence: 0.7] [Reference Citation Analysis (0)] |
28. | M GH, Shenoy PD, R VK. 2022 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT). Performance Analysis of Real and Synthetic Data using Supervised ML Algorithms for Prediction of Chronic Kidney Disease. India: IEEE, 2022: 1-6. [DOI] [Cited in This Article: ] |
29. | Kim HW, Heo SJ, Kim JY, Kim A, Nam CM, Kim BS. Dialysis adequacy predictions using a machine learning method. Sci Rep. 2021;11:15417. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 2] [Cited by in F6Publishing: 4] [Article Influence: 1.0] [Reference Citation Analysis (0)] |
30. | Seyahi N, Ozcan SG. Artificial intelligence and kidney transplantation. World J Transplant. 2021;11:277-289. [PubMed] [DOI] [Cited in This Article: ] [Cited by in CrossRef: 6] [Cited by in F6Publishing: 13] [Article Influence: 3.3] [Reference Citation Analysis (5)] |
31. | Mills KT, Stefanescu A, He J. The global epidemiology of hypertension. Nat Rev Nephrol. 2020;16:223-237. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 1423] [Cited by in F6Publishing: 1648] [Article Influence: 329.6] [Reference Citation Analysis (3)] |
32. | Devos P, Ménard J. Trends in Worldwide Research in Hypertension Over the Period 1999-2018: A Bibliometric Study. Hypertension. 2020;76:1649-1655. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 43] [Cited by in F6Publishing: 41] [Article Influence: 8.2] [Reference Citation Analysis (0)] |
33. | Arora A, Arora A. Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset. PLoS One. 2023;18:e0283094. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 4] [Reference Citation Analysis (0)] |
34. | Visco V, Izzo C, Mancusi C, Rispoli A, Tedeschi M, Virtuoso N, Giano A, Gioia R, Melfi A, Serio B, Rusciano MR, Di Pietro P, Bramanti A, Galasso G, D'Angelo G, Carrizzo A, Vecchione C, Ciccarelli M. Artificial Intelligence in Hypertension Management: An Ace up Your Sleeve. J Cardiovasc Dev Dis. 2023;10. [PubMed] [DOI] [Cited in This Article: ] [Cited by in F6Publishing: 6] [Reference Citation Analysis (0)] |
35. | Chaikijurajai T, Laffin LJ, Tang WHW. Artificial Intelligence and Hypertension: Recent Advances and Future Outlook. Am J Hypertens. 2020;33:967-974. [PubMed] [DOI] [Cited in This Article: ] [Cited by in Crossref: 24] [Cited by in F6Publishing: 14] [Article Influence: 2.8] [Reference Citation Analysis (0)] |