Synthetic data within a common data model for artificial intelligence applications in maternal health: experience report in the Colombian context

Ever Augusto Torres-Silva , Juan José Gaviria-Jiménez , Ana María Guevara-Zambrano , Laura Herrera-Almanza , José Flórez-Arango , .

Keywords: electronic health records, maternal health, pregnancy, artificial intelligence

Abstract

Introduction. Synthetic data in healthcare is an alternative for generating clinical records that resemble those registered in real clinical scenarios. The benefits of synthetic data are: greater volume of data, the possibility of representing specific patient populations, protection of real-data privacy, and improved data-sharing among different actors.
Objective. To formulate a synthetic data generation model for the gestational care process in Colombia and adapt it to the Observational Medical Outcomes Partnership (OMOP) common data model to facilitate its integration into artificial intelligence applications in maternal health.
Materials and methods. We conducted a case study of fully synthetic data formulation that included some of the most frequent outcomes and conditions during gestation based on a typical care process for pregnant women in Colombia. This approach was complemented by the generation of a common data model to facilitate data integration in future artificial intelligence applications or complementary systems that benefit from a standardized language, regardless of the system or form of classification.
Results. We formulated a model for the synthetic generation of clinical data –applicable to real clinical settings– that spans the entire gestational care until the perinatal period. The model included the most frequent clinical conditions and outcomes, which were diagrammed in the Synthea™ tool with their corresponding clinical probabilities of occurrence based on the reported literature or the usual practice of obstetric specialists in Colombia.
Conclusions. This study demonstrates that the generation of synthetic data applied to the gestational care process in Colombia was feasible and represents a pioneering contribution in the region.

Downloads

Download data is not yet available.

References

Bernstam EV, Smith JW, Johnson TR. What is biomedical informatics? J Biomed Inform. 2010;43:104-10. https://doi.org/10.1016/j.jbi.2009.08.006

Katalinic M, Schenk M, Franke S, Katalinic A, Neumuth T, Dietz A, et al. Generation of a realistic synthetic laryngeal cancer cohort for AI applications. Cancer. 2024;16:639. https://doi.org/10.3390/cancers1603063

Weldon J, Ward T, Brophy E. Generation of synthetic electronic health records using a federated GAN. arXiv. 2021. https://doi.org/10.48550/ARXIV.2109.02543

Callahan TJ, Stefanksi Al, Ostendorf Dm, Wyrwa Jm, Davies Sjd, Hripcsak G, et al. Characterizing patient representations for computational phenotyping. AMIA Annu Symp Proc. 2023:2022:319-28.

United Nations. Millennium Development Goals - GOAL 5: Improve maternal health. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://www.un.org/millenniumgoals/maternal.shtml

Department of Economic and Social Affairs, United Nations. Transforming our world: The 2030 agenda for sustainable development. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://sdgs.un.org/2030agenda

Lozano-Avendaño L, Bohórquez-Ortiz AZ, Zambrano-Plata GE. Implicaciones familiares y sociales de la muerte materna. Univ Salud. 2016;18:364-72.

Sáenz R, Nigenda G, Gómez-Duarte I, Rojas K, Castro A, Serván-Mori E. Persistent inequities in maternal mortality in Latin America and the Caribbean, 1990-2019. Int J Equity Health. 2024;23:96. https://doi.org/10.1186/s12939-024-02100-y

Departamento Nacional de Planeación. Plan Nacional de Desarrollo 2022-2026. Colombia, potencia mundial de la vida. Bogotá: Departamento Nacional de Planeación; 2023.

Organización Panamericana de la Salud. Cada dos minutos muere una mujer por problemas en el embarazo o el parto, 2023. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://www.paho.org/es/noticias/23-2-2023-cada-dos-minutos-muere-mujerpor-problemas-embarazo-parto-organismos-naciones

World Health Organization. Maternal mortality. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://www.who.int/news-room/fact-sheets/detail/maternal-mortality

Gonzales A, Guruswamy G, Smith SR. Synthetic data in health care: A narrative review. PLOS Digit Health. 2023;2:e0000082. https://doi.org/10.1371/journal.pdig.0000082

Mitre Corporation. GitHub - synthetichealth/synthea: synthetic patient population simulator. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://github.com/synthetichealth/synthea

Observational Health Data Sciences and Informatics. Standardized data: The OMOP common data model. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://www.ohdsi.org/data-standardization/

Jones SE, Bradwell KR, Chan LE, McMurry JA, Olson-Chen C, Tarleton J, et al. Who is pregnant? Defining real-world data-based pregnancy episodes in the national COVID Cohort Collaborative (N3C). JAMIA Open. 2023;6:ooad067. https://doi.org/10.1093/jamiaopen/ooad067

Ministerio de Salud y Protección Social. Guías de práctica clínica para la prevención, detección temprana y tratamiento de las complicaciones del embarazo, parto o puerperio. Bogotá: Ministerio de Salud y Protección Social; 2013.

Mitre Corporation. GitHub - synthetichealth/synthea-international: Synthea metadata and configuration files for international locations. Fecha de consulta: 6 de febrero de 2025. Disponible en: https://github.com/synthetichealth/synthea-international

Departamento Administrativo Nacional de Estadísticas. Censo Nacional de Población y Vivienda 2018. Fecha de consulta: 26 de febrero de 2025. Disponible en: https://www.dane.gov.co/index.php/estadisticas-por-tema/demografia-y-poblacion/censo-nacional-depoblacion-y-vivenda-2018

Ministerio de Tecnologías de la Información y las Comunicaciones. DIVIPOLA: códigos de municipios geolocalizados. Fecha de consulta: 26 de febrero de 2025. Disponible en: https://www.datos.gov.co/Mapas-Nacionales/DIVIPOLA-C-digos-municipios-geolocalizados/vafm-j2df/about_data

Ministerio de Salud y Protección Social. Listado de IPS en Colombia según su nivel de complejidad. Fecha de consulta: 26 de febrero de 2025. Disponible en: https://www.datos.gov.co/Salud-y-Protecci-n-Social/Listado-de-IPS-en-Colombia-seg-n-su-nivel-de-compl/ugc5-acjp/about_data

Walonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D, et al. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018;25:230-8. https://doi.org/10.1093/jamia/ocx079

Mitre Corporation. GitHub. GitHub - OHDSI/ETL-Synthea: A package supporting the conversion from Synthea CSV to OMOP CDM. Fecha de consulta: 30 de marzo de 2025. Disponible en: https://github.com/OHDSI/ETL-Synthea

Po L, Thomas J, Mills K, Zakhari A, Tulandi T, Shuman M, et al. Guideline No. 414: Management of pregnancy of unknown location and tubal and nontubal ectopic pregnancies. J Obstet Gynaecol Can. 2021;43:614-30.e1. https://doi.org/10.1016/j.jogc.2021.01.002

American College of Obstetricians and Gynecologists. ACOG Practice Bulletin, number 200: Early pregnancy loss. Obstet Gynecol. 2018;132:e197-207. https://doi.org/10.1097/AOG.0000000000002899

Andersen AMN. Maternal age and fetal loss: Population based register linkage study. BMJ. 2000;320:1708-12. https://doi.org/10.1136/bmj.320.7251.1708

Tong S, Kaur A, Walker Sp, Bryant V, Onwude Jl, Permezel M. Miscarriage risk for asymptomatic women after a normal first-trimester prenatal visit. Obstet Gynecol. 2008;111:710-4. https://doi.org/10.1097/AOG.0b013e318163747c

American Diabetes Association Professional Practice Committee. Management of diabetes in pregnancy: Standards of medical care in diabetes—2022. Diabetes Care. 2022;45(Suppl.1):S232-43. https://doi.org/10.2337/dc22-S015

American College of Obstetricians and Gynecologists. ACOG Practice Bulletin, number 227: Fetal growth restriction. Obstet Gynecol. 2021;137:e16-28. https://doi.org/10.1097/AOG.0000000000004251

American College of Obstetricians and Gynecologists. ACOG Practice Bulletin, number 234: Prediction and prevention of spontaneous preterm birth. Obstet Gynecol. 2021;138:e65-90. https://doi.org/10.1097/AOG.0000000000004479

Rivera Z, René, Caba BF, Smirnow SM, Aguilera TJ, Larraín A. Fisiopatología de la rotura prematura de las membranas ovulares en embarazos de pretérmino. Rev Chil Obstet Ginecol. 2004;69:249-55. https://doi.org/10.4067/S0717-75262004000300013

Maldonado MD, Lombardía J, Rodríguez O, Rincón P, Sánchez Dehesa A. Hemorragias del tercer trimestre. SEMERGEN. 2000;26:192-5. https://doi.org/10.1016/S1138-3593(00)73571-4

American College of Obstetricians and Gynecologists. ACOG Practice Bulletin, number 183: Postpartum hemorrhage. Obstet Gynecol. 2017;130:e168-86. https://doi.org/10.1097/AOG.0000000000002351

Torres E. Maternal health in Colombia: Synthetic data. Fecha de consulta: 26 de marzo de 2025. Disponible en: https://www.kaggle.com/datasets/evertorres/maternal-health-incolombia-synthetic-data

Synthea. Pregnancy module Synthea. GitHub. Fecha de consulta: 25 de febrero de 2025. Disponible en: https://github.com/synthetichealth/synthea/blob/master/src/main/resources/modules/pregnancy.json

Rajotte JF, Bergen R, Buckeridge DL, El Emam K, Ng R, Strome E. Synthetic data as an enabler for machine learning applications in medicine. iScience. 2022;25:105331. https://doi.org/10.1016/j.isci.2022.105331

Prasanna A, Jing B, Plopper G, Miller KK, Sanjak J, Feng A, et al. Synthetic health data can augment community research efforts to better inform the public during emerging pandemics. Preprint. medRxiv. 2023;2023.12.11.23298687. https://doi.org/10.1101/2023.12.11.23298687

Pammi M, Shah PS, Yang LK, Hagan J, Aghaeepour N, Neu J. Digital twins, synthetic patient data, and in silico trials: Can they empower paediatric clinical trials? Lancet Digit Health. 2025;7:100851. https://doi.org/10.1016/j.landig.2025.01.007

Delanerolle G, Phiri P, Cavalini H, Benfield D, Shetty A, Bouchareb Y, et al. Synthetic data and the future of women’s health: A synergistic relationship. Int J Med Inf. 2023;179:105238. https://doi.org/10.1016/j.ijmedinf.2023.105238

How to Cite
1.
Torres-Silva EA, Gaviria-Jiménez JJ, Guevara-Zambrano AM, Herrera-Almanza L, Flórez-Arango J. Synthetic data within a common data model for artificial intelligence applications in maternal health: experience report in the Colombian context. Biomed. [Internet]. 2025 Dec. 10 [cited 2026 Jan. 11];45(Sp. 3):71-92. Available from: https://revistabiomedicaorg.biteca.online/index.php/biomedica/article/view/7937

Some similar items:

Published
2025-12-10

Altmetric

Article metrics
Abstract views
Galley vies
PDF Views
HTML views
Other views
Crossref Cited-by logo
Escanea para compartir
QR Code