TY - JOUR
T1 - Data science without borders
T2 - bridging the divide in data science capacity across African health institutions
AU - Kiragga, Agnes N.
AU - Iddi, Samuel
AU - Walekhwa, Abel W.
AU - Barasa, Miranda
AU - Cygu, Steve
AU - Odhiambo, Rachel
AU - Gningue, Moctar
AU - Mboup, Aminata
AU - Onana, Anicet
AU - Adnew, Bethlehem
AU - Alemu Ashuro, Akililu
AU - Hudson, Simon
AU - Greenfield, Jay
AU - Todd, Jim
AU - Bhattacharjee, Tathagata
AU - Sharan, Malvika
AU - Sonabend, Raphael
AU - Kadengye, Damazo
AU - Mbatchou, Bertrand Hugo
AU - Abdissa, Alemseged
AU - Sarr, Moussa
AU - Nabende, Joyce Nakatumba
AU - Bamutura, Moses
AU - Tamurat, Bekure
AU - Dereje, Nebiyu
AU - Temfack, Elvis
N1 - Publisher Copyright:
Copyright © 2025 Kiragga, Iddi, Walekhwa, Barasa, Cygu, Odhiambo, Gningue, Mboup, Onana, Adnew, Alemu Ashuro, Hudson, Greenfield, Todd, Bhattacharjee, Sharan, Sonabend, Kadengye, Mbatchou, Abdissa, Sarr, Nabende, Bamutura, Tamurat, Dereje and Temfack.
PY - 2025
Y1 - 2025
N2 - Background: Effective public health data science in Africa requires a comprehensive understanding of institutional capabilities across multiple dimensions. This study conducted a multidimensional assessment of three African health institutions to examine the availability of health data, healthcare training, data governance needs, and infrastructure capabilities, to inform the use of data science tools to address health challenges. Methods: The study is a baseline assessment for the Data Science Without Borders Project—a three-year multi-country project implemented in three African health institutions: the Institute for Health Research, Epidemiological Surveillance and Training (IRESSEF) in Senegal, Armauer Hansen Research Institute (AHRI) in Ethiopia, and the Douala General Hospital (DGH) in Cameroon. We designed a baseline structured needs assessment survey to assess: (1) health data availability across sixteen (16) dataset categories; (2) training needs across seven (7) domains, data governance considerations; and (3) infrastructure capabilities, including computing resources, connectivity, and service availability. We then conducted an integrated analysis to identify patterns, gaps, and opportunities across various dimensions, informing project implementation. Results: The assessment revealed different institutional profiles with complementary strengths and limitations, which are critical for the effective use of data science tools. IRESSEF demonstrated rich data resources (particularly in genomics, maternal health, and geographical health differences), moderate infrastructure limitations (8GB RAM, 67% service capability), and high training needs (data & analytics: 4.7/5.0, data governance: 4.0/5.0). AHRI exhibited superior computing resources (512GB RAM, 64 CPU cores), specialized surveillance data (9.9%), and moderate training needs (average: 3.0/5.0). DGH demonstrated focused strengths in infectious disease research (3.3%), moderate computing resources (32 GB RAM), and large opportunities to use electronic health records for research. Common priorities across institutions included the need for enhancing data & analytical capabilities (average: 4.3/5.0) and use of advanced [artificial intelligence and machine learning analysis techniques (IRESSEF: 5.0, AHRI: 4.0, DGH: 5.0)], and very importantly, the need to establish data governance structures to increase the ability and capacity of the partners to share data for consortium collaborative analyses across Africa. Conclusion: Our integrated assessment suggests that effective capacity building requires moving beyond standardized approaches to embrace a phased model that leverages institutional needs and complementarities. We recommend: (1) establishing robust data governance frameworks as a foundation; (2) implementing a phased and customized approach where institutions receive training according to their immediate demands and strengths; (3) addressing critical infrastructure gaps to support data. We are involved in science projects in Africa that support federated analyses to maintain data sovereignty. This approach offers potential for a varying African approach to health data science, which could extend to AI adoption and broader continental collaboration.
AB - Background: Effective public health data science in Africa requires a comprehensive understanding of institutional capabilities across multiple dimensions. This study conducted a multidimensional assessment of three African health institutions to examine the availability of health data, healthcare training, data governance needs, and infrastructure capabilities, to inform the use of data science tools to address health challenges. Methods: The study is a baseline assessment for the Data Science Without Borders Project—a three-year multi-country project implemented in three African health institutions: the Institute for Health Research, Epidemiological Surveillance and Training (IRESSEF) in Senegal, Armauer Hansen Research Institute (AHRI) in Ethiopia, and the Douala General Hospital (DGH) in Cameroon. We designed a baseline structured needs assessment survey to assess: (1) health data availability across sixteen (16) dataset categories; (2) training needs across seven (7) domains, data governance considerations; and (3) infrastructure capabilities, including computing resources, connectivity, and service availability. We then conducted an integrated analysis to identify patterns, gaps, and opportunities across various dimensions, informing project implementation. Results: The assessment revealed different institutional profiles with complementary strengths and limitations, which are critical for the effective use of data science tools. IRESSEF demonstrated rich data resources (particularly in genomics, maternal health, and geographical health differences), moderate infrastructure limitations (8GB RAM, 67% service capability), and high training needs (data & analytics: 4.7/5.0, data governance: 4.0/5.0). AHRI exhibited superior computing resources (512GB RAM, 64 CPU cores), specialized surveillance data (9.9%), and moderate training needs (average: 3.0/5.0). DGH demonstrated focused strengths in infectious disease research (3.3%), moderate computing resources (32 GB RAM), and large opportunities to use electronic health records for research. Common priorities across institutions included the need for enhancing data & analytical capabilities (average: 4.3/5.0) and use of advanced [artificial intelligence and machine learning analysis techniques (IRESSEF: 5.0, AHRI: 4.0, DGH: 5.0)], and very importantly, the need to establish data governance structures to increase the ability and capacity of the partners to share data for consortium collaborative analyses across Africa. Conclusion: Our integrated assessment suggests that effective capacity building requires moving beyond standardized approaches to embrace a phased model that leverages institutional needs and complementarities. We recommend: (1) establishing robust data governance frameworks as a foundation; (2) implementing a phased and customized approach where institutions receive training according to their immediate demands and strengths; (3) addressing critical infrastructure gaps to support data. We are involved in science projects in Africa that support federated analyses to maintain data sovereignty. This approach offers potential for a varying African approach to health data science, which could extend to AI adoption and broader continental collaboration.
KW - African health informatics
KW - capacity building
KW - cross-institutional collaboration
KW - data governance
KW - federated infrastructure
UR - https://www.scopus.com/pages/publications/105025379003
U2 - 10.3389/fpubh.2025.1695907
DO - 10.3389/fpubh.2025.1695907
M3 - Article
C2 - 41426703
AN - SCOPUS:105025379003
SN - 2296-2565
VL - 13
JO - Frontiers in Public Health
JF - Frontiers in Public Health
M1 - 1695907
ER -