DHS AI/ML Toolkit
The DHS AI/ML Toolkit is a suite of interoperable tools designed to enable scalable, explainable, and efficient AI workflows tailored for Demographic and Health Surveys (DHS) data. Built with a focus on open science, public health impact, and technical reproducibility, the toolkit supports every stage of the data pipeline — from ingestion and transformation to modeling, visualization, and policy application.
The project was inspired by a 2020 study by Bitew et al., which highlighted the untapped potential of applying data science and machine learning methods to DHS survey data. In response, our team launched this initiative to promote standardized, reproducible AI frameworks and support graduate students, researchers, and organizations working with DHS data.
Whether you are building child mortality risk models, spatial dashboards, or Bayesian inference systems, the DHS AI/ML Toolkit offers powerful components to support data-driven development and decision-making in low-resource and research settings.
Toolkit Components
-
DHS-To-Database-dhs2CSVTables-simplified (Open Source)
Converts DHS datasets into clean CSV tables that can be stored in relational databases such as SQLite and PostgreSQL. It simplifies the data engineering process and makes DHS survey data ready for analysis and model training. -
CIAO BAYESIAN
An Explainable AI (XAI) system for Bayesian modeling and analysis of DHS data. It supports modeling under-5 mortality risks across clusters with features like uncertainty estimation and Bayesian diagnostics. -
KILIMA TULIP AI
A DHS-based machine learning model that predicts the survival of children under 5 years across five countries in Africa with more than 95% accuracy. Its predictions feed into the FLOWER dashboard for visual interpretation. -
DEEP MINTILO AI
A model designed to predict the survival status of a child using 36 variables. It trains on DHS data and includes a detailed workflow for model evaluation and classification accuracy. -
WATOTO SURVIVAL (Open Source)
An R-based application that uses survival analysis techniques such as Kaplan–Meier and Cox regression to estimate under-five mortality risks across different countries and genders using DHS data. -
MINTILO AI (Open Source)
A predictive model for estimating the survival of children under 5 using household features, spatial indicators, and survey metadata. Offers tools to explore feature relevance and country comparisons. -
DHS AI Genesis
DHS AI Genesis was our initial step toward building a platform that applies data science and machine learning to DHS data. It originated as a space to demonstrate how machine learning algorithms can be used to extract insights from household survey data. Genesis marked the beginning of our broader work on the DHS AI/ML Toolkit and remains a reference point for researchers interested in this field.