Open research · Community tools

DHS AI/ML Toolkit

Interoperable, explainable AI workflows for Demographic and Health Surveys—built for reproducibility, open science, and researchers everywhere.

Why this toolkit exists

We share methods and software so graduate students, researchers, and organizations can work with DHS data in a standardized, reproducible way.

  1. DHS survey data
  2. Pipelines & models
  3. Research & policy

What it is

The DHS AI/ML Toolkit is a suite of interoperable tools for scalable, explainable AI workflows on Demographic and Health Surveys (DHS) data— from ingestion and transformation to modeling, visualization, and policy use.

Where it started

Inspired by a 2020 study by Bitew et al. on the potential of data science and ML for DHS data, we launched this initiative to promote reproducible frameworks for students and researchers.

Who it’s for

Whether you are building child mortality risk models, spatial dashboards, or Bayesian systems, these components support data-driven development in low-resource and research settings—child mortality, spatial work, and explainable inference included.

Toolkit components

Explore each building block

Each item links to more detail in our Newsletter archive—jump in where your work begins.

STORYTELLER

A Python toolkit and CLI (storyteller-dhs) that turns DHS databases into an interactive Datasette experience with reusable workflows for querying, exports, and full-text search — designed as a storytelling companion for the DHS AI/ML Toolkit.

Converts DHS datasets into clean CSV tables that can be stored in relational databases such as SQLite and PostgreSQL. It simplifies the data engineering process and makes DHS survey data ready for analysis and model training.

CIAO BAYESIAN

An Explainable AI (XAI) system for Bayesian modeling and analysis of DHS data. It supports modeling under-5 mortality risks across clusters with features like uncertainty estimation and Bayesian diagnostics.

KILIMA TULIP AI

A DHS-based machine learning model that predicts the survival of children under 5 years across five countries in Africa with more than 95% accuracy. Its predictions feed into the FLOWER dashboard for visual interpretation.

DEEP MINTILO AI

A model designed to predict the survival status of a child using 36 variables. It trains on DHS data and includes a detailed workflow for model evaluation and classification accuracy.

WATOTO SURVIVAL

Open source

An R-based application that uses survival analysis techniques such as Kaplan–Meier and Cox regression to estimate under-five mortality risks across different countries and genders using DHS data.

MINTILO AI

Open source

A predictive model for estimating the survival of children under 5 using household features, spatial indicators, and survey metadata. Offers tools to explore feature relevance and country comparisons.

DHS AI Genesis

DHS AI Genesis was our initial step toward building a platform that applies data science and machine learning to DHS data. It originated as a space to demonstrate how machine learning algorithms can be used to extract insights from household survey data. Genesis marked the beginning of our broader work on the DHS AI/ML Toolkit and remains a reference point for researchers interested in this field.