AI Initiative for Public Health and Tropical medicine


The DHS AI WebApp is a user-friendly, SQLite-based database repository specifically designed for DHS survey data. It serves as a powerful resource for Data Science projects, providing researchers, data analysts, and graduate students with easy-to-use querying capabilities tailored to their specific research subjects.

Project Initiative

  • Our project initiative aims to promote the use of data science and machine learning techniques with DHS survey data. It was inspired by a study published by Bitew, et al (2020) that highlighted the untapped potential of utilizing these methods.
  • We are committed to supporting graduate students interested in machine learning and data science. Through our project, we provide valuable resources, guidance, and assistance to help these students explore the vast opportunities offered by DHS survey data in their research endeavors.
  • Additionally, we strive to encourage the adoption of a consistent and unified framework for working with DHS survey data when developing machine learning algorithms. By advocating for standardized practices, we aim to improve the reproducibility and comparability of results across different studies and research projects. Our goal is to establish a solid foundation that enables researchers to effectively harness the power of DHS survey data in their machine learning endeavors.

The DHS Program

The Demographic and Health Surveys Program (DHS) is a crucial initiative managed by ICF International and sponsored by the United States Agency for International Development (USAID), with valuable contributions from esteemed organizations like UNICEF, UNFPA, WHO, and UNAIDS. Its primary objective is to collect and disseminate accurate, nationally representative data on health and population.

Through its extensive work spanning more than 400 surveys conducted in over 90 countries, the DHS Program has successfully gathered, processed, and shared reliable and comprehensive data on critical aspects such as population dynamics, health indicators, HIV/AIDS prevalence, and nutrition. This program has been instrumental in advancing our understanding of these important areas, enabling policymakers, researchers, and organizations to make informed decisions and implement effective interventions.

DISCLAIMER

Please be aware that authorized access to Survey data from the Demographic and Health Surveys (DHS) Program is granted solely for the purpose of statistical reporting and analysis. If you intend to use the data for any other purpose, it is necessary to register a new research project. It is of utmost importance that all DHS data is handled with confidentiality, and no attempt should be made to identify any interviewed households or individual respondents. For detailed terms of use, please refer to: DHS - Terms of Use. Users are also required to submit an electronic copy (pdf) of any reports or publications resulting from the use of DHS data files to: references@dhsprogram.com.

IMPORTANT: We have made the SQLite database available on our DHS AI WebApp platform platform to enhance data science and machine learning approaches specifically for DHS survey data. However, it is essential to clarify that our objective is not to provide access to DHS data to users who are unfamiliar with the DHS Program. If you plan to publish your research study, we strongly advise you to request a new data download directly from the DHS Program. Once granted access to the DHS Program website, KofiyaTech can assist you in converting your data into SQLite databases suitable for data science and machine learning studies.

ETL (Extract, Transform, Load) Process

DHS-To-Database by Harry Gibson is a Python software program designed to parse and load data obtained from the Demographic and Health Surveys (DHS) Program into a relational database format. This tool enables the seamless integration and analysis of multiple surveys on a large scale, facilitating reliable and repeatable pooling and cross-sectional analysis.

Harry Gibson, the creator of this ETL (Extract, Transform, Load) methodology, has generously shared the code and related resources in a dedicated GitHub repository. In addition to developing the software, he has also conducted reverse-engineering of the CSPro data format and thoroughly explored the DHS survey structure.

The methods presented and demonstrated in this repository have been successfully utilized in numerous studies, resulting in the collection and preparation of datasets that have been extensively published. Researchers have benefited from the application of these methods to investigate a wide range of topics within the DHS survey domain.

How to get access and view the database after download?

Please follow the instructions below to gain access and view the database after downloading the zipped file:

  • Database Type: The database is in the SQLite format, ensuring ease of use and compatibility.
  • Extracting the Data: Begin by extracting the contents of the downloaded file. We recommend using the 7-zip software tool to unzip the file efficiently.
  • Opening the Database: Once the file is successfully unzipped, you can access the database using the DB Browser for SQLite software tool. This user-friendly tool allows you to explore and interact with the database effectively.

Within the database, each DHS data type (e.g., HR, IR, KR, etc.) is organized into three distinct types of tables:

  • Main Analysis Table: This table houses the raw data sets, enabling statistical analysis for a wide range of research purposes. Within the database, you will come across a table named "ET_2005_DHS_07082021_1930_58107.ETHR51FL.RECORD1" as an example of how this table is labeled for Household Record (HR).
  • Variable Name Metadata Table: Here, you'll find comprehensive descriptions of the variable names used in the survey data. This valuable information helps researchers understand the context and meaning behind each variable. Within the database, you will come across a table named "ET_2005_DHS_07082021_1930_58107.ETHR51FL.FlatRecordSpec" as an example of how the table is labeled for Household Record (HR).
  • Variable Value Metadata Table: This table provides recoding values specifically for categorical variables found in the survey data. It allows researchers to interpret and analyze categorical variables accurately. Within the database, you will find a table with a name such as "ET_2005_DHS_07082021_1930_58107.ETHR51FL.FlatValuesSpec" as an example of how the table is labeled for Household Record (HR).

Please do not hesitate to reach out to us if you require any assistance with the software installation process. Our team is here to support you and ensure that you can easily set up the necessary tools for accessing and utilizing the database.

DEMO

In this workshop, we discussed the DHS AI WebApp and its project initiative, followed by a live demo of how to download and work with data.


Support

At Kofiya Technologies, we have developed the DHS AI WebApp platform with the primary objective of supporting and empowering Data Science research projects. We understand the significance of providing easy access to DHS survey data in a simplified and user-friendly database format, particularly for graduate students and academics worldwide.

If you have any questions or require assistance with your research projects involving the database, please do not hesitate to reach out to us. Our team is here to provide guidance and support to ensure the smooth progress of your work. We are committed to helping you make the most of the valuable resources available. Feel free to contact us, and we will be delighted to assist you.