Join us in transforming healthcare in Africa through Artificial Intelligence!

Watch the tutorials below on how to use ChatGPT for Data Analysis and use our Cheat Sheets for guidance on prompting.

Contact us for opportunities to partner in using AI tools to improve the accuracy and efficiency of data analysis:

Unlocking the Power of Data to Improve Health

Healthcare systems in low- and middle-income countries (LMICs) struggle to make sense of vast amounts of data due to scarcity of analytical skills. By using new Artificial Intelligence (AI) technology like ChatGPT, we aim to bridge this data interpretation gap, thereby supporting decision-making and strengthening health care systems.

We used ChatGPT-4 to analyse and interpret routine health programme data, comparing AI-generated outputs with traditional analysis and statistical software. Working closely with the South African Department of Health, we developed an AI-supported data-analysis ‘cheat sheet’ (available to download below) to scale-up this skill.

Embracing ChatGPT as a tool to assist with complex data analysis is a big step forward in improving how data is understood and utilised, especially in LMICs. The AI-toolkit can be adapted and used for any data interpretation in any sector. While individuals using the toolkit must have an in-depth understanding of the programme, they do not need to be specialists in data analytics. 

ChatGPT Data Analysis Toolkit

Tutorial Videos on how to setup and use ChatGPT for Data Analysis

Tutorial 1: Setup

Background and Basic Setup of ChatGPT

Watch Tutorial

Tutorial 2: Upload

Preparing and Uploading Data to ChatGPT

Watch Tutorial

Tutorial 3: Cleaning

Cleaning and Downloading Data with ChatGPT

Watch Tutorial

Tutorial 4: Basic Analysis

Basic Data Analysis and Visualization with ChatGPT

Watch Tutorial

Tutorial 5: Insights

Data Insights with ChatGPT

Watch Tutorial

Tutorial 6: Performance Insights

Programme Performance Insights from ChatGPT

Watch Tutorial

Tutorial 7: Strategic Insighs

Strategic Insights from ChatGPT

Watch Tutorial

Tutorial 8: Ethics and AI

Ethics and Limitations of AI

Watch Tutorial

SOP and Prompt Sheets for Data Analysis with ChatGPT

Hardware and software requirements:

  1. A functional computer, either laptop or desktop.
  2. A reliable and uninterrupted internet connection
  3. A web browser compatible with the ChatGPT platform.

Subscription to ChatGPT-4:

  1. Using a credit card, secure a monthly subscription to the latest version – currently ChatGPT-4 for a monthly fee of 20USD.
  2. To subscribe, visit https://chat.openai.com

Small data sets or simple tables:

  1. Directly drag and drop the dataset into ChatGPT chat window.

Large data sets:

  1. For large data sets or multiple files, use the Attachment icon situated within the chat interface.
  2. The system allows for the simultaneous upload of multiple datasets or multiple workbooks within a single dataset.

Supported file formats:

  1. ChatGPT-4 supports a range of data types, including but not limited to CSV, Excel, PDF and dta.

Using the prompt sheets:

Refer to the provided prompt sheets in the toolkit for specific data analysis instructions. This sheet is curated to offer guidance tailored to ChatGPT-4’s capabilities.

Data description and cleaning:

Utilize the prompts specified for data description, sorting and grouping. 

Data analysis:

The prompt sheet offers tailored prompts for both patient-level and cumulative dataset analysis.

Merging datasets:

Refer to the prompt sheet for instructions on merging various data sets.

Data visualization:

Utilize the visualization prompts from the prompt sheet to generate insightful charts and graphs.

Generating reports and insights:

The prompt sheet provides prompts for generating analysis insights, comprehensive reports, summaries, and presentation materials.

NOTES:

  1. Ensure that sensitive data is anonymized or encrypted before uploading for analysis. ChatGPT-4 respects user privacy, but it’s always a good practice to maintain data confidentiality.

  2. Regularly check the OpenAI platform for updates or changes in features and functionalities.

  • Calculating Proportions
  • Sorting Data
  • Presenting Data

The prompts and examples below are based on a DHIS2 Data Export for the TB control programme.
Be sure to use the indicators and descriptions as they appear in your data set.

DATA AGGREGATION (Collecting data into a summary form to allow further analysis)

Before starting, ask ChatGPT to review the data by giving a prompt such as:

►   Describe the dataset.

Calculating proportions

►   Calculate proportions for the following TB indicators per District or Facility:

o   TB screening coverage (proportion of all facility attendants that were screened for TB)

o   Clients eligible for TB test / Symptomatic clients

o   TB test using GeneXpert / Investigation

o   TB bacteriologically confirmed / TB confirmed.

o   Proportion of Clinically diagnosed TB clients / all people diagnosed with TB.

o   TB total diagnosed.

o   Treatment Coverage (Nr of clients who started TB Treatment / TB Treatment Initiation

o   Initial loss to follow up.

►   Display results in a table and show both raw numbers and proportions as %.

Sorting Data

►   Rank the Districts / Facilities by Headcount / Screening rate / Number of TB investigations / TB test rate / DS-TB total diagnosed rate in descending order. Display the results in a table.

►   Create a table of Headcounts, Number screened and Screening rate for each District / Facility. Rank the Districts / Facilities in order of Screening Rate.

Presenting Data

►   Display this in a bar graph showing Number of patients screened for TB, Number symptomatic for each District / Facility.

►   Add a secondary axis for Symptomatic Rate as a line graph.

 

  • Data Description
  • Trends
  • Data Interpretation
  • Programme Performance
  • Insights into Predictive Analytics

The prompts and examples below are based on a DHIS Data Export for DS-TB.

Be sure to use the indicators and descriptions as they appear in your data set.

DATA INSIGHTS

(This is the analysis of patterns and statistics within data. Interpretation of the patterns can give insight into the TB programme and inform decision making.)

Data Description

These questions may be asked after any data set upload, data aggregation, data insight, or advanced data analysis output from ChatGPT.

►   Provide key findings from this data set.

►   Describe the findings.

►   Interpret results and provide a simple explanation.

Finding Trends

►   Describe the trends Between the Four Time Periods shown in the data set.

►   Describe the trends in Headcount / Screening Numbers / Screening Rate / Number confirmed.

►   Describe how the trends vary across the Districts / Facilities.

Data Interpretation (Arriving at conclusions from reviewing data)

Calculate the indicators first, then proceed to prompts below.

►   Calculate Screening Rate and Symptomatic Rate.

►   Explain what a Low Screening Rate / Low Symptomatic Rate / Low TB Test Rate / Low TB Diagnosed Rate / Low Treatment Initiation Rate indicates.

►   Explain what a Screening Rate >100% / Eligible for Investigation / Symptomatic Rate >100% / Investigation Rate >100% / TB Confirmed Rate >100% / Treatment Initiation Rate >100% indicates.

►   Explain the relationship between TB Test Rate and TB Diagnosed Rate.

Programme performance

►   Indicate which District / Facility has the highest number of TB Diagnosed in relation to Headcount for the year.

►   Indicate which District / Facility has the lowest number of TB Diagnosed in relation to Headcount for the year.

►   Indicate which 5 Districts / Facilities have the Highest Positivity Yield of TB Investigations.

►   Indicate the top 5 performing Districts / Facilities based on this data set. Good performance is defined as Screening Rate >95%, Investigation Rate >95%, Treatment Initiation Rate >95% and Excludes Screening Rate >100% from good performance.

►   Indicate which Districts / Facilities are showing an improving trend in TB Programme Performance for the year.

►   Based on this data set, indicate how the TB programme performance has progressed over the year.

►   Based on this data set, indicate what are the strengths and weaknesses in the TB Programme.

Insights into Predictive Analytics

For the best results, use the Data Insights prompts above first.

►   Based on this data set, what could be trends in DS-TB confirmed numbers / DS-TB Treatment Initiation numbers for the Next Quarter / Next Year.

The prompts and examples below are based on a DHIS Data Export for DS-TB.

Be sure to use the indicators and descriptions as they appear in your data set.

PROGRAMME STRATEGY

Once data has been analysed and interpreted, ChatGPT can also assist in addressing gaps and strategizing to improve TB Programme performance.

Programme Strategy Prompts

►   Based on the findings, indicate which interventions may improve TB programme performance in the Districts.

►   Indicate how the Screening Rate / TB Test Rate / Treatment Start Rate / TB Programme Performance can be improved in District X / Districts / District X and Y.

►   List ways in which the Screening Rate >100% can be addressed.

►   Indicate what the barriers to TB Programme performance are in the Districts / Province / District X.

►   List ways in which the barriers to the TB Programme can be addressed in District / Province / District X.

►   Indicate how the population demographics in District / Province impacts access to healthcare.

►   Indicate ways TB Case Finding can be optimized in District / Province / District X.

  • Calculating Proportions
  • Sorting Data
  • Presenting Data

The prompts and examples below are based on an EDR Web Data Export.

Be sure to use the indicators and descriptions as they appear in your data set.

PLEASE NOTE: When using Patient level data, it is critical that Patient Identifying Information (e.g. Names) are removed and the data is anonymized.

DATA AGGREGATION

Collecting data into a summary form to allow further analysis.

Before starting, request ChatGPT to review the data by giving a prompt such as:

►   Describe the dataset.

Calculations

►   Calculate proportions as a % for the following DR-TB indicators per District / Facility:

o   HIV positive status

o   Gender

o   Age < 5 years

o   Age > 65 years

o   Patient category: New

o   Patient category: previous first line treatment

o   Patient category: previous second line treatment

o   Pulmonary TB

o   Extrapulmonary TB

o   BMI under 18.5

o   RR-TB

o   MDR-TB

o   XDR-TB

o   Short Regimen

o   Long / Individualised Regimen

►   Display the results in a table. Include raw numbers and proportion as %.

o   Example: Calculate the proportion of HIV positive and Extrapulmonary TB for each DR-TB District. Display the results in a table and include raw numbers and proportion as %.

►   Indicate HIV status distribution by Age Group in each District. Display the results in a table including proportions as % and raw numbers.

o   Example: Create a variable called “Treatment Success” by adding Cured and Treatment Completed outcomes.

►   Calculate Treatment Success Rate for each DR-TB District. Rank the DR-TB Districts by Treatment Success Rate. Display the results in a table including rates as % and raw numbers.

o   Example: Display the numbers and proportions of each Final Treatment Outcome. Create a variable called “Not Died” by adding the following final Treatment Outcomes: Treatment Completed, Cured, Loss to Follow Up, Treatment Failure, Transferred Out and Moved Out outcomes.

►   Display Treatment Success Rate by HIV Status / Age Group / Gender / BMI <18.5 and BMI ≥18.5 / Regimen Type short or individualised / Patient Category: New, Relapse, Treatment after Lost To Follow Up. Display this in a table. Display this in a graph.

o   Example: Display Treatment Success Rate by Short and Individualized Regimen. Display this in a table and a graph.

o   Example: Display Treatment Success Rate by HIV status. Display this in a table.

o   Example: Display Treatment Success Rate by BMI <18.5 and BMI ≥18.5. Display this in a table.

o   Example: Display Lost To Follow Up Rate by the patient categories New, Previous 1st line Treatment, Previous 2nd line treatment and Previous Lost To Follow Up. Display this in a table.

►   Identify the HIV positive people whose ART start date was after DR-TB treatment start date. Calculate the average time to ART start date.

►   Calculate average Time To Death in days.

►   Calculate average Time To Death by HIV Status.

►   Calculate average time to death for BMI under 18.5 and BMI ≥18.5.

►   Calculate Time To Death by Age under or over 65 years.

►   Calculate the average Days On Treatment to Lost To Follow Up Outcome.

Sorting Data

►   Rank the DR-TB Districts by cohort numbers. Display this in a table.

►   Calculate the proportion in % on the Short Regimen for each DR-TB District. Rank the districts by proportion on the Short Regimen and display this in a table.

Presenting Data

►   Create a bar graph of numbers on the Short Regimen, with a secondary axis for proportion on Short Regimen as a line graph.

  • Data Description
  • Data Trends
  • Data Interpretation
  • Programme Performance
  • Insights into Predictive Analytics
  • Advanced Data Analysis
  • Key findings

The prompts and examples below are based on an EDR Web Data Export.

Be sure to use the indicators and descriptions as they appear in your data set.

PLEASE NOTE: When using Patient level data, it is critical that Patient Identifying Information (e.g. Names) are removed and the data is anonymized.

DATA INSIGHTS

This is the analysis of patterns and statistics within data. Interpretation of the patterns can give insight into the TB programme and inform decision making.

Data Description

►   Indicate the key findings from this data set.

►   Interpret results and provide a simple explanation.

►   These questions may be asked after any data set upload, data aggregation, data insight, or advanced data analysis output from ChatGPT.

  • Example: Display Treatment Success Rate by HIV Status in a table. List the key findings.

Data Trends

►   Describe the trends shown in the data set.

  • Example: Indicate if there is a trend in Cohort Numbers during Time Period / Year.
  • Example: Indicate if there is a seasonal pattern to Cohort Numbers or Treatment Outcomes.

►   Indicate the trends in [insert variable / indicator].

  • Example: Indicate the trends in HIV Positive Rate / Short Regimen Rates / Treatment Success Rates / LTFU Rates / Died Rates

►   Indicate how [insert variable / indicator] vary by [insert variable / indicator].

  • Example: Indicate how HIV Status / Age Group / Gender / BMI / Regimen Type: Short and Log (Individualized) / Final Treatment Outcomes vary by DR-TB District.
  • Example: Indicate how Time in Days to LTFU Outcome vary by Age Group.
  • Example: Indicate how final Treatment Outcome vary by TB Type (Pulmonary and Extrapulmonary).

►   List the DR-TB Districts / Facilities that are showing an improving trend in Treatment Success Rate.

Data Interpretation (Arriving at conclusions from reviewing data)

►   Indicate if there is an association between [insert variable / indicator] and [insert variable/ indicator].

  • Example: Indicate if BDQ-Containing Regimens are associated with better Treatment Success Rates.

►   Provide HIV distribution by gender, include proportions and total row. Display results in a table for each variable. Indicate if there is an association between HIV Status and Gender.

►   Describe the relationship between [insert variable / indicator] and [insert variable / indicator]:

o   HIV Status

o   Age Group

o   Gender

o   BMI

o   Regimen Choice: Short or Individualised

  • Example: Describe the relationship between culture conversion and final Treatment Outcomes?
  • Example: Describe the relationship between BDQ-containing regimens and final Treatment Outcomes?
  • Example: Describe the relationship between a Signed Consent Form and Data Completeness.
  • Example: Describe the relationship between HIV Status and BMI < 18.5.
  • Example: Display Treatment Success Rate by HIV Status in a table. Describe the relationship between Treatment Success Rate and HIV Status.

►   Indicate if [insert variable / indicator] impacts [insert variable / indicator]

  • Example: Review the Final Treatment Outcomes and the Regimen Type: Short or Individualized. Indicate if the Regimen Type impacts the Lost to Follow Up Rates.
  • Example: Review the Final Treatment Outcomes and Gender. Indicate if Gender  impacts the Lost to Follow Up Rates.
  • Example: Review Gender and BMI. Indicate if Gender  impacts BMI.

►   Indicate if [insert variable / indicator] is influenced by [insert variable/ indicator].

  • Example: Calculate average time to death. Indicate if Time to Death is influenced by Regimen Type: Short or Individualized

►   Indicate if the difference between [insert variable/ indicator] and [insert variable/ indicator] is statistically significant.

  • Example: Indicate if the difference between Male and Female Gender in the Cohort is statistically significant.
  • Example: Indicate if the difference in Time to Death by HIV Status is statistically significant.

►   Indicate if there is a statistically significant difference in [Insert Final Treatment Outcome] rates by HIV Status / Age Group / Gender / BMI <18.5 or ≥18.5 / Regimen type (short or individualized) / Patient Category.

Programme performance

►   List Districts / Facilities that have the best data completeness.

►   Based on this data indicate how the DR-TB Programme Performance progressed between [insert dates or time periods].

►   List the strengths and weaknesses in the DR-TB Programme based on this data set.

►   List the gaps in TB Programme Performance based on this data.

►   Describe how Treatment Outcomes compare with global benchmarks.

Insights into Predictive Analytics

For the best results, use the Data Insights prompts above first followed by these prompts:

►   Indicate the predictors of a Treatment Success outcome.

►   Indicate the predictors of a LTFU Outcome.

►   Indicate the predictors of a Died Outcome.

Advanced Data Analysis

►   Calculate odds ratio for Gender and Final Treatment Outcomes. Show results in a table.

►   Calculate and add odds ratios and p-values for each variable and display all results in a table, showing individual contingency tables.

►   Create a multivariate logistic regression model using Death as an independent variable. Use the following as dependent variables; Previous Drug History and HIV Status. Create odds ratios and p-values for each variable and display all results in a table, show individual contingency tables.

►   Indicate the predictors of Death. Show all results in a table. Add RR and 95% CI.

►   Provide a detailed STATA profile for all the work done.

Key Findings

For the best results on Key Findings, use the Data Insights prompts above first followed by these prompts:

►   What are the key findings from this data set?

►   Provide a one paragraph summary.

The prompts and examples below are based on an EDR Web Data Export.

Be sure to use the indicators and descriptions as they appear in your data set.

PLEASE NOTE: When using Patient level data, it is critical that Patient Identifying Information (e.g. Names) are removed and the data is anonymized.

PROGRAMME STRATEGY

Once data has been analysed and interpreted, ChatGPT can also assist in addressing gaps and strategizing to improve TB Programme performance.

Programme Strategy Prompts

►   Based on the findings in this data set (or data aggregation / data insight / advanced data analysis), indicate which interventions may improve the DR-TB programme performance.

►   Indicate how the Loss To Follow Up Rate / Died Rate / Extrapulmonary TB Rates / Malnourished Rates / TB Programme Performance / Adverse Event Management can be improved in District / Province.

►   List ways in which Missing Data Capture can be addressed.

►   Indicate what the barriers to DR-TB programme performance are in District / Province.

►   List ways in which the barriers to the DR-TB programme can be addressed in District / Province.

►   List the population demographics in District / Province and how to they impact access to healthcare.

►   Indicate ways DR-TB case finding can be optimized in District / Province.

►   Provide input on whether or not patient education programmes can improve DR-TB outcomes.

Contact us for opportunities to partner in using AI tools to improve the accuracy and efficiency of data analysis: