The rapid expansion of mobile phone usage in low-income and middle-income countries has created unprecedented opportunities for applying AI to improve individual and population health.
In benshi.ai, a non-profit funded by the Bill and Melinda Gates Foundation, the goal is to transform health outcomes in resource-poor countries through advanced AI applications. We aim to do so by providing personalized predictions and recommendations to support diagnosis to medical care teams and frontline workers, as well as to nudge patients through personalized incentives towards an improvement in disease treatment management and general wellness.
To this end, we have built an operational machine learning platform that provides personalized content and interventions real-time. Multiple engineering and machine learning decisions have been made to overcome different challenges and to build an experimentation engine and a centralized data and model management system for global health. Databricks served as a cornerstone upon which all our data/ML services were built. In particular, MLflow and dbx (an opensource tool from Databricks) have been crucial for the training, tracking and management of our end-to-end model pipelines. From the data science perspective, our challenges involved causal inference analysis, behavioral time series forecasting, micro-randomized trials, and contextual bandits-based experimentation at the individual level.
This talk will focus on how we overcome the technical challenges to build a state-of-the-art machine learning platform that serves to improve global health outcomes.
Africa Perianez: Hello everyone. Thank you, organizers. It is really a pleasure to be here today. My name is Africa Perianez and I’m the CEO and founder of benshi.ai. Benshi.ai is a nonprofit funded during the pandemic, together with the Bill and Melinda Gates Foundation, to bring the latest personalization techniques to the multi serve communities. We are building a machine learning platform, to provide with individual recommendations, with individual predictions and incentives to healthcare care teams and patients. So our business is accelerating and democratizing behavioral machine learning for low and middle-income countries to reduce health inequalities.
More specifically, we focus on providing real-time and just in time, personalized incentives and recommendations to frontline health workers and patients. Use these data to move from individual to collective to shape strategies for collective behaviors and helping this way healthcare leaders.
We have many challenges, but here we prioritize the ones that we focus the most, we can say. We want to operationalize those larger scales, digital traces for mobile health devices, and be able to turn those behavioral logs into robust, personalized results, and move towards casual analysis, beyond correlations.
How we are doing this. Here, in this slide, I show on a schema about how we connect with our partners. But first, we received those data, those logs of the different app, either in patient pattern data or healthcare logs that they are using during the app. We combine those logs with different contextual information. Then with this information, we applied different machine learning models and it’s permutations to be able to find what is going to happen next or how we can incentivize, how we can, not to behavior and the react and the impact we are having is feedback into that system.
Who is behind benshi.ai? We are a team of scientists of engineers, that facilitate from the [inaudible] on started use her healthcare specialist, that we are working in reckless, who would be forward. We are a diverse team. This is crucial for every data science team but particularly for this core years, is a certain field.
We need to be able that our models in our lives, hopefully. That is why it’s important that we have engineers and data scientists coming from many different cultures, many different backgrounds to be able to at least reflect, in the models as we put in production. So myself, and most of the leadership team, are coming from the same background, who are coming from the video game industry and why I’m mentioning this at all, why is related to video games with healthcare?
We can say that probably video games is the industry that knows better how to motivate, how to notch behavior and most of the successful apps, they have game design principles into their app design. So I’m talking, moving beyond gamification, I’m talking about game design principles like intrinsic motivation, hypertension or [inaudible] of complexity or difficulties playing games and to progress into the app with new content, with new features. So these kinds of techniques and this kind of knowledge on how to motivate is something that has been applied in many places. At the end, this is what we want to do. We want to be able to definitely engage with the right level of engagement, to the different users, to the characters, to the passion, with the goal of improved different outcomes.
But this is what we want to find, where the key elements that they are going to motivate and improve their health care outcomes that every app they have different ones and be able to succeed and make the app really strong.
Here, I’m showing one schema about how we interact, how we communicate with our partners and indeed the client I really want to highlight, that it’s her third component of communication that is the front end dashboard, where our product have full transparency, full knowledge of which [inaudible] production, which side formation or functions are applying into the data? Which model is in production and when we have changed the version? What is the accuracy? What is the different features that are feeding those models? All this information is available into the dashboard but I would show later.
The second way of communication is through API or SDK directly with the end user. Here I want to talk about one of our partners, that is Maternity Foundation. So Maternity Foundation has developed a Safe Delivery App, which is an online learning app, to improve a skilled birth attendant, training or support. By formally protect to midwives. I want to talk a bit about, midwives indeed. So midwives stay flat. So a well-trained midwife, is able to almost reduce the mortality of mothers on the neonatal mortality up to two-thirds, if the midwife is well trained. So the impact that organization, such as the Maternity Foundation are having in saving life is crucial because they focus on improving the learning skills of new [inaudible] around the world, many in low-income settings. So, this is how we are working with them.
We are working with them the worst opportunity session and we create an adaptive learning journey with the right content, with the right term for commendation, but the right time to help them into their work, to help them to continue learning with the right information, for everyone at the right pace.
Here, I’m showing you some data from users in Ethiopia. So here we can see on the right, that we have some predictions on learning progress, on the usage of this app. So we can see that there is a change of behavior for the users, that they are able to progress beyond level five. We observed that even if most of the users are not able to progress beyond that level, if they do, mainly are midwife or physicians.
And also here we are showing predicted progression level versus predicted connection time. So we observe, and this makes a lot of sense, but this is why it’s important to continue engaging them, is that predicted progressively Level increase with predicted connection time. And most of the users that are able to progress above level five, will spend at least three hours using the app. So, the fact that they engage in the app, that they enjoy using the app, that’s made them learn more, and at end, save lives.
So this is a definite stumble to the partners with whom we are working, the kind of hub that we’re integrating. I come back here to the System Schema, but the infrastructure here in this applied, I just want to show, a very high level view of the infrastructure.
So our back hand, it’s called Kubernetes cluster. We are using Databricks for our data pipeline and model training on MLflow or model management system. I’m here, so in some of the products that we have incorporated, that are in production and here, we serve the full cycle of the data ingestion until the action and the implementation of the different instances. So first of all, we have the data ingestion models, where our partners can come and in a self-serving way, they can upload the schema, define the schema, define the different projects they have. And they have different KPIs to see how the nursing is going. From the core data structure point, this is the word, all the data from all the different partners talk the same aggregation and well not the same aggregation because it depends from partners to partner. They are uniform.
So from there, I want to highlight that we have different data pipeline processing for the different metrics, KPIs and features. In the point 4, I want to highlight these three pillars that are very important. This is the different ways that we treat the data at the end. So, one is analytics. We focus on parking information whereas this information is used to, mainly for validation and a metric money coding purposes, but those to have to provide to their work partners, actionable insights of information that happen in the past, that can be useful for them, but we realized that it’s already useful information that, not so many partners have already analyzed deeply. The most important ones are predictions and recommendations. This is the machine learning models that, we have in production. The first one,[inaudible 00:11:21] who is going to suffer what in the future? Who is going to suffer disease? Who is going to disengage off the app and when this is going to happen?
But okay, we can’t predict this and what we do with this information. And then we have the machine learning models with the recommendation system that the data later tell us, what is the model like? The most likely incentive that is going to work, with every individual, with every person, are the rider states that’ll want to notch them. Once we have the what, we apply this directly? Of course not, because our mission is to democratize experimentation at the end. We need to assure the impact that our hypothesis, that our predictions are having. The only way of doing it is through experimentation. Then this is why we have built the [inaudible] engine. Now we have all the material, all the information, all the randomization, so different waves, ready to be used and where this becomes reality it’s in the action registration, what we call, notch in the space. This is where we’re supposed to API until we send the information to every user at the right moment.
So here in this slide, I also want to show how we are keeping track, keeping full control of the machine learning models that we put into production. We really need to have a very good model governance. We are doing this, taking control of two main pillars.
The first one is the code or related with the type of hardware is, and we are using the type of parameters, any change on the filters that we use, or any change in any parameter is going to keep track here. It is going to be a different version or a different model. Depending on the degree of the changes. Then we have the control in terms of the data. Which data is being introduced to train those models, if we are taking more timeframes, or if there is any [inaudible] that has been applied, all this information is going to be separately to the type of hardware, of any change into the model parameters or features. It’s a very different type of data that we are putting in. We have full control of everything we’ll change, that can come make a different impact [inaudible] hopefully in the platform.
I also wanted to talk about data pipeline development. So we, also have developed dbx 2. That is of course, based on dbx that was developed by Databricks, and what we have done here is that, well, we have started using dbx indeed before it was open sourced by Databreaks. It was last November. What we have done, is we have added ordinary house communications to fit different development needs. So for instance, we have added data connectors to different data sources, or we have paints and comprehension file from JSON to Java. We also use it because it is very important for us to work. I mean, work in local, with good coding practices and also the power of using Databricks for manipulating them and easy access of data into the cloud.
Here I am, I go a bit on a trail on what I taught before and I added one more block and I didn’t mention in the schema, I was mentioning before about the machine learning models.
So we are using as I said, prediction behavioral models, to predict who is going to, decipher what when? To predict actions, who is going to connect when? What is the kind of content they are going to get? When they’re going to disengage in terms of the time of the app? In terms of app time of progression into the app, or for instance, predict the location.
At the end, what we’re defining here is a sample, the sample of my experiment. The second block is the recommendation, so which action makes more sense for everyone at a different stage of the life youth in the app or the different events, they’re going to software recommend, based on the data. So all this information is provided by the recommendation system. Then we have a third block that you mentioned before, that it was contextual hard factor forecasting. Contextual information is crucial for our recommendations.
So it is not the same. If we are giving a recommendation for instance, for midwife to dry or rainy season or if there is a particular epidemiology outbreak or a different event that is happening into that area. So this information is to contest what information needs to be updated. So this is why in many cases this information, is it not updated. So then we need to work with projections or with a forecast to be able to have the information available, to support the recommendations based on the behavioral loss.
Then it’s also important that in some cases, because mainly ethical reasons, we cannot perform experiment. So in those cases, we will also provide with our set of tools to perform observational studies. To try to infer as much as we can. What is the impact of different interventions into the metric of interest? Also of course, we incorporated, and this is crucial for this kind of [inaudible] platform. Our pool analysis of the hold or the impact of the experiments or actions we are taking or the validation of the machine learning models in production.
Now I want to briefly mention a bit deeper into the experimentation engine we are building and this is why I added this sentence of the way. I’m sure everyone knows the importance of mainly nowadays experimentation of clinical trials, but yes, just to mention until remark this importance, if a scientific woman be asked, what is the truth?
She will reply, that is, which is accepted upon adequate evidence. If this asks about the description of adequacy of evidence, it will certainly refer to matters of observation and experiment. With both these considerations we means, throw away the key of understanding knowledge and its objects. If we really want to know the impact or how it’s affecting any action we take, we need to perform an experiment. This is why it is crucial to implement these kind of technologies, bring the latest state of the art, in terms of experimentation to the models on the terms of setting. So we are implementing in our platform, AB testing of course, moving into micro randomized trials, bandits and contextual bandits.
More specifically MRT’s, micro-randomized trials, involve multiple randomizations and enable causal modeling of proximal effects of the randomized intervention component.
So at the end, what we want to evaluate is when and for whom these interventions are effective and what factors are maximizing or reducing the impact of those interventions? So in terms of contextual bandits settings, is a system that is learning optimal decision and it’s able to adapt on the fly, depending on the user, depending on the passion, depending on the healthcare worker. At the end when this [inaudible] is connecting a more classical experimentation who really continuously adapting systems provide personalized and contextualized, very important, online contextualized interventions.
And here, as I mentioned before, I just wanted to show you, what is the aspect and how it looks there, the machine learning platform we are building. So, here we are constantly implementing new features, new services and new codes. Here is the London page where we can see different KPIs and the titles of different options. Here is that ingestion product, different schemas, how the ingestion is going. So some of the management, the model that are in productions, the different versions running different features incorporated on the validation for every type of model, every class of model, we have different validation so there is showing like [inaudible] there or see index or scatterplots. Here we have a scenario of the different experiments we are testing.
The impact we can see monitoring that is happening, so here we can see the average behavior if it’s having an impact of different metrics. On here, we can observe different individual users, how they behave compared to the average. At the end, if the experiment was significant or not. If you want to create that experiment, it is easy to do, et cetera. Just very quickly, lets show you how they look.
I want just to finalize, our goal is to boost health outcomes in resource-poor countries through personalized incentives to nudge behavior. To do that, we are building a machine learning platform, focusing on individual behavior predictions, recommendations and reinforcement learning base experimentation.
So, on last system, I want to say, at the end our mission is to democratize behavioral machine learning. Right now we are spoken APIs, but we want to move forward to develop SDKs to fully integrate with the different apps and to make it easy, that very app in low-income settings is using it.
Thank you very much for your attention. Well, just to mention that we are hiring, so we are hiring engineers and data scientists, so if you’re interested in join us or to collaborate with us please let us know. Thank you very much.
Africa is the founder and CEO of benshi.ai, a non-profit funded by the Bill & Melinda Gates Foundation, that focuses on reducing health inequalities through behavioral machine learning in low-income ...