The Importance of Model Fairness and Interpretability in AI Systems

Machine learning model fairness and interpretability are critical for data scientists, researchers and developers to explain their models and understand the value and accuracy of their findings. Interpretability is also important to debug machine learning models and make informed decisions about how to improve them. In this session, Francesca will go over a few methods and tools that enable you to “unpack” machine learning models, gain insights into how and why they produce specific results, assess your AI systems fairness and mitigate any observed fairness issues.

Using open source fairness and interpretability packages, attendees will learn how to:

  • Explain model prediction by generating feature importance values for the entire model and/or individual datapoints.
  • Achieve model interpretability on real-world datasets at scale, during training and inference.
  • Use an interactive visualization dashboard to discover patterns in data and explanations at training time.
  • Leverage additional interactive visualizations to assess which groups of users might be negatively impacted by a model and compare multiple models in terms of their fairness and performance.

Watch more Spark + AI sessions here
Try Databricks for free

Video Transcript

– Hi, everyone. Thank you so much for joining my session. The importance of model fairness and the interpretability in AI system.

I’m a Francesca Lazzeri. I’m a senior cloud advocate at Microsoft, where I lead a team of data scientists, AI developers, to build end-to-end solution on Azure. Specifically during this session, we’re going to talk about how you as a data scientist, or a developer can build the end-to-end responsible machine learning solutions. We have been writing numerous article around these important topics. The latest article that you can find is called the machine fairness. I put the link there. So you can check it out. So there are all sorts of many resources, that’s what we’re going to talk about during this session. Some of this resources are in open sources, so you can see that we put several different and github reports such as the interpreterML-github, but also the fairlearnAI github report. So you can check it out, all of the useful links and resources after this session and dig in. If you wanted to talk more, feel free to ask me questions. So the agenda for today is going to be divided in three main parts. And we have the first part that we’re going to talk about what’s responsibility AI?


What we mean with responsible artificial intelligence. And then we’re going to look at the two main packages. One is called InterpretML package, and the second one is called the Fairlearn toolkit.

So we are in a moment of history where we are all leveraging data, in order to making the significant decisions that really are going to affect individual lives, in different type of the means such as healthcare, justice, finance, education, marketing, but also in HR employment. So it is very important for us and specifically for our customers, to ensure the safe, ethical and responsible use of the artificial intelligence. We know that AI has the potential to drive bigger changes in the way we do business. And on the other side, like all great technology and innovation in the past, is also important to that to keep in mind that these type of technological innovation is going to have a very broad impact on society as well.

For all these reason, we think that that is important when you build AI solutions and a machine learning algorithms, to ask yourself the following question, so how can I as a data scientist and developers design, build and use AI systems that create positive impact on individuals and society? Another important question, for example is how do we best ensure that AI is safe and also reliable? How can we attain the benefit of a high, but also respecting privacy? So these are all open questions that developers and the data scientists have. And at Microsoft, we have building different tools that you can use and leverage in your existing solution or to build a new solution, in order to assess the data fairness level of your machine learning application.

Why Responsible Al?

Moreover, we also are seeing that many customers are using the same challenge, there are many different reports, so here I’m putting the recent (indistinct) report, that showed that nine out of 10 organization, we know that they are facing ethical issues in the implementation of AI systems. So these organization, cited actually different results, including the pressure to quickly implement AI and the failure to consider ethics when implementing the AI system. And also the lack of resources dedicated to ethical AI systems.

Microsoft’s Al Principles

At Microsoft, we build the user system that is our top of foundation to guide our thinking as a data scientist. And we define six ethical principles that AI system should follow. So, as you can see from this slide, the first four are fairness, reliability and safety, privacy and security, and finally we have inclusiveness. So these are really key properties that each AI system should achieve. The second two are transparency and accountability. These are somehow underneath all of the other principles, and guide how we design, implement and operationalize the AI systems.

So let’s see what we mean by transparency and how we can implement it. So let’s start with transparency.

What do we mean actually with transparency? So we mean two main things. The first of all is that AI system should be explainable. And also that the AI system should have the algorithms that are accountable, meaning that you can actually understand why they are producing specific results. There are a few cases of machine learning interpretability, we can define them following two different categories.

Machine Learning Interpretability use cases

So you see there is the first one that is a model designers evaluation. So this is more of the training time. And the second one end user or providers of the solution to end the user. So these are at the influencing time, this is more of the time where they are consumed your AI applications. So there are many different use cases that are important to keep in mind for both categories. So like a data scientist needed to explain the output of a model to stakeholders. Usually these are business users and also clients in order to build the trust. Another very important and popular I would say use cases, when data scientists need the tools to verify if a model behavior matches the predict clear objectives. Finally, also data scientists needed also to ensure the fairness of their trained models. Fair application and other use cases that are more like from the insurance thing the time category are when, again, your AI predictions. So the results that you get from your AI application need to be explained at the influencing time. Some of the most popular cases are in the healthcare and finance industry. So why a model classified Fabio like a customer ID, a three step four colon cancer, another important question from the finance industry that we received very often is why a specific client, in this case we call her Rosie, was denied a mortgage loan, or why he’s investment portfolio carries a higher risk. So these are all questions that somehow, you have to know how to answer. And specifically you have to know why you got the specific results.

So that’s why at Microsoft we develop the interpretability toolkit.

So here is a toolkit that really helps a data scientist to interpret and explain their model. We put together this toolkit in order to explain machine learning models globally, meaning on older data or locally, on a specific data point, using the really state of art technology, very easy to use.

Second we wanted to incorporate the cutting edge interpretabilty solution, that are developed by Microsoft, but also leverage all of the open source community. So, solution solves with these aspects is very important, and finally, we were able to create a common API and also data structure across the integrated libraries and integrate these will be the Azure services.

InterpretML is end user toolkit that you can find at really gives you access into the state of art interpretability techniques through an open and unified UDI, and also provides you a lot of visualization that you can use in order to understand matter, why your model is predicting a specific result. So with this toolkit, you can understand the model. So using the wide range of the explainers and techniques, using a really type of interactive type of a visual. You can also choose algorithms and experiment with different combination of algorithms. You can also explore as a data scientist, a different model attributes, such as for example, if you are more interested in the performers or the global and local features, and you can compare again the different models, multiple models at the same time. So also this are very nice. In order to find more information, and you can again, look at this github repo, and also remember that you can run what if analysis as you manipulate the data and view the impact also of your model. So why did we start this project?

InterpretML Repo you will see that the interpret community was able to extend interpret and open source python package from Microsoft research, that was used to train interpretable models and helping those to explain a black box system. And in just a few minutes we are going to see also what we mean with a black box system. So the interpret community was able to extend these interpret capability, with additional interpretability techniques and also utility function, to handle also, I would say, the real world data sets and the workflow. So with this package you can train interpretable glass box model and explain black box system. And also you can use this package to understand your model global behavior, or to understand the reason behind each individual prediction.

As you can see, Azure machine learning, (indistinct) so we call it to AzureML interpret, and speaking of interpret because it helps you save explanation, run these three, remote and parallel computing, or explanation on AzureML computer. So this is an additional capabilities that AzureML can offer for you, and also user able to create the scoring explainer for you. And most importantly, if you want to push your modeling to production.. Old rationale is that this explains for you.

In the Github Repo we will also see that they use what we call the interpret tax bills on interpret. We have added extension to support that tax model.

There are two different type of models that are supported, as you can see, there is what we call the glass box explanations. These are for example, explainable, mostly leaner models, decision tree rule systems. And we have also a black box explanation like LIME, SHAP, partial dependents, sensitivity analysis.

Interpretability Approaches

The black box models are challenging, in order to understand, for example, deeper narrow networks. So black box, explain our scan, analyze the relationship between input features and output predictions to interpret models. So as I mentioned, in my previous slide, some of the examples can include the LIME and SHAP as well.

So talking about the SHAP, let’s take a closer look at it. So SHAP is game theory approach to explain that the output of any machine learning models. So it connects the optimal credit allocation with the local explanations, using what we call the classic Shapley value from the game theory, and that also their related extension. So let’s see together how we can actually apply SHAP to a real sample machine learning use case.

So let’s consider a black box that predicts price of a condo or an apartment, based on all these features. As far as I can see, there is a proximity to a green area, such as a park. And also the fact that the building itself is a pet friendly or not, in this case, the feature is negative. So with this in mind, with these features, our model predicts that the average cost of the apartment, the average price of the apartment is 300k.

How much has each of these features contributed to the prediction compared to the average prediction? As you can see we have a different information such as the house price prediction, that is about 300,000 Euro. We have an average house price prediction for old apartments and this is about 310 euros. So the delta here is a negative, minus 10,000k.

So this are the appreciable values. So as you can see, we have different features starting with the parks, how the parks contributed to these results. So we have plus of 10k, then we have the fact that the cats ban also contributed in a negative way, 50k. So the fact that the building staff is not that friendly. Also the size of the apartment is a very important feature in this case, and we see that it contributed that actually to 10k. And then we also see that there is a final feature that is the fact that the apartment is at the second floor, they had a zero net contribution. So these final attributes, these a final feature actually was not really impacting our model result.

So how did actually Shapley calculate all these values?

So, we will take features of interest, for example cat’s ban, and we will move it from the features set. Second, we take the remaining features, and we generate all possible coalition. And finally we add and remove your feature of interest to each of the coalition, and we calculate the difference that you make. So this is really how SHAP works. So this is a really the logic of that is behind SHAP.

Of course, there are some pros and cons that it’s important to be in mind when you decided to use SHAP, for example, SHAP is great because it is based on a solid theory, and also distributes a day effect in a very fair way. However, on the other side, you produce also can translate the explanation, what they call like explanation that sometimes instead of comparing the prediction to integrate a prediction of the entire data center, you could compare it to a subset or even to a single data point. But in terms of, without the cons for SHAP, computation time, it’s possible that for example, you can use like 2K more or less, possible coalition of the feature values for key features. Sometimes it’s difficult to understand it, so it can be misinterpreted. And finally the inclusion of unrealistic data instance, when the features are related is also very possible. So it’s a risk that you should keep in mind if when you decided to use SHAP.

So, as I said, there are also a different model so that you can use, there are different tools. So interpretability approaches based on how you want to use this different models. So in terms of the glass box models, these are models that are interpretable do do their structure, for example, are explainable boosting machines, linear models and also decision trees. Glass box models could use a lot less explanations and are editable by domain experts, which is something very, very nice to have when you want again to use and leverage this glass box type of models.

Linear Models

So in terms of GLM, so this is a generalized leaner model. As you can see, this is a flexible generalization of linear regression, that allows of course (indistinct) to have their own distribution models other than a normal distribution. So, as you can see, the main characteristic of a generalized linear models is that are the current span-data for interpretable models, and also get to learn an additive relationship between data and response.

Explainable Boosting Machine

Another sample is the explainable boosting machine. So here you can go with EBM, this is sort of interpretable model that has been developed by Microsoft research. It is a very interesting model because it uses the modern machine learning techniques like banking, good gradient boosting, and also to make interaction with actual to improve the traditional generalize additive model.

This is why actually the explainable boosting the machine are very accurate, and they are considered like a very good techniques, like for example, the random forest, and also a gradient boosting trees.

So in this the second part of the presentation, we are going actually to focus on fairness.

Microsoft’s Al Principles

We are going to see what are the different in fairness it brings forth, which aims to tackle the question on how we can ensure that AI system treats everyone, in a fair ways.

Fairness that has a main goal to provide more positive outcomes, and avoiding the harmful outcome of AI systems, for different groups of people.

Types of harm

There are different types of harms, as you can see from these slides. Broadly speaking, I would say that, we develop this different types of harms based on the taxonomy data Microsoft research, that I created, and there are five different types of harms that you can see in a machine learning system. And while I have the definition of all of them in the slides for the scope of these projects, we’d actually just focus on the first two of them. Data allocation as you can see this is harms that can occur when the system extends or I would say, with all the opportunities, resources for information to specific groups of people. And then we have the other one, which is quality of service. This is whether a system works as well for one person as it does it for another person. So the example of the physical nature from many different applications is probably one of the most important example of the quality of the service. For this fairness part Microsoft developed a new toolkit that is called the Fairlearn. This is a new approach to measuring and mitigating unfairness in systems that make predictions, serve users, or make decisions about allocating resources, opportunities, or information.

There are many ways that AI systems can behave unfairly, for example, AI can impact the quality of service which we use, again whether system works as well for one person as it does for another. And also AI can also impact allocation, which is, again, the harm that’s occurred when AI system extend or withhold opportunities, resources or information to specific groups of people.

So, as you can see in the toolkit that again, you can find more at In these tool kits there are different type of focuses and different type of, I would say capabilities. So, the main goal of this toolkit is to empower developers, solve the artificial intelligence systems, to assess their system fairness, and also mitigate any observed fairness issues. Most importantly it helps user identify and mitigate unfairness in the machine learning models, with a focus on group fairness. So now let’s jump actually on a demo. I want to show you how you can use the interpretability toolkit.

So in this demo, we’re going to see how you can use the interpretability toolkit, for tabular data in Azure Databricks. As you can see, we’re going to see what’s the toolkit that you can use and download, for the explanation results from explanation experiment, and also visualize the feature in both times.

Implementing advanced analytics solution in every organization and for each of our customers, is I would say for a different step of process. So you first needed to ingest the data from different variety of data sources, including batch and the streaming data. And as you can see, there are different options here such as this architectural shows Azure, and then the most important part is that, of course you needed to take in and store separate data that’s being ingested, regardless of the data volumes, variety and the velocity. Here you can do it of course, with different type of products. When you get into the prep and training stage, you can use again as Azure Databricks just to train and deploy your model. So as you can see in databricks we have an option that is called Runtime ML, that includes a variety of popular in the libraries. The libraries are updated, we each are using to include the new features. So, data databricks (indistinct) a subset of the supported library, as part of the top tier libraries. For these libraries, Databricks provides a faster update to cabinets. Updating to the latest package releases with each run time release. So this is very good for data scientists as well.

Demo dataset

In terms of dataset, so we are going to use breast cancer Wisconsin dataset, which is a public data set. Here you can see that there are different attributes that we’re going to use here for this demo. And not only in terms of ID number and diagnosis which are probably the most important attributes, but also there are real value features that are computed for each cells, necklace, that we are going to analyze for this to specific data.

First of all you need to install AzureML Interpret and AzureML Contrib interpret packages.

Next, you need to train a sample model in a local Jupyter notebook. As you can see, you can again use the breast cancer data set, and then you can split the data into training and test them. Third you can call the explainer locally.

Call the explainer locally

Here you need to initialize an explainer objective, pass your model, and then do some training data to explainer cost factor. In order to make explanation and visualization more informative, you can also choose to pass in feature names. And I’ll put the class names. For example, if you’re doing the classification. These codes that you seem to use as lines actually show you how you can use the sheets and explain the object with the different types of examples. Here are specifically, you have a data explainer and a PFI explainer in a local environment.

Explain the entire model behavior (global explanation)

Then if you wanted to explain an entire model behavior, you can call what we call the global explanation. So this is going actually to give you a sort of a disorganization that you can leverage to understand and interpret better your models. So some of these visualization again are produce from your title codes, just to using these packages. And one just to show you some of the data visualization that this package can create for you. As you can see, there is here an overall view of the tree model, along with data predictions and explanations. So we have the data exploration, these are displays an overview of the data set along with the prediction values. Then we have the global importance, these aggregates, features, importance values of individual data points, to show the models overall top key. These are configurable type of case, so you can change it that number, design important features. And also have some understanding of the underlying model overall behavior. Then we have the explanation exploration, so these demonstrates how a feature affects the change in the model prediction values, or also the probability of the prediction values. It’s a very good visualization if you want to show the impact of a feature interaction. Finally, we have a date of some importance so these use different in features of importance of values across all the data points to show the distribution of each of the features backed on the prediction value. By using this diagram you can investigate, for example, in what the direction the feature values affects the prediction value.

Another way to understand better what your model is actually doing, is by using the local explanation.

Explain an individual prediction (local explanation)

You can see that here and you can get the individual feature important values of the different data points, by calling the explanation of where (indistinct), or for a different group of instances. Here we have a different type of visualization that are created. First of all, we have the local importance. This shows that at the top key important features for any individual prediction. And it’s very helpful when a data scientist wants to illustrate to the local behavior, of the underlying model, on a specific data point.

Then we have a data collection exploration, it’s a sort of what if analysis. As you can see these observations allows changes to feature values of the selected data points, and observers outing changes to the prediction value.

Finally, another important visualization that I want to share with you is called the individual conditional expectation. This visualization allows a feature value changes from a minimum value to a maximum value. So it’s very helpful when a data scientist needs to illustrate how the data points prediction changes when a feature changes.

Again, this was just an overview of what interpretability toolkit can do for you, and how you can leverage your data on your solutions. I want also to share additional contacts of the product team who work in these toolkits. As you can see, you can find their names and their emails there, in case you want also to follow up offline with a product team, again, who put together all these toolkits that I presented today.

Again, this is one of the articles that you can use it to learn more, and also to find some of the resources that have been used today during this session. And in terms of resources, I just want to share those with you, one more time. These are all the packages and githup report that have been used during this session. And you can also find me on Twitter and github, and (indistinct).

Watch more Spark + AI sessions here
Try Databricks for free
« back
Francesca Lazzeri
About Francesca Lazzeri


Francesca Lazzeri, PhD is an experienced scientist and machine learning practitioner with over 12 years of both academic and industry experience. She is author of the book “Machine Learning for Time Series Forecasting with Python” (Wiley) and many other publications, including technology journals and conferences. Francesca is Adjunct Professor of AI and machine learning at Columbia University and Principal Cloud Advocate Manager at Microsoft, where she leads an international team (across USA, Canada, UK and Russia) of cloud AI developer advocates and engineers, managing a large portfolio of customers in the research/academic/education sector and building intelligent automated solutions on the cloud. Before joining Microsoft, she was a research fellow at Harvard University in the Technology and Operations Management Unit.