Autonomous software systems powered by LLMs that perceive environments, make decisions, and take actions through reasoning, planning, and tool use
Explainable AI, or XAI, refers to techniques that help people understand how an AI system arrived at a specific output. It is especially relevant to machine learning and deep learning, where models learn patterns from data instead of following human-written rules.
As models become more powerful, their decisions can become harder to trace. Deep learning models may contain billions of parameters, making it difficult to understand why they approved a transaction, flagged fraud, denied a loan or detected an abnormality in an MRI. This is often called the “black box” problem.
XAI helps open that box by giving teams ways to evaluate whether a model is:
As AI takes on more consequential decisions, understanding why a model reached an answer matters as much as the answer itself. This article covers the main XAI methods, the techniques data and AI teams rely on and how to choose between them.
Decisions in domains such as lending, hiring, healthcare, fraud detection or insurance can have major consequences for individuals. People have a right to know why their application was rejected, or a transaction flagged or a particular treatment recommended, especially if AI was involved. Lack of transparency isn’t just inconvenient. In many contexts it can be a liability. Here are four practical reasons why XAI methods matter:
Model behavior can also shift over time as real-world data changes. Explainability supports ongoing monitoring.
XAI methods generally fall into two categories: models that are explainable by design, and methods that explain a model after the fact. In the first category, the model's structure is simple enough to read directly. Examples include decision trees, linear regressions or rule-based systems.
In the second, the model is too complex to read directly, so a separate technique is applied after training to probe what the model is doing. Example techniques might include running experiments on an already-trained model, approximating the model with something simpler or tracing which inputs had the most influence on a specific output.
In either case, however, the analysis doesn't change the model, it interrogates it.
The basic workflow looks like this:
Before diving into specific methods, there are four terms that come up frequently in XAI discussions and knowing them will help clarify later discussions.
| Term | What it means | Example |
|---|---|---|
| Interpretable model | A model that's simple enough for a human to follow on its own — no extra tool needed. | A decision tree or linear regression whose logic you can read directly. |
| Explainable model | A complex model paired with a separate technique that explains the model’s behavior it has been trained. | A deep neural network analyzed with SHAP or LIME. |
| Global explanation | Describes how a model behaves overall, across all inputs. | "Income and credit score are the top two drivers across all loan decisions." |
| Local explanation | Describes why a model made one specific prediction. | "This applicant was denied because their debt-to-income ratio was too high." |
XAI methods are typically grouped by how they generate explanations. The three descriptions that follow cover the major techniques currently in use, as well as the trade-offs you have to consider regarding transparency, accuracy and practical fit.
Intrinsically interpretable models are transparent by design. The structure of the model itself reveals how it makes decisions, so no additional tool or technique is required to analyze the model’s logic. Examples include decision trees, which follow a flowchart of yes/no rules you can walk through by hand, and linear and logistic regression, which assigns a numerical weight to each input so you can see exactly how each feature contributes to the output. Generalized additive models and rule-based systems work similarly.
The trade-off here is accuracy. Interpretable models are easy to explain but often less accurate than complex models for hard problems like image recognition or understanding language. However, for highly regulated industries where every decision must be defensible, they're often the default choice.
Post hoc methods are applied after a model is trained. When most people say XAI, this is what they mean. Tools like SHAP, LIME and counterfactuals all qualify.
Post-hoc methods are usually the only option for deep learning models, large language models (LLMs) and other complex systems where the underlying math is too complex to read directly. The trade-off, however, is that post-hoc explanations are approximations, not exact internal calculations.
This category refers to methods that produce a visual output showing which part of the input drove the model's decision. Examples include saliency maps and Grad-CAM, which both highlight which pixels in an image mattered most. Attention visualizations highlight which words in a sentence the model focused on. For image and text models, a heatmap or highlight is often more intuitive than a list of numbers, making these methods especially useful when communicating results to nontechnical stakeholders. Like post-hoc methods, visualization outputs should be treated as informative signals, not definitive proof.
The table below summarizes the most widely used XAI methods, followed by more detailed descriptions of the five techniques practitioners use most frequently.
| Method | Scope | Model-agnostic? | Output | Best for |
|---|---|---|---|---|
| SHAP | Local + global | Yes | Numeric contribution of each feature to a prediction | Tabular models, tree-based models, broad use |
| LIME | Local | Yes | A simple surrogate model explaining one prediction | Quick local explanations across model types |
| LRP | Local | No (needs neural net internals) | Relevance scores traced back through network layers | Deep neural networks, image models |
| Integrated gradients | Local | No (needs model gradients) | Pixel- or token-level attribution | Neural networks, images and text |
| Saliency maps / Grad-CAM | Local | No | Heatmap over an image showing influential regions | Computer vision models |
| Counterfactual explanations | Local | Yes | "What would need to change for a different outcome?" | Decisions affecting individuals (loans, hiring) |
| Partial dependence plots (PDP) | Global | Yes | Chart showing how one feature affects predictions on average | Understanding overall model behavior |
| Permutation feature importance | Global | Yes | Ranked list of which features matter most overall | Model debugging, feature selection |
| Anchors | Local | Yes | "If-then" rules that lock in a prediction | Rule-style explanations for end users |
| TCAV | Global | No | How much a high-level concept influences predictions | Image models, concept-level audits |
| Attention visualization | Local | No (needs transformer internals) | Highlighting which tokens the model focused on | LLMs, transformers, NLP models |
The XAI method known as SHapley Additive exPlanations (SHAP) assigns each input feature a numeric score showing how much it moved a prediction up or down compared to a baseline. Ask SHAP why a loan was denied and it might tell you that the applicant's debt-to-income ratio reduced the approval probability by 22 points while their employment history added 8. The method is rooted in Shapley values from cooperative game theory, a principled way of distributing credit fairly among contributors, which gives SHAP a stronger theoretical foundation than most alternatives.
Key strengths of SHAP are that it is model-agnostic and it produces both local (single prediction) and global (overall model) explanations. It is also the primary explainability tool supported by Databricks AutoML and MLflow autologging. The trade-off is compute cost. SHAP can be slow on large datasets or complex models, and should be budgeted for accordingly.
The Local Interpretable Model-agnostic Explanations (LIME) method of XAI picks one prediction you want to understand, then builds a smaller, easy-to-read model to analyze how it generates that prediction. To do this, LIME tweaks the input slightly, many times over, and observes how the model's output changes. It uses those results to fit a simplified surrogate, typically a linear model, that approximates the AI it is analyzing. The output is a ranked list of features and their directional influence on the prediction.
LIME works on any model type and produces one-off explanations quickly. The trade-off is that the explanations can be unstable. Because LIME uses random perturbations, running it twice on the same prediction can produce meaningfully different results, which can be a real concern in high-stakes or contexts where auditing is required.
A counterfactual explanation answers a direct question: What would have needed to change for the model to make a different decision? For example, the statement, "If your annual income were $10,000 higher, this application would have been approved." That's a counterfactual.
This type of XAI resonates with nontechnical audiences because it is actionable. Counterfactuals fit naturally with how people already think about cause and effect, and they give people something to do with the information. They also work well within regulatory frameworks that include a right to an explanation, such as GDPR Article 22. The trade-off is typically practical. A counterfactual is only useful if the suggested change is realistic and within the person's control. "If you were 10 years younger" is not an actionable explanation.
Saliency maps and Grad-CAM are visual XAI techniques for image-based models. They produce a heatmap overlaid on the original image showing which pixels or regions the model focused on when making its prediction. In a medical imaging context, a Grad-CAM output on an X-ray classification might show the model focused on a certain region of the lung, which is exactly what a radiologist needs to see before trusting the result.
These methods are widely used in computer vision, medical imaging, autonomous systems and industrial quality control. Research has shown that saliency maps can look convincing while not accurately reflecting what the model is doing. Treat them as one signal, not a definitive output.
Transformer models provide the architecture behind most modern LLMs, and have built-in attention mechanisms that weight how much each input token contributes to each output token. Attention visualizations turn those weights into a highlight map over the text, showing which input words the model relied on most when generating a specific response.
The visualizations are readable without specialized expertise, which makes them one of the more accessible explainability tools for LLMs. They aren’t always a faithful explanation of the final output. Research has found that features with high attention weights don't always accurately reflect the model's actual decision.
Choosing the right XAI method depends on the model, the audience and the question you're trying to answer. The following framework can help guide your decision:
XAI methods are powerful, but they're not perfect. Anyone deploying them in production should understand the limitations.
Most post hoc methods such as SHAP, LIME or saliency maps approximate what the model is doing rather than revealing the exact internal computation. Two different methods applied to the same prediction may produce different explanations. Treat XAI outputs as evidence, not proof.
As mentioned, methods like SHAP and integrated gradients can be slow on large datasets or complex models. Running full explanations on every prediction in a high-volume production system may not be feasible, and selectively applying them raises questions about representativeness. Budget compute cost as well as modeling costs when considering which XAI method to choose.
Some methods, especially LIME, produce different results from repeated runs of the same prediction due to random sampling in the perturbation process. This instability is a real concern for auditable or regulated contexts. Adversarial attacks can also manipulate post hoc explanations to obscure actual model behavior. While research for countermeasures is ongoing, such attacks are another reason not to treat explanations as tamper-proof.
The most interpretable models are often the least accurate on complex problems, and the most accurate models are often the hardest to explain. This isn't a solvable engineering problem, it's a deliberate design choice. Organizations need to assess their priorities. Do they want a less accurate but fully transparent model, or a more accurate black-box model with XAI tooling layered on top? The answer should be driven by the importance of the decision. High-stakes domains such as healthcare, lending or criminal justice often warrant prioritizing explainability even at some cost to raw accuracy.
XAI methods are already in production across regulated and high-stakes industries. Here's how different methods tend to be used across industries:
MLflow, the open source ML lifecycle platform created by Databricks, supports model tracking, versioning and logging explanation artifacts alongside the model itself. For supported model flavors, MLflow autologging can capture SHAP values and feature importance scores, which keeps explanations attached to the specific model version and training run that produced them. Databricks AutoML also auto-generates SHAP plots and Shapley value notebooks for the models it produces, giving teams a starting point for explainability without manual setup.
Unity Catalog provides the governance layer that makes explanations auditable over time. This layer includes model lineage, versioning, centralized access control and audit logs that let teams trace which data trained which model and who accessed it. Together, MLflow and Unity Catalog give data and AI teams the infrastructure to build explainability into the model lifecycle rather than bolting it on at the end.
Are XAI explanations always accurate?
No. Most XAI methods, especially post hoc techniques like SHAP and LIME, produce approximations of model behavior, not exact reconstructions of internal computation. Two methods applied to the same prediction may yield different explanations. Treat XAI outputs as evidence, not conclusive proof. Validating explanations against domain expertise and combining multiple methods gives a more reliable picture.
What is the difference between XAI and interpretable AI?
Interpretable AI refers to models that are transparent by design and whose structure is simple enough to follow directly. Explainable AI is broader and includes interpretable models, as well as complex black-box models paired with separate techniques that explain their behavior after the fact. An interpretable model doesn't need XAI tools, but an explainable model does.
What is the difference between global and local explanations?
A global explanation describes how the model behaves across all inputs, such as which features matter most overall or what patterns drive predictions in general. A local explanation describes why the model made one specific prediction for one specific input. Both types are useful, and the best XAI practice uses global methods to understand the model and local methods to explain individual decisions.
What's the difference between XAI and responsible AI?
Responsible AI is the broader discipline, which covers fairness, safety, privacy, transparency and accountability across the full AI lifecycle. Explainable AI is the set of methods that make model behavior transparent and auditable. So, explainability is necessary for responsible AI but not sufficient on its own. A model can be explainable and still be biased, unsafe or misused.
Can XAI methods be used on generative AI?
Yes, though the techniques differ from those used on traditional ML models. For LLMs and other transformer-based systems, attention visualization is the most widely used approach. LIME can also be applied to text inputs. That said, generative AI presents harder explainability challenges than tabular or image models because outputs are more varied, context windows are longer and the relationship between input tokens and generated text is more complex. Explainability for generative AI is an active area of research, and current methods should be treated as partial signals rather than complete explanations.
XAI methods give data and AI teams the tools to build systems people can understand, trust and audit. Choosing the right method depends on the model, the audience and the importance of the output decision, but the underlying goal is the same: make AI behavior visible enough to act on with confidence.
Learn more about how Databricks supports responsible, governed AI in our enterprise data governance framework or the Databricks AI governance framework.
Subscribe to our blog and get the latest posts delivered to your inbox.