Agent Explainability: The Foundation for Trust in the Agent Ecosystem

What is Agentic AI? There are multiple answers to this question, not surprisingly since we’re still in the early stages of this complex technology. Wikipedia emphasizes the autonomous nature of agentic AI in its definition: Agentic AI is “a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks without human intervention.”

McKinsey noted that AI is now “moving from thought to action” in a July, 2024 report that explained the difference between generative and agentic AI. “We are beginning an evolution from knowledge-based, gen-AI-powered tools — say, chatbots that answer questions and generate content — to gen AI-enabled agents that use foundation models to execute complex, multistep workflows across a digital world. … Gen AI agents eventually could act as skilled virtual coworkers, working with humans in a seamless and natural manner.”

Similarly, IBM answers the question “What is Agentic AI” by writing: “Agentic systems provide the flexibility of LLMs, which can generate responses or actions based on nuanced, context-dependent understanding, with the structured, deterministic and reliable features of traditional programming. This approach allows agents to think and do in a more human-like fashion.”

For decades we’ve been developing applications using increasingly sophisticated programming platforms and high level languages. In their early days, computers were relatively slow and had a small amount of main memory, so we generally used assembly languages, which had a very strong correspondence between an instruction in the language and the individual instructions of the machine’s architecture.

That’s how I first learn to program in the early 1960s. One of the hardest parts of programming was debugging, that is, the process of finding out why the program crashed or otherwise didn’t work, fixing the bug and resubmitting the program to the computer. Debugging a program in those days often required taking a core dump, which involved recording the state of the computer’s memory when the program crashed and going over it instruction by instruction until the problem was found and fixed.

Programming became considerably less labor intensive with the development of higher level, domain specific languages, and of compilers that translated the high level language into machine code. Examples include Fortran, — a language particularly suited for scientific, engineering, and other numeric applications; Cobol, — primarily intended for business applications; PL/1, — a more general purpose language widely used in systems programming; and Java, — a high-level, object oriented language that could run on multiple platforms without the need to recompile.

Agentic AI is now taking application development to a whole new level. Instead of carefully specifying every step of the application, as in deterministic languages that are supposed to do what they were explicitly programmed to do, AI agents are intended to make decisions and execute tasks, working autonomously alongside humans as skilled virtual coworkers unless an agents needs additional human guidance.

In the emerging world of agentic systems, application programmers are now becoming something closer to system engineers, whose job is to design and clearly specify the overall system they're building including the various agents or assistants needed to develop the various components. The human’s job is to define the goals and actions the agents should follow under a variety of conditions using LLMs and other tools, as well as to coordinate the execution of the overall system to make sure that it’s working as intended.

A few weeks ago I posted a blog on “The Future of Agentic Ecosystems,” based on a November, 2024 article by senior technology executive Eric Broda in his Medium platform. “Billions of dollars of investment by some of the largest firms on the planet are flowing into tools that will make it easy to build autonomous agents,” wrote Broda. “And if this huge investment, and the recent headlines, are any indication, we will soon have many, many autonomous agents collaborating in a dynamic ecosystem.”

The key questions will not be “how to build autonomous agents”, he added, “but rather, how do we manage this burgeoning ecosystems of autonomous agents. How does one find an autonomous agent that does what we want? How does one interact with an autonomous agent? And if we want to transact with an autonomous agent, how does that happen? And how does it happen safely?”

More recently, Broda raised another important question in “Agent Explainability: The Foundation for Trust in the Agent Ecosystem,” an April, 2025 Medium article. Given that AI agents are designed to make decisions and perform tasks with limited human intervention, making sure that they do what we want them to do is likely to be significantly more difficult than it’s been with traditional deterministic application development over the past several decades.

“Widespread agent adoption will only occur when we trust them,” he wrote. “But what does it mean to trust an agent? In the simplest sense, trust means believing that an agent will do what it is meant to do.”

“Trust, in human relationships is built over time through observable behavior, consistent actions, and shared understanding. We trust people not only because of what they do, but because we understand why they do it — we can usually infer motives, judge consistency, and form expectations. And when that trust is broken, we seek explanations.”

But agents don’t have our shared, intuitive human understanding. Thus, as agents become more embedded in business and industrial operations and as we delegate more and more important responsibilities to agents, the opportunity for serious errors increases.

Unlike deterministic programs, agents are driven by AI-based LLMs which are both statistical and opaque. That is, if an AI agent generates unreliable, error prone outputs, it can be quite difficult to figure out what they did and why they did it. While debugging a deterministic, complex application can also be very difficult, we have decades of experience and a wide variety of tools to help us do so, unlike the much more recent LLMs and agentic AI systems.

This is very challenging, said Broda, because in order to trust agents “we must open the opaque LLM box that powers them. Explainability makes an agent’s plans transparent, understandable, traceable — and trusted.”

In his Medium article, Broda proposed a model for agent explainability.

First, he reminds us that that over the years, we’ve developed all kinds of methods to help us determine the causes of machine failure in the physical world. For example, commercial airplanes are all equipped with flight data and cockpit voice recorders that run silently in the background. When an accident occurs, investigators use these data to determine the root causes of the accident and recommend corrective actions. Similarly, modern factories are full of embedded sensors that monitor operations to detect early signs of a potential failure and take corrective action.

These are both examples of explainability systems that “literally create a trail of evidence of operational behaviour that can be inspected, verified, and explained. This is what we should expect from agents. If we are to trust them with increasingly complex tasks, agents need to do more than produce outputs. We need agents that, like their counterparts in aviation and industry leave behind a trail of data — an explainable record of what they intended to do, how they did it, and why it turned out the way it did.”

Today, a GenAI-based system relies on LLMs to drive an end-to-end process: “it creates an internal opaque plan, uses hidden logic to ingest prompts and inputs, uses unseen execution capabilities, and finally tosses the plan out once the task is complete. These plans are ephemeral.”

Agent explainability requires capturing several elements of the agent’s task plans, including the detailed steps the agent intends to take including its interactions with other agents, the agent’s selection rationale, who the agent is intending to interact with and which tools the agent is expecting to use, the instructions used to create the task plan, and the parameters that were provided to the agents.

“An agent ecosystem must include explainability but not as an optional feature or afterthought, rather as a core design principle,” wrote Broda in conclusion. “However, to do this explainability must be designed in from the start: agents must be built with transparent task planning, traceable execution, and observable metrics from the ground up. The challenge is clear: opaque agents will inevitably undermine trust. The solution is equally clear: we must design agents that explain themselves.”

Irving Wladawsky-Berger

RECENT POSTS

CATEGORIES

Subscribe to this blog via email