Trust: The Cornerstone of AI Implementation — How to Measure It, Improve It, and Design for It

To increase trust in AI, don’t search for one perfect system; combine systems that break in different ways.

The core ideas of this article were first presented in the talk “Beyond Quality: Measuring Trust in AI Outcomes” at Software Quality Days 2026 in Vienna.

Trust Architecture Canvas by Alexis Savkin - Trust comes from combining systems that break in different ways.

Download the Trust Architecture Canvas as a PDF template.

Trust Architecture Canvas: Design Reliable Systems, Including AI-Based Ones

Can We Trust AI? — The Fundamental Question of All AI Implementations

All discussions about AI eventually end up with the same question:

Can we trust AI?

In some cases, people say they cannot use it officially because they are in a regulated industry. Others say they tried AI tools like Cursor or GitHub Copilot, and they worked really well. But somehow all those discussions end with one question: “Can we trust AI?”

Trust Is Everywhere, but What Is Trust?

Think about two shopping carts:

  • One has a coin chain, so you need to put a coin in before using it.
  • The other one does not require anything.

Shopping cart as an example of trust implementation: how systems translate trust to their stakeholders

In the first case, it looks like the supermarket does not trust me to return the cart without a deposit.

In the second case, the supermarket trust me enough to return the cart to the right place and not create inconvenience for other drivers.

It is a small example, but it shows how systems translate trust to their stakeholders – like me doing weekly shopping in one supermarket or another.

Trust Is Gradual, Subjective, and Contextual

These are the basic properties of trust.

  • Trust is not binary; it is a degree of something.
  • Trust is not an intrinsic property of the system; someone trusts something, for a specific purpose, in a specific context.

Why We Use Trust to Complement Quality

We use trust because of natural constraints in the operating domain; we use trust for faster decisions when limited information is available.

Quality Metrics Break Down When the Domain Becomes Too Complex

For less complex business domains, the cost of measurement is acceptable, so we can use classical quality metrics. As the complexity of the domain increases, classical measurement becomes too costly.

Why Do We Complement “Quality” with “Trust”?

At this point, we have a choice. We can continue trying to base decisions only on hard metrics, or we can use something that we group under the trust umbrella: perceptions, social proof, probabilities, and other proxies.

Cybersecurity Shows How Quality Turns Into Trust

A decade ago, cybersecurity was relatively easy to quantify and measure: brute-force time, basic internal controls…

Starting in 2024, attack vectors changed, and we started talking much more about the need to analyze third parties in the supply chain.

Are we still measuring the quality of cybersecurity controls, or are we increasingly measuring trust?

A typical third-party vulnerability assessment is more about relying on trust indicators demonstrated by the partner than on hard quality and security metrics.

Cybersecurity Example: Why we complement quality with trust

Humans and AI Are Both Breakable

Here you have the Munker–White illusion.

Munker–White illusion: can we trust humans?

The illusion shows that two colors can be objectively the same, but we still see them as different. This is just one example of how, as human beings, we can be tricked.

Humans are not a perfect reference. We also fail, and we also need controls around our judgment.

That was an illusion: the colors are actually the same.

AI Can Be Tricked as Well

As for AI, a classical example is asking about how to get to the car wash.

A car wash example where AI hallucinates.

AI may answer the question literally and suggest walking.

The realistic question is not “trust or don’t trust,” but where does this system break?

How Do We Measure Trust?

Probably absolute trust numbers will not make much sense (we simply do not have a real measurement unit for trust). But relative numbers are much more useful.

It is useful to understand whether trust is higher or lower in one setup than in another. This helps us compare systems and explain decisions.

Trust metrics help us speak with stakeholders. Instead of saying, “I feel it works,” we can explain why a certain AI setup is acceptable or why additional controls are needed.

How Do We Improve Trust?

My practical framework includes three levels:

  • Level one: personal trust
  • Level two: systematic trust
  • Level three: architectural trust

Level One: Personal Trust

Personal trust is intuitive. You get your own understanding of whether you can trust the system or not.

With AI, this means putting your hands on it. You test it, give it tasks, see where it breaks.

How Trust Is Quantified and Measured

  • One metric is the time you spend writing prompts.
  • Another is the time you spend fixing the result.

If you spend a lot of time prompting or repairing output, that tells you something about your real trust level.

Action Plan

  • Test AI in your own work.
  • Watch where it helps, where it breaks, and how much effort you need to make the result usable.

Level Two: Systematic Trust

At the systematic level, we move from personal experience to scale. It is not only “I know where AI breaks.” It is: let’s put this on scale and test it for a specific domain or a specific class of tasks.

Basically, we do the same as on level one, but now with more cases, more structure, and more statistics.

How Trust Is Quantified and Measured

  • The proxy for trust becomes the probability of correct output.

You calculate it as the number of correct outputs divided by the total number of cases. Plus, you add a confidence interval depending on the number of test cases.

Action Plan

  • Use public benchmarks when relevant.
  • Use your own datasets for specific domains.

Add random sampling and human review to understand whether the statistical result matches your real domain needs.

Level Three: Architectural Trust

At the architectural level, the question changes again. We do not trust AI 100%, and probably we never will. But:

Can we build something trustworthy using systems that we do not trust 100%?

The answer is “yes.” The internet is one example: physical networks are not something we can trust 100%, but somehow we managed to build the Internet on top of them.

How Trust Is Quantified and Measured

First you measure how each system performs separately. Then you measure how they perform together.

  • The important metric is shared failure rate: cases where all systems fail at the same time.

Action Plan

  • Identify the key systems in the pipeline: AI, humans, policies, validations, controls.
  • Measure their individual trust levels.
  • Test the whole architecture to see whether the combined system gives a higher trust level than each part alone.

Increasing Trust by Combining Systems That Break in Different Ways

Combined trust depends on how systems fail together.

If we have system A and system B, each with its own trust level, what happens when we combine them?

  • We cannot simply summarize their trust levels, because then we can go above 100%.
  • We also cannot simply take the minimum or maximum.

The answer depends on how the systems are designed and how they fail together.

An example of using the Trust Architecture Canvas: deciding not to use a candidate system because it breaks in the same way as an existing one.

Joint Analysis Shows the Combined Trust Level

To understand combined trust, we need joint analysis. We test system A and system B separately, then we also look at how they behave on the same cases.

For example, system A has 84% trust and system B has 91%. But when we combine them, the overall trust becomes 95%, because the shared failure rate is only 5%. They do not always fail on the same cases, and this is the important part.

Trustworthy Architecture Uses Overlapping Safety Nets

In software engineering, peer review works in the same way. Another person may catch something that you have not captured.

In aviation, we also see redundancy in controls and procedures.

Simply duplicating controls will not increase trust much. What we are looking for is diverse redundancy: orchestrating systems that break in different ways.

Not Every Redundancy Is Realistic

Some redundancy is useful in theory, but not realistic. For example, in taxi services, we could add a second driver, and probably the service would become safer. But this is not realistic.

So instead, we build a network of different systems: regulations, policies, driver ratings, app controls, reporting mechanisms. All these systems combine and contribute to the overall trust level.

Human in the Loop Is One More Trust System

We can think about human in the loop as another trust system. Humans bring intuition and common sense, and their principles are different from AI systems. This makes humans a perfect trust factor.

The Architecture Matters More Than Individual Trust Scores

Two strong systems can still fail together if they fail in the same way.

At the same time, two imperfect systems can create a stronger combined system if they compensate for each other. So the core design question is: do these systems break differently?

Have We Just Reinvented Reliability?

Not exactly.

Reliability is part of trust, but trust covers a wider set of ideas. We are not only asking whether one component works — we are looking at the whole decision pipeline: AI, humans, policies, controls, and business context.

Trust also involves more stakeholders: users, managers, regulators, business owners, humans in the loop, and people affected by the decision.

Core Takeaway

Trustworthy AI is not about finding one perfect AI model. All systems are breakable, including AI systems, human systems, business systems, policies, and controls.

The idea is to:

  1. Understand how systems break and then
  2. Combine systems that break in different ways.

This is the way to get to something trustworthy.

Cite as: Alexis Savkín, "Trust: The Cornerstone of AI Implementation — How to Measure It, Improve It, and Design for It," BSC Designer, May 15, 2026, https://bscdesigner.com/measure-trust.htm.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.