Why do we expect our AI products to be flawless?

I was going to title this ‘Why can’t we let self-driving cars kill anyone?’ but I thought that might be a bit too much.

Nonetheless, the facts don’t lie. Human drivers kill 1.3 million people every year. Think about that number for a moment.

Meanwhile, whilst it is early, numerous studies show that self-driving cars are often safer, but there is uproar if a self-driving car is anywhere near a fatal accident. One death, and we’ve pushed the acceptance of the technology back years.

In fact, this pattern of humans holding new technology to a much higher standard than they hold humans to, even when there is overwhelming evidence that the new automation technology is better, is very common throughout history. Automation is judged harshly for minor imperfections, whilst major human errors go unnoticed. There are important implications here for anyone designing AI products. When we study this, we learn that there are multiple human biases at play, and they show up consistently across many examples:

  • Automated elevators: People wouldn’t use them without a human operator, despite significant reductions in accidents. People complained about doors occasionally opening slightly off-level with the floor (yet human operators were worse), or had momentary hesitation or slight jolting before moving.
  • Airline autopilot: Massively improved flight safety, far exceeding human pilot reliability. Yet mistrusted by pilots for minor imperfections such as slight deviations in assigned altitude, slightly jerky movements during course corrections, and (can you imagine!) overly cautious approaches and landings.
  • ATMs: Criticised for slight delays dispensing cash (despite the huge queue inside the building), and slow user interfaces. Customers preferred human tellers, despite them making many more mistakes.

The list of new, better technology, being held to previously unseen high standards, and judged for minor imperfections goes on:

  • 1900s Automatic telephone switchboards
  • 1930s Automated traffic signals
  • 1970s Industrial robotics in car manufacturing
  • 1980s Automated stock trading systems
  • 1980s Automated subway trains
  • 1990s Digital medical diagnostic tools
  • 2000s Automated voting machines

I’ve been thinking about this a lot because we see a similar, fascinating pattern when new customers try Fin. In evaluating what Fin can do, and Fin’s performance, many customers hold Fin to a much higher standard than they hold their human team to. Even when Fin is faster than humans, and more accurate more often, the feedback is ‘Fin is too slow’, ‘Fin made too many mistakes’.

For example, Fin can issue refunds to customers. To do so, Fin needs to:

  • Check the product purchase history
  • Check the refund policies
  • Check the customer record
  • Approve
  • Talk to the payment system to issue the refund
  • Get back to the customer to tell them the refund has been approved and issued
  • Update all customer records.

For any AI Agent to do that accurately, consistently, is impressive. But it might take 90 seconds.

Few humans can do this in 90 seconds. They take way longer. Sometimes up to a day.

Fin will do this in 90 seconds, and often less, every time. When we should be thinking ‘This technology is like magic’, we judge it harshly instead. It’s like everyone’s experience with airplane wifi. Instead of thinking ‘it’s amazing I have the internet up here in this flying box of metal’, we think ‘this internet speed is shit’.

Sometimes Fin can make a mistake, or a minor misunderstanding. And people think ‘Fin made a mistake with that answer, this technology isn’t good enough yet’. Yet their human team makes way more mistakes all the time.

Why is this? And how do we design for it?

We can’t try and persuade people to abandon psychological biases that have existed for hundreds of years. But we can study them, and design around them. In this case, there are three main ones:

Automation bias leads us to over-scrutinize automated systems. Our brains naturally offload cognitive tasks, creating discomfort when we lose perceived control, and this discomfort magnifies minor technological imperfections.

Possible solutions: 

  1. Products need to clearly show users how and why automated decisions were made. If you’ve wondered why AI tools expose so many events, even when they move past at incomprehensible speeds, now you know. We’re exploring different ways of making Fin’s reasoning visible and understandable:

  2. Remind people of the facts, showing real comparisons between AI and Human performance. We do a lot of this in Fin Reporting.

Status quo bias makes us instinctively resistant to change because we fear losses more intensely than we desire gains, leading to irrational preferences for familiar human processes, even when automation is demonstrably better.

Possible solutions:

  1. Introduce automation incrementally, mixing familiar interactions with new automated experiences to ease users into acceptance.
  2. Leveraging familiarity by designing automated interfaces and interactions that closely mirror familiar human-based workflows, reducing cognitive friction.

We designed Fin to blend deterministic and generative workflows. This makes it easy for many to incorporate Fin into their existing workflows as a first step. We encourage customers to rethink their whole setup, but sometimes this is the best way to get started.

The availability heuristic causes us to judge the likelihood of events based primarily on how readily specific examples come to mind, rather than relying on accurate statistical assessments. This cognitive shortcut arises because the human brain prioritises information that’s vivid, emotional, or easily recalled. As a result, rare but dramatic incidents, such as a single highly publicised automated mistake, are disproportionately memorable and thus perceived as more frequent or dangerous. Conversely, everyday human errors, though statistically far more common, lack the emotional intensity and vividness necessary for easy recall, making them significantly underestimated in our risk perception.

Possible solutions:

  1. Prioritise extra effort in designing initial user interactions to be smooth and error free
  2. Amplify positive experiences to create memorable narratives
  3. When errors occur, transparently communicate how rare they are compared to overall success.

Remember that AI products are new. Many will suffer from these biases and more, and if we want fast adoption timelines, we will need to design around them if the products are to be successful.