← Writing

Production Has Opinions

The bottleneck in AI is rarely the algorithm. It's everything around it, the data, the workflows, the ownership, the incentives. Getting something to actually work in production is still surprisingly hard.


Most organizations are not struggling with models.

They are struggling with systems.

Over the past decade, I have spent a lot of time building AI and data science systems inside large enterprises. One pattern keeps repeating. The tools keep getting better. The models keep getting better. The demos look more impressive every year.

But getting something to actually work in production, in real workflows, with real data, with real users, is still surprisingly hard.

The Bottleneck Is Rarely the Algorithm

The bottleneck is everything around it:

  • Messy, fragmented data that was never designed to support the kind of inference you need
  • Workflows that do not map cleanly to model inputs and outputs
  • Unclear ownership across teams, where the people who built the model are not the people who have to live with the results
  • Systems that were never designed for AI-driven decisions: built for reporting, not for acting
  • Incentives that reward pilots and discourage the kind of long-term platform investment that production actually requires

None of these are model problems. They are organizational and systems problems, and no amount of better algorithms fixes them.

What Generative AI Has Made More Visible

Generative AI has made this gap more visible.

The barrier to a working demo has never been lower. But building something reliable enough to support real decisions still takes platform thinking, not just model thinking.

The demo almost always works. Production has opinions. Production is where real users make real decisions, where edge cases accumulate, where latency matters, where the mismatch between what the model was trained on and what the world actually looks like starts to show.

Why This Led to Toutami

This gap is one of the reasons I started Toutami, not because the models were not good enough, but because the layer between the model and the decision was missing.

Building that layer means treating reliability as a first-class concern from the start, not something you bolt on after the demo gets approved.

The interesting work is not getting the model to run.

It is making the system reliable enough that people are willing to depend on it.