← Writing

Why I Had to Change How I Build AI Systems

The way I built AI systems two years ago no longer works. Not because I was doing it wrong then, but because the underlying technology changed in ways that made the old approach a liability.


The way I built AI systems two years ago no longer works.

Not because I was doing it wrong then.

Because the underlying technology changed in ways that made the old approach a liability.

Before

The pattern I used consistently, and that most teams around me also used:

Break the problem into steps. Assign each step to a model call. Build explicit pipelines that pass outputs from one call to the next. Manage state outside the system, in a database or in application code.

This worked.

It was predictable. You could test each step independently. When something broke, you knew where to look.

But it required constant structural maintenance.

Every variation in the problem required a new branch in the pipeline. Every edge case required an explicit rule. Every change in context, a date shift, a new constraint, a user preference update, meant updating the state externally and routing it back through the right part of the flow.

The system was brittle in proportion to how much the real world didn't behave the way the pipeline expected.

The workaround phase

The instinct, when these systems broke, was to add more structure.

More rules. More pipeline stages. More explicit handling for the cases that fell through.

That made the systems more complete.

It also made them harder to maintain and slower to adapt. Each addition was a new dependency. Each rule created an interaction with the other rules. The system grew, but not in the direction of the problem. It grew in the direction of its own complexity.

I built a version of this for a travel planning flow. It handled a reasonable range of cases.

It did not handle the moment a user changed their mind.

When the context shifted, the pipeline didn't update. The state was stale. The system produced answers that were technically correct for a problem that no longer existed.

What changed

Three things shifted the calculus:

Context windows got long enough to hold a full problem in a single pass. Not a summarized version of it. The actual history, the constraints, the decisions made, the reasons behind them.

Tool use became reliable enough to use in production. Models could call external systems, check real data, take action, and reason about what came back, without requiring explicit orchestration logic around every call.

Structured outputs became consistent. You could ask for a specific format and get it, reliably, which meant downstream processing stopped being a guessing game.

Together, these shifted the design question.

Before: how do I break this problem into steps the model can handle?

Now: what does the model need to reason well about this problem as a whole?

What I stopped doing

I stopped building pipelines as the primary unit of design.

A pipeline encodes assumptions about how a problem unfolds. When those assumptions hold, it works well. When they don't, it fails in ways that are hard to recover from because the structure itself resists adaptation.

I stopped managing state entirely outside the system.

Putting all context in the application layer and feeding the model only what it needed for the current step felt clean. It also meant the model was always reasoning without the full picture. Small disconnects accumulated.

I stopped writing rules for edge cases first.

Rules are expensive to maintain and they encode a static view of the problem. Most edge cases I wrote rules for were edge cases in my model of the problem, not in the problem itself.

What I started doing

I start by asking what the model needs to reason well, not what sequence of calls will produce the right output.

I keep more context live. Not unlimited, but enough that the system can track how a situation has evolved, not just what it currently looks like.

I design for adaptation rather than completion. The goal is not a system that produces an answer. It is a system that can update its answer when the situation changes, without losing what it already knows.

This changes the architecture substantially.

Fewer handoffs. Less external state management. More investment in how context is structured and surfaced.

How this shows up in Voyami

Voyami is a travel planning system. The core problem is not finding options.

It is tracking a decision that evolves.

A user starts with a destination. Then a budget changes. Then someone else joins the trip. Then the dates shift.

Every one of those changes affects every prior decision. Under the old pipeline model, each change required re-entering the funnel. The system had no memory of what had been decided or why.

Under the current approach, the system holds the full context of the planning session. When something changes, it reasons about what that change affects, surfaces the relevant tradeoffs, and updates the plan without discarding what was already decided.

That is not a feature. It is a different architecture.

It required rethinking how state is represented, how context is passed, how the system handles partial information and conflicting constraints.

It also required accepting that the system would sometimes be uncertain, and that surfacing that uncertainty explicitly was better than producing a confident answer that was based on stale context.

What is still hard

Evaluation is harder under this model.

When systems are pipeline-based, you can test each stage independently. Pass/fail at every step.

When context flows continuously through a system that reasons across it, the failure modes are subtler. The system produces reasonable-looking outputs that are slightly off in ways that compound.

You need evaluation at the decision level, not the output level.

That means defining what a good decision looks like, which requires understanding the problem well enough to know. For travel planning, it means knowing whether the suggested itinerary actually reflects the user's stated constraints, not just whether the format is correct.

That kind of evaluation is harder to automate and harder to maintain as the system evolves.

The other hard part is latency. Longer context, more reasoning, more tool calls, all of it costs time. The systems I build now are more capable and slower than the ones I built before.

That tradeoff is real. Managing it, without reverting to simpler architectures that don't actually solve the problem, is still something I am figuring out.

What I am watching now

Even this approach is not stable yet.

There are still open questions that will likely change how these systems are built again.

When to rely on a single model with more context, and when to break the problem into coordinated steps. How much reasoning should happen inside the model, and how much should be explicitly structured outside it. Where memory should live, in the model's context, in external systems, or across both. How to handle longer interactions where context grows, but relevance does not.

These are not settled.

In practice, they show up as tradeoffs.

A system that is too flexible becomes inconsistent. A system that is too structured becomes brittle again.

Most of the work right now is finding where that balance holds.

Why this matters

The shift is not from old AI to new AI.

It is from systems designed around model limitations to systems designed around model capabilities.

For a long time, most of the architecture was compensating for what the model couldn't do.

Now more of it can be designed around what the model can do.

That changes which problems are worth attempting, how you structure the work, and what makes a system good.

I am still adjusting to that.

But I am building differently than I was.