How I Think About Building AI Systems

The problems I work on do not present themselves as modeling problems.

They present themselves as workflow problems. Organizational problems. Decision problems where no one is quite sure what decision is actually being made.

The model comes later. Usually much later.

I start by trying to understand the decision clearly, not in the abstract, but in detail.

Who is making it. When it is made. What information is available at that moment. What constraints will never appear in the data.

And then: what happens when it is wrong?

That last question changes almost everything. It reveals how much tolerance there is for error, who absorbs the consequences, and whether the system will ever be trusted enough to matter. Until those are clear, building anything is premature.

From there, I try to understand what actually happens today.

Not what the process documentation says. Not what the workflow is supposed to be.

What people actually do.

Where they get information. How they combine it. Where judgment enters because the data is insufficient. Where they work around systems that are supposed to help but don't.

This part is rarely documented. It has to be observed.

It is also usually where the real problem becomes visible, not in the model, but in the gap between how the decision is supposed to be made and how it actually is.

The next question is whether a system could change the outcome.

Not whether it could produce useful output. Whether it could actually change what happens.

There is a difference. Many systems generate information that gets reviewed and then set aside. The decision happens the same way it always did. The system is present but not load-bearing.

If that is the likely trajectory, the problem is usually upstream: in how the output fits into the workflow, in whether the person using it has any reason to change behavior, in whether the format matches how the decision actually gets made.

Identifying this early determines whether it is worth building at all.

Once I understand the decision and where a system could genuinely influence it, I think about ownership.

Who is accountable for this decision today. What would change for them if the system existed. Whether using it creates clarity or creates risk.

If ownership is unclear, or if the system would shift accountability in ways nobody has agreed to, the system will not hold regardless of how good it is.

This is one of the earliest filters. It has saved a lot of time.

Only then do I think about data.

Not what data exists, that question usually has an optimistic answer. What data is actually available, usable, and stable enough to support the decision as it is actually made.

Where it comes from. How often it changes. What happens when it is wrong.

This is usually where the first real constraints appear. Not because data is missing, but because the data that exists does not behave the way the system requires. It encodes different assumptions, is owned by different teams, and updates on a cadence that does not match the decision workflow.

At that point, the shape of the system becomes clearer, what context needs to persist, what can change independently, what the failure mode looks like and who handles it.

I also think about evaluation from the beginning. Not as a separate phase. As part of the design.

What signals would tell us whether decisions are improving? How long does it take to observe them? What is the closest proxy we can track in the short term without losing sight of what actually matters?

If I cannot answer those questions before building, the evaluation will default to whatever is easiest to measure. And that is almost never the right thing.

None of this follows a clean sequence in practice.

The questions interact. A data constraint changes how the decision can be framed. An ownership issue may change whether the system should exist at all. A failure mode may redefine what good enough looks like.

The system emerges through those tradeoffs, not around them.

What has changed over time is not the questions. Most of them have been the same for years.

What has changed is how early they need to be answered.

When models were harder to build, the model was the constraint, and it made sense to start there.

Now that capable models are available quickly, the constraint has moved. It is not in the prediction. It is in whether the prediction lands inside a real decision, owned by someone, in an environment ready to use it.

That is where the work actually is.

And it requires a different set of questions at the start.