← All posts
·5 min read

AI agent context still misses the product layer

AI coding agents now have AGENTS.md, memory banks, harnesses, evals, and monitors. They still lack product context: what the product promises to do and which behaviors must stay true.

product-behavior-contractai-codingproduct-intelligencedeveloper-experience

If you've been following AI coding tools closely, you've probably noticed the conversation changing.

A year ago, most of the discourse was about prompts, context windows, and whether agents could reliably finish tasks at all.

Now the serious work is happening one layer above the model:

Taken together, these point to the same conclusion: the model is only part of the system. Reliable agentic coding depends on the surrounding stack.

That's real progress. But it still leaves one missing layer.

The modern agent stack is getting better at telling agents how to work. It still does a poor job telling them what the product must continue to do.

What AI agent context solves today

Most serious AI-native repos are already building some version of the same stack:

Each layer solves a real problem.

Repo instructions reduce workflow mistakes. Memory reduces repeated exploration. Harnesses help agents make progress across long tasks. Evals and monitors catch bad outputs and suspicious behavior.

If your goal is better software engineering execution, this stack makes sense.

Why better harnesses still don't protect product decisions

Here's the problem: an agent can follow every repo rule, use the right harness, pass the tests, and still break the product.

Not by writing obviously bad code. By changing something that looked reasonable from the code alone.

That happens because most product decisions are not explicit in the repo:

The agent sees implementation. It does not automatically see product intent, trust level, or business significance.

That is why teams end up saying the same thing after an agent made a "wrong" change: the code was plausible, but it violated something the team had already decided.

This is not a prompt quality problem. It is a missing artifact problem.

What is missing from AI agent context today?

The missing layer is product truth.

Not a PRD. Not a sprint spec. Not a memory log. Not a test suite.

Product truth answers a narrower and more durable question:

What does this product actually promise to do right now, and which behaviors are confirmed enough that agents should treat them as protected?

That layer needs to capture things like:

Without that layer, every agent is forced to infer product meaning from implementation details.

Sometimes that works. Sometimes it silently introduces product drift.

Why AGENTS.md and memory banks are not enough

This is where teams get confused, because all of these artifacts look similar from the outside. They're usually text files in the repo. They're all readable by both humans and agents. They all seem like "context."

But they operate at different levels:

None of those directly answer: which product behaviors are intentional, protected, and safe to build on top of?

You can have all five and still leave the core product layer implicit.

That's why a repo can feel "well-instrumented" for agents and still be fragile when they touch billing, entitlements, onboarding logic, permissions, or compliance-sensitive flows.

What a Product Behavior Contract adds

A Product Behavior Contract adds the missing product layer without replacing the rest of the stack.

It sits alongside your existing agent context and makes the behavioral contract explicit:

That changes the quality of agent decisions.

When the contract says a billing limit is confirmed, the agent stops treating it as an arbitrary number it can refactor freely. When the contract says the permission model is still under exploration, the agent stops extending it as if the design were settled. When a behavior is marked provisional, humans and agents both know not to overfit around it.

This is the difference between code context and product context.

Code context tells the agent what exists. Product context tells the agent what must remain true.

The agent stack is converging. The product layer is next.

My read of the current ecosystem is that OpenAI, Anthropic, and the broader tool market are all converging on the same architecture:

That is the right direction.

But as agents get better at execution, the cost of missing product truth goes up, not down.

A stronger coding agent can now move faster, touch more files, and refactor more confidently. If the product layer is still implicit, that extra capability just lets it make bigger product mistakes more efficiently.

The next mature AI-native repo will not stop at workflow rules, harnesses, and evals. It will also include a durable product artifact that says what the software is actually supposed to do.

That's the role of a product behavior contract.

The format is open source. The PBC viewer lets you browse structured contracts in the browser. And Stewie is the product built to help teams generate and maintain that contract from real code.

Related:

Related posts

Stewie reads your codebase and helps you author a living product behavior spec. We're onboarding a small group of product and engineering teams before public launch. Request early access →