← All posts
·5 min read

Why AI agents keep violating your product rules

Your coding agent ships correct-looking code that breaks product promises. The problem isn't capability — it's a missing context layer that AGENTS.md and memory banks were never designed to provide.

product-behavior-contractai-codingsaasdeveloper-experience

I tracked every time a coding agent made a "wrong" decision in my codebase over the course of a month. Twenty-three times.

Nineteen of those weren't bugs. The agent was correct based on what it could see. It just didn't know what I knew — product decisions, trade-offs, things I'd explicitly confirmed or rejected that existed nowhere in the code.

The agent saw a Stripe integration. It didn't know I was actively evaluating switching to Polar. It saw a simple email/password auth flow and didn't know that was intentionally minimal because I hadn't decided on the permission model yet. It saw a 24-hour refund window and couldn't tell whether that was a deliberate product decision or a temporary hack.

These aren't coding mistakes. They're product context failures.

The state of the art is good — and it's not enough

The ecosystem for giving agents code-level context has gotten genuinely impressive in 2026:

All of these solve the "how should I write code in this repo" problem. And they're getting really good at it.

What none of them solve: what should this product do and why.

The harness engineering gap

Both OpenAI and Anthropic published engineering posts in early 2026 saying the same thing: the model isn't the bottleneck — the harness is.

OpenAI's harness engineering series describes building their entire product with zero hand-written code. The secret was "structured documentation as a single source of truth" with "mechanical enforcement." Their advice: "Give agents maps, not 1,000-page manuals."

Anthropic's post on effective harnesses for long-running agents describes the same pattern — every new agent session starts with no memory, so you need persistent structured context that agents can quickly load and reason about.

Both companies built these harnesses by hand for their own teams. Neither productized it.

This validates the problem from the two biggest AI companies — but it also reveals the gap. They're describing code-level harnesses. Neither addresses the product truth layer: the decisions, rules, and constraints that define what the product promises to do.

Four layers of agent context

When you look at the tools available today, there are really four distinct layers of context that agents need:

LayerWhat it providesExample tools
Code conventionsHow to write code in this repoAGENTS.md, .cursorrules
Session memoryWhat happened in prior sessionsCline memory bank, claude-progress.txt
Implementation specsWhat to build (feature level)spec-kit, Kiro, SDD
Product truthWhat the product promises — what's decided, what's uncertain, and whyMissing

Each layer is useful. None replaces the others. But right now, most repos have layers 1-3 covered (or at least have tools for them) and layer 4 is completely absent.

That's the layer where your product decisions live — and it's the layer agents violate most often.

What product violations look like

When an agent has AGENTS.md but no product truth layer, it knows how to work but not what to protect. So it:

The agent isn't being careless. It's following the instructions it has. The problem is that nobody wrote down which behaviors are off-limits.

Product truth as a behavioral contract

The fix isn't more documentation. It's a different kind of artifact — one that captures product-level decisions in a format both humans and agents can reason about.

That's what a Product Behavior Contract (PBC) is. It's Markdown-first, machine-readable, and designed to sit alongside your existing agent context:

A PBC doesn't replace AGENTS.md or memory banks or feature specs. It fills the specific gap those tools leave open — the product truth layer.

When your agent can read a behavior contract that says "the 24-hour refund window is a confirmed product decision, not a temporary value," it stops treating that number as refactorable. When it sees "auth is intentionally minimal — permission model is still in exploration," it stops building assumptions on top of an undecided foundation.

Code tells you what is. Product truth tells you what should be.

The distinction matters because code is descriptive — it shows the current state. Product truth is prescriptive — it says what the product promises and where those promises are still uncertain.

An agent that only reads code will always assume the current implementation is intentional. An agent that also reads a product contract can distinguish between "this is a deliberate design decision" and "this is a temporary hack that should be replaced."

That's the difference between an agent that writes correct code and an agent that builds the right product.


The PBC spec is open source at github.com/stewie-sh/pbc-spec. You can browse example contracts in the PBC viewer.

If you're using AGENTS.md, memory banks, or spec-driven development and still finding that your agents miss the product layer — this is the artifact that's missing.

Related posts

Stewie reads your codebase and helps you author a living product behavior spec. We're onboarding a small group of technical founders before public launch. Request early access →