I tracked every time a coding agent made a "wrong" decision in my codebase over the course of a month. Twenty-three times.
Nineteen of those weren't bugs. The agent was correct based on what it could see. It just didn't know what I knew — product decisions, trade-offs, things I'd explicitly confirmed or rejected that existed nowhere in the code.
The agent saw a Stripe integration. It didn't know I was actively evaluating switching to Polar. It saw a simple email/password auth flow and didn't know that was intentionally minimal because I hadn't decided on the permission model yet. It saw a 24-hour refund window and couldn't tell whether that was a deliberate product decision or a temporary hack.
These aren't coding mistakes. They're product context failures.
The state of the art is good — and it's not enough
The ecosystem for giving agents code-level context has gotten genuinely impressive in 2026:
- AGENTS.md / CLAUDE.md — Claude Code, Codex CLI, and others read these for conventions, build commands, architecture patterns
- Cline's memory bank — structured markdown files persisted across sessions for project knowledge
- Spec-driven development — GitHub's spec-kit (72k+ stars), Kiro, and others turning specs into agent instructions
- Agent discovery — both Claude Code and Codex CLI now do initial codebase exploration before they start working
All of these solve the "how should I write code in this repo" problem. And they're getting really good at it.
What none of them solve: what should this product do and why.
The harness engineering gap
Both OpenAI and Anthropic published engineering posts in early 2026 saying the same thing: the model isn't the bottleneck — the harness is.
OpenAI's harness engineering series describes building their entire product with zero hand-written code. The secret was "structured documentation as a single source of truth" with "mechanical enforcement." Their advice: "Give agents maps, not 1,000-page manuals."
Anthropic's post on effective harnesses for long-running agents describes the same pattern — every new agent session starts with no memory, so you need persistent structured context that agents can quickly load and reason about.
Both companies built these harnesses by hand for their own teams. Neither productized it.
This validates the problem from the two biggest AI companies — but it also reveals the gap. They're describing code-level harnesses. Neither addresses the product truth layer: the decisions, rules, and constraints that define what the product promises to do.
Four layers of agent context
When you look at the tools available today, there are really four distinct layers of context that agents need:
| Layer | What it provides | Example tools |
|---|---|---|
| Code conventions | How to write code in this repo | AGENTS.md, .cursorrules |
| Session memory | What happened in prior sessions | Cline memory bank, claude-progress.txt |
| Implementation specs | What to build (feature level) | spec-kit, Kiro, SDD |
| Product truth | What the product promises — what's decided, what's uncertain, and why | Missing |
Each layer is useful. None replaces the others. But right now, most repos have layers 1-3 covered (or at least have tools for them) and layer 4 is completely absent.
That's the layer where your product decisions live — and it's the layer agents violate most often.
What product violations look like
When an agent has AGENTS.md but no product truth layer, it knows how to work but not what to protect. So it:
- Refactors billing logic and silently changes the grace period from 14 days to 30
- "Simplifies" an auth check that was handling a deliberate edge case
- Auto-fills a field that the product intentionally leaves empty for compliance reasons
- Removes a validation rule because it looks redundant — but it was load-bearing
- Infers behavior from code patterns and builds confidently on wrong assumptions
The agent isn't being careless. It's following the instructions it has. The problem is that nobody wrote down which behaviors are off-limits.
Product truth as a behavioral contract
The fix isn't more documentation. It's a different kind of artifact — one that captures product-level decisions in a format both humans and agents can reason about.
That's what a Product Behavior Contract (PBC) is. It's Markdown-first, machine-readable, and designed to sit alongside your existing agent context:
- What must happen — required outcomes for each behavior
- What must not happen — forbidden actions and states
- Edge cases — the exceptions that look like bugs but are deliberate
- Trust levels — which behaviors are confirmed vs. provisional vs. still being explored
A PBC doesn't replace AGENTS.md or memory banks or feature specs. It fills the specific gap those tools leave open — the product truth layer.
When your agent can read a behavior contract that says "the 24-hour refund window is a confirmed product decision, not a temporary value," it stops treating that number as refactorable. When it sees "auth is intentionally minimal — permission model is still in exploration," it stops building assumptions on top of an undecided foundation.
Code tells you what is. Product truth tells you what should be.
The distinction matters because code is descriptive — it shows the current state. Product truth is prescriptive — it says what the product promises and where those promises are still uncertain.
An agent that only reads code will always assume the current implementation is intentional. An agent that also reads a product contract can distinguish between "this is a deliberate design decision" and "this is a temporary hack that should be replaced."
That's the difference between an agent that writes correct code and an agent that builds the right product.
The PBC spec is open source at github.com/stewie-sh/pbc-spec. You can browse example contracts in the PBC viewer.
If you're using AGENTS.md, memory banks, or spec-driven development and still finding that your agents miss the product layer — this is the artifact that's missing.