← All posts
·6 min read
View .md

Beyond CLAUDE.md and AGENTS.md: when your coding agent needs a behavior spec

CLAUDE.md and AGENTS.md tell agents how to work in your repo. They don't tell agents what your product promises. That's a different problem — and it needs a different artifact.

product-behavior-contractai-codingpbc-specdeveloper-experience

TL;DR: CLAUDE.md and AGENTS.md are excellent at steering how agents write code. They were never designed to capture what the product promises to do. When agents refactor, extend, or integrate code, they need a behavior spec — not more workflow instructions. That's the layer above instruction files.

Your agent refactors the billing module and changes the grace period from 14 days to 30. The code is cleaner. The tests pass. The product promise is broken.

Your agent simplifies the auth flow and removes an edge case check. It looked redundant in the code — but it was handling a deliberate compliance requirement that existed nowhere except in your head.

You had CLAUDE.md or AGENTS.md in the repo. You had coding conventions written down. The agent followed those perfectly — and still broke the product.

This isn't a model capability problem. The models are better than they've ever been. It's a context architecture problem — instruction files tell agents how to write code, but not what the product promises to do.

What CLAUDE.md and AGENTS.md actually are

These files work. They solve a real problem. But it's worth being precise about which problem.

CLAUDE.md tells Claude Code how to operate in your repo. Use pnpm, not npm. Run tests before committing. Don't modify generated files. Keep responses concise. It's a workflow configuration — process knowledge that changes when your conventions change.

AGENTS.md does the same for Codex and other agents: coding conventions, build commands, architecture patterns, file organization rules. OpenAI's AGENTS.md spec has been adopted across tens of thousands of repos because it solves the "how to work here" problem well.

Both are instruction files. They answer: "How should an agent behave in this repo?"

That's a useful question. But it's not the question that causes production incidents.

The question they don't answer

The question that causes production incidents is: "What does this product promise to do?"

When an agent refactors your billing module, it doesn't need to know whether to use pnpm or npm. It needs to know that the 14-day grace period is a confirmed product decision — not a magic number to clean up. It needs to know that the empty tax_id field is intentionally blank for compliance reasons — not a bug to fix. It needs to know that the auth flow is deliberately minimal because the permission model hasn't been decided yet — not because nobody got around to adding OAuth.

CLAUDE.md doesn't have this information. Neither does AGENTS.md. Not because they're badly designed — because they were designed for a different purpose.

Where instruction files hit their ceiling

The ceiling isn't one thing. It's a pattern that shows up in three ways:

1. No semantic structure. CLAUDE.md and AGENTS.md are freeform prose. A human reads them and infers what matters. An agent reads them and treats every line as equally weighted. "Use TypeScript" and "never change the grace period without product owner approval" have the same format — a bullet point. One is a preference. The other is load-bearing.

2. No trust signal. Everything in an instruction file has the same status: written down. There's no way to distinguish a confirmed product decision from a provisional assumption from an active experiment. An agent treats them all as current truth — and they aren't.

3. No verification path. After an agent runs, there's no way to check whether it honored the product constraints. You can lint code style. You can run tests. But "did the agent preserve the billing contract?" requires a human to review the diff line by line and remember every product decision in their head.

These aren't flaws in CLAUDE.md or AGENTS.md. They're the natural limits of instruction files trying to carry behavior specs they were never built for.

What sits above instruction files

The layer above instruction files is a behavior spec — an artifact that captures what the product promises to do, in a format that's both human-reviewable and machine-readable.

A behavior spec (.pbc.md — formally a Product Behavior Contract) sits in your repo alongside CLAUDE.md and AGENTS.md, but it answers different questions:

CLAUDE.md / AGENTS.md.pbc.md
Answers"How should the agent work here?""What does the product promise?"
ContainsConventions, commands, patternsBehaviors, rules, states, edge cases
ChangesWhen your workflow changesWhen a product decision changes
AudienceAgents + new contributorsEveryone — product owner, eng, QA, agents
StructureFreeform proseMarkdown with typed semantic blocks

Here's what the same knowledge looks like in each format:

In CLAUDE.md:

# Billing rules
- Grace period is 14 days
- Don't change billing logic without approval
- Tax ID is required for invoices

In a .pbc.md behavior spec:

## Grace period enforcement

### When
A subscription payment fails

### Then
- System enters a 14-day grace period
- User retains full access during grace period
- Daily retry attempts are made against the payment method
- On day 14, if no successful payment: downgrade to free tier

### Invariants
- Grace period duration must be exactly 14 days — not configurable per plan
- No data deletion occurs during grace period
- Grace period cannot be extended manually by support

### Edge cases
- If user upgrades plan during grace period: new payment attempt immediately
- If payment method is removed during grace period: grace period continues (retry stops)

The CLAUDE.md version tells an agent "don't touch this." The behavior spec tells the agent (and the product owner, and QA, and the next developer) exactly what the product promises — in enough detail to verify whether the promise is still being kept.

The market knows something is missing

This isn't a theoretical gap. The pain is already showing up across the ecosystem:

The vocabulary is fragmenting — guardrails, governance, policies, behavior, intent — but the underlying need is converging: teams need a structured way to specify what agents are and aren't allowed to do, above the instruction file layer.

How to start

You don't need to spec your entire product on day one. Start with the module where an agent mistake would hurt most — usually billing, auth, or entitlements.

  1. Create billing.pbc.md in your repo
  2. Write the 3-5 behaviors that are non-negotiable (grace period, refund window, upgrade logic)
  3. For each behavior: what must happen, what must not happen, edge cases
  4. Point your CLAUDE.md or AGENTS.md at it: "Read *.pbc.md files before modifying any billing, auth, or entitlement logic"

That last step is the bridge — your existing instruction files become the pointer to the behavior spec. They work together, not against each other.

The stack, not the replacement

The right mental model isn't "PBC replaces CLAUDE.md." It's a stack:

Layer 4: Behavior specs (.pbc.md)                  ← product truth — what it promises
Layer 3: Feature specs / PRDs                      ← what we plan to build
Layer 2: Session memory / context                  ← what we're doing now
Layer 1: Instruction files (CLAUDE.md, AGENTS.md)  ← how to work here

Each layer is useful. None replaces the others. Most repos have layers 1-3 covered. Layer 4 is the one that prevents the production incident where an agent does exactly what it was told — and breaks a product promise nobody wrote down.


The PBC spec is open source at github.com/stewie-sh/pbc-spec. You can browse example contracts in the PBC viewer.

If your instruction files are working for code conventions but failing for product decisions — this is the layer that's missing.

Related posts

Stewie reads your codebase and helps you author a living product behavior spec. We're onboarding a small group of product and engineering teams before public launch. Request early access →