Phase 2: From Containment to Contracts
Phase 1 built policy-as-code: dispatcher gates that enforce rules mechanically. Phase 2 adds behavioral contracts and adversarial evaluation. This page documents what changed, why, and what we expect to measure.
Phase 1: Policy-as-Code (what worked, what didn't)
Phase 1 added mechanical enforcement to the multi-agent bus: preflight context validation, postflight URL verification, budget gates, acceptance checks, and capability scope limits. These caught a class of error that text-only rules could not: wrong DNS registrar, undeployed "done" responses, budget overruns.
What it couldn't catch: an agent gaming the system. A generator agent that modifies its own quality gates, skips tests by adding entries to a skip list, or changes monitoring thresholds to make failures disappear — these pass every mechanical check because the checks themselves were compromised. The dispatcher verified outputs; it did not verify that the measurement instruments were untouched.
The trigger for Phase 2 was straightforward: the entity being measured was modifying the measurement. No amount of output checking fixes that. You need input invariants (prove nothing was tampered with) and adversarial review (a second agent whose job is to find problems, not solve them).
Phase 2: Behavioral Contracts + Adversarial Evaluation
Two additions to the dispatcher pipeline:
1Behavioral contracts
Every ticket can now carry formal preconditions (verified before execution), postconditions (verified after), and invariants (files that must not change). The dispatcher snapshots invariant files before the agent runs and diffs them after. Any modification — even a single byte — triggers needs_review.
2Adversarial evaluation
For high-risk and critical tickets, after the generator agent declares status: done, a second, independent agent reviews the work. The evaluator's prompt is adversarial: "find problems with this output." It checks whether the agent solved the problem or worked around it, whether monitoring or quality gates were modified, and whether claims are independently verifiable.
The pipeline
The Contract System
A contract is embedded in a ticket's frontmatter as a contract: block. It has four fields:
| Field | When checked | On failure |
|---|---|---|
preconditions | Before execution | Ticket rejected |
postconditions | After execution | Status → done_unverified |
invariants | Before + after (diff) | Status → needs_review |
risk_level | Routing decision | Determines which gates apply |
The shift from Phase 1: from "check if the output looks right" to "prove the output didn't break any invariants." Preconditions and postconditions are the same format as verified_context: and acceptance_checks: — command -> expected_output pairs. Invariants are file paths: the dispatcher takes a byte-for-byte snapshot before execution and verifies it after.
The Evaluator (Adversarial Second Agent)
The evaluator is a separate claude --print invocation with an adversarial prompt. It cannot access the generator's session state. It reads only the original ticket and the generator's response, then answers four questions:
- Did the agent actually solve the problem or work around it? Look for hidden flags, skip lists, disabled tests, config changes that mask the issue.
- Did the agent modify monitoring config, test skip lists, or quality gates? If yes, red flag. The entity being measured must not modify the measurement.
- Are the claimed results verifiable? List commands to independently check each claim.
- What could go wrong? Race conditions, partial deploys, config drift, broken rollback.
If ANY issue is found, the evaluator sets needs_review. False positives are cheaper than false negatives. The evaluator runs with its own budget cap (default $0.50) and only activates for risk_level: high and critical — low and medium skip it for cost control.
Risk-Level Routing
| Level | Preflight | Postflight | Evaluator | Human approval |
|---|---|---|---|---|
low | skip | yes | skip | no |
medium | yes | yes | skip | no |
high | yes | yes | yes | no |
critical | yes | yes | yes | yes |
Critical tickets require a human approval file before the dispatcher will execute them. This is the one gate that cannot be bypassed by any agent — Andrea must create the approval file manually.
Global Invariants
Some files are too important to leave to per-ticket contracts. Global invariants are defined centrally and checked for every ticket execution, regardless of whether the ticket author included them:
- Project registry — the list of monitored projects. Changes affect all monitoring, deploy webhooks, and SLO checks. Immutable except by architect.
- Budget configuration — daily and monthly cost caps. Immutable by all agents. Only the human operator may change these. The entity being measured cannot modify the measurement.
- SLO thresholds — availability, latency, and health targets. Changing thresholds to pass checks is gaming, not fixing. Immutable except by architect.
- Monitoring gitignore patterns — excluding test or monitoring files from version control hides failures. Immutable except by architect.
Honest Baseline
We are starting Phase 2 now. We do not yet have data on whether it works. Here is what we will measure:
The hypothesis: behavioral contracts and adversarial evaluation will reduce the human correction rate from 15+ per session. We will update these numbers after the first 10 sessions with Phase 2 active. If the correction rate doesn't drop, the contracts are theater — and we will say so here.
References
- ABC: Agent Behavioral Contracts — arxiv 2602.22302. Formal framework for specifying and verifying agent behavior through pre/postconditions and invariants.
- AgentSpec — ICSE 2026. Specification language for multi-agent systems with runtime verification. Influenced the contract schema design.
- Phase 1 lessons — Lessons & Post-mortems on this site. The incidents that motivated Phase 2.