Concord blog · Part 1 of 3

Why agents need a contract

Durable execution tells you what ran. It doesn't tell you what it meant. That gap is the problem Concord exists to close — and it's getting wider as agents start to initiate real work.

Read ~7 min Series 1 / 3 Next Contract, not runtime →

If you're already convinced

Skim ahead. Part 2 draws the layer boundaries (Concord vs DBOS vs LangGraph vs OPA). Part 3 walks a hotel booking through the contract end-to-end. Otherwise, start here.

A 3 a.m. story

It's 3:14 a.m. The on-call engineer is paged. A customer says they were charged twice for the same hotel reservation, and they can only see one of the two charges in their dashboard. The engineer opens their laptop.

They have a lot of evidence. Slack has the agent's reasoning trace. Postgres has two reservation rows. Stripe has two payment intents. The agent's memory store has a preferred_hotel write that doesn't seem related but might be. The OTel dashboard says the workflow completed successfully.

What they don't have is the answer to a basic question: which of those two reservations was authorized?

Each piece of evidence has its own answer source. Slack tells you what the agent thought it was doing. Postgres tells you what landed in the database. Stripe tells you what cleared the processor. None of them tell you all of:

Did the user approve both bookings, or just one?
Did the first attempt fail silently and a retry create the second?
Did a duplicate webhook fire from the hotel API?
Is the second charge idempotent with the first, or is it a real duplicate?
Can the second one be compensated without undoing the first?

The data formats don't agree — different IDs, different timestamps, different shapes — so the engineer has to reconstruct the cross-cutting story by hand. By 4 a.m. they have a hypothesis. By 5 a.m. they've issued a refund out of caution. The actual root cause comes out three days later in a postmortem.

This isn't an unusual situation. It's what happens by default. Modern systems make execution reliable but leave the meaning of that execution to be reconstructed from logs.

Scattered evidence — see for yourself

The simulation below has an agent that fires four side effects: a Slack message, a Postgres write, a Stripe charge, and a memory write. Run it, investigate the failure, and toggle Concord on. What changes isn't what happened — it's whether the meaning survived.

Interactive Scattered evidence vs unified contract

Agent

hotel_bot

External

Slack

—

Database

Postgres

—

Payments

Stripe

—

Memory

vector_store

—

Concord off

Questions at 3 a.m.

Who authorized this booking?—
Which policy allowed it?—
Is the retry idempotent with the original?—
Can the duplicate be compensated?—
Which artifact maps to which charge?—

Click Propose hotel booking to start.

Switch Concord on and the four side effects still happen. The agent still sends Slack, Postgres still writes, Stripe still charges. The difference is that each one is also a row in domain_events, with a command_id tying them together and a purpose column saying what the row is for. The duplicate booking that caused the page never happened — policy caught it before the second effect fired.

Why now

This problem has been around as long as distributed systems. So why am I writing about it in 2026?

Because the trigger condition used to be a human. A human clicked a button. The system executed. The human's intent was implicit context — easy to recover by asking them. The system didn't have to record intent explicitly because of course Alex booked the hotel — Alex pressed the book button.

The system has to record intent explicitly now because the intent didn't come from a person. It came from an agent.

And the agent's reasoning is a transient prompt-and-completion exchange. There is no log to point to that says "this is what the agent intended." If you don't capture it at the moment it's proposed, it's gone. The model that produced the proposal might not even exist next quarter.

The mismatch is structural. Old runtimes assumed human ingress. New runtimes have to assume agent ingress. The contract that used to live in the human's head now has to live in a table.

And this is not a solved problem someone else is quietly shipping. As of late 2025 the published verdict, including from research groups inside the companies selling agents, is that meaningful oversight of agent actions, real-time or after the fact, remains open [1]. Recent academic work agrees: today's systems lack shared accountability [2]. The contract layer is the missing piece both admissions point at.

The five questions

The mental model I keep coming back to is five questions. Every consequential action has to answer all five:

1. What is this action?
2. Who or what is allowed to do it?
3. What does it affect?
4. How should it execute?
5. What must be recorded?

Each question has a home in a mature stack:

What is this action — Concord. The command type, payload schema, declared effects.
Who is allowed — a policy engine (OPA, Cedar, or Concord's own policy bundles).
What it affects — Concord's effect, artifact, and memory primitives.
How it executes — a durable runtime. DBOS today; Temporal as an alternate adapter.
What gets recorded — Concord's domain_events (with OTel mirroring traces).

The novelty isn't any single answer. Workflows exist. Policies exist. Audit logs exist. The novelty is making all five share the same action contract, so that a deterministic vendor sync and an agentic hotel booking show up with the same shape — and a 3 a.m. investigator can answer all five from one place.

Three counterarguments I take seriously

I've made this case enough times to know which counterarguments come up. Here are the three I think most about.

"Can't OTel do this?"

OTel records what executed. It does not record authority or consequence. A trace says "this span ran for 312 ms and produced this output." It does not say "this action was authorized by Alex under policy bundle hotel_booking_v3 with an approval recorded at 03:14:03." Traces are a strict subset of audit, and the subset they cover is the part that was already easy.

This isn't an argument against OTel — I want Concord to emit OTel traces, and the two are complementary. It's an argument against using OTel as the system of record for governance decisions. Spans are forgetful by design.

"Can't the agent framework do this?"

Agent frameworks govern the agent's reasoning loop. They know which tool the agent chose and what arguments it produced. They generally don't know what happened after the tool call reached the connector — and they don't unify deterministic code paths and agentic code paths under one contract.

Concord is willing to govern a non-agentic command (a scheduled vendor sync) and an agentic one (a hotel booking) with the same primitive. Same status machine, same audit table, same compensation graph. That's the thing the agent framework was not designed to do.

"Isn't this CQRS / event sourcing?"

Close. I borrow from that lineage. Two differences worth naming:

The agent-proposal mode. CQRS commands typically come from human ingress and represent immediate intent. Concord commands often originate from an agent's proposal and pass through policy and approval before they become real. The "command" in CQRS is closer to Concord's command + approved + planned state, all collapsed into one moment. Concord pulls those moments apart.
The cross-cutting nature. CQRS is typically per-aggregate. Concord's command shape applies across deterministic functions, agentic loops, connector calls, and approvals. The shape is the contract.

If you're already running event sourcing and your aggregates already authorize their own commands, you're more than halfway to Concord. The extra mile is the cross-cutting registry and the agent gateway.

Where I might be wrong

I'll close with three places I'm aware Concord could be the wrong choice, and one place I haven't decided.

This is overkill for many apps

True. If you don't have agents and you don't have multi-system side effects, you don't need this. Concord's break-even is around the point where you have a connector call, an approval, and an agent-or-scheduled trigger in the same workflow. Below that line, vanilla DBOS or a simple Postgres + cron is plenty.

The contract might calcify

Schemas drift. Tool catalogs change. Policies get rewritten. Concord versions all of these in principle, but I haven't shipped a migration story I'm proud of. If I get this wrong, the contract becomes the thing teams route around instead of through — and the whole point falls apart.

The ceremony tax

Audit tiers (Part 2 covers them) let trivial actions stay trivial. But the impulse to mark everything Tier 3 — "just to be safe" — is real and will show up in code review. Concord doesn't enforce restraint. The team has to.

What I haven't decided

Whether Concord should ship with an opinionated policy DSL or stay strictly bring-your-own-policy. There's a strong case for both — opinionated keeps small teams unstuck, BYO keeps regulated industries adoptable. I lean BYO today, but I expect to be wrong about this at least once.

References

Microsoft Research, New Future of Work Report 2025 (MSR-TR-2025-58), December 2025. Names meaningful human oversight of agents, real-time or post-hoc, as an open challenge, and the "goal-plan-execution gap" as a core sociotechnical problem.
Workshop on Human-Agent Collaboration, CHI 2026, April 2026. Finds current systems "lack critical characteristics of effective collaboration, such as mutual awareness, mixed-initiative interactions, and shared accountability."

Coming up in Part 2

Contract, not runtime. The boundary between Concord, DBOS, LangGraph, OPA, OTel, and Postgres. Three small interactive demos: a layer stack, an audit-tier picker, and a clickable capability graph.