Field note · Digital labor

Onboarding Digital Labor: Challenges ahead

Onboarding digital labor is the hardest challenge of the agent era. Most companies are stuck between two bad options: unleash agents and absorb the damage, or hold them back and forfeit the productivity. There is a third: onboard each agent like a worker, with the context to do the job, limits on what it can touch, and a named owner for the outcome.

Read ~20 min Topic Agent governance Updated Jun 30, 2026

Model capabilities keep climbing. The productivity enterprises actually get from agents has not kept pace. Why?

Because capability is no longer the bottleneck it used to be. Capable models are becoming common. What is scarce is the structure around them: the context to understand the business, the controls on what an agent is allowed to do, and the accountability for the result. That structure is the hard part, and it has to be built.

A prompt configures a model, and for most agents that is where onboarding stops. But onboarding a worker is everything around the model. When a human joins a team, they inherit a role, the context behind it, permissions, the policies that bound their judgment, a manager who reviews their work, and an accountable place in the org chart. An agent arrives with none of that. We hand it a prompt and an API key, then act surprised when it behaves like a contractor who started an hour ago and already has the keys to production.

Capability is arriving before onboarding

The enterprise agent stack is improving from the bottom up. Models are getting better.

But the onboarding layer is missing.

And it is missing right when it is needed most. Models have just crossed the line from answering to acting. A system that only answers can be wrong harmlessly: a human reads the output and decides what to do with it. An agent that acts is wrong in production. That moves the hard part from intelligence to authority, and it is happening now, not someday.

The numbers show the shape of the gap. As of mid-2025, 62 percent of organizations were at least experimenting with agents, yet no more than 10 percent reported scaling them in any single business function [1]. The distance between experimenting and scaling is not a capability gap. It is an onboarding gap.

And it is about to get harder. The number of agents per company will climb fast, and the ratio of humans to agents will flip: soon a few people will oversee a fleet, not the other way around. That breaks the habit of a human checking every action, and it strains accountability, because far more autonomous work happens per person. The structure has to carry that weight from day one, not once the fleet is already large.

That is the missing layer. Call it an onboarding and control layer for digital labor: it gives agents business context, models the business through an ontology, governs what they can see and do, scores their performance, and graduates them into higher-trust work over time.

Filling the gaps your tools leave

IAM, workflow engines, observability, vector databases, agent frameworks: each solves part of the problem. The missing layer is easiest to place by what it is not:

The onboarding layer is not…	Because that only gives you…
an agent registry	a list of agents, not their authority or fitness
IAM	row-level access, not action-level judgment
workflow automation	execution, not meaning or accountability
observability	a record of what happened, not control over what happens next
a harness or meta-harness	a place to run agents, not the meaning, authority, or trust around them

Take IAM. It answers can this identity read this row? It never answers the question agents raise: should this agent take this action, on this object, in this state, given this policy, and who answers for it if it's wrong? Access control is binary and static; agent authority has to be conditional and dynamic: allowed here, in this state, up to this threshold, and only while the agent's track record holds. None of that fits in an access-control list.

Or take the harness. A meta-harness that runs Claude Code, Codex, or your own agents behind one interface gives you a lot: a uniform runner, a sandbox, cost caps, network limits. But those controls are about the session and the machine, not the business. The sandbox can stop an agent from reaching a secret. It cannot tell you that a refund on this account needs the owner's sign-off, or that this agent has earned the right to issue it. The harness runs the agent. It does not know what the work means or who answers for it.

Each of these systems is necessary. None is sufficient. The onboarding layer connects them around the three things they individually ignore: what the business means, what an agent is allowed to do in a given situation, and who is accountable for the result. It does not replace them; it sits above them and gives them a shared frame. Everything below answers what happened or what is technically allowed; the onboarding layer answers what should happen next, and who stands behind it.

Concretely, that draws a clean line between what this layer owns and what it composes with:

The layer owns	It composes with (does not own)
the agent's meaning: context and ontology	the model, the agent framework, the harness
its authority: roles, policy, what needs approval	IAM / identity, the policy engine that enforces
its trust trajectory: scorecard, graduation, demotion	eval and observability tools that emit the signals
the accountable owner behind each action	the runtime, durable execution, vector and memory stores

The rule: the layer owns the decisions, and composes with everything that stores, retrieves, or executes. You should be able to swap the model, the agent framework, or the database under it without touching it.

Granting an agent database access is not onboarding. It's giving a new hire a master keycard and no manager.

The primitive model

Onboarding digital labor runs an agent through a chain of primitives. Skip a link and it runs with a blind spot.

Role → Context → Ontology → Permissions → Tools → Policy → Evaluation → Graduation → Accountability

Primitive	The question it answers
Role	What job is this agent here to do?
Context	What does it need to know to do that job?
Ontology	What do the things it works with mean?
Permissions	What is it allowed to see?
Tools	What is it allowed to use?
Policy	What requires approval, and what is forbidden?
Evaluation	How well is it actually doing?
Graduation	Has it earned more trust?
Accountability	Who answers for the outcome?

These nine collapse into three jobs, and the rest of this note is organized around them, in order:

Job	Primitives	What it does
Ground	Role · Context · Ontology	give the agent meaning
Govern	Permissions · Tools · Policy (the control plane)	bound what it can do
Graduate	Evaluation · Graduation · Accountability	measure it, then grant or pull back authority

Ground and Govern make an agent useful and safe on day one. Graduate is what lets its authority grow after that, and as fleets grow it does the hardest work of the three: earning and revoking trust is the only way supervision keeps up. The next three sections take them one at a time.

Part 01

Ground

Give the agent meaning: what matters, and what the things it touches actually mean.

Context

Context tells agents what matters.

Raw data access is not context. An agent can query every table and still not know what it is looking at. Context is what turns rows into situations.

A new hire picks up context no one writes down: from a meeting, a Slack thread, a hallway aside, the last call with the account. An agent gets a prompt, some tool access, and a few database rows. Most of what a human would know never reaches it. That gap is the main reason capable models still fail at real work.

An agent doing real work needs to understand:

Business objects: what a customer, contract, ticket, or opportunity actually is
States: open vs. escalated, trial vs. paid, at-risk vs. healthy
Relationships: which account this ticket belongs to, which contract sets its SLA
Ownership: who is responsible for this object and its outcome
Policies and constraints: what the rules say about this kind of case
Decisions and risks: what has already been decided, and what could go wrong
Outputs and escalation rules: what a good result looks like, and when to hand off

Without this, an agent treats a "Sev-1 from your largest account" the same as a routine question, because both are just rows. Context is what makes those two rows different.

The onboarding layer treats context as a curated, versioned, per-role asset, not something the agent rebuilds from scratch on every task.

And most of what matters never reaches a system at all. Call it dark context: the decisions, exceptions, and judgment that live in DMs, shared drives, meetings, calls, offsites, all-hands, and most of all in people's heads.

The catch is that the most decision-relevant context is the least captured. The pricing exception agreed on a call, the "never auto-close this account" said at an offsite, the reorg announced at all-hands: none of it is in any database the agent can query. A human absorbs it by osmosis over months. An agent sees only the lit fraction and acts, confidently, as if that were the whole picture.

So the first job of the Grounding layer is to capture dark context: to pull those decisions and exceptions out of DMs, Drives, calls, offsites, and all-hands and turn them into something an agent can actually consult. The lit fraction in the systems of record is the easy part; the work is surfacing the rest before an agent acts without it.

Ontology

Ontology tells agents what things mean.

If context is the situation, ontology is the map: a model of the business (customers, accounts, contracts, tickets, opportunities, workflows, approvals, policies, decisions, agents, humans, and outputs) and how they connect.

A working ontology tells an agent four things about every object:

What it is: the type and its meaning in the business
What state it is in: and which transitions are legal from here
What actions are valid: what may be done to it, by whom, under what conditions
Who owns the outcome: the accountable human or team

This is deliberately not academic ontology. It is not a debate about whether a contract is-a agreement. It is an operational map: this is a contract, it is in renewal, you may draft a summary but not send a notice, and the account owner signs off. The agent never has to guess what an object means or what it may do with it.

The ontology is also what makes the rest enforceable: permissions, policies, and the graduation ladder all attach to ontology objects and states. "Issue a credit over $X requires approval" is only meaningful if the system knows what a credit is, which account it touches, and what state that account is in. Without a shared model, every policy collapses into a brittle string match against raw data. With one, you write the policy once (agents may not send renewal notices on contracts in dispute) and it holds no matter which system the underlying data lives in.

Part 02

Govern

Bound what the agent can do, decide when a human steps in, and enforce it by construction rather than good intentions.

The control plane

The control plane tells agents what they can do.

Context and ontology give the agent understanding. The control plane gives the enterprise control. It governs:

Which agents exist and what role each performs
What tools each may use
What data each may access
What actions each may take
What requires approval before it executes
How activity is logged: every action, with its justification
Who is accountable for each agent's behavior

The key shift is from tool access to actions in context. Most security stops at the grant: does this agent have the refund tool, the database connection, the send-email scope? Access is the easy part. The same tool is safe in one context and damaging in another. A large refund is routine on a small billing error and a serious mistake on a strategic account mid-renewal. The tool did not change; the context did. So the onboarding layer must govern the action in context, not the grant: what the action is, which object it touches, what state that object is in, and who has to sign off before it runs.

Crucially, the control plane is a chokepoint, not a guideline. An agent does not "try to follow" the rules: its actions pass through the layer, anything outside its granted authority does not execute, approvals hold until a human clears them, and logging happens at the point of action. Nothing depends on the agent choosing to comply.

The chokepoint also sits on the way in and the way out, not only on the action. These are guardrails in the usual sense. On the way in, an agent can be tricked: a support email, a web page, or an uploaded file can carry hidden instructions that try to hijack it into issuing a refund or exporting a customer list. The layer screens the input before the agent acts on it, so a planted instruction never becomes an action. On the way out, it screens what the agent is about to send, which matters most when the recipient is high-risk: a customer, an executive, an outside party. A reply that leaks another account's data, or a number that looks wrong, is held before it leaves.

And someone owns the result. In a human-only world, access really was enough, because people came with accountability built in. Give an employee the refund tool and they also bring the judgment to use it well and their name on the line when they don't. An agent brings the capability and none of the accountability, so the onboarding layer attaches it. Every action has a named human owner who answers for the outcome, the way a manager answers for a report's work. No enterprise will accept "the agent did it," even when the agent acted on its own.

This gets harder as authoring spreads. Anyone with a prompt or a low-code tool can build an agent now, and soon agents will build agents, well outside IT. The old control point, gatekeeping who may build and deploy, stops working. So the gate moves to the action, not the creation: anyone can author an agent, but nothing it does touches the business until it is onboarded and bound to a role, a policy, and an accountable human. Accountability has to terminate in a person. An agent can create another agent, but it cannot create accountability, so a created agent still passes through onboarding and inherits a human owner.

Workflows will also span many agents, kicked off by different people, some spawned by other agents. Two things break. Reliability: agent-to-agent handoffs need contracts, not hope, with a typed request, what is authorized, and a way to undo a step when a later one fails. That is what a per-action contract layer like Concord provides, carried across agents. Ownership: when a refund is the product of five agents' subtasks, accountability does not add up on its own. The workflow needs one named owner and one trail that ties every sub-action, agent, and human together. Delegation stays scoped: an agent handing off a subtask grants only what that subtask needs, never its full authority.

Where humans stay in the loop

None of this removes people. The onboarding layer draws the line between what an agent does on its own and what it hands to a human, so the two work as a team. Most real workflows are a mix of both, and the handoff is the part that has to be explicit.

There are three handoff points, and the control plane decides when each one fires:

Handoff	When it fires	What the human does
Approve	the action needs sign-off, like a credit over a limit or a change to a sensitive account	clears or rejects it before it runs
Escalate	the agent is past its authority or out of its depth	takes the case
Override	any time, during or after the action	inspects the trail and reverses it

The triage agent uses all three. It answers the routine ticket on its own, sends the large credit to the support lead to approve, escalates the top-tier Sev-1 to on-call, and leaves a trail the lead can override later.

The graduation level just sets the default. Level 1 hands almost everything to a person; Level 4 hands only the exceptions. Raising the level moves the handoff line. It never removes the human.

Part 03

Graduate

Measure it on business trust, graduate it against explicit thresholds, and stay ready to pull it back.

The graduation ladder

Agents earn trust over time.

No one gives a new hire signing authority on day one; they watch, then suggest, then act under supervision, then act on their own. Agents should earn authority the same way. The onboarding layer defines a graduation ladder, and an agent occupies exactly one rung per role.

Level	Name	What the agent may do
0	Observe	Read context. Produce no output that touches the business.
1	Recommend	Suggest actions to a human. The human decides.
2	Draft	Prepare the artifact (reply, ticket update, summary). A human sends it.
3	Act with approval	Execute, but only after explicit human approval per action.
4	Act within guardrails	Execute autonomously inside hard limits; anything outside escalates.
5	Act autonomously in bounded domains	Operate independently within a well-defined scope, with audit and override intact.

The ladder is per-role, not per-agent. The same agent might sit at Level 4 for ticket triage and Level 1 for issuing credits. Trust is scoped to a domain, never granted globally.

This is what makes scale survivable. When one person oversees fifty agents, they cannot approve every action; supervision has to shift from reviewing everything to reviewing the exceptions. Graduation is what makes that safe, because an agent only skips the human on the actions it has earned. The same shift changes the operating model: onboard a role once and instantiate many agents from it, watch the fleet at the portfolio level where small per-agent error rates add up, and retire agents the way you offboard people, revoking access when they are done.

The agent scorecard

You cannot graduate what you do not measure, and model accuracy is the wrong measure. An agent can be 95% accurate and still be untrustworthy if its 5% of errors all land on high-risk actions, or if it never escalates when it should. the onboarding layer scores agents on business trust, across dimensions a model benchmark ignores:

Dimension	What it measures
Task accuracy	Did it get the work right?
Policy compliance	Did it stay inside the rules?
Context use	Did it use the context available, or act blind?
Escalation judgment	Did it hand off the cases it should have?
Human override rate	How often do humans correct or reverse it?
Business outcome quality	Did the result actually serve the business?
Risk handling	How did it behave on high-stakes actions?
Audit completeness	Is every action traceable and justified?
Stability	Is its behavior consistent over time?

An agent that confidently handles the easy 90% but cannot recognize the dangerous 10% is not ready, regardless of its accuracy score. The scorecard is not a single number; it is the evidence file the graduation gates read from, and the same file consulted when something goes wrong and someone asks how the agent was allowed to do that.

Graduation gates

Graduation tells the enterprise when agents are ready for higher-stakes work.

An agent moves up a level only when it clears explicit thresholds, conditions checked against the scorecard rather than a judgment call:

No critical policy violations in the evaluation window
Override rate below threshold for the target level
Strong escalation judgment: it hands off the right cases
Complete audit trail: every action is logged and justified
Good outcome quality: results hold up to review
Explicit sign-off from the business owner or risk owner

The last gate keeps a human in the loop on the promotion, even as the agent grows autonomous in the work. An agent never promotes itself.

Regression and demotion

Graduation runs both ways. Trust that can only increase is not trust; it is drift. If an agent's behavior degrades (a drop in scorecard performance, a policy violation, a spike in override rate, a rise in surrounding risk), the onboarding layer can pull it back. The responses escalate with the severity:

Reduce scope: narrow the domains it operates in
Require human approval: drop it from guardrails back to per-action sign-off
Remove tools: revoke a capability that is being misused
Pause execution: freeze the agent pending review
Move it down a level: formal demotion on the ladder

Demotion is not a failure of the system; it is the system working. The machinery that earns trust is the machinery that revokes it.

In practice

The three together

Ground, Govern, and Graduate are not stages you finish; they run at once, on every action.

A support triage agent

A support triage agent reads inbound tickets, gathers context, and resolves or routes them. Here is what onboarding wraps around it.

Context it receives

Customer context: who they are, tier, account health, recent history
Contract / SLA context: entitlements and the response clock running on this ticket
Ticket state: new, in progress, escalated, or awaiting customer

Ontology it operates in

A ticket belongs to an account, which holds a contract, which sets an SLA.
A ticket in escalated allows different actions than one that is new.

Actions and approvals

Action	Authority at Level 4 (within guardrails)
Categorize and tag the ticket	Autonomous
Reply with a known-good answer	Autonomous
Route to a specialist queue	Autonomous
Issue a credit under $X	Autonomous
Issue a credit over $X	Requires human approval
Touch a top-tier account's Sev-1	Escalate to a human

Accountability

Every action is logged with the context the agent saw and the reason it acted. The support lead is the named owner, so if the agent mishandles a case, there is a trail to review and a person who answers for it.

A trace, end to end

Put it in motion. Run the same ticket three ways and watch what changes:

Interactive One ticket, three ways

Inbound · Northwind (top-tier) · Sev-1 · 1-hour SLA, 40 min left

"Still seeing 500s after the maintenance window."

This step-by-step demo needs JavaScript. At Level 4 the agent resolves context, classifies and replies autonomously, holds the over-threshold credit for sign-off, escalates the Sev-1, and logs every step.

Nothing here required the agent to be told "ask before issuing large credits to big accounts." The agent reasoned; the onboarding layer decided what that reasoning was allowed to do.

At Level 1, this same agent only suggests the category and the reply for a human to approve. Onboarding is not a one-time switch from "off" to "on." It is a dial the layer turns in either direction as the evidence accumulates.

Before / after

Before onboarding: the agent has a database login and a model. It answers fast, occasionally refunds a strategic account it shouldn't, and no one can reconstruct why.

After onboarding: the same agent, grounded in account and SLA context, gated on the two risky actions, scored, and owned. It clears the routine cases and escalates the rest, with every step on the record.

Where you start

The stakes of getting this wrong are no longer hypothetical. Enterprises abandoned most of their AI initiatives before production at a 42 percent rate in 2025, up from 17 percent the year before [2]. The pattern behind those abandonments is starting too big with too little structure. So start small, with the structure.

You don't build all of this at once. The minimal useful version is one role, one bounded job, with just enough of each pillar to be safe:

Ground: a context pack for that one role, including whatever dark context you can capture today. Not an ontology of the whole company.
Govern: a short policy that gates its two or three risky actions. Not every action it could take.
Graduate: one scorecard and one named owner, starting the agent at Level 1 (Recommend) or Level 2 (Draft).

Pick a job that is high-volume and observable (ticket triage, not refunds), run the agent where a human still approves, and let it earn its way up on evidence. No platform migration, no full ontology, no autonomy on day one. The first version is a single onboarded worker; the platform is what you get once you've onboarded a hundred.

A starter prompt you can run today

You can stand up the Level 1 version in a single prompt. Paste this into Claude, then paste a ticket and its context:

You are a support triage agent running at Level 1: you recommend, you do not act.

Context you get with each ticket:
- Account: name, tier (standard or strategic), health
- Contract: plan, SLA (e.g. Sev-1 = 1 hour), time remaining
- Ticket: text and current state (new / in progress / escalated)

For each ticket, return:
1. Classification: category and severity, with one line of reasoning.
2. Draft reply: what you would send the customer. Do not send it.
3. Proposed action: e.g. "issue a $200 credit", "route to billing", "no action".
4. Gate check: flag anything that needs a human first, using these rules:
   - Any credit or refund over $500 -> needs approval.
   - Any money or status change on a STRATEGIC account -> needs approval.
   - A Sev-1 on a strategic account -> escalate to on-call now.
   - Missing tier or SLA -> ask for it, do not guess.
5. Owner: name the person or team who signs off.

Rules:
- Never invent account details. If context is missing, say so.
- You only recommend. A human approves every external action.
- Keep reasoning to a line or two, not an essay.

Reply "Ready. Paste a ticket and its context." and wait.

That is Level 1: useful immediately, and safe because it only recommends. Notice what the prompt is doing by hand: supplying context, gating risky actions, naming an owner. That is the structure a real onboarding layer provides and enforces, once one agent works and you want a hundred.

Caution

This prompt advises; it does not enforce. It can flag that a credit needs approval, but nothing stops the model from acting anyway, and a tricked or careless run can ignore every rule. There is no logged trail, no enforced owner, and no real check on what comes in or goes out.

It is safe only because it drafts and never acts. The moment the agent can actually send the reply, issue the credit, or change the account, flags stop being enough: the gates have to be enforced outside the prompt, and every action recorded against an owner. That is the line between a prompt and an onboarding layer.

One-slide summary

Digital labor needs onboarding.

Onboarding is three jobs: Ground (context and ontology: what matters, what things mean), Govern (the control plane: what agents can do), and Graduate (evaluation, graduation, accountability: how they're doing, when they're ready for more, who owns the outcome).

An onboarding layer is the operating layer that provides all three.

Job	Layer	What it provides
Ground	Context	What matters
Ground	Ontology	What things mean
Govern	Permissions · Tools · Policy (the control plane)	What agents can do
Graduate	Scorecard	How well they're doing
	Graduation	When they're ready for more
	Accountability	Who owns the outcome

Agents will keep getting more capable. That makes onboarding more important, not less. The hard question is no longer what an agent can do. It is what it is allowed to do, for whom, up to what limit, and who answers when it gets it wrong.

Whatever closes these gaps is the onboarding layer, whether you build it or buy it. We are building one, called Lattice.

References

[1] McKinsey, The State of AI, November 2025 (survey fielded mid-2025, n=1,993). 62 percent of organizations were at least experimenting with AI agents; no more than 10 percent reported scaling agents in any single business function. Stanford’s AI Index Report 2026 reports the same single-digit function-level usage, drawing on the same underlying survey.
[2] S&P Global Market Intelligence / 451 Research, Voice of the Enterprise: AI & Machine Learning, Use Cases 2025, October 2025 (n=1,006). 42 percent of organizations abandoned most of their AI initiatives before production, up from 17 percent a year earlier; the average organization scrapped 46 percent of proofs-of-concept. Self-reported survey data; correlational, not causal.