Onboarding Digital Labor: Challenges ahead
Onboarding digital labor is the hardest challenge the enterprise faces in the agent era. It leaves every company with three options. Move fast and break things: unleash agents and absorb the damage. Brand harm, AI slop, work no one can trust. Or move slow but safe: hold them back and forfeit the productivity. Most companies are stuck on these two. The fix is the third: onboard each agent like a worker, with the context to do the job, limits on what it can touch, and a named owner for the outcome.
Model capabilities keep climbing. The productivity enterprises actually get from agents has not kept pace. Why?
Because capability is no longer the bottleneck it used to be. Capable models are becoming common. What is scarce is the structure around them: the context to understand the business, the controls on what an agent is allowed to do, and the accountability for the result. That structure is the hard part, and it has to be built.
A prompt configures a model, and for most agents that is where onboarding stops. But onboarding a worker is everything around the model. When a human joins a team, they inherit a role, the context behind it, permissions, the policies that bound their judgment, a manager who reviews their work, and an accountable place in the org chart. An agent arrives with none of that. We hand it a prompt and an API key, then act surprised when it behaves like a contractor who started an hour ago and already has the keys to production.
Capability is arriving before onboarding
The enterprise agent stack is improving from the bottom up. Models are getting better.
But the onboarding layer is missing.
And it is missing right when it is needed most. Models have just crossed the line from answering to acting. A system that only answers can be wrong harmlessly: a human reads the output and decides what to do with it. An agent that acts is wrong in production. That moves the hard part from intelligence to authority, and it is happening now, not someday.
And it is about to get harder. The number of agents per company will climb fast, and the ratio of humans to agents will flip: soon a few people will oversee a fleet, not the other way around. That breaks the habit of a human checking every action, and it strains accountability, because far more autonomous work happens per person. The structure has to carry that weight from day one, not once the fleet is already large.
That is the missing layer. Call it an onboarding and control layer for digital labor: it gives agents business context, models the business through an ontology, governs what they can see and do, scores their performance, and graduates them into higher-trust work over time.
Filling the gaps your tools leave
IAM, workflow engines, observability, vector databases, agent frameworks: each solves part of the problem. The missing layer is easiest to place by what it is not:
| The onboarding layer is not… | Because that only gives you… |
|---|---|
| an agent registry | a list of agents, not their authority or fitness |
| IAM | row-level access, not action-level judgment |
| workflow automation | execution, not meaning or accountability |
| observability | a record of what happened, not control over what happens next |
| a harness or meta-harness | a place to run agents, not the meaning, authority, or trust around them |
Take IAM. It answers can this identity read this row? It never answers the question agents raise: should this agent take this action, on this object, in this state, given this policy, and who answers for it if it's wrong? Access control is binary and static; agent authority has to be conditional and dynamic: allowed here, in this state, up to this threshold, and only while the agent's track record holds. None of that fits in an access-control list.
Or take the harness. A meta-harness that runs Claude Code, Codex, or your own agents behind one interface gives you a lot: a uniform runner, a sandbox, cost caps, network limits. But those controls are about the session and the machine, not the business. The sandbox can stop an agent from reaching a secret. It cannot tell you that a refund on this account needs the owner's sign-off, or that this agent has earned the right to issue it. The harness runs the agent. It does not know what the work means or who answers for it.
Each of these systems is necessary. None is sufficient. The onboarding layer connects them around the three things they individually ignore: what the business means, what an agent is allowed to do in a given situation, and who is accountable for the result. It does not replace them; it sits above them and gives them a shared frame. Everything below answers what happened or what is technically allowed; the onboarding layer answers what should happen next, and who stands behind it.
Concretely, that draws a clean line between what this layer owns and what it composes with:
| The layer owns | It composes with (does not own) |
|---|---|
| the agent's meaning: context and ontology | the model, the agent framework, the harness |
| its authority: roles, policy, what needs approval | IAM / identity, the policy engine that enforces |
| its trust trajectory: scorecard, graduation, demotion | eval and observability tools that emit the signals |
| the accountable owner behind each action | the runtime, durable execution, vector and memory stores |
The rule: the layer owns the decisions, and composes with everything that stores, retrieves, or executes. You should be able to swap the model, the agent framework, or the database under it without touching it.
Granting an agent database access is not onboarding. It's giving a new hire a master keycard and no manager.
The primitive model
Onboarding digital labor runs an agent through a chain of primitives. Skip a link and it runs with a blind spot.
| Primitive | The question it answers |
|---|---|
| Role | What job is this agent here to do? |
| Context | What does it need to know to do that job? |
| Ontology | What do the things it works with mean? |
| Permissions | What is it allowed to see? |
| Tools | What is it allowed to use? |
| Policy | What requires approval, and what is forbidden? |
| Evaluation | How well is it actually doing? |
| Graduation | Has it earned more trust? |
| Accountability | Who answers for the outcome? |
These nine collapse into three jobs, and the rest of this note is organized around them, in order:
| Job | Primitives | What it does |
|---|---|---|
| Ground | Role · Context · Ontology | give the agent meaning |
| Govern | Permissions · Tools · Policy (the control plane) | bound what it can do |
| Graduate | Evaluation · Graduation · Accountability | measure it, then grant or pull back authority |
Ground and Govern make an agent useful and safe on day one. Graduate is what lets its authority grow after that, and as fleets grow it does the hardest work of the three: earning and revoking trust is the only way supervision keeps up. The next three sections take them one at a time.
Give the agent meaning: what matters, and what the things it touches actually mean.
Context
Context tells agents what matters.
Raw data access is not context. An agent can query every table and still not know what it is looking at. Context is what turns rows into situations.
A new hire picks up context no one writes down: from a meeting, a Slack thread, a hallway aside, the last call with the account. An agent gets a prompt, some tool access, and a few database rows. Most of what a human would know never reaches it. That gap is the main reason capable models still fail at real work.
An agent doing real work needs to understand:
- Business objects: what a customer, contract, ticket, or opportunity actually is
- States: open vs. escalated, trial vs. paid, at-risk vs. healthy
- Relationships: which account this ticket belongs to, which contract sets its SLA
- Ownership: who is responsible for this object and its outcome
- Policies and constraints: what the rules say about this kind of case
- Decisions and risks: what has already been decided, and what could go wrong
- Outputs and escalation rules: what a good result looks like, and when to hand off
Without this, an agent treats a "Sev-1 from your largest account" the same as a routine question, because both are just rows. Context is what makes those two rows different.
The onboarding layer treats context as a curated, versioned, per-role asset, not something the agent rebuilds from scratch on every task.
And most of what matters never reaches a system at all. Call it dark context: the decisions, exceptions, and judgment that live in DMs, shared drives, meetings, calls, offsites, all-hands, and most of all in people's heads.
The catch is that the most decision-relevant context is the least captured. The pricing exception agreed on a call, the "never auto-close this account" said at an offsite, the reorg announced at all-hands: none of it is in any database the agent can query. A human absorbs it by osmosis over months. An agent sees only the lit fraction and acts, confidently, as if that were the whole picture.
So the first job of the Grounding layer is to capture dark context: to pull those decisions and exceptions out of DMs, Drives, calls, offsites, and all-hands and turn them into something an agent can actually consult. The lit fraction in the systems of record is the easy part; the work is surfacing the rest before an agent acts without it.
Ontology
Ontology tells agents what things mean.
If context is the situation, ontology is the map: a model of the business (customers, accounts, contracts, tickets, opportunities, workflows, approvals, policies, decisions, agents, humans, and outputs) and how they connect.
A working ontology tells an agent four things about every object:
- What it is: the type and its meaning in the business
- What state it is in: and which transitions are legal from here
- What actions are valid: what may be done to it, by whom, under what conditions
- Who owns the outcome: the accountable human or team
This is deliberately not academic ontology. It is not a debate about whether a contract is-a agreement. It is an operational map: this is a contract, it is in renewal, you may draft a summary but not send a notice, and the account owner signs off. The agent never has to guess what an object means or what it may do with it.
The ontology is also what makes the rest enforceable: permissions, policies, and the graduation ladder all attach to ontology objects and states. "Issue a credit over $X requires approval" is only meaningful if the system knows what a credit is, which account it touches, and what state that account is in. Without a shared model, every policy collapses into a brittle string match against raw data. With one, you write the policy once (agents may not send renewal notices on contracts in dispute) and it holds no matter which system the underlying data lives in.
Bound what the agent can do, decide when a human steps in, and enforce it by construction rather than good intentions.
The control plane
The control plane tells agents what they can do.
Context and ontology give the agent understanding. The control plane gives the enterprise control. It governs:
- Which agents exist and what role each performs
- What tools each may use
- What data each may access
- What actions each may take
- What requires approval before it executes
- How activity is logged: every action, with its justification
- Who is accountable for each agent's behavior
The key shift is from tool access to actions in context. Most security stops at the grant: does this agent have the refund tool, the database connection, the send-email scope? Access is the easy part. The same tool is safe in one context and damaging in another. A large refund is routine on a small billing error and a serious mistake on a strategic account mid-renewal. The tool did not change; the context did. So the onboarding layer must govern the action in context, not the grant: what the action is, which object it touches, what state that object is in, and who has to sign off before it runs.
Crucially, the control plane is a chokepoint, not a guideline. An agent does not "try to follow" the rules: its actions pass through the layer, anything outside its granted authority does not execute, approvals hold until a human clears them, and logging happens at the point of action. Nothing depends on the agent choosing to comply.
The chokepoint also sits on the way in and the way out, not only on the action. These are guardrails in the usual sense. On the way in, an agent can be tricked: a support email, a web page, or an uploaded file can carry hidden instructions that try to hijack it into issuing a refund or exporting a customer list. The layer screens the input before the agent acts on it, so a planted instruction never becomes an action. On the way out, it screens what the agent is about to send, which matters most when the recipient is high-risk: a customer, an executive, an outside party. A reply that leaks another account's data, or a number that looks wrong, is held before it leaves.
And someone owns the result. In a human-only world, access really was enough, because people came with accountability built in. Give an employee the refund tool and they also bring the judgment to use it well and their name on the line when they don't. An agent brings the capability and none of the accountability, so the onboarding layer attaches it. Every action has a named human owner who answers for the outcome, the way a manager answers for a report's work. No enterprise will accept "the agent did it," even when the agent acted on its own.
This gets harder as authoring spreads. Anyone with a prompt or a low-code tool can build an agent now, and soon agents will build agents, well outside IT. The old control point, gatekeeping who may build and deploy, stops working. So the gate moves to the action, not the creation: anyone can author an agent, but nothing it does touches the business until it is onboarded and bound to a role, a policy, and an accountable human. Accountability has to terminate in a person. An agent can create another agent, but it cannot create accountability, so a created agent still passes through onboarding and inherits a human owner.
Workflows will also span many agents, kicked off by different people, some spawned by other agents. Two things break. Reliability: agent-to-agent handoffs need contracts, not hope, with a typed request, what is authorized, and a way to undo a step when a later one fails. That is what a per-action contract layer like Concord provides, carried across agents. Ownership: when a refund is the product of five agents' subtasks, accountability does not add up on its own. The workflow needs one named owner and one trail that ties every sub-action, agent, and human together. Delegation stays scoped: an agent handing off a subtask grants only what that subtask needs, never its full authority.
Where humans stay in the loop
None of this removes people. The onboarding layer draws the line between what an agent does on its own and what it hands to a human, so the two work as a team. Most real workflows are a mix of both, and the handoff is the part that has to be explicit.
There are three handoff points, and the control plane decides when each one fires:
| Handoff | When it fires | What the human does |
|---|---|---|
| Approve | the action needs sign-off, like a credit over a limit or a change to a sensitive account | clears or rejects it before it runs |
| Escalate | the agent is past its authority or out of its depth | takes the case |
| Override | any time, during or after the action | inspects the trail and reverses it |
The triage agent uses all three. It answers the routine ticket on its own, sends the large credit to the support lead to approve, escalates the top-tier Sev-1 to on-call, and leaves a trail the lead can override later.
The graduation level just sets the default. Level 1 hands almost everything to a person; Level 4 hands only the exceptions. Raising the level moves the handoff line. It never removes the human.
Measure it on business trust, graduate it against explicit thresholds, and stay ready to pull it back.
The graduation ladder
Agents earn trust over time.
No one gives a new hire signing authority on day one; they watch, then suggest, then act under supervision, then act on their own. Agents should earn authority the same way. The onboarding layer defines a graduation ladder, and an agent occupies exactly one rung per role.
| Level | Name | What the agent may do |
|---|---|---|
| 0 | Observe | Read context. Produce no output that touches the business. |
| 1 | Recommend | Suggest actions to a human. The human decides. |
| 2 | Draft | Prepare the artifact (reply, ticket update, summary). A human sends it. |
| 3 | Act with approval | Execute, but only after explicit human approval per action. |
| 4 | Act within guardrails | Execute autonomously inside hard limits; anything outside escalates. |
| 5 | Act autonomously in bounded domains | Operate independently within a well-defined scope, with audit and override intact. |
The ladder is per-role, not per-agent. The same agent might sit at Level 4 for ticket triage and Level 1 for issuing credits. Trust is scoped to a domain, never granted globally.
This is what makes scale survivable. When one person oversees fifty agents, they cannot approve every action; supervision has to shift from reviewing everything to reviewing the exceptions. Graduation is what makes that safe, because an agent only skips the human on the actions it has earned. The same shift changes the operating model: onboard a role once and instantiate many agents from it, watch the fleet at the portfolio level where small per-agent error rates add up, and retire agents the way you offboard people, revoking access when they are done.
The agent scorecard
You cannot graduate what you do not measure, and model accuracy is the wrong measure. An agent can be 95% accurate and still be untrustworthy if its 5% of errors all land on high-risk actions, or if it never escalates when it should. the onboarding layer scores agents on business trust, across dimensions a model benchmark ignores:
| Dimension | What it measures |
|---|---|
| Task accuracy | Did it get the work right? |
| Policy compliance | Did it stay inside the rules? |
| Context use | Did it use the context available, or act blind? |
| Escalation judgment | Did it hand off the cases it should have? |
| Human override rate | How often do humans correct or reverse it? |
| Business outcome quality | Did the result actually serve the business? |
| Risk handling | How did it behave on high-stakes actions? |
| Audit completeness | Is every action traceable and justified? |
| Stability | Is its behavior consistent over time? |
An agent that confidently handles the easy 90% but cannot recognize the dangerous 10% is not ready, regardless of its accuracy score. The scorecard is not a single number; it is the evidence file the graduation gates read from, and the same file consulted when something goes wrong and someone asks how the agent was allowed to do that.
Graduation gates
Graduation tells the enterprise when agents are ready for higher-stakes work.
An agent moves up a level only when it clears explicit thresholds, conditions checked against the scorecard rather than a judgment call:
- No critical policy violations in the evaluation window
- Override rate below threshold for the target level
- Strong escalation judgment: it hands off the right cases
- Complete audit trail: every action is logged and justified
- Good outcome quality: results hold up to review
- Explicit sign-off from the business owner or risk owner
The last gate keeps a human in the loop on the promotion, even as the agent grows autonomous in the work. An agent never promotes itself.
Regression and demotion
Graduation runs both ways. Trust that can only increase is not trust; it is drift. If an agent's behavior degrades (a drop in scorecard performance, a policy violation, a spike in override rate, a rise in surrounding risk), the onboarding layer can pull it back. The responses escalate with the severity:
- Reduce scope: narrow the domains it operates in
- Require human approval: drop it from guardrails back to per-action sign-off
- Remove tools: revoke a capability that is being misused
- Pause execution: freeze the agent pending review
- Move it down a level: formal demotion on the ladder
Demotion is not a failure of the system; it is the system working. The machinery that earns trust is the machinery that revokes it.
Ground, Govern, and Graduate are not stages you finish; they run at once, on every action.
A support triage agent
A support triage agent reads inbound tickets, gathers context, and resolves or routes them. Here is what onboarding wraps around it.
Context it receives
- Customer context: who they are, tier, account health, recent history
- Contract / SLA context: entitlements and the response clock running on this ticket
- Ticket state: new, in progress, escalated, or awaiting customer
Ontology it operates in
- A ticket belongs to an account, which holds a contract, which sets an SLA.
- A ticket in
escalatedallows different actions than one that isnew.
Actions and approvals
| Action | Authority at Level 4 (within guardrails) |
|---|---|
| Categorize and tag the ticket | Autonomous |
| Reply with a known-good answer | Autonomous |
| Route to a specialist queue | Autonomous |
| Issue a credit under $X | Autonomous |
| Issue a credit over $X | Requires human approval |
| Touch a top-tier account's Sev-1 | Escalate to a human |
Accountability
Every action is logged with the context the agent saw and the reason it acted. The support lead is the named owner, so if the agent mishandles a case, there is a trail to review and a person who answers for it.
A trace, end to end
Put it in motion. Run the same ticket three ways and watch what changes:
Nothing here required the agent to be told "ask before issuing large credits to big accounts." The agent reasoned; the onboarding layer decided what that reasoning was allowed to do.
At Level 1, this same agent only suggests the category and the reply for a human to approve. Onboarding is not a one-time switch from "off" to "on." It is a dial the layer turns in either direction as the evidence accumulates.
Before onboarding: the agent has a database login and a model. It answers fast, occasionally refunds a strategic account it shouldn't, and no one can reconstruct why.
After onboarding: the same agent, grounded in account and SLA context, gated on the two risky actions, scored, and owned. It clears the routine cases and escalates the rest, with every step on the record.
Where you start
You don't build all of this at once. The minimal useful version is one role, one bounded job, with just enough of each pillar to be safe:
- Ground: a context pack for that one role, including whatever dark context you can capture today. Not an ontology of the whole company.
- Govern: a short policy that gates its two or three risky actions. Not every action it could take.
- Graduate: one scorecard and one named owner, starting the agent at Level 1 (Recommend) or Level 2 (Draft).
Pick a job that is high-volume and observable (ticket triage, not refunds), run the agent where a human still approves, and let it earn its way up on evidence. No platform migration, no full ontology, no autonomy on day one. The first version is a single onboarded worker; the platform is what you get once you've onboarded a hundred.
A starter prompt you can run today
You can stand up the Level 1 version in a single prompt. Paste this into Claude, then paste a ticket and its context:
You are a support triage agent running at Level 1: you recommend, you do not act.
Context you get with each ticket:
- Account: name, tier (standard or strategic), health
- Contract: plan, SLA (e.g. Sev-1 = 1 hour), time remaining
- Ticket: text and current state (new / in progress / escalated)
For each ticket, return:
1. Classification: category and severity, with one line of reasoning.
2. Draft reply: what you would send the customer. Do not send it.
3. Proposed action: e.g. "issue a $200 credit", "route to billing", "no action".
4. Gate check: flag anything that needs a human first, using these rules:
- Any credit or refund over $500 -> needs approval.
- Any money or status change on a STRATEGIC account -> needs approval.
- A Sev-1 on a strategic account -> escalate to on-call now.
- Missing tier or SLA -> ask for it, do not guess.
5. Owner: name the person or team who signs off.
Rules:
- Never invent account details. If context is missing, say so.
- You only recommend. A human approves every external action.
- Keep reasoning to a line or two, not an essay.
Reply "Ready. Paste a ticket and its context." and wait.
That is Level 1: useful immediately, and safe because it only recommends. Notice what the prompt is doing by hand: supplying context, gating risky actions, naming an owner. That is the structure a real onboarding layer provides and enforces, once one agent works and you want a hundred.
This prompt advises; it does not enforce. It can flag that a credit needs approval, but nothing stops the model from acting anyway, and a tricked or careless run can ignore every rule. There is no logged trail, no enforced owner, and no real check on what comes in or goes out.
It is safe only because it drafts and never acts. The moment the agent can actually send the reply, issue the credit, or change the account, flags stop being enough: the gates have to be enforced outside the prompt, and every action recorded against an owner. That is the line between a prompt and an onboarding layer.
One-slide summary
Digital labor needs onboarding.
Onboarding is three jobs: Ground (context and ontology: what matters, what things mean), Govern (the control plane: what agents can do), and Graduate (evaluation, graduation, accountability: how they're doing, when they're ready for more, who owns the outcome).
An onboarding layer is the operating layer that provides all three.
| Job | Layer | What it provides |
|---|---|---|
| Ground | Context | What matters |
| Ontology | What things mean | |
| Govern | Permissions · Tools · Policy (the control plane) | What agents can do |
| Graduate | Scorecard | How well they're doing |
| Graduation | When they're ready for more | |
| Accountability | Who owns the outcome |
Agents will keep getting more capable. That makes onboarding more important, not less. The hard question is no longer what an agent can do. It is what it is allowed to do, for whom, up to what limit, and who answers when it gets it wrong.
Whatever closes these gaps is the onboarding layer, whether you build it or buy it. We are building one, called Lattice.