15 Coordinating Multiple Agents
The bottleneck in parallel agent work is never how fast agents can generate code — it’s how fast you can review what they produce.
Chapter 14 established phase-level ownership: who owns requirements, infrastructure, security, documentation, and remediation when agents touch the whole delivery lifecycle. This chapter inherits that ownership discipline and narrows it: phase-level boundaries are necessary but not sufficient once multiple agents touch the same phase concurrently. The operational layer now becomes file/path ownership, parent-session gate ownership, and explicit handoff ownership before more than one agent edits against shared repository state.
Concrete failure case: one agent drafts the API contract for a billing-change requirement while another agent, in the same phase of implementation planning, edits the handler and tests from an older contract. Phase-level ownership says “product owns requirements” and “engineering owns implementation,” but concurrency breaks that boundary unless one parent session owns the gate and each worker owns a declared file/path slice. The move from phase ownership to file/path ownership is therefore not bureaucracy; it is what keeps two correct-looking agent outputs from becoming one incoherent change.
This is the parallel agent execution case that Chapter 1 distinguished from sequential tool switching. Sequential tool switching means choosing Cursor for one moment and Claude Code or Codex for another; parallel agent execution means concurrent execution whose outputs must be reviewed, merged, or rejected against shared repository state. The bottleneck in this chapter is therefore not portfolio choice across tools; it is review throughput once multiple workers are producing changes against the same system. The shift from conductor to orchestrator is real, but orchestration is not simply “run more agents.” The practitioners who get real leverage from parallel agents understand a fundamental asymmetry: agents can generate code at effectively unlimited speed, but your capacity to review, integrate, and verify that code remains stubbornly finite [272], [273]. Every multi-agent pattern in this chapter exists to manage that asymmetry.
The term multi-agent covers two very different things. Most of what this chapter describes is fan-out parallelism: independent agents working on separable subproblems, each in its own sandbox, with the human picking a winner or merging a branch. True orchestration is the thinner extension: agents sharing a goal through boundary contracts, ownership maps, handoffs, merge discipline, or stateful coordination. The early sections of this chapter show where fan-out pays off; the later ones show where coordination overhead becomes executable.
The ownership boundary from Chapter 14 becomes sharper once more than one agent is involved. The SDLC question was “who approves infrastructure changes, who owns the remediation queue?” The multi-agent version is “which parent session owns the gate, which worker owns each file boundary, and who merges the handoff?” If those ownership lines are implicit, fan-out turns one unclear responsibility into five unclear responsibilities. This chapter formalizes the branch-level and path-level boundary before discussing coordination mechanics.
The practical question is never “can I run more agents?” but “will running more agents produce work I can actually absorb?” Senior engineers with existing habits of managing parallel workstreams — handling interrupts, context-switching between reviews, triaging competing priorities — are naturals for this. Junior and mid-level engineers often struggle, not because the tools are harder, but because the cognitive load shifts from doing work to directing and reviewing work [273].
15.1 When Parallel Agents Pay Off
The single most useful framework for deciding when to parallelize is the read/write distinction. Read tasks — research, exploration, codebase analysis, library comparison — parallelize cleanly because they don’t modify shared state. Write tasks — code generation, file editing, refactoring — create coordination problems the moment two agents touch overlapping files [274]. When parallel agents make independent architectural choices without shared awareness — one picks a different API style, another assumes a different error pattern — integration produces conflicts that cost more to resolve than the parallelism saved [275]. Almost all of this is fan-out: you are running N agents independently and picking or merging results, not coordinating N agents toward a shared goal.
Research tasks are the clearest win. Need to evaluate whether a real-time sync library can work with your Python backend? Want to understand how your authentication system handles edge cases across twelve files? Exploring three competing libraries? Fire up parallel agents. Each operates in its own context window, reads what it needs, and returns a summary — no merge conflicts, no coordination overhead, and the results are immediately useful as context for your next decision [276]. Context isolation is the durable move: a subagent that returns a two-paragraph summary leaves your main loop sharp, where inlining the same read would degrade every subsequent turn with tool-result noise [274].
Proof-of-concept work parallelizes for the same reason — you’re not committing to any of the generated code, you’re answering questions. “Can this approach work?” is a read task disguised as a write task. Low-stakes maintenance — deprecation warnings, minor dependency bumps — also runs well in parallel because the blast radius of any single failure is small.
Fan-out isn’t only a local pattern. Cloud dispatch — one practitioner supervising five to ten cloud sessions, each in its own sandbox — is the same pattern at higher scale [277], [278]. Codex Cloud paired with a local CLI, and Jules-style cloud VMs running parallel sessions, are two concrete surfaces. The durable rule across all of them: parallel generation without parallel review is just a queue of half-inspected code. Reach for cloud multi-session when you have a backlog of independent tasks that don’t share files and a review pipeline that can honestly absorb five to ten PRs a day; skip it when the tasks share files, when the work is on the critical path, or when the reviewer can’t read those PRs carefully. Supervision shifts shape when sessions run asynchronously or in the background: without a foreground REPL to watch, you need explicit status checks, visible budget and turn counters, and a fast cancellation path so a runaway worker doesn’t burn quota before you notice.
You pay for this. Parallel agents earn their keep on research and exploration, where the speed advantage is dramatic and review cost is low; they become questionable for implementation tasks unless you have clear task boundaries and robust automated verification [273].
15.2 Role-Based Specialization
The mechanics of authoring a named subagent — a slug-resolved markdown file with a description, scoped tool list, and behavioral prompt — live in Chapter 4. This section is about what you do with those definitions once coordination is the goal: which dispatch patterns actually reduce the review bottleneck, and when read-only fan-out beats write-capable fan-out.
The most reliable multi-agent pattern isn’t running three generalist agents in parallel — it’s giving each agent a focused role with constrained context. Subagents solve context pollution: when a single agent handles research, implementation, and testing in one session, intermediate outputs from each phase contaminate the context for the next. Splitting these into isolated agents, each seeing only the context it needs, produces better results per phase [279], [280]. The tractability payoff is concrete: the parent receives only the subagent’s final message rather than its full tool trace, so noisy exploration stays inside the specialist instead of polluting the orchestrator’s context [274].
The Architect/Editor split is the simplest and most proven role-based pattern. One agent reasons about the problem; another translates that design into code changes. Aider showed this concretely: separating reasoning from code manipulation pushed benchmark performance to 85% — the highest on their code editing benchmark — because each model could specialize in what it does best [281]. The same principle applies across model tiers: lead on a powerful model for planning, teammates on a faster, cheaper model for execution [281]. Don’t split a 30-second task across two models and wait 90 seconds for round-trips — the split earns its keep when edit failures are dominating your runs, not when a single model would finish in one turn.
The immediately useful template is the three-reviewer fan-out. One subagent reviews the diff for security issues, one for missing tests, and one for maintainability or design drift. Each sees only the diff and the relevant rubric, not the whole repo conversation. The parent waits for all three final messages, then summarizes the overlaps and disagreements into one human-facing review note. Reasoning-based security scanning catches what pattern matchers miss — business-logic flaws, broken access control, subtle data-flow bugs — which is precisely the kind of work that benefits from a dedicated reviewer with a tight tool surface [270]. That pattern gives you specialization without shared-state complexity, and it is usually enough for the outer-loop review and integration tasks that Chapter 8 treats differently from inner-loop editing work.
The planner-generator-evaluator pattern takes this further. A planner decomposes the task, a generator produces the code, and an evaluator grades the output against explicit criteria before accepting it. Separating the agent doing work from the agent evaluating it dramatically improves output quality — generators consistently over-praise their own work, but external evaluators can be tuned to skepticism [282]. This directly addresses the review bottleneck: the evaluator catches the most obvious failures before they reach you. The same principle shows up in plan-review pipelines where one agent drafts the plan and a second, independently configured agent reviews it before any code is written, with two review rounds and explicit exit conditions outperforming five rounds of fuzzy agreement [283]. The operator caveat transfers: don’t hand an autonomous loop a problem you couldn’t specify crisply to a junior engineer.
Parallel read-only subagents are safe by construction: they cannot create merge conflicts or silently clobber shared state [275]. The moment two write-capable subagents can touch the same config file, migration, or shared module, coordination overhead rises with every shared file and can erase the speed gain. Use parallel write-capable subagents only when each one has a disjoint ownership boundary declared up front and automated conflict detection runs before any merge is attempted. That up-front ownership boundary is not the same thing as the runtime permission contract in Chapter 16: ownership says which agent owns which files and decisions; permissions decide which tools and mutations are allowed inside those declared boundaries. Five-role persona stacks on a one-hour bugfix are ceremony without payoff; save role specialization for work where planning, review, and QA phases actually differ. Where those stacks do pay off — full role-sequence frameworks — the mechanics live in Chapter 9, not here.
15.3 Coordination Patterns
Every multi-agent system needs a coordination mechanism, and the practitioner consensus reduces to a short ladder: isolate the workspace, choose the session model, define the handoff artifact, gate the approval at the parent, and escalate to a runtime only when those four are no longer enough. Walk it in that order.
1. Workspace isolation comes first. Each agent operates in its own branch or worktree, produces commits as output, and the human orchestrator merges [277], [272]. The simplest setup is dual clones — two separate checkouts, each with its own agent session. Git worktrees are the more scalable variant, and several tools now support them natively. Once you’re running more than two or three parallel agents, CLI ceremony starts to leak: you lose which pane holds which branch, cancelling one agent means hunting for its terminal, and comparing diffs turns into git gymnastics. A manager UI replaces that ceremony with a visible surface for spawning, observing, comparing, and cancelling agents, with workspace-level isolation still enforced underneath. Claude Squad-style terminal multiplexers and the parallel-agents UIs now shipping in IDE-class tools are two concrete instantiations [284]. Don’t reach for a manager surface if a CLI-plus-worktrees setup already works for two or three agents; it earns its keep when the tmux-pane count stops scaling and when non-CLI teammates want to dispatch and compare runs without writing a worktree script.
2. Choose the session model next. Session isolation is the multi-agent analog of thread safety. If each worker is handling an independent subtask, default to a stateless SDK entry point such as query() so no worker inherits another worker’s context by accident. Use a stateful client such as ClaudeSDKClient only when the orchestrator itself needs to accumulate context across pipeline steps and hand the result into the next turn. Shared mutable state in one long-lived session requires coordination; isolated stateless workers are safe by construction. When a worker hits max_turns or a spend cap mid-pipeline, resume or fork the session deliberately rather than letting the parent silently absorb its half-finished context.
# Stateless worker: bounded review task, no history to preserve
review = query(
"Review this diff for missing tests only.",
allowedTools=["Read", "Grep"],
)
# Stateful orchestrator: multi-step run with resumable session state
client = ClaudeSDKClient()
session = client.session()
result = session.query("Plan the migration, then supervise implementation.")
session_id = result.session_id
# Resume the pipeline after a turn or budget stop
resumed_client = ClaudeSDKClient(resume=session_id)
resumed_session = resumed_client.session()
resumed_result = resumed_session.query(
"Continue from the last checkpoint and summarize unfinished work."
)
# Fork before a risky branch so the original checkpoint survives intact
fork_client = ClaudeSDKClient(resume=session_id, fork_session=True)
fork_session = fork_client.session()
fork_result = fork_session.query(
"Try the schema-change branch and stop if tests fail."
)The transferable rule: a stateless worker is the multi-agent analog of a pure function — bounded inputs, bounded outputs, no shared state to debug — while a stateful orchestrator is the analog of a long-lived service that must explicitly checkpoint, resume, and fork. Choose by whether the next step actually needs context the previous one produced. If the stateless worker is enough, prefer it. If the orchestrator must carry context across planning, delegation, and integration, use the stateful client and capture the session identifier immediately so a turn or budget stop becomes a resume rather than a restart. Treat startup latency as a separate concern: if subagent spawn time is on the critical path in a TypeScript orchestrator, pre-warm the subprocess pool with startup() before dispatch.
3. Define the handoff artifact. Each agent hands back a branch or commit plus a short summary of what changed, what checks ran, and what still looks risky. Shared filesystem access only matters insofar as every agent can inspect the same repository state; the real handoff should still be a reviewable diff, not chat prose [272]. For dependency-aware coordination, keep shared state minimal: a shared task list plus an inter-agent messaging tool, with task files on disk and a single send-message channel, lets agents self-claim tasks from a queue and execute in dependency-aware waves without piling up shared memory beyond the queue itself. At higher scale, add a lightweight git-level synchronization step so two agents cannot claim the same problem simultaneously [285]. Goal-based, idempotent steps — each producing a new artifact rather than mutating an input — make the whole pipeline retryable from the last good commit instead of restartable only from scratch [286].
4. Place the approval gate at the parent, not the children. The full taxonomy of approval modes, pause-point design, and unattended-launch recipes lives in Chapter 16; the multi-agent-specific rule is structural. In a fan-out pipeline the gate sits on the parent’s canUseTool / can_use_tool callback, because that is the only place where every child’s tool call is actually evaluated — children inherit the parent’s posture and cannot narrow it per child. In a Terraform provider-version migration, for example, three subagents under one parent inventory modules, diff provider changes, and draft a PR sequence [261], [265]; the parent callback auto-approves Read, Grep, and terraform plan while pausing on terraform apply, IAM touches, secret reads, or any plan that destroys a stateful resource. Encode anything that must stay blocked even under a bypass posture as a parent-level deny entry; deny rules survive bypass while allow lists do not.
5. Escalate to runtime only when markdown cannot carry the load. Every rung above can live in a spec doc, a PR template, and an SDK script as long as a human is in the loop. Move beyond that only when you need runtime-enforced ownership, crash-resumable handoffs, or service-style execution — that is the threshold to reach for a graph runtime for explicit state transitions between nodes, or a persistent session store on top of the stateful client for resumable sessions and approval callbacks in code. Versioned structured state (a database that supports diffs and rollbacks, not just a JSON file in Git) becomes the right move once concurrent agents actually need to share writes; below that scale, branches and commits are still the simpler primitive [285], [287].
15.4 The Boundary Between Orchestration and Autonomy
Orchestration ends where agents continue working without waiting for human review. If each agent returns a branch or commit for a person to inspect before merge, you are still orchestrating. Once agents are allowed to retry, re-plan, or merge based on automated checks alone, you have crossed into autonomous execution [288]. Two agents each returning a branch for you to review is orchestration; an agent that keeps retrying after tests fail until an evaluator clears it is autonomy.
The consequence unique to multi-agent pipelines is permission inheritance: a child cannot be safer than the parent that spawned it. Launch a parent in bypassPermissions so an overnight refactor doesn’t stall on prompts, fan out a security-reviewer subagent whose definition declares allowedTools: ["Read", "Grep"], and the child still inherits the parent’s bypass — its allowedTools narrows what the model is encouraged to use, not what the runtime will let it use [279]. Any audit of “what can these agents touch?” starts with the parent’s permission mode, not the children’s tool lists. The full container, autonomy-mode, deny-list, and safe-default unattended-launch policy lives in Chapter 16; this chapter’s contribution is just the inheritance rule and the placement of the gate.
15.5 Multi-Agent Coordination Artifacts
Parallel agents introduce coordination problems that no single-agent workflow encounters, and single-agent context engineering (Chapter 5, Chapter 3) is necessary but insufficient here. The artifacts that matter are the ones a human can inspect quickly before merging. A workable set for two agents on one feature looks like this.
Boundary contract. Write down the shared interface before either agent starts. Example: Agent A owns the backend invite-redemption endpoint, Agent B owns the signup form, and both agree that POST /api/invites/redeem returns 201 {userId, teamId}, 409 {code: "invite_used"}, or 422 {fieldErrors: ...}. That contract keeps the agents from inventing incompatible payloads or error semantics in parallel [278].
Ownership map. Declare which files each agent may edit. In the same feature, Agent A might own server/invites/*, the migration, and API tests; Agent B might own web/signup/*, the form components, and browser tests. Shared files such as an OpenAPI schema stay human-owned unless you explicitly reopen the contract [275], [279].
Handoff summary. Each agent should return the same package: branch name or commit hash, files changed, checks run, known limitations, and any contract changes that would force the other agent to rebase. Once handoffs have that structure, integration becomes mechanical instead of interpretive [272].
Merge gate. Merge only when the contract still matches, the declared checks pass, and no undeclared files were touched. Frequent integration catches contract drift early; infrequent integration gives agents more room to diverge before the human sees the conflict [275].
Filenames don’t matter. What matters is that every agent gets the same four artifacts — boundary contract, ownership map, standard handoff summary, explicit merge gate. Those artifacts are what turn parallel generation into coordinated engineering.
15.6 Parallelization Checklist
Treat the substance arguments earlier in the chapter as the why; this checklist is the launch routine you run before you spin agents up.
Agent count and brief: - 2 agents is the comfortable default for supervised work; 3-5 is the upper band where explicit boundaries and pre-decomposed tasks become mandatory; beyond that, treat it as escalation, not flex — the review queue erases the speed gain unless an evaluator agent pre-filters [282] - Each agent gets a focused brief naming the files it owns, the interfaces it must conform to, and its acceptance criteria; shared rules and contracts live in the rules file, not the brief - Excluded by construction: other agents’ full context, intermediate outputs, anything outside the agent’s declared scope
Isolation checks before launch: - Each agent on its own branch or worktree - Scoped credentials per agent — short-lived tokens limited to that agent’s repo or service task, not your personal all-access key - Parent permission mode audited and committed in the launch script — children inherit it, so the parent boundary is the real control; container isolation, autonomy-mode selection, and deny-list policy come from Chapter 16
Merge preconditions: - Automated tests pass on the agent’s branch - No conflicts with other agents’ branches - A human reviewed the diff, or an evaluator agent pre-screened it with explicit pass/fail criteria - No files outside the agent’s declared ownership were modified
The single rule the rest of the checklist serves: parallelism only earns its keep when ownership is declared up front and the parent boundary actually controls what the children can touch. Fan out wider than that and you are not orchestrating — you are queuing review debt with extra steps.
15.7 Takeaways
- Reach for cloud multi-session parallel agents only when you have a backlog of independent tasks that don’t share files and a review pipeline that can honestly absorb five to ten PRs per day; skip it when tasks share files, when the work is on the critical path, or when reviewers can’t read those PRs carefully — and when sessions run asynchronously, add explicit status checks, visible budget and turn counters, and a fast cancellation path so a runaway worker doesn’t burn quota before you notice.
- Use the Architect/Editor split only when recurring edit failures or genuine planning/execution separation justify the extra coordination cost: let the stronger model design, then hand execution to the faster editor model.
- Run a three-reviewer fan-out on each diff: one read-only subagent for security, one for missing tests, and one for maintainability or design drift; the parent collects all three final messages and summarizes overlaps and disagreements into one human-facing review note before you do final diff review.
- Before launching parallel agents on a shared feature, write down a boundary contract (the exact shared API shape), an ownership map (which files each agent may edit), a required handoff summary structure (branch, files changed, checks run, known risks), and a merge gate (contract still matches, checks pass, no undeclared files touched) — these four artifacts are what turn parallel generation into coordinated engineering.
- Audit and commit the parent’s permission mode in the launch script before fanning out, because children inherit the parent’s posture and cannot narrow it — a child’s allowedTools constrains what the model is encouraged to use, not what the runtime will permit.
- Default to stateless workers for independent subtasks so they cannot inherit one another’s context by accident; switch to a stateful orchestrator only when a later step genuinely needs context produced by an earlier one, and resume or fork that session deliberately when limits interrupt the pipeline.
- Escalate from branches-and-scripts to a graph runtime or persistent session store only when you need runtime-enforced ownership, crash-resumable handoffs, or service-style execution — below that threshold, branches and commits are the simpler primitive.