10 Connectors, Commands, and Automation Hooks
Every connector you bolt onto an agent is a capability — and every capability is also a permission, context budget pressure, and a place where review can quietly disappear.
10.1 MCP and External Systems
A coding agent with only file edit and bash is an island. Real software delivery touches issue trackers, observability, deployment platforms, design tools, and databases, and the connectors that reach those systems are how an agent stops being an editor companion and starts being a workflow participant. The argument of this chapter is narrow and load-bearing: connectors matter because they let agents operate in real systems, but they only pay off when you treat them as permission-bearing surfaces of the workflow rather than as integration glue. Three surfaces deserve operator-grade attention — MCP servers for structured external access, slash commands and headless invocations for repeated entry points, and lifecycle hooks for deterministic enforcement. Skills, plugins, and subagents are mentioned only where they sharpen one of those three decisions; their full treatment lives in Chapter 4 and Chapter 2.
The way agents reach external systems has converged on one protocol. The Model Context Protocol (MCP) defines a client–server contract where servers expose tools, resources, and prompts, and any MCP-aware agent can call them through the same interface [168], [169]. The point is not the protocol; it is that wiring an agent to GitHub or Postgres collapses from “build a custom integration per tool” into a config block any team can read [170], [171], [172].
The minimal pattern lives in .mcp.json (or its equivalent) at the project root:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_TOKEN": "${GITHUB_TOKEN}" }
},
"postgres": {
"command": "uvx",
"args": ["mcp-server-postgres", "${DATABASE_URL}"]
}
}
}Sidebar — same shape, two scopes. What you write at the project root is the developer-side view of an integration. The organizational-side view is an admin-published registry that enumerates which servers users are allowed to connect at all, served over HTTPS and refreshed on a fixed cycle [92], [173]. Both views speak the same transport vocabulary —
stdiofor locally-spawned subprocesses,streamable-http(and the oldersse) for remote URLs — and the same per-tool permission shape; the registry adds an envelope and an enforcement cycle, not a different model. Repo-local config is the right default for solo work and small co-located teams. The moment you cross out of pilot, expect a registry to take ownership of which servers can run. Full registry mechanics live in Chapter 21 — for now, write your.mcp.jsonso a future migration to a managed allow-list is a mechanical move, not a redesign.
Each server’s tools surface to the agent as mcp__<server>__<tool> — mcp__github__list_issues, mcp__postgres__query. That naming convention is what allowedTools filters against, and the filter shape is where most teams trip. allowedTools: ["mcp__postgres__query"] grants only the read tool; allowedTools: ["mcp__postgres__*"] silently grants mcp__postgres__execute alongside it the moment the server adds a write capability. Wildcards are tempting and dangerous; enumerate explicit tools whenever the server exposes anything destructive [174]. The opposite failure is just as common: the server registers, its tools appear in the model’s tool list, and not a single one ever fires because allowedTools was left empty. There is no runtime error — only an agent that mysteriously can’t call GitHub.
Adding an MCP server is therefore a permission expansion, not just an integration. The server’s tools join the same callable surface as built-in Read, Edit, and Bash, so they need the same allowlist and denylist discipline. allowedTools can name MCP tools directly, such as mcp__github__create_issue or mcp__postgres__query, which lets a review bot read issues without granting it database writes.
Transports matter too. stdio runs the server as a local subprocess and is the right default for tools you trust [168], [170]. streamable-http (and legacy sse) point at a remote URL and let the agent use cloud-hosted services like Vercel MCP, which adds OAuth consent on every reconnection, client allowlists, and a deliberately read-only initial scope to keep the lethal trifecta — untrusted content + sensitive data + outbound reach — from snapping shut [175], [176]. The syntax should look different in your config, not just in your mental model:
# Claude Code remote HTTP MCP
claude mcp add --transport http secure-api https://api.example.com/mcp \
--header "Authorization: Bearer your-token"// Claude Agent SDK remote HTTP MCP
for await (const message of query({
prompt: "Explain our deployment logs",
options: {
mcpServers: {
"deploy-logs": {
type: "http",
url: "https://logs.example.com/mcp"
}
},
allowedTools: ["mcp__deploy-logs__*"]
}
})) {
// ...
}When you flip a connector from read to write you are not adding a feature; you are arming the trifecta. Treat it as a security review.
The other axis where MCP quietly fails is context budget pressure. Six MCP servers exposing 60 tools consume roughly 47,000 tokens of definition before the agent reads a single file; in real sessions, MCP descriptions can eat a third of the effective window and force compactions that would not otherwise happen [177], [178]. Two patterns help. The first is dynamic discovery — instead of preloading every tool, expose a single CLI like mcp-cli grep/call that lets the agent find and invoke tools on demand, dropping MCP context from ~47k to ~400 tokens [177]. The second is static generation — vendor the tool definitions you actually need into your codebase, version them, and stop pulling fresh schemas from upstream every session [178]. Both move the unit of governance from “the live MCP server” to “what your repo says the agent can do,” which matters because unversioned MCP servers will trim and rename tools to save their own tokens, breaking your agent’s behavior with no commit you can blame [179].
That instability is why some practitioners are walking away from MCP for low-frequency cases and replacing it with skills — short, model-written instructions that wrap a CLI and load only when needed [179], [180]. A Sentry skill that documents sentry-cli calls fits in a few hundred tokens and breaks only when you change the CLI, not when an upstream MCP server rewrites its descriptions. Roughly: use MCP when the agent needs structured, schema-rich access to a system you don’t control (Postgres, GitHub Issues, Vercel deploys) and the latency of a server process is acceptable; use a skill or shell command when the integration is a thin wrapper over a good CLI and you’d rather pay the maintenance cost yourself [181], [179]. This isn’t a war between connector types; it’s a tool-selection question with a real answer per integration. The skill versus subagent versus rules-file taxonomy is owned by Chapter 4; what matters here is only the choice between MCP and a skill-wrapped CLI.
An MCP server’s value is set by the gap between its tools and the underlying API surface — the bigger the gap, the less the agent has to reorchestrate the same multi-step flow on every conversation, and the fewer chances each call has to drift [182]. Low-level wrappers — one tool per API endpoint — collapse that gap to zero and force the agent to relearn your domain from scratch every turn. The DoltHub team hit this from the inside: confusing CLI behavior that broke at a 50% rate kept failing under longer CLAUDE.md files and better prompting until they wrapped it in MCP tools with explicit branch semantics, at which point the failures stopped [183]. When agents repeatedly stumble over the same tool, redesign the tool. Stripe operationalizes the same principle at scale through Toolshed: ~500 internal MCP tools, none of them low-level wrappers, with any given agent receiving only a curated subset because “agents perform best when given a smaller box” [173], [184]. Curation is the same lesson as wrapping — both raise the abstraction the agent sees above the raw API, just from different ends of the catalog.
For canonical patterns rather than live data, expose a reference application as an MCP server [185]. The agent can pull recent commits from your golden-path repo and detect drift in services that copied the template months ago — but only when the comparison is scoped to specific commits, not “compare these two codebases.” Open-ended drift detection produces noise; commit-scoped detection produces patches. AI adds value here only for context-sensitive transformations like normalizing diverse log statements into a shared format; for rule-based changes a codemod still wins.
10.2 From Repo-Local Config to Org-Managed Registry
Per-developer MCP freedom works until it doesn’t, and the inflection point is sharp. MCP servers run with the agent’s credentials, read environment variables, and reach the network without further prompts after registration; one developer with an unvetted server is one supply-chain incident away from exposing the codebase [186]. Teams typically graduate from repo-local .mcp.json to an org-managed allow-list the first time security review asks “what external services can our agent connect to?” and the honest answer is “whatever each developer chose” — or when onboarding instructions live in a Slack thread nobody owns, or a post-incident review names a server the platform team had never heard of.
The graduation target is a registry, not a policy document: an admin-published list of approved servers that agents fetch on launch and re-check on a fixed cadence (24 hours in Amazon Q Developer’s first-mover implementation), terminating servers that drift off the list and relaunching with the registry’s pinned version when entries disagree [92], [173].
The reason this graduation is mechanical rather than disruptive is that the registry speaks the same transport vocabulary your repo-local config already uses. Servers are described as stdio (a package the client launches locally), streamable-http (a remote endpoint over HTTPS), or sse (the older server-sent-events remote shape). Tool names, transport names, and the developer-facing prompt surface stay identical; what changes is who decides which servers are reachable and at what pinned version [173].
One caveat worth naming up front: enforcement is client-side, so a teammate running a different agent client, or one who hand-edits config outside the governed tool, bypasses the registry entirely [92] — meaning network egress controls, IAM scoping, and audit logging still have to do real work underneath [186].
Full registry mechanics — profile precedence, version pinning policy, migration sequencing, exception workflow, and decommission of repo-local entries — belong in Chapter 21. For now, the actionable rule for an individual team is narrower: write your repo-local .mcp.json in the shape a future registry will accept (explicit tool allow-lists, pinned versions, env-var credentials, no wildcards on writes) so graduation is a copy, not a rewrite.
10.3 Commands and Reusable Entry Points
The second extension surface is for repeated work that you control: turning ad-hoc prompts into named, version-controlled entry points. The temptation when the prompt-to-result loop feels good is to keep building — a custom router here, a workflow engine there, a queue for parallel runs — until you have written half of a CI system inside your agent config. Don’t. Once you outgrow the built-in primitives, you inherit your own permission model, retry logic, review checkpoints, and debugging burden. Compose the platform’s primitives before inventing a new abstraction layer.
Slash commands, skills, and subagents are three different answers to “what should this repeated prompt become” — what to invoke, what to load, where to isolate — and the full taxonomy, including how the three compose, lives in Chapter 4. What matters at this layer is the packaging decision and the escalation path between it and the other two surfaces in this chapter.
A compact decision rule for which surface a repeated prompt becomes:
| Trigger | Surface |
|---|---|
| Same prompt shape, typed by more than one human more than a few times | Slash command (.claude/commands/*.md) |
| Procedure the agent should pull in only when the situation matches | Skill (see Chapter 4) |
| Rule that should fire whether the agent thinks of it or not | Hook (PreToolUse / PostToolUse) |
| Structured access to an external system you don’t control | MCP server (with explicit allowedTools) |
| Focused task that should not pollute the main thread | Subagent (see Chapter 4) |
| Caller is a CI step, scheduler, or webhook — not a human | Headless CLI flag or SDK call |
Pick the lowest-ceremony surface that survives the rule’s actual lifecycle. The trap is jumping straight to a custom MCP server or bespoke tool when a slash command would have done the job. Run the table top-to-bottom and stop at the first surface that fits: start with a slash command for any prompt your team types more than three times; promote it to a skill when the procedure should fire only conditionally; promote it to a hook when the rule must fire even when the agent forgets; build a custom MCP tool only when the gap between “what the agent should call” and “what the API exposes” is wide enough that the agent keeps reorchestrating the same multi-step flow [182]; and reach for headless invocation only when the caller stops being a human at a keyboard. Cross that last line too early and you’ve built a CI system from scratch; cross it too late and your scheduled job is somebody pasting prompts into a chat window every morning.
The simplest place to start is the slash command itself. A reusable /triage-bug command lives as a markdown file at .claude/commands/triage-bug.md, gets committed to the repo, and ships with the team:
---
description: Triage an incoming bug report into a structured issue draft
argument-hint: <issue_url_or_number>
---
You are triaging bug report $1 in this repository.
1. Run `gh issue view $1 --json title,body,labels,comments` to load the report.
2. Reproduce locally only if the report includes a clear repro recipe; otherwise skip.
3. Produce a structured draft with: suspected component, severity rationale,
smallest reproduction we have, and one open question for the reporter.
4. Write the draft to `triage/$1.md` and stop.
Do not edit production code, do not close or label the issue, and do not
post comments. A human owns those decisions.Anyone on the team types /triage-bug 4821 and the agent runs the team’s agreed shape, not whatever framing the current prompter happens to use [174], [184]. The shape here is just an entry point that turns a repeat prompt into a versioned artifact; the same template works for /draft-changelog, /scaffold-endpoint, /review-pr, and any other operation a team performs more than a handful of times. What makes a good review or a good changelog is the subject of Chapter 11 — the packaging primitive is independent of the procedure being packaged.
Outside the session, the cross-platform contract that lets these primitives run unattended is converging on a three-condition shape: authentication satisfied from environment variables, all tool permissions pre-declared at invocation, and a non-interactive entry point that replaces the chat box. GitHub Copilot CLI ships this as copilot --prompt "fix the failing test in auth.test.ts" --allow-all-tools [187]. Continue’s equivalent is explicit enough to paste into CI:
export CONTINUE_API_KEY=your-key-here
cn -p "Review the current git changes for bugs and suggest improvements" --autoCline goes further with a programmatic SDK lifecycle — initialize, authenticate, newSession, prompt, setSessionMode to gate plan-versus-act, cancel, shutdown — that lets an orchestrator drive the agent like any other process:
const initializeResponse = await agent.initialize({
protocolVersion: 1,
clientCapabilities: {
fs: { readTextFile: true, writeTextFile: true },
terminal: true,
},
});
await agent.authenticate({ methodId: "cline-oauth" });
const { sessionId } = await agent.newSession({ cwd: process.cwd(), mcpServers: [] });
await agent.setSessionMode({ sessionId, modeId: "plan" });
await agent.prompt({
sessionId,
prompt: [{ type: "text", text: "Draft the migration plan only" }],
});
await agent.shutdown();The simultaneous, independent shipment across three vendors is the signal: env-var auth + pre-declared permissions + flag-or-SDK invocation is the contract, not a vendor-specific quirk. Permission scoping and stop conditions for these unattended runs are the subject of Chapter 16; treat the recipe itself as the durable lesson here.
Pick the shape that fits the call site. For a one-shot CI step that takes a prompt and returns a result, a stateless query is the simplest path. For a multi-turn pipeline where step two needs the context step one built — analyze auth module, then refactor with JWT — a stateful client is cheaper and more reliable than re-feeding the same files twice. The Claude Agent SDK names the choice directly with query() for one-shot and ClaudeSDKClient for multi-turn, with max_turns and max_budget_usd as the bounds and resume/fork_session as the recovery primitives [188]:
# One-shot — stateless. Fits a CI step or webhook handler.
from claude_agent_sdk import query
result = await query(
prompt="Triage this Sentry alert and propose a patch",
max_turns=5,
)
# Multi-turn — stateful. Step 2 reuses what step 1 already loaded.
from claude_agent_sdk import ClaudeSDKClient
async with ClaudeSDKClient(max_budget_usd=2.00) as client:
await client.query("Analyze the auth module and summarize the JWT flow")
await client.query("Now refactor LoginController to use the JWT helper")Classic gotchas are silent: a resumed session whose cwd differs from the original starts fresh with no warning, an interrupt() whose buffered messages aren’t drained leaks stale output into the next call, and TypeScript’s startup() pre-warm only helps if you actually call it before the latency-sensitive request [188]. Chapter 15 goes deeper on orchestration topology — the point here is that you have a clean shape per integration.
What this unlocks is the difference between editor-bound agents and pipeline-bound ones. A scheduled refactor sweep across many repos, a per-PR review pass that runs in a sandbox before a human looks, a webhook handler that triages Sentry alerts — these only work when the agent runs without a human at the terminal [189], [187], [190]. Vercel Agent and Copilot Cloud Agent are concrete examples of the contract: an agent invoked as a service, sandboxed before its work hits a diff, automating branch creation, commits, and PR opening for routine tasks while the developer context-switches [189], [187]. Devin extends the pattern with explicit guidance that tasks with verifiable success criteria — passing tests, green CI, matching an existing pattern — yield substantially better results than open-ended instructions, and that an exploratory “Ask Devin” pass to scope the work before invocation pays for itself [190], [191], [192]. The recurring lesson is structural: scope the task before invoking the agent, design the call site so the agent can self-verify before any human reviews output, and treat an unattended entry point without a self-check as a hidden tax on the next reviewer in line. Verification mechanics live in Chapter 12; the point at this layer is just that the entry point and the check have to ship together.
Headless mode does not work for everything. Exploratory debugging, where you steer each step and inspect intermediate state, belongs in the interactive session [181], [193]. Tasks that depend on visual feedback an automated runner can’t evaluate — does this UI actually look right — need a human in the loop [190]. Irreversible operations should never run unattended without explicit checkpoints; the cost of a wrong decision in CI is a force-pushed branch, a dropped table, or a deleted file with no rollback path. Use Chapter 16 to decide which calls are safe headless before you wire the trigger.
10.4 Automation Hooks
The third surface sits between intention and action. Hooks register callbacks at named lifecycle events — primarily PreToolUse before a tool fires and PostToolUse after it returns — and let you make programmatic allow/deny/escalate decisions, transform inputs, or trigger side effects without touching the agent’s prompt. Their structural position outside the model means they apply consistently regardless of how the agent reached a decision, and they cost zero context tokens.
Pre-tool hooks are the most reliable way to enforce policy the model cannot override, because they run before the action and the model sees only the result [186], [181]. A path guard against **/.env or **/secrets/* is far more reliable than a CLAUDE.md instruction asking the agent to “please don’t read credentials.” Cline’s hook system runs JS/TS files from .clinerules/hooks/ and lets the hook return a structured response — cancel: true aborts the action and optionally injects a reason the model reads on the next turn. Windsurf takes a more shell-friendly route with bash -c and powershell runners configured in .windsurf/hooks.json (workspace) or ~/.codeium/windsurf/hooks.json (user), and exposes named events like pre_read_code and post_write_code instead of generic tool hooks. The split between workspace and user scope matters: team policy (“always run lint after a write”) belongs in the workspace config and gets committed; personal preference (“notify me when a long task finishes”) belongs in the user config and stays out of the repo.
Multiple hooks chain in declaration order, and the priority rule is deny-wins-over-allow across the entire chain — a convenience hook cannot override a policy hook. That ordering is correct for governance: you want the lint-or-block gate to take precedence over an “allow git push” exception. Hooks are also evaluated first in the harness’s permission pipeline, before deny rules, permission modes, and the user-callback layer covered in Chapter 2. Use them when the rule is structural (block writes to a path, run a fast check after every edit, log every shell command) and lives somewhere that survives prompt edits. Don’t use them to change what the agent does during a task — that’s instruction territory — and don’t put expensive side effects on high-frequency events; running a full build on every file read is a workflow killer.
A PreToolUse hook for Bash should inspect the command for known-dangerous patterns, return a structured deny with an explanation, and log the attempt:
#!/usr/bin/env bash
# .claude/hooks/pre-bash.sh — PreToolUse handler for the Bash tool.
# Reads JSON event on stdin, writes JSON decision on stdout.
input=$(cat)
cmd=$(echo "$input" | jq -r '.tool_input.command')
case "$cmd" in
*"rm -rf "*|*"curl "*"| sh"*|*"git push --force"*|*" eval "*|*"sudo "*)
reason="blocked dangerous pattern: $cmd"
echo "$(date -Iseconds) BLOCKED $cmd" >> ~/.claude/blocked.log
jq -n --arg r "$reason" '{cancel: true, reason: $r}'
exit 0
;;
esac
echo '{"cancel": false}'The file-path policy variant is just as important because many teams care more about where the write lands than which command triggered it:
#!/usr/bin/env bash
# .claude/hooks/pre-edit-path-guard.sh
input=$(cat)
path=$(echo "$input" | jq -r '.tool_input.file_path // empty')
case "$path" in
*.env|*/.env|*/secrets/*)
jq -n --arg r "blocked write to protected path: $path" '{cancel: true, reason: $r}'
exit 0
;;
esac
echo '{"cancel": false}'A matching PostToolUse hook on Edit can trigger a fast check and, on failure, push the result back as a tool message so the agent can react on the next turn instead of moving on:
#!/usr/bin/env bash
# .claude/hooks/post-edit-lint.sh
input=$(cat)
if ! npm run lint --silent >/tmp/agent-lint.log 2>&1; then
jq -n --rawfile out /tmp/agent-lint.log \
'{cancel: false, additionalContext: ("Lint failed after edit:\\n" + $out)}'
exit 0
fi
echo '{"cancel": false}'Failure modes are silent and worth memorizing. A hook script with a bad path quietly skips and the agent proceeds unchecked. A hook that exits non-zero without returning the structured cancel signal lets the action through. A hook that fails its check but cannot push the failure back into the agent’s context is effectively a no-op — the lint failed, the agent never heard about it, and the next turn lands on top of broken code. The DoltCash work surfaces the same lesson from the other side: post-execution checks that surface “actual vs expected” back into the loop produced more reliable behavior with off-the-shelf agents than pre-execution guardrails alone [194]. What a sufficient verification check actually is belongs in Chapter 12; what matters here is that the hook is the deterministic place to attach one and the deterministic place to surface the result. The same lifecycle pattern extends to hosted CI through GitHub Copilot Cloud Agent’s workflow-scoped hooks — which is what makes hooks suitable for org-wide policy, not just local IDE quirks [187].
Hooks are the deterministic enforcement layer, but they are not the whole permission story. The layer below them — allowedTools, disallowedTools, default-deny posture, sandbox topology, and how the harness orders deny rules against permission modes and user callbacks — is the subject of Chapter 16. The combination — sandbox + permission allowlist + hooks for policy and observability — is defense in depth, and any single layer fails predictably; what belongs in this chapter is just the hook leg of that stool and how it interacts with the connectors and entry points above.
The deeper rule connecting all three sections: every connector you bolt onto an agent expands its tool surface, and the tool surface is the agent’s permission surface. An MCP server, a custom command, a hook that lets a tool through — each is a decision about what the agent can do without asking. The shape that works treats these as composable primitives, not as raw material for a custom platform: a small .mcp.json with explicit allowedTools, a versioned set of slash commands under .claude/, a CI invocation path with env-var auth and pre-declared permissions, and a hook layer that enforces what your prompt cannot. Teams that resist the urge to grow this into a half-built workflow engine ship faster and audit cleaner than teams that don’t. Chapter 4 picks up the specialization side of these primitives; Chapter 16 covers the policy framework for setting scope across task types; Chapter 15 is where these primitives compose into orchestration; Chapter 21 is where the registry, the sandbox policy, and the deny-list become someone’s full-time job.
10.5 Takeaways
- Enumerate explicit MCP tool names in
allowedToolsinstead of using wildcards — a wildcard likemcp__postgres__*silently grants every write tool the server adds in future updates. - Write repo-local
.mcp.jsonwith explicit tool allow-lists, pinned versions, and env-var credentials so that graduating to an org-managed registry is a mechanical copy, not a redesign. - Treat adding a write-capable MCP connector as a security review, not a feature addition — read-to-write expands the agent’s permission surface into the lethal trifecta of untrusted content, sensitive data, and outbound reach.
- When an agent repeatedly fails on the same MCP tool, redesign the tool with explicit domain semantics rather than prompting harder or lengthening the rules file.
- Use the lowest-ceremony decision table to route repeated work: slash command for any prompt your team types more than three times, skill when it should fire conditionally, hook when it must fire even if the agent forgets, MCP only when the agent would otherwise keep reorchestrating the same multi-step flow, and headless invocation only when the caller stops being a human at a keyboard.
- Satisfy the three-condition unattended-invocation contract — env-var auth, all tool permissions pre-declared at invocation, and a non-interactive entry point — before wiring any agent to CI, a scheduler, or a webhook.
- Register
PreToolUsehooks for path guards and dangerous-command blocks rather than relying on prompt instructions — hooks run before the action and the model cannot override them. - Wire
PostToolUsehooks to push verification failures back into the agent’s context as tool messages so the agent can correct on the next turn instead of building on top of broken output.