9  Process Frameworks for Coding Agents

A process framework doesn’t make the model smarter. It packages the rules, commands, roles, artifacts, and gates that turn raw agent capability into a repeatable team workflow — and the question is always whether the packaging earns its maintenance cost.

9.1 Ceremony and Autonomy: A 2D Map

Hold two axes steady before naming any framework. Ceremony is how much process artifact you require — none for vibe coding, a short brief for lightweight workflows, a reviewed plan for spec-driven work, multi-phase artifact chains for full ceremony. Autonomy is orthogonal: how much the agent runs without you. Brand names mix the two, which is why a flat ladder hides the real choice. Two frameworks at the same ceremony tier can sit at opposite ends of the autonomy axis [144], [145].

Low autonomy (you drive each step) Mid autonomy (mixed gates) High autonomy (agent runs unattended)
Zero ceremony Vibe coding
Lightweight GSD, RIPER-5, CONVENTIONS.md, AGENTS.md Autonomous loops
Mid OpenSpec, Spec Kit, Superpowers, SuperClaude
High BMAD, Kiro chain

BMAD and autonomous loops aren’t two points on one ladder — they sit in opposite corners. BMAD keeps humans at every gate and demands maximum artifact overhead. Autonomous loops strip artifacts to a single prompt file and let the agent run for hours against a monotonic success signal [146], [144]. Most production work lands in the mid-ceremony / mid-autonomy cell, which is exactly where OpenSpec, Spec Kit, Superpowers, and SuperClaude compete.

Compare any overlay on four questions: what it packages, what native surface could replace it, what failure mode it prevents, and what it costs to keep alive.

Framework family What it ships Native substitute Failure it prevents Recurring maintenance cost
Conventions overlay (CONVENTIONS.md, .clinerules, AGENTS.md) one markdown file loaded each session none — this is the minimum retyping the same standards every chat keeping the file aligned with the codebase
GSD / RIPER-style discipline a few mode labels and a task-brief habit saved slash command + project rules vague tasks and skipped acceptance mostly cultural, not artifact-heavy
Superpowers / SuperClaude installable skill library, ~19 slash commands, ~9 personas across design/dev/analysis/ops phases [147] a handful of skill files + .claude/agents/ definitions specialist behavior depending on memory or copy-pasted prompts curating the procedure library as the team learns
Spec Kit / OpenSpec /specify, /plan, /tasks commands plus a project constitution file Plan Mode + a hand-written implementation_plan.md implementation starting before intent is reviewed keeping the constitution and command playbooks current
Kiro-style chain three-document chain: requirements.md -> design.md -> tasks.md one spec doc + a checklist requirements/design/tasks failing silently inside one artifact maintaining three separate documents per feature
BMAD analyst/PM/architect/developer/QA roles, workflow specs, builder system, installable directory [148] subagents + explicit review gates multi-phase work skipping requirements/design/review boundaries versioning role files and workflow specs as the project grows

The thread that runs through every tier is specification-document-driven decomposition: producing an explicit reviewable plan before execution and directing the agent to execute against that artifact. Plan Mode, /deep-planning, Spec Kit, OpenSpec, Kiro, and BMAD are the same pattern at different formalization levels [149], [150]. The framework choice is mostly a choice about how durable that artifact must be:

Decomposition artifact Trigger Review surface Persistence Safe autonomy after approval
Conversation plan (Claude Code Plan Mode) small change, one reviewer, cheap rollback approve in chat session-local low: supervised execution
implementation_plan.md (Cline /deep-planning) multi-file change, session handoff diffable plan file survives restart medium: bounded execution
requirements.md -> design.md -> tasks.md (Kiro / BMAD-adjacent) ambiguous requirements, team handoff staged artifact reviews durable record higher: each gate compounds trust

The heavier the artifact, the more reviewable state lives outside the agent’s current context window, and the more autonomy you can hand back after the gate passes. That is the operator decision: pick the lightest artifact that still survives the handoffs and reviews this task actually needs.

A worked example sharpens the packaging comparison. Say the task is “add per-user notification preferences” — schema, API, UI, tests. Run it through three overlays:

  • Conventions overlay only. A CONVENTIONS.md says “use httpx, type hints required, tests live next to source” [151]. The diff comes back stylistically right but architecturally improvised: notification logic lands in three different modules. The overlay enforced standards, not sequencing. Maintenance cost: keep the file aligned with the codebase.
  • Spec Kit. /specify captures “users can enable/disable email notifications, persisted per-account, default off.” /plan produces architecture and trade-offs. /tasks decomposes implementation. The project constitution rejects the plan if it violates documented invariants. You review the plan before code exists and execute against the approved tasks. Packaged surfaces: three slash commands plus the constitution file. Native substitute: Plan Mode plus a hand-written implementation_plan.md. Maintenance cost: keep the constitution and command playbooks current as conventions evolve.
  • BMAD. The analyst writes the requirements artifact. The PM gates scope. The architect produces a design artifact. Bounded developer tasks are filed by the orchestrator, implemented one by one, and a QA role checks each before the next phase opens [148]. Same feature, four artifacts and five role transitions instead of three commands. Maintenance cost: version the role files and workflow specs as the project grows. You pay this cost only when handoff risk — wrong requirements, drifted design, missed QA — would cost more than the ceremony.

Adopt a framework only for what it packages better than your agent already does natively. If native rules, slash commands, skills, and .claude/agents/ give you the same gate with less ceremony, prefer the lighter shape [152], [153]. If the framework provides shared artifact vocabulary that prevents skipped gates across a team, pay the overhead deliberately.

Native subagent vs framework role: the decision rule. Use native subagent definitions (.claude/agents/, OpenCode’s permission-scoped agents) when you need role: reusable specialist behavior — a security reviewer, a research assistant, a style checker — with bounded tools and a focused prompt [153], [152]. Adopt a framework role system (BMAD’s analyst/architect/developer/QA, SuperClaude’s command-and-persona pairs) when you need role plus gate: phase sequencing, artifact expectations, and human approval points between handoffs [147], [148]. Many BMAD/SuperClaude personas predate native subagent support and are effectively implementing the role half of that pair by hand. If your goal is just named specialists, .claude/agents/ plus a few skill files now wins on ceremony cost. For coordination mechanics across multiple specialists, see Chapter 15; for subagent definition itself, see Chapter 4.

Surface map: what overlays actually expose. Frameworks ship as combinations of slash commands (/specify, /plan, /tasks, /deep-planning), skill files (Superpowers procedures), rules files (CLAUDE.md, AGENTS.md, .clinerules), persona definitions, and occasional MCP connectors that bridge to external systems [154], [155]. Any overlay can be unbundled into these surfaces, which is what makes the maintenance cost real and what makes the native-substitute column on the comparison table possible. The deeper mechanics of MCP servers, hooks, and connectors are out of scope for this chapter.

The ceremony trap. A framework you’ve half-adopted is worse than ad-hoc practice. You pay the setup cost and still skip the review phase under deadline pressure. Don’t adopt BMAD because it signals seriousness — adopt it when you’ll actually do the artifacts. And treat overlay files as code: a conventions file or persona stack that encodes patterns the codebase abandoned six months ago will generate plausible-looking output that conflicts with current state until someone updates the overlay [156].

9.2 Vibe Coding as the Baseline, Not the Framework

Vibe coding is the zero-ceremony anchor on the map: conversational generation with no durable artifact, appropriate when rollback is cheap and you can spot failures by inspection. A Claude Code + Remotion video that renders in under ten minutes is fine to vibe-code; the artifact is throwaway and you iterate by re-prompting [157]. The escalation signal is not “I feel uncertain” but “I cannot verify this from the diff and roll back if I’m wrong.” Once that line crosses, the next overlay up earns its keep. The mode-switching mechanics themselves — when to slide between modes mid-task — live in Chapter 6.

9.3 Lightweight: GSD, Task Briefs, and Conventions Overlays

The lightweight tier pairs two things: a task brief that names “done” in a few bullets, and a conventions overlay that externalizes the recurring instructions you keep retyping. The framework-specific question here is what packaged overlay buys you above the native equivalent — task-sizing mechanics belong to Chapter 8.

The minimum viable overlay is a single markdown file. Aider’s CONVENTIONS.md shows the move concretely: with the file loaded, agent output flips from requests to httpx and gains type hints; without it, it does neither [151]. Vercel Agent now auto-detects AGENTS.md, CLAUDE.md, .cursorrules, and .github/copilot-instructions.md for the same reason — declarative guideline files have become the standard mechanism for feeding agents codebase-specific expectations [155]. Empirical work on Claude Code system prompts shows that prompt-only refinement yields ~5% improvement on general coding tasks and ~10% on repository-specific tasks — the conventions overlay is doing measurable work, not decorative work [158].

GSD- and RIPER-style methodologies layer task-brief discipline on top of the conventions file: state the goal, list two or three acceptance criteria, label the phase (Research, Innovate, Plan, Execute, Review) [150], [159]. The overlay-specific value-add is small: a saved slash command and a shared vocabulary, not installable artifacts. If your team already keeps tight task briefs and a clean rules file, you don’t need a labeled methodology on top — that is exactly the cell where the native substitute wins.

The lightweight tier covers most day-to-day work: bug fixes, boilerplate, single-module refactors, contained features. The overlay maintenance cost is whatever it takes to keep one markdown file aligned with the codebase, which is small if you treat it like code [156]. Heavier overlays such as Superpowers and BMAD build on the same idea — they just add more procedure libraries, more role structure, and more explicit handoff stages.

9.4 Spec Kits and Process Scaffolds

Specification mechanics — when to write a spec, what goes in it, and how to review it — belong to Chapter 7. Decomposition mechanics belong to Chapter 8. The overlay-specific question here is narrower: what does a scaffold add beyond ordinary spec discipline?

The answer is packaging. A hand-written spec can be enough for one person in one session. A process scaffold earns its keep when the sequence, memory, and handoff state need to survive teammates, resets, and repeated failures. The useful additions are concrete:

  • Named command sequence. Spec Kit packages /specify, /plan, and /tasks as repeatable commands backed by prompt templates and scripts [160], [161]. Cline’s /deep-planning packages the same gate as a durable implementation_plan.md. Plan Mode gives you the lightest native version. The value is not that the command is magic; it is that the team stops relying on memory to run intent review before execution.
  • A maintained constitution or rules memory. Spec Kit’s .specify/memory/ constitution captures invariants separately from any one feature, so naming rules, architecture constraints, and breaking-change definitions can be reviewed as process assets [161]. The native substitute is a strong rules file; the scaffold turns it into a first-class artifact with its own maintenance burden [156].
  • Staged artifacts and handoff packets. Kiro-style chains split requirements, design, and tasks so each stage can be accepted or rejected independently. BMAD adds role-shaped handoff packets. AWS Transform uses a config.json, generated plan, clean Git baseline, and validation checkpoints for modernization work [162]. The common benefit is making skipped gates visible.

OpenSpec and Spec Kit are easy to confuse because both sound like “spec-driven development.” OpenSpec is the lighter shape: a small checked-in spec artifact the agent reads and respects, without a constitution, multi-stage templates, or checklist machinery [163], [164]. Spec Kit treats process itself as something the team maintains: command sequence, templates, memory, and review checklists. Choose OpenSpec when the spec needs to survive sessions. Choose Spec Kit when invariants are repeatedly violated across features, or when the team needs a shared vocabulary for what counts as a valid plan.

The adoption rule is to buy the smallest scaffold that fixes the actual failure:

  • If one reviewer can hold the task in working memory, use Plan Mode or a single implementation_plan.md.
  • If the spec must survive resets or handoffs, use an OpenSpec-style checked-in spec.
  • If team invariants keep being missed, add a Spec Kit-style constitution and templates.
  • If requirements, design, implementation, and QA fail at different handoff points, use a Kiro/BMAD-style artifact chain or an AWS Transform-style plan/checkpoint pipeline.

The maintenance cost rises sharply along that list. A stale constitution, outdated command playbook, or half-adopted role chain is worse than a lightweight native workflow because it teaches the team to ignore the gate. The right scaffold is the lightest one whose artifacts the team will actually re-read on the next change; anything heavier degrades into ritual, and anything lighter loses the reviewable-before-execution property that gives the spectrum its value.

9.5 Skills-as-Methodology: Superpowers and Persona Packs

Superpowers, SuperClaude, and BMAD all answer the same question — how do you package specialist behavior so it doesn’t depend on memory or copy-paste — but they answer it at different ceremony levels.

SuperClaude / Superpowers ship as installable skill libraries: roughly nineteen slash commands paired with nine personas across design, development, analysis, and operations phases [147]. Invoke a command and the matching persona prompt, tool list, and procedure load together. The native substitute is a handful of skill files plus .claude/agents/ definitions: a description, a tool list, a role prompt [152], [153]. The framework adds a curated procedure library and a shared invocation vocabulary at the cost of installing and maintaining the overlay. Adopt it when specialists recur across sessions and teammates and when the cost of every developer reinventing the same skills exceeds the cost of keeping the overlay current. Stick with native skills and subagents when only a handful of procedures recur and the team can keep them in sync without a wrapper. Persona-driven QA patterns illustrate the same packaging at smaller scale: a QA persona constrained to UI-only access, mandatory screenshots, and continue-after-finding-bugs behavior produces more thorough testing than generic instructions, but the value comes from packaged constraints plus reviewable artifacts, not from any specific role name [165].

BMAD sits at the full-ceremony end. It packages four things together: named roles (analyst, PM, architect, developer, QA), an artifact chain (requirements -> design -> tasks -> implementation -> review), handoff rules between roles, and explicit approval gates. A builder system lets teams extend the framework with custom roles and workflows [148]. The value is not that the prompts are magical; it’s that the process makes skipped handoffs visible. The artifact chain prevents the common large-agent failure mode: locally correct code with no coherent system-level design.

Detailed practitioner experience reports show that with proper AGENTS.md configuration and constraint-heavy prompts, agents can reliably generate production-quality code [166]. The gating still has to come from somewhere. For large, dense projects, the human needs to know what they want and how to do it first, then guide the agent through explicit upfront design and checkpoints. BMAD’s contribution is making the chain — requirements, architecture, implementation tasks, QA checks — and the named approval points between them a shared, version-controlled artifact rather than tacit team knowledge. Choose it when the cost of drift exceeds the cost of running it.

Role-based prompting is itself a spectrum. BMAD sits at the heavy end. gstack-style prompting sits much lighter: stack a few perspectives inside one prompt — CEO, eng-manager, engineer, QA, release-manager, devops — when you want internal pushback without creating a separate artifact chain [167]. Restated for the heavier end of that spectrum, the decision rule from the map applies: native .claude/agents/ subagents cover role — a reusable specialist with bounded tools. Framework role systems cover role plus gate — phase sequencing, artifact expectations, and approval points wired together. Many BMAD/SuperClaude personas predate native subagent support and are effectively implementing the role half by hand. If you’re adopting them primarily for named specialists, evaluate whether .claude/agents/ plus a few skill files now gives you the same benefit with less ceremony [153]. For the coordination mechanics across specialists, see Chapter 15.

Where autonomous loops fit. Full ceremony and autonomous loops are adjacent on the map but in opposite corners. Full ceremony keeps humans at every gate. Autonomous loops — a shell loop re-feeding a single prompt file until a stop condition fires — strip artifacts and let the agent run unattended against a monotonic verifier [146]. The methodological point here is only why they live in different cells; the execution mechanics belong to Chapter 16. Raise autonomy only when requirements are stable, rollback is cheap, verification is monotonic, external-state risk is low, and the agent has visible budget or permission limits [144].

9.6 Choosing Your Tier

Three factors set the ceremony level: reversibility, blast radius, and complexity. Autonomy is a separate dial — once the ceremony level is set, decide how much execution the agent runs without you. The full switching-signal logic — how to slide between modes mid-task — lives in Chapter 6; the framework-specific question is which overlay to adopt for which kind of work and what it costs to keep alive.

Factor Vibe coding Lightweight Spec kit Full ceremony (BMAD)
Reversibility disposable minutes to fix hours to unwind days, or irreversible
Blast radius only your time one module, caught by tests multiple modules user data, security, regulatory
Packaged surfaces none one rules file commands + constitution + plan artifact roles + artifact chain + gates + builder
Maintenance cost none keep one file current keep constitution and command playbooks current version role files and workflow specs per project
Native substitute n/a CLAUDE.md + saved prompt Plan Mode + implementation_plan.md subagents + explicit review gates

The map is a toolkit, not a ladder. Pick a point per task and shift mid-task when the risk profile turns out different from what you expected — a lightweight task that surfaces a cross-module dependency should escalate to spec-kit gating before the agent digs deeper into the wrong design. Most production work lands in the spec-kit middle, and that’s fine. The skill is recognizing which tasks need more or less, and being honest about whether you’ll actually do the artifacts the heavier framework demands. A half-adopted framework is the worst outcome on the map — you pay the maintenance cost and still ship the failure the gates were meant to catch.

9.7 Takeaways

  • Before evaluating any process framework, place it on two independent axes — ceremony (how many artifacts you must produce) and autonomy (how much the agent runs without you) — because frameworks that look equivalent on a single ladder can sit in opposite corners of the map.
  • Use the trigger ‘I cannot verify this from the diff and roll back if I’m wrong’ — not ‘I feel uncertain’ — as the precise signal that vibe coding must give way to the next overlay tier.
  • Apply the four-level scaffold adoption rule — Plan Mode or a single implementation_plan.md → checked-in OpenSpec → Spec Kit constitution and templates → Kiro/BMAD artifact chain — buying only the level that fixes the failure mode your team is actually experiencing.
  • Use native .claude/agents/ subagent definitions when you need reusable specialist behavior (role); adopt a framework role system such as BMAD or SuperClaude only when you also need phase sequencing, artifact expectations, and human approval points wired between handoffs (role plus gate).
  • Treat overlay files — conventions files, persona stacks, and role definitions — as code: a file that encodes patterns the codebase abandoned six months ago will generate plausible-looking output that conflicts with current state until someone updates it.
  • Treat autonomy as a separate dial from ceremony level: raise agent autonomy only when requirements are stable, rollback is cheap, verification is monotonic, external-state risk is low, and the agent has visible budget or permission limits.