May 10, 2026

Spec-Driven Development: A Practical Guide for Software Engineers Building With AI

If you've used any AI coding agent — Claude Code, Cursor, Copilot, Gemini CLI — beyond a quick script, you've hit this pattern: you describe a feature, the AI generates several hundred lines that look correct, and hours later you're untangling code that subtly missed the actual intent.

This isn't an AI failure. It's a communication failure. And it's the problem Spec-Driven Development (SDD) is designed to solve.

This guide is for software engineers. It covers what SDD actually is at a conceptual level, why the methodology has emerged, the four-phase workflow that makes it work with any AI agent, and a complete worked example using Claude Code so you can see exactly how the theory maps to practice. The methodology is universal — Claude Code is just the example.

What Spec-Driven Development Actually Is

Spec-Driven Development inverts the traditional workflow by treating specifications as the source of truth and code as a generated or verified secondary artifact.

Read that carefully. It's not "write better documentation." It's an inversion of which artifact is primary.

In a traditional codebase, code is the truth. Documentation describes the code; when they disagree, you trust the code. In SDD, the spec is the truth. Code is one of several artifacts derived from the spec — alongside tests, type definitions, API contracts, and observability hooks. When code and spec disagree, you fix the code (or deliberately update the spec).

This inversion sounds academic. It has very practical consequences for how you organize work, review code, and onboard new engineers.

Why This Matters Now

AI agents are not search engines. They're closer to brilliant junior engineers who take everything you say literally. Vague prompts produce code that satisfies the prompt's literal interpretation — not the requirement you had in your head.

When the marginal cost of code generation collapses, the bottleneck shifts. Three years ago, your value as an engineer was reasoning plus typing speed. Today, your value is reasoning plus the ability to express that reasoning precisely enough that an agent can execute it without drift.

This is what the discipline is really about. SDD isn't a productivity hack — it's the rebalancing of where engineering judgment sits when AI writes most of the code.

Three Levels of Specification Rigor

Not every project needs the same rigor. There are three levels of SDD adoption, and choosing the right level for your context matters:

Spec-first is the lightest form. A well thought-out spec is written first, then used in the AI-assisted development workflow. Code becomes the working artifact afterward, with the spec as the reference document. This is where most teams should start — and where most teams should stay.

Spec-anchored keeps the spec alive after the feature ships. The spec is kept even after the task is complete, for evolution and maintenance of the respective feature. When the feature changes, you update the spec first, then regenerate or modify the code to match.

Spec-as-source is the furthest end of the spectrum. The spec is the main source file over time — only the spec is edited by the human, the human never touches the code. Few teams operate here yet; the tooling is still maturing.

For most engineering teams, spec-first is the practical starting point. Everything below assumes that level.

The Four-Phase Workflow

The methodology converges on a four-phase loop that's universal across tools:

Each phase has a specific job, and you don't move to the next one until the current one is validated.

Specify — A high-level description of what you're building and why. The agent generates a detailed specification with edge cases enumerated. Output: requirements.md.

Plan — Translate the spec into architectural decisions: data model, API contract, integration points, error handling strategy. Output: design.md.

Tasks — Break the plan into atomic, testable units. Each task is small enough to implement and validate in isolation. Output: tasks.md.

Implement — Execute tasks one at a time under the spec's constraints. You review focused, scoped diffs instead of thousand-line code dumps.

The phase gates matter. Skipping the plan because "the spec is clear enough" is the most common failure mode. Catching an architectural mistake at the planning stage costs minutes. Catching it after implementation costs hours or days.

Writing Requirements That AI Can Execute: EARS

The single biggest leverage point in any SDD workflow is how you write requirements. Vague requirements produce vague code, regardless of which AI you're using.

EARS — Easy Approach to Requirements Syntax — is the notation that solves this. It was developed at Rolls-Royce, first published in 2009, and has been adopted by many organisations across the world. EARS reduces or even eliminates common problems found in natural language requirements.

It uses five simple patterns that eliminate ambiguity:

Compare these two requirement statements:

❌ Vague: "Handle invalid invites properly."
✅ EARS: "IF the invite link is expired or already used, THEN the system shall return HTTP 410 with the message 'This invite is no longer valid' and offer to request a new one."

The second is executable. The AI knows the status code, the message, and the response shape. The first is wishful thinking — the AI will guess, and the guess might not match yours.

The principle is straightforward: specs with checkable success criteria — "npm test passes" or "curl returns 200" — produce more reliable outcomes than specs with interpretable criteria like "well-structured code" or "good performance." "Every function must have a docstring. Maximum function length: 50 lines" is followed more consistently than "write clean, well-structured code."

The Worked Example: SDD with Claude Code

The methodology is universal. The implementation details vary by tool. Let's walk through a complete example using Claude Code — the patterns translate directly to Cursor, Copilot, Aider, or any other agent that supports project-level instructions.

Project Structure

This is where SDD becomes concrete. The structure below is a synthesis of patterns from production teams using Claude Code at scale:

your-project/
├── CLAUDE.md                    # Loaded every session — project rules
├── .claude/
│   ├── settings.json           # Permissions and hooks (checked into git)
│   ├── commands/               # Custom slash commands
│   │   ├── spec-create.md
│   │   ├── spec-design.md
│   │   ├── spec-tasks.md
│   │   └── spec-implement.md
│   └── skills/                 # Project-specific skills (optional)
│       └── review/
│           └── SKILL.md
├── .claudeignore               # Files to exclude from context
├── specs/                      # All feature specs live here
│   └── [feature-name]/
│       ├── requirements.md     # WHAT — user stories + EARS criteria
│       ├── design.md           # HOW — architecture and contracts
│       └── tasks.md            # WHEN — ordered, atomic tasks
├── src/
│   └── ...
└── tests/
    └── ...

A few principles drove this layout:

CLAUDE.md at the root. Claude Code reads it automatically every session. This is your project constitution — the rules and conventions that apply to all work. Keep it short. Under 50 lines is a good ceiling.

specs/[feature-name]/ for each feature. All specs go in specs/{feature_name}/ with three files: requirements.md (user stories with EARS acceptance criteria), design.md (technical architecture and implementation guidance), and tasks.md (incremental coding tasks). This three-file pattern is the de facto standard across SDD frameworks for Claude Code.

Subdirectory CLAUDE.md files for module-specific context. You can drop additional CLAUDE.md files in subdirectories. Claude Code picks these up when it's working in that part of the tree, so you can give module-specific context without bloating the root file. Use this for things like component naming conventions in src/components/, business logic patterns in src/lib/, or auth requirements in src/api/.

.claudeignore to exclude noise. Generated files, build outputs, large fixtures — exclude anything that pollutes context without adding value.

Setting Up `CLAUDE.md`

Your CLAUDE.md should answer three questions for any agent that picks up your project: what is this, how do we work, and what should you never do. A practical starter:

# Project Overview 
SaaS analytics dashboard. Next.js 14 (App Router), TypeScript,
PostgreSQL with Drizzle ORM, Tailwind for styling.

# Workflow Rules 
- Always read the relevant spec in /specs before writing code
- Never exceed defined scope without asking first
- Write tests for every requirement
- Update the spec if requirements change mid-implementation
- Use AskUserQuestion when intent is unclear

# Code Standards 
- TypeScript strict mode, no `any` - Functions max 50 lines, files max 300 lines
- Every public function needs a JSDoc comment
- Use existing patterns in /src/lib before creating new ones

# Testing 
- Unit tests with Vitest, co-located as *.test.ts
- Run `npm test` and `npm run lint` before declaring done

# Commands 
- npm run dev — start dev server
- npm run test:unit — unit tests
- npm run test:e2e — end-to-end tests
- npm run lint — linter

A few things to notice:

Be specific about commands. Don't write "we use Jest." Write npm run test:unit and npm run test:e2e. The agent will run the command you give it, so give it the right one. Call out the weird stuff. If there's some legacy decision that looks wrong but is intentional, say so. Otherwise the agent will "fix" it for you.

Keep it current. A stale CLAUDE.md actively misleads. If you switched from yarn to pnpm six months ago, update the file. It takes two minutes.

Writing the Spec: A Real Example

Let's walk through building a "team invites" feature. We'll use the interview pattern — instead of writing the spec yourself, have the agent interview you first.

Step 1: Start the spec session.

I want to build a team invites feature.

Interview me in detail using the AskUserQuestion tool. Ask about technical implementation, UI/UX, edge cases, concerns, and tradeoffs. Don't ask obvious questions — dig into the hard parts I might not have considered.

Keep interviewing until we've covered everything, then write a complete spec to specs/team-invites/requirements.md in EARS format.

The agent will surface questions you didn't think to ask:

"What happens if the invite email bounces?"
"Should the invite link be invalidated after first use?"
"What if the invitee's email matches a deleted account?"
"Do invites expire? After how long?"
"Should admins be able to revoke an invite before acceptance?"

Answer these. They become bugs three sprints later if you don't.

Step 2: The agent writes requirements.md.

# Feature: Team Invites

## Problem
Workspace admins need to invite teammates by email so they
can collaborate without sharing accounts.

## Users
- Admins (can send invites)
- Invitees (receive emails, click links to join)

## Functional Requirements (EARS)

### Invite Creation
THE SYSTEM SHALL restrict invite creation to users with
the `admin` role.

WHEN an admin submits an invite form with a valid email,
THE SYSTEM SHALL create an invite record and send an email
containing a signed link valid for 7 days.

IF the email already belongs to a workspace member,
THEN THE SYSTEM SHALL return "User already in workspace"
without creating an invite.

### Invite Acceptance
WHILE an invite is unaccepted and within 7 days of creation,
THE SYSTEM SHALL allow the invitee to complete signup.

IF the invite link is expired or already used,
THEN THE SYSTEM SHALL display "This invite is no longer
valid" and offer to request a new one.

## Non-goals
- Bulk invites (next iteration)
- Custom invite messages
- Role assignment beyond default member role

## Acceptance Criteria
- [ ] POST /invites returns 201 with invite ID
- [ ] Email arrives within 30 seconds
- [ ] Expired links return HTTP 410
- [ ] All cases covered by integration tests
- [ ] `npm test` passes
- [ ] `npm run lint` passes

Step 3: Generate design.md.

After approving requirements, ask the agent to plan the architecture:

Read specs/team-invites/requirements.md.

Write specs/team-invites/design.md covering:
- Data model changes (tables, columns, indexes)
- API contract (endpoints, request/response shapes)
- Integration points (email service, auth)
- Error handling strategy
- Migration plan if touching existing data
⁠
 Do not write code yet. Wait for my approval.

Read the design carefully before approving. This is your last cheap checkpoint.

Step 4: Generate tasks.md.

Based on requirements.md and design.md, write specs/team-invites/tasks.md as an ordered list of atomic tasks. Each task must:
⁠
- Be implementable in under 30 minutes
- Have clear acceptance criteria
- Be independently testable
- Reference which requirement(s) it satisfies
 Order tasks by dependency.

You'll get something like:

# Tasks: Team Invites

## T1: Database migration
- Add `invites` table (id, email, workspace_id, token,
  expires_at, used_at, created_at)
- Index on (token), (email, workspace_id)
- Satisfies: REQ-1, REQ-2

## T2: Invite service
- `createInvite(email, workspaceId)` — returns invite + token
- `validateInvite(token)` — returns invite or throws
- `acceptInvite(token, userId)` — marks used, joins workspace
- Satisfies: REQ-2, REQ-4, REQ-5

## T3: API endpoints
- POST /invites (admin-only)
- GET /invites/:token
- POST /invites/:token/accept
- Satisfies: REQ-2, REQ-3, REQ-4

## T4: Email integration
- Send invite email with signed link via SendGrid
- Satisfies: REQ-2

## T5: Tests
- Integration tests covering all REQ-* paths
- Satisfies: All

Implementation: One Task at a Time

Here's a critical move that separates serious SDD users from casual ones: start a fresh session for implementation.

Each subagent gets a fresh context window focused entirely on its specific task, reads what it needs, implements, and returns. This means the main agent won't run out of context even for larger refactors with dozens of tasks.

Context bleed from the spec phase is the most common cause of drift during execution. The spec should carry the intent, not the conversation history.

# Terminal session 1: Spec phase
$ claude
> [interview, write requirements.md, design.md, tasks.md]
$ exit

# Terminal session 2: Implementation (FRESH context)
$ claude
> Read specs/team-invites/. Implement T1.
  Stay strictly within scope. Run tests when done.
  Commit with message "feat(invites): T1 - database migration".

# [Claude implements T1, runs tests, commits]

> Implement T2.
> Implement T3.
# ...

This is where SDD pays off. Instead of reviewing a 1000-line PR, you review focused 50–100 line changes that map cleanly to a numbered task and a numbered requirement.

Custom Slash Commands

Once you've done this a few times, encode the workflow in custom commands. Drop these in .claude/commands/:

.claude/commands/spec-create.md:

Create a new feature spec.

Usage: /spec-create [feature-name] "[brief description]"

Steps:
1. Create directory specs/[feature-name]/
2. Interview the user using AskUserQuestion about technical
   implementation, UI/UX, edge cases, concerns, and tradeoffs
3. Write requirements.md in EARS format
4. Confirm before moving to design phase

.claude/commands/spec-design.md:

Generate the design document for an existing spec.

Usage: /spec-design [feature-name]

Steps:
1. Read specs/[feature-name]/requirements.md
2. Write specs/[feature-name]/design.md covering:
   - Data model
   - API contract
   - Integration points
   - Error handling
3. Do NOT write code. Wait for approval.

Now the workflow is invokable as /spec-create team-invites "team invitation flow". The pattern survives team handoffs because it's encoded in the repo, not in someone's head.

What Code Review Looks Like Under SDD

Code review changes shape under SDD. Instead of a single review checking everything, you have two layers:

Review the spec, not the diffs. The diffs should be obviously correct given the spec. If they're not, the bug is either in the spec or the spec wasn't followed — both of which are easier to spot than reading 400 lines of generated code line by line.

The transformation is significant: instead of being interrupted dozens of times during implementation, you review three documents upfront (requirements, design, tasks) and then let the implementation agent work. Your approval count during implementation drops because the important decisions are already made.

Recovery When Things Go Wrong

Here's a real benefit nobody mentions until they've experienced it. When a session goes sideways or context gets polluted, you don't lose everything — you have a document that captures the full intent and design decisions. The spec acts as a recovery point. If the implementation hits errors, open a new chat, pin the spec document, paste the error, and the agent fixes it immediately. No context rebuilding, no re-explaining the architecture.

Without a spec, an interrupted session means starting over. With a spec, you resume.

When NOT to Use SDD

Be honest about where it doesn't earn its keep. Skip SDD entirely when the task is a single-file bug fix, a formatting change, or a well-understood CRUD operation that a single prompt can handle.

A useful heuristic: if you'd be annoyed to have the agent interpret requirements differently than you meant, write the spec. If you could fix any drift in the output with a quick follow-up prompt, just prompt directly.

The discipline only earns its keep on features complex enough to deserve a real code review. Don't write a 30-page spec for a button color change.

The Failure Modes Teams Underestimate

SDD is not a silver bullet. There are four predictable ways teams fail at it:

Failure 1: Spec as bureaucracy. If writing the spec feels heavier than writing the code, the team will quietly stop and revert to prompt-driven coding. The fix is investing in templates, examples, and slash commands that make spec authoring fast.

Failure 2: Over-specification. A spec that prescribes implementation details rather than contracts removes the leverage AI provides. "Use Redis with 60-second TTL" is implementation. "The system shall cache invite lookups; cache invalidation must occur within 60 seconds of mutation" is a spec. Let the agent choose Redis.

Failure 3: Spec/code drift. When engineers patch code without updating the spec, the spec becomes a lie and the team loses the regeneration property that makes SDD valuable. You need CI checks that fail when code and spec diverge — or, at minimum, a team norm of "update the spec before merging."

Failure 4: Cultural resistance. Engineers who built their identity around writing elegant code can feel displaced when the elegant artifact is a spec. The teams that navigate this well frame SDD as elevation rather than replacement — engineers are designing systems at a higher level, not being demoted to prompt operators.

A Starter Checklist

## Before any non-trivial feature
- [ ] Created /specs/[feature]/ folder
- [ ] Wrote requirements.md with EARS-formatted requirements
- [ ] Defined explicit non-goals
- [ ] Listed acceptance criteria as checkable items
- [ ] Reviewed CLAUDE.md is current

## Before implementation
- [ ] design.md exists and is approved
- [ ] tasks.md breaks work into <30-min units
- [ ] Started a fresh AI session

## After each task
- [ ] Code references the requirement(s) it satisfies
- [ ] Tests pass
- [ ] Committed with clear message
- [ ] Updated spec if requirements changed

## Before merging
- [ ] All tasks complete
- [ ] All acceptance criteria met
- [ ] Spec and code agree

The Bottom Line

Spec-Driven Development is what AI-assisted engineering looks like when it grows up. It accepts that AI will write most of the code, and responds by making the human contribution sharper, more leveraged, and more durable.

Most teams using AI coding agents are flying blind. They prompt, receive output, patch errors, and repeat — a cycle that degrades in quality as codebase complexity grows. The root cause is not the AI. It's the absence of a disciplined framework for communicating intent to the agent.

The methodology is universal. The example here used Claude Code because it has clean primitives — CLAUDE.md, slash commands, the Tasks system — that map directly to SDD phases. The same patterns work in Cursor with .cursorrules, in Copilot with custom instructions, in Aider with CONVENTIONS.md. The tool is interchangeable. The discipline is not.

The engineers shipping the best AI-assisted software aren't the ones with the best prompts. They're the ones with the best specs.

Start your next non-trivial feature by creating specs/feature-name/requirements.md. The rest follows from there.

jaugusto.dev