Spec Driven Development

SPEC DRIVEN

DEVELOPMENT

A methodology for AI-native software engineering.
One spec. Two agents. Complete traceability.

MARKDOWN NO FRAMEWORK NO SDK OPEN PROCESS

01

Context

WHAT IS SDD?

Spec Driven Development has roots in formal specification, design-by-contract, and the broader shift toward treating specifications as executable artefacts. StrongDM, Anthropic, and others are already building production software from specs using AI agents.

This presentation introduces a practical method for putting SDD into practice.

The unique contribution is provenance — the mechanism that makes specification-driven workflows auditable, traceable, and verifiable. How agents communicate through documentation. How reasoning is captured as a byproduct of building. How that creates compliance-ready evidence without additional effort.

SDD EXISTS

The concept of building software from specifications using AI agents is established and in production today.

WHAT'S NEW HERE

A concrete, open method — the provenance chain, the builder-tester separation, and the templates to implement it today.

WHAT YOU'LL LEAVE WITH

A workflow, three templates, and the understanding of why provenance is the core innovation.

02

The Problem

NOT ALL AI CODING IS EQUAL.

Dan Shapiro's Five Levels of Vibe Coding maps where teams actually operate — vs where they think they are.

← 90% of developers are here

03

The Opportunity

SDD GETS YOU
TO LEVEL 3–4.

Level 5 — the fully autonomous dark factory — that's a whole different problem set. Different tooling, different organisational structure, different economics.

But the method described in this deck can move you from L0–2 to L3–4 today. Write specs. Delegate to agents. Verify through provenance. That's the transition from writing code to directing agents.

L0–2 — WHERE YOU ARE

AI suggests, you accept or reject. You write code alongside an AI assistant. The human is still the bottleneck.

L3–4 — WHERE SDD TAKES YOU

You specify, the agent builds, a separate agent verifies. Provenance captures the reasoning. The human writes specs and reviews results.

L5 — DARK FACTORY

Fully autonomous. No humans in the loop. Digital twins, external scenarios, thousand-dollar daily compute budgets. A different talk.

04

The Insight

IT'S ALL
ABOUT
THE SPEC.

With agentic AI, the spec you hand the agent is the implementation instruction. The agent reads your spec and builds the software. If the spec is structured well enough, it also defines the verification scenarios — not as a side effect, but inherently.

The spec is the single source of truth for what the software does, how it gets built, and how it gets verified.

THE OLD BOTTLENECK

Implementation speed. Can we build it fast enough? Can we hire enough engineers?

THE NEW BOTTLENECK

Specification quality. Can we describe what needs to exist precisely enough that agents can build it?

The skill that matters now is the ability to specify — clearly, completely, and verifiably.

05

The Architecture

THE INFINITY LOOP

The spec sits at the top. Scenarios sit at the bottom. Provenance is the crossing point where the two agents communicate. Code is the canonical context — the reality everything else orbits.

LEFT LOOP — BUILDER

Reads spec → Writes code → Produces provenance

CROSSING — PROVENANCE

Both agents read and write. Neither talks directly. The document is the interface.

RIGHT LOOP — TESTER

Reads spec + provenance → Writes scenarios → Runs tests

CODE — CANONICAL CONTEXT

Self-describing. Any agent reads provenance to understand why, reads code to understand what.

06

The Spec — What It Contains

THE SPEC SPECIFIES
EVERYTHING.

The spec defines both the product and the process that produces the product. Not just what to build — but who does the work, what each worker is responsible for, and how they communicate.

REQUIREMENTS

Functional, non-functional, constraints, assumptions. Specific enough that an agent can implement them and a separate agent can verify them.

ARCHITECTURE

Component boundaries, data flow, interfaces. Enough for the builder to make decisions and enough for the tester to know what to probe.

AGENT ROLES

Builder and tester roles defined in the spec itself. At prompt time: "You are the builder. Here is the spec. Do your job."

PROCESS

What each agent reads, what it produces, where it writes. The spec is the orchestrator. No LangGraph. No supervisor agent. The workflow is the spec.

07

The Builder Agent

BUILD AND
SHOW YOUR
WORKING.

The builder doesn't just write code. It produces evidence of how it interpreted the spec — every assumption, every ambiguity, every decision. It's not marking its own homework. It's handing in its homework with its working shown.

01

Read the spec

Full specification, prerequisites, current state

02

Build the software

Implement as specified

03

Write provenance as you go

Assumptions, ambiguities, decisions — not after the fact

04

Commit spec + code + provenance

One atomic unit. Never separated.

✕

Do not write tests

That is not your role

08

The Testing Agent

CHALLENGE
EVERY
ASSUMPTION.

A separate agent because the builder has blind spots. If the builder misunderstood the spec, it will write code that reflects the misunderstanding and tests that confirm it. Everything passes. Everything's wrong. The testing agent reads the spec and provenance — never the code — and finds the daylight between them.

01

Read the spec

Full specification — the authority

02

Read the provenance

What the builder claims it did and why

03

Find the daylight

Gaps, assumptions, ambiguities, silences

04

Write prose scenarios

Plain language — what's being tested and why

05

Implement tests from scenarios

Executable code, derived from the prose

06

Update the provenance

Findings, results, recommendations — append, never overwrite

09

Provenance

THE REASONING
RECORD.
NOT A MAP.

Code is self-describing — any agent can read a codebase and understand its structure, patterns, and dependencies. What the code can't tell you is why. Why is the timeout 30 minutes? Why this library and not that one? Why does this module exist at all? That's the provenance. The reasoning layer that answers the questions code can't answer about itself.

CODE — THE WHAT

Self-describing context
Any agent can read and navigate it
The canonical reality of the system
What exists right now

PROVENANCE — THE WHY

Decisions made, assumptions held
Ambiguities interpreted
Layered: builder writes, tester appends
Why it exists this way

TOGETHER

Agent reads code → understands what
Agent reads provenance → understands why
Context window rebuilt from scratch each session
Provenance is pre-loaded understanding

10

The Testing Agent

THE CROSS-
EXAMINATION.

The testing agent never sees the code. It has two inputs: the spec (what was intended) and the provenance (what the builder says it did). Its job is to find the daylight between those two documents.

A separate agent because the builder has blind spots. If the builder misunderstood the spec, it will write code that reflects the misunderstanding and tests that confirm it. Everything passes. Everything's wrong.

Gaps

Requirements the provenance doesn't address

Assumptions

Decisions the builder made where the spec was silent — primary targets

Ambiguities

Places the builder interpreted unclear requirements

Silences

Things the builder didn't mention at all — red flags

11

Scenarios — Prose First

PROSE FIRST.
CODE SECOND.

The testing agent writes a markdown scenario — plain language explaining what's being tested and why — before it writes a single line of test code. The code is derived from the prose, not the other way around.

A product owner can read the scenario. A regulator can read it. A client who knows nothing about code can say "yes, that's the right question to ask" or "actually, don't test for that — update the spec."

S-003: TOKEN EXPIRY HANDLING

TRIGGERED BY: ASSUMPTION A1

The spec requires authentication on all protected routes. The provenance states the builder implemented JWT validation with a 30-minute expiry, but the spec is silent on expiry duration.

EXPECTS:

Expired tokens return 401
Expiry period is documented

FAILS IF:

Expired token returns 200
No expiry validation exists

TEST: tests/auth/token-expiry.test.ts#L12

12

The Loop — Closing It

FAILING TESTS ARE
WORK ORDERS,
NOT BUG REPORTS.

A failing test with provenance is a diagnosis. The builder doesn't get "line 47 assertion failed." It gets the prose scenario, the gap between spec and provenance, and a recommendation for what to fix.

FAIL

Scenario S-003 fails. Token expiry not validated.

→

READ

Builder reads provenance. Tester's findings explain what and why.

→

FIX

Builder fixes code. Updates provenance with new entry.

→

PASS

Tester re-runs. Scenario passes. Loop continues.

No human touched the code. No human wrote a test. No human triaged a bug. A human wrote a spec. Everything else is derived.

13

The Provenance Chain

FIVE ARTEFACTS.
FIVE PURPOSES.
COMPLETE LINEAGE.

SPEC

Intent

What should exist
and why.

CODE

Reality

Self-describing.
Canonical context.

PROVENANCE

Reasoning

Why it's this way.
The decisions made.

SCENARIOS

Challenge

Plain language.
What's being verified.

TESTS

Execution

Derived from scenarios.
Pass or fail.

The spec describes what the code should be. The code describes itself. The provenance explains why the code is the way it is. The scenarios challenge the code based on the gap between spec and provenance. The tests execute against the code.

Code is the reality every other artefact exists in relation to. Provenance is the reasoning that makes the code navigable.

14

Why It's Different

MARKDOWN AND
A PROCESS.

THAT'S IT.

No SDK. No framework. No vendor lock-in. No orchestration engine. Anyone with access to an AI agent and a markdown editor can implement SDD today. The entire methodology fits in three templates.

THE INDUSTRY SAYS

Install our agent framework
Use our evaluation platform
Buy our orchestration tool
Adopt our scoring rubrics

SDD SAYS

Write a spec in markdown
Tell one agent to build
Tell another agent to verify
Let the documents do the rest

The spec is the orchestrator.
The provenance is the protocol.
The tools are whatever you have.

15

The Value

WHAT SDD
GIVES YOU.

TRACEABILITY

Every test traces to a scenario, every scenario to a provenance entry, every entry to a spec requirement. End to end.

COMPLIANCE

Provenance is an audit trail. EU AI Act, SOC 2, ISO 27001 — the chain of evidence already exists.

SEPARATION

Builder and tester can't share blind spots. Same principle as code review, enforced architecturally.

SIMPLICITY

Three markdown templates. No framework. No vendor. The human maintains one artefact: the spec.

ACCESSIBILITY

Anyone with an AI agent and a text editor. No specialised tooling. The methodology is the tool.

SPEC QUALITY

Forces rigorous specification. Vague specs produce vague provenance, vague scenarios, unjudgeable outcomes.

16

The Landscape

WHAT SDD DOES
NOT GIVE YOU.

MULTI-AGENT ORCHESTRATION

SDD uses two roles talking through documents. If you need dozens of agents coordinating dynamically in real-time, LangGraph, CrewAI, and AutoGen solve that problem. Most teams aren't there yet.

REAL-TIME AGENT MONITORING

SDD tests the software, not the agent. If you need continuous evaluation of agent behaviour in production — drift, hallucination, alignment — Arize, Braintrust, and Bloom address that.

DYNAMIC TOOL DISCOVERY

SDD specs are static documents. If your agents need to discover and compose tools at runtime, MCP servers and tool registries are built for that.

LONG-TERM AGENT MEMORY

SDD uses provenance as persistent context, but it doesn't give you vector stores, RAG pipelines, or cross-session memory systems.

These technologies solve real problems. But they solve advanced problems. SDD sharpens the thinking that makes every other tool more effective.

Start with the spec. Graduate to complexity when the problem demands it.

17

Live Demo

LIVE

DEMO

SDD in action. One spec. Two agents. Provenance, scenarios, and tests — generated live.

01 — THE SPEC

A real spec with agent roles, requirements, and constraints. Markdown. Nothing else.

02 — THE BUILDER

Hand the spec to the builder agent. Watch it build and produce provenance — assumptions, decisions, ambiguities — in real time.

03 — THE TESTER

Hand the spec and provenance to a separate agent. Watch it find the daylight, write prose scenarios, and generate executable tests.

04 — THE LOOP

Failing scenario → builder reads provenance → fixes → tester re-runs. The infinity loop, live.

18

THE BOTTLENECK HAS
MOVED FROM CODE
TO SPECIFICATION.

SDD IS THE METHOD.

One spec. Two agents. Five artefacts. Complete traceability from intent to verification.
No framework required. Start today.

Kevin Ryan & Associates

kevinryan.io

sddbook.com

19

Bonus Content

BONUS

CONTENT

Provenance, audit, and regulation — why SDD is a compliance
asset for SOC 2, ISO 27001, and the EU AI Act.

SOC 2 ISO 27001 EU AI ACT AUGUST 2026

20

Provenance & Compliance

YOUR AUDITOR'S AI
IS GOING TO ASK
HOW THIS WAS BUILT.

When a human builds software, the reasoning exists in their head, in Slack threads, in PR comments. You can reconstruct it — badly — after the fact. When an agent builds software, the reasoning exists in the context window. The session ends. The reasoning evaporates. Unless you capture it.

You cannot retrospectively create provenance for decisions that were never documented. SDD means you never have to.

SOC 2

Change Management Controls

Auditors require evidence that changes are authorised, documented, and traceable. SDD's provenance chain is that evidence — spec to code to verification.

ISO 27001

Annex A — Secure Development

Requires documented development procedures, separation of duties, and design review records. SDD's builder-tester separation and layered provenance satisfy this structurally.

EU AI ACT

Articles 11, 12, 19 — Full Force August 2026

Technical documentation, automatic record-keeping, and retained logs for high-risk AI systems. Provenance is all three — generated in real time, not written after the fact.

21

Provenance & Compliance

COMPLIANCE
AS A BYPRODUCT.
NOT A PROJECT.

Most organisations will try to bolt compliance documentation onto AI-built systems after the fact. They will hire consultants to retrospectively construct the design history that auditors and regulators demand. SDD produces that documentation as an inherent part of the build. You don't retrofit it. It already exists.

AUDITOR ASKS

Why does this system behave this way?

→

SCENARIO

This test exists because provenance entry C1 challenged assumption A3.

→

PROVENANCE

A3 was assumed because the spec was silent on expiry. Builder chose 30 minutes.

→

SPEC

Requirement FR-007: all protected routes require authentication.

€35M

EU AI Act — 7% turnover
Prohibited practices

SOC 2
FAIL

Lost contracts, lost clients
Unrecoverable trust damage

ISO 27001
NC

Non-conformity finding
Certification at risk

SDD solves the documentation and traceability problem across all three frameworks — the one that requires you to have been recording decisions from the start.

22