A methodology for AI-native software engineering. One spec. Two agents. Complete traceability.
MARKDOWNNO FRAMEWORKNO SDKOPEN PROCESS
01
Context
WHAT IS SDD?
Spec Driven Development has roots in formal specification, design-by-contract, and the broader shift toward treating specifications as executable artefacts. StrongDM, Anthropic, and others are already building production software from specs using AI agents.
This presentation introduces a practical method for putting SDD into practice.
The unique contribution is provenance — the mechanism that makes specification-driven workflows auditable, traceable, and verifiable. How agents communicate through documentation. How reasoning is captured as a byproduct of building. How that creates compliance-ready evidence without additional effort.
SDD EXISTS
The concept of building software from specifications using AI agents is established and in production today.
WHAT'S NEW HERE
A concrete, open method — the provenance chain, the builder-tester separation, and the templates to implement it today.
WHAT YOU'LL LEAVE WITH
A workflow, three templates, and the understanding of why provenance is the core innovation.
02
The Problem
NOT ALL AI CODING IS EQUAL.
Dan Shapiro's Five Levels of Vibe Coding maps where teams actually operate — vs where they think they are.
← 90% of developers are here
03
The Opportunity
SDD GETS YOU TO LEVEL 3–4.
Level 5 — the fully autonomous dark factory — that's a whole different problem set. Different tooling, different organisational structure, different economics.
But the method described in this deck can move you from L0–2 to L3–4 today. Write specs. Delegate to agents. Verify through provenance. That's the transition from writing code to directing agents.
L0–2 — WHERE YOU ARE
AI suggests, you accept or reject. You write code alongside an AI assistant. The human is still the bottleneck.
L3–4 — WHERE SDD TAKES YOU
You specify, the agent builds, a separate agent verifies. Provenance captures the reasoning. The human writes specs and reviews results.
L5 — DARK FACTORY
Fully autonomous. No humans in the loop. Digital twins, external scenarios, thousand-dollar daily compute budgets. A different talk.
04
The Insight
IT'S ALL ABOUT THE SPEC.
With agentic AI, the spec you hand the agent is the implementation instruction. The agent reads your spec and builds the software. If the spec is structured well enough, it also defines the verification scenarios — not as a side effect, but inherently.
The spec is the single source of truth for what the software does, how it gets built, and how it gets verified.
THE OLD BOTTLENECK
Implementation speed. Can we build it fast enough? Can we hire enough engineers?
THE NEW BOTTLENECK
Specification quality. Can we describe what needs to exist precisely enough that agents can build it?
The skill that matters now is the ability to specify — clearly, completely, and verifiably.
05
The Architecture
THE INFINITY LOOP
The spec sits at the top. Scenarios sit at the bottom. Provenance is the crossing point where the two agents communicate. Code is the canonical context — the reality everything else orbits.
LEFT LOOP — BUILDER
Reads spec → Writes code → Produces provenance
CROSSING — PROVENANCE
Both agents read and write. Neither talks directly. The document is the interface.
Self-describing. Any agent reads provenance to understand why, reads code to understand what.
06
The Spec — What It Contains
THE SPEC SPECIFIES EVERYTHING.
The spec defines both the product and the process that produces the product. Not just what to build — but who does the work, what each worker is responsible for, and how they communicate.
REQUIREMENTS
Functional, non-functional, constraints, assumptions. Specific enough that an agent can implement them and a separate agent can verify them.
ARCHITECTURE
Component boundaries, data flow, interfaces. Enough for the builder to make decisions and enough for the tester to know what to probe.
AGENT ROLES
Builder and tester roles defined in the spec itself. At prompt time: "You are the builder. Here is the spec. Do your job."
PROCESS
What each agent reads, what it produces, where it writes. The spec is the orchestrator. No LangGraph. No supervisor agent. The workflow is the spec.
07
The Builder Agent
BUILD AND SHOW YOUR WORKING.
The builder doesn't just write code. It produces evidence of how it interpreted the spec — every assumption, every ambiguity, every decision. It's not marking its own homework. It's handing in its homework with its working shown.
01
Read the spec
Full specification, prerequisites, current state
02
Build the software
Implement as specified
03
Write provenance as you go
Assumptions, ambiguities, decisions — not after the fact
04
Commit spec + code + provenance
One atomic unit. Never separated.
✕
Do not write tests
That is not your role
08
The Testing Agent
CHALLENGE EVERY ASSUMPTION.
A separate agent because the builder has blind spots. If the builder misunderstood the spec, it will write code that reflects the misunderstanding and tests that confirm it. Everything passes. Everything's wrong. The testing agent reads the spec and provenance — never the code — and finds the daylight between them.
01
Read the spec
Full specification — the authority
02
Read the provenance
What the builder claims it did and why
03
Find the daylight
Gaps, assumptions, ambiguities, silences
04
Write prose scenarios
Plain language — what's being tested and why
05
Implement tests from scenarios
Executable code, derived from the prose
06
Update the provenance
Findings, results, recommendations — append, never overwrite
09
Provenance
THE REASONING RECORD. NOT A MAP.
Code is self-describing — any agent can read a codebase and understand its structure, patterns, and dependencies. What the code can't tell you is why. Why is the timeout 30 minutes? Why this library and not that one? Why does this module exist at all? That's the provenance. The reasoning layer that answers the questions code can't answer about itself.
CODE — THE WHAT
Self-describing context Any agent can read and navigate it The canonical reality of the system What exists right now
PROVENANCE — THE WHY
Decisions made, assumptions held Ambiguities interpreted Layered: builder writes, tester appends Why it exists this way
TOGETHER
Agent reads code → understands what Agent reads provenance → understands why Context window rebuilt from scratch each session Provenance is pre-loaded understanding
10
The Testing Agent
THE CROSS- EXAMINATION.
The testing agent never sees the code. It has two inputs: the spec (what was intended) and the provenance (what the builder says it did). Its job is to find the daylight between those two documents.
A separate agent because the builder has blind spots. If the builder misunderstood the spec, it will write code that reflects the misunderstanding and tests that confirm it. Everything passes. Everything's wrong.
Gaps
Requirements the provenance doesn't address
Assumptions
Decisions the builder made where the spec was silent — primary targets
Ambiguities
Places the builder interpreted unclear requirements
Silences
Things the builder didn't mention at all — red flags
11
Scenarios — Prose First
PROSE FIRST. CODE SECOND.
The testing agent writes a markdown scenario — plain language explaining what's being tested and why — before it writes a single line of test code. The code is derived from the prose, not the other way around.
A product owner can read the scenario. A regulator can read it. A client who knows nothing about code can say "yes, that's the right question to ask" or "actually, don't test for that — update the spec."
S-003: TOKEN EXPIRY HANDLING
TRIGGERED BY: ASSUMPTION A1
The spec requires authentication on all protected routes. The provenance states the builder implemented JWT validation with a 30-minute expiry, but the spec is silent on expiry duration.
EXPECTS:
Expired tokens return 401 Expiry period is documented
FAILS IF:
Expired token returns 200 No expiry validation exists
TEST: tests/auth/token-expiry.test.ts#L12
12
The Loop — Closing It
FAILING TESTS ARE WORK ORDERS, NOT BUG REPORTS.
A failing test with provenance is a diagnosis. The builder doesn't get "line 47 assertion failed." It gets the prose scenario, the gap between spec and provenance, and a recommendation for what to fix.
FAIL
Scenario S-003 fails. Token expiry not validated.
→
READ
Builder reads provenance. Tester's findings explain what and why.
→
FIX
Builder fixes code. Updates provenance with new entry.
→
PASS
Tester re-runs. Scenario passes. Loop continues.
No human touched the code. No human wrote a test. No human triaged a bug. A human wrote a spec. Everything else is derived.
13
The Provenance Chain
FIVE ARTEFACTS. FIVE PURPOSES. COMPLETE LINEAGE.
SPEC
Intent
What should exist and why.
CODE
Reality
Self-describing. Canonical context.
PROVENANCE
Reasoning
Why it's this way. The decisions made.
SCENARIOS
Challenge
Plain language. What's being verified.
TESTS
Execution
Derived from scenarios. Pass or fail.
The spec describes what the code should be. The code describes itself. The provenance explains why the code is the way it is. The scenarios challenge the code based on the gap between spec and provenance. The tests execute against the code.
Code is the reality every other artefact exists in relation to. Provenance is the reasoning that makes the code navigable.
14
Why It's Different
MARKDOWN AND A PROCESS.
THAT'S IT.
No SDK. No framework. No vendor lock-in. No orchestration engine. Anyone with access to an AI agent and a markdown editor can implement SDD today. The entire methodology fits in three templates.
SDD uses two roles talking through documents. If you need dozens of agents coordinating dynamically in real-time, LangGraph, CrewAI, and AutoGen solve that problem. Most teams aren't there yet.
REAL-TIME AGENT MONITORING
SDD tests the software, not the agent. If you need continuous evaluation of agent behaviour in production — drift, hallucination, alignment — Arize, Braintrust, and Bloom address that.
DYNAMIC TOOL DISCOVERY
SDD specs are static documents. If your agents need to discover and compose tools at runtime, MCP servers and tool registries are built for that.
LONG-TERM AGENT MEMORY
SDD uses provenance as persistent context, but it doesn't give you vector stores, RAG pipelines, or cross-session memory systems.
These technologies solve real problems. But they solve advanced problems. SDD sharpens the thinking that makes every other tool more effective.
Start with the spec. Graduate to complexity when the problem demands it.
17
Live Demo
LIVE
DEMO
SDD in action. One spec. Two agents. Provenance, scenarios, and tests — generated live.
01 — THE SPEC
A real spec with agent roles, requirements, and constraints. Markdown. Nothing else.
02 — THE BUILDER
Hand the spec to the builder agent. Watch it build and produce provenance — assumptions, decisions, ambiguities — in real time.
03 — THE TESTER
Hand the spec and provenance to a separate agent. Watch it find the daylight, write prose scenarios, and generate executable tests.
THE BOTTLENECK HAS MOVED FROM CODE TO SPECIFICATION.
SDD IS THE METHOD.
One spec. Two agents. Five artefacts. Complete traceability from intent to verification. No framework required. Start today.
Kevin Ryan & Associates
kevinryan.io
sddbook.com
19
Bonus Content
BONUS
CONTENT
Provenance, audit, and regulation — why SDD is a compliance asset for SOC 2, ISO 27001, and the EU AI Act.
SOC 2ISO 27001EU AI ACTAUGUST 2026
20
Provenance & Compliance
YOUR AUDITOR'S AI IS GOING TO ASK HOW THIS WAS BUILT.
When a human builds software, the reasoning exists in their head, in Slack threads, in PR comments. You can reconstruct it — badly — after the fact. When an agent builds software, the reasoning exists in the context window. The session ends. The reasoning evaporates. Unless you capture it.
You cannot retrospectively create provenance for decisions that were never documented. SDD means you never have to.
SOC 2
Change Management Controls
Auditors require evidence that changes are authorised, documented, and traceable. SDD's provenance chain is that evidence — spec to code to verification.
ISO 27001
Annex A — Secure Development
Requires documented development procedures, separation of duties, and design review records. SDD's builder-tester separation and layered provenance satisfy this structurally.
EU AI ACT
Articles 11, 12, 19 — Full Force August 2026
Technical documentation, automatic record-keeping, and retained logs for high-risk AI systems. Provenance is all three — generated in real time, not written after the fact.
21
Provenance & Compliance
COMPLIANCE AS A BYPRODUCT. NOT A PROJECT.
Most organisations will try to bolt compliance documentation onto AI-built systems after the fact. They will hire consultants to retrospectively construct the design history that auditors and regulators demand. SDD produces that documentation as an inherent part of the build. You don't retrofit it. It already exists.
AUDITOR ASKS
Why does this system behave this way?
→
SCENARIO
This test exists because provenance entry C1 challenged assumption A3.
→
PROVENANCE
A3 was assumed because the spec was silent on expiry. Builder chose 30 minutes.
→
SPEC
Requirement FR-007: all protected routes require authentication.
€35M
EU AI Act — 7% turnover Prohibited practices
SOC 2 FAIL
Lost contracts, lost clients Unrecoverable trust damage
ISO 27001 NC
Non-conformity finding Certification at risk
SDD solves the documentation and traceability problem across all three frameworks — the one that requires you to have been recording decisions from the start.