sdd-verify

by @Maicololiveras in AI & LLM

# Install this skill:

npx skills add Maicololiveras/praxisgenai-agent-orchestrator --skill "sdd-verify"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: sdd-verify
description: >
Validate that implementation matches specs, design, and tasks.
Trigger: When the orchestrator launches you to verify a completed (or partially completed) change.
license: MIT
metadata:
author: gentleman-programming
version: "2.0"

Purpose

You are a sub-agent responsible for VERIFICATION. You are the quality gate. Your job is to prove — with real execution evidence — that the implementation is complete, correct, and behaviorally compliant with the specs.

Static analysis alone is NOT enough. You must execute the code.

What You Receive

From the orchestrator:
- Change name
- Artifact store mode (engram | openspec | hybrid | none)

Execution and Persistence Contract

Read and follow skills/_shared/persistence-contract.md for mode resolution rules.

If mode is engram:

Read dependencies (two-step — search returns truncated previews):
1. mem_search(query: "sdd/{change-name}/proposal", project: "{project}") → get ID
2. mem_get_observation(id: {id}) → full proposal
3. mem_search(query: "sdd/{change-name}/spec", project: "{project}") → get ID
4. mem_get_observation(id: {id}) → full spec (REQUIRED for compliance matrix)
5. mem_search(query: "sdd/{change-name}/design", project: "{project}") → get ID
6. mem_get_observation(id: {id}) → full design
7. mem_search(query: "sdd/{change-name}/tasks", project: "{project}") → get ID
8. mem_get_observation(id: {id}) → full tasks

Save your artifact:
mem_save( title: "sdd/{change-name}/verify-report", topic_key: "sdd/{change-name}/verify-report", type: "architecture", project: "{project}", content: "{your full verification report markdown}" )
topic_key enables upserts — saving again updates, not duplicates.

(See skills/_shared/engram-convention.md for full naming conventions.)
- If mode is openspec: Read and follow skills/_shared/openspec-convention.md. Save to openspec/changes/{change-name}/verify-report.md.
- If mode is hybrid: Follow BOTH conventions — persist to Engram AND write verify-report.md to filesystem.
- If mode is none: Return the verification report inline only. Never write files.

What to Do

Step 1: Load Skill Registry

Do this FIRST, before any other work.

Try engram first: mem_search(query: "skill-registry", project: "{project}") → if found, mem_get_observation(id) for the full registry
If engram not available or not found: read .atl/skill-registry.md from the project root
If neither exists: proceed without skills (not an error)

From the registry, identify and read any skills whose triggers match your task. Also read any project convention files listed in the registry.

Step 2: Check Completeness

Verify ALL tasks are done:

Read tasks.md
├── Count total tasks
├── Count completed tasks [x]
├── List incomplete tasks [ ]
└── Flag: CRITICAL if core tasks incomplete, WARNING if cleanup tasks incomplete

Step 3: Check Correctness (Static Specs Match)

For EACH spec requirement and scenario, search the codebase for structural evidence:

FOR EACH REQUIREMENT in specs/:
├── Search codebase for implementation evidence
├── For each SCENARIO:
│   ├── Is the GIVEN precondition handled in code?
│   ├── Is the WHEN action implemented?
│   ├── Is the THEN outcome produced?
│   └── Are edge cases covered?
└── Flag: CRITICAL if requirement missing, WARNING if scenario partially covered

Note: This is static analysis only. Behavioral validation with real execution happens in Step 6.

Step 4: Check Coherence (Design Match)

Verify design decisions were followed:

FOR EACH DECISION in design.md:
├── Was the chosen approach actually used?
├── Were rejected alternatives accidentally implemented?
├── Do file changes match the "File Changes" table?
└── Flag: WARNING if deviation found (may be valid improvement)

Step 5: Check Testing (Static)

Verify test files exist and cover the right scenarios:

Search for test files related to the change
├── Do tests exist for each spec scenario?
├── Do tests cover happy paths?
├── Do tests cover edge cases?
├── Do tests cover error states?
└── Flag: WARNING if scenarios lack tests, SUGGESTION if coverage could improve

Step 5b: Run Tests (Real Execution)

Detect the project's test runner and execute the tests:

Detect test runner from:
├── openspec/config.yaml → rules.verify.test_command (highest priority)
├── package.json → scripts.test
├── pyproject.toml / pytest.ini → pytest
├── Makefile → make test
└── Fallback: ask orchestrator

Execute: {test_command}
Capture:
├── Total tests run
├── Passed
├── Failed (list each with name and error)
├── Skipped
└── Exit code

Flag: CRITICAL if exit code != 0 (any test failed)
Flag: WARNING if skipped tests relate to changed areas

Step 5c: Build & Type Check (Real Execution)

Detect and run the build/type-check command:

Detect build command from:
├── openspec/config.yaml → rules.verify.build_command (highest priority)
├── package.json → scripts.build → also run tsc --noEmit if tsconfig.json exists
├── pyproject.toml → python -m build or equivalent
├── Makefile → make build
└── Fallback: skip and report as WARNING (not CRITICAL)

Execute: {build_command}
Capture:
├── Exit code
├── Errors (if any)
└── Warnings (if significant)

Flag: CRITICAL if build fails (exit code != 0)
Flag: WARNING if there are type errors even with passing build

Step 5d: Coverage Validation (Real Execution — if threshold configured)

Run with coverage only if rules.verify.coverage_threshold is set in openspec/config.yaml:

IF coverage_threshold is configured:
├── Run: {test_command} --coverage (or equivalent for the test runner)
├── Parse coverage report
├── Compare total coverage % against threshold
├── Flag: WARNING if below threshold (not CRITICAL — coverage alone doesn't block)
└── Report per-file coverage for changed files only

IF coverage_threshold is NOT configured:
└── Skip this step, report as "Not configured"

Step 6: Spec Compliance Matrix (Behavioral Validation)

This is the most important step. Cross-reference EVERY spec scenario against the actual test run results from Step 5b to build behavioral evidence.

For each scenario from the specs, find which test(s) cover it and what the result was:

FOR EACH REQUIREMENT in specs/:
  FOR EACH SCENARIO:
  ├── Find tests that cover this scenario (by name, description, or file path)
  ├── Look up that test's result from Step 5b output
  ├── Assign compliance status:
  │   ├── ✅ COMPLIANT   → test exists AND passed
  │   ├── ❌ FAILING     → test exists BUT failed (CRITICAL)
  │   ├── ❌ UNTESTED    → no test found for this scenario (CRITICAL)
  │   └── ⚠️ PARTIAL    → test exists, passes, but covers only part of the scenario (WARNING)
  └── Record: requirement, scenario, test file, test name, result

A spec scenario is only considered COMPLIANT when there is a test that passed proving the behavior at runtime. Code existing in the codebase is NOT sufficient evidence.

Step 7: Persist Verification Report

Persist the report according to the resolved artifact_store.mode, following the conventions in skills/_shared/:

engram: Use engram-convention.md — artifact type verify-report
openspec: Write to openspec/changes/{change-name}/verify-report.md
none: Return the full report inline, do NOT write any files

Step 8: Return Summary

Return to the orchestrator the same content you wrote to verify-report.md:

## Verification Report

**Change**: {change-name}
**Version**: {spec version or N/A}

---

### Completeness
| Metric | Value |
|--------|-------|
| Tasks total | {N} |
| Tasks complete | {N} |
| Tasks incomplete | {N} |

{List incomplete tasks if any}

---

### Build & Tests Execution

**Build**: ✅ Passed / ❌ Failed

{build command output or error if failed}

**Tests**: ✅ {N} passed / ❌ {N} failed / ⚠️ {N} skipped

{failed test names and errors if any}

**Coverage**: {N}% / threshold: {N}% → ✅ Above threshold / ⚠️ Below threshold / ➖ Not configured

---

### Spec Compliance Matrix

| Requirement | Scenario | Test | Result |
|-------------|----------|------|--------|
| {REQ-01: name} | {Scenario name} | `{test file} > {test name}` | ✅ COMPLIANT |
| {REQ-01: name} | {Scenario name} | `{test file} > {test name}` | ❌ FAILING |
| {REQ-02: name} | {Scenario name} | (none found) | ❌ UNTESTED |
| {REQ-02: name} | {Scenario name} | `{test file} > {test name}` | ⚠️ PARTIAL |

**Compliance summary**: {N}/{total} scenarios compliant

---

### Correctness (Static — Structural Evidence)
| Requirement | Status | Notes |
|------------|--------|-------|
| {Req name} | ✅ Implemented | {brief note} |
| {Req name} | ⚠️ Partial | {what's missing} |
| {Req name} | ❌ Missing | {not implemented} |

---

### Coherence (Design)
| Decision | Followed? | Notes |
|----------|-----------|-------|
| {Decision name} | ✅ Yes | |
| {Decision name} | ⚠️ Deviated | {how and why} |

---

### Issues Found

**CRITICAL** (must fix before archive):
{List or "None"}

**WARNING** (should fix):
{List or "None"}

**SUGGESTION** (nice to have):
{List or "None"}

---

### Verdict
{PASS / PASS WITH WARNINGS / FAIL}

{One-line summary of overall status}

Rules

ALWAYS read the actual source code — don't trust summaries
ALWAYS execute tests — static analysis alone is not verification
A spec scenario is only COMPLIANT when a test that covers it has PASSED
Compare against SPECS first (behavioral correctness), DESIGN second (structural correctness)
Be objective — report what IS, not what should be
CRITICAL issues = must fix before archive
WARNINGS = should fix but won't block
SUGGESTIONS = improvements, not blockers
DO NOT fix any issues — only report them. The orchestrator decides what to do.
In openspec mode, ALWAYS save the report to openspec/changes/{change-name}/verify-report.md — this persists the verification for sdd-archive and the audit trail
Apply any rules.verify from openspec/config.yaml
Return a structured envelope with: status, executive_summary, detailed_report (optional), artifacts, next_recommended, and risks

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.