macos-qa-loop

Name: macos-qa-loop
Rating: 5 (2 reviews)
Author: eovidiu

by @eovidiu in AI & LLM

# Install this skill:

npx skills add eovidiu/agents-skills --skill "macos-qa-loop"

Install specific skill from multi-skill repository

# Description

Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".

# SKILL.md

name: macos-qa-loop
author: Ovidiu Eftimie
description: Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".

macOS QA Verification Loop

Overview

This skill runs an autonomous quality assurance loop that verifies a macOS application matches its specification and design mocks. It builds the app, captures screenshots of every relevant screen, compares them against design mocks using Claude's vision capabilities, checks spec requirements, and produces a structured gap report. The loop repeats after fixes until the app is fully compliant — or declares exactly what remains.

This is the bridge between "tests pass" and "it's actually correct."

When to Use This Skill

Use this skill when:
- A macOS app implementation is complete (or near-complete) and needs verification against spec
- Design mocks exist (screenshots, Figma exports, or reference images) and you need to confirm the built app matches them
- You want automated visual regression checking during development
- You need a structured gap report showing what matches spec and what doesn't
- Final pre-release QA before handing off to macos-cicd-distributor

Do NOT use this skill when:
- You're still writing initial code (use macos-tdd-expert for TDD during development)
- No spec or mocks exist (nothing to verify against)
- The app doesn't build yet (fix build errors first)

Trigger phrases:
- "Verify this app against the mocks"
- "Run QA loop on the macOS app"
- "Check if the app matches the spec"
- "Compare the app against design mocks"
- "Is the app done per spec?"

Prerequisites

Before running the QA loop, ensure:

Spec exists — A specification document describing the app's requirements (markdown, structured text, or project-curator format)
Mocks exist — Design reference images for each screen/state (PNG, JPEG, or PDF)
App builds — xcodebuild succeeds without errors
Xcode project path — Know the .xcodeproj or .xcworkspace location
Scheme name — The Xcode scheme to build and run

The QA Loop

Phase 1: Intake — Parse Spec and Mocks

Read the specification and produce a verification checklist: a structured list of every testable requirement.

## Verification Checklist

| ID   | Requirement                          | Type     | Screen    | Status  |
|------|--------------------------------------|----------|-----------|---------|
| R001 | Sidebar shows 5 navigation items     | Visual   | Main      | pending |
| R002 | Clicking "Settings" opens prefs pane | Behavior | Main      | pending |
| R003 | Dark mode inverts all backgrounds    | Visual   | Main+Prefs| pending |
| R004 | Export button is disabled when empty  | State    | Main      | pending |

Types:
- Visual — Verified by comparing screenshot against mock
- Behavior — Verified by interacting with the app (XCUITest or accessibility API)
- State — Verified by checking specific UI states (enabled/disabled, visible/hidden)
- Data — Verified by checking displayed data matches expected values

Reference: references/spec-checklist-format.md — Detailed guide on parsing specs into verification checklists.

Phase 2: Build

Build the application using xcodebuild:

xcodebuild -project "$PROJECT_PATH" \
  -scheme "$SCHEME" \
  -configuration Debug \
  -derivedDataPath "$DERIVED_DATA" \
  build 2>&1

If build fails, STOP. Report the build error. Do not proceed to screenshots.

Build output location:

APP_PATH="$DERIVED_DATA/Build/Products/Debug/$APP_NAME.app"

Phase 3: Launch and Screenshot

Launch the app and capture screenshots of every screen/state defined in the mocks:

# Launch the app
open "$APP_PATH"
sleep 2  # Allow launch animation to complete

# Capture the main window
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
  "$OUTPUT_DIR/screen-main.png"

For multi-screen apps, navigate to each screen and capture:

# Use osascript to navigate via accessibility
osascript -e '
  tell application "System Events"
    tell process "AppName"
      click menu item "Preferences..." of menu "AppName" of menu bar 1
    end tell
  end tell
'
sleep 1
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
  "$OUTPUT_DIR/screen-preferences.png"

Reference: references/verification-workflow.md — Complete screenshot capture workflow including window targeting, state navigation, and multi-monitor handling.

Script: scripts/screenshot-app.sh — Ready-to-use build + launch + screenshot utility.

Phase 4: Visual Comparison

Compare each captured screenshot against its corresponding design mock using Claude's vision.

How this works:
1. Read the design mock image (the reference)
2. Read the captured screenshot (the actual)
3. Ask Claude to compare them semantically

Comparison dimensions:
- Layout — Element positioning, spacing, alignment
- Typography — Font sizes, weights, line heights (approximate)
- Colors — Background, text, accent colors match
- Components — All expected UI elements are present
- States — Correct enabled/disabled, selected/unselected states
- Content — Placeholder text, icons, labels match expectations

Comparison output format:

### Screen: Main Window

**Mock**: mocks/main-window.png
**Actual**: screenshots/screen-main.png

| Aspect     | Match | Issue                                         |
|------------|-------|-----------------------------------------------|
| Layout     | ⚠️    | Sidebar is 20px wider than mock               |
| Typography | ✅    | —                                             |
| Colors     | ✅    | —                                             |
| Components | ❌    | Missing "Export" button in toolbar             |
| States     | ✅    | —                                             |
| Content    | ⚠️    | Placeholder text still shows "Lorem ipsum"    |

Reference: references/visual-comparison-guide.md — Detailed comparison methodology, tolerance thresholds, and edge cases.

Phase 5: Behavioral Verification

For requirements typed as Behavior or State, verify by interacting with the running app:

Option A: XCUITest (preferred if tests exist)

xcodebuild test \
  -project "$PROJECT_PATH" \
  -scheme "${SCHEME}UITests" \
  -configuration Debug \
  -derivedDataPath "$DERIVED_DATA" \
  2>&1

Option B: Accessibility API via osascript

tell application "System Events"
  tell process "AppName"
    -- Check button exists and is enabled
    set exportBtn to button "Export" of toolbar 1 of window 1
    return {exists:exists of exportBtn, enabled:enabled of exportBtn}
  end tell
end tell

Option C: Manual verification prompt
When automation cannot verify a requirement, produce a clear manual verification step:

**Manual Check Required:**
- [ ] R007: "Drag and drop reorders sidebar items" — Launch app, drag item 3 above item 1, verify order persists after restart

Phase 6: Gap Report

Produce a structured report combining all verification results:

# QA Verification Report

**App**: MyApp v1.0
**Date**: 2025-02-01
**Spec**: docs/spec.md
**Mocks**: mocks/

## Summary

| Status | Count |
|--------|-------|
| ✅ Pass   | 12    |
| ⚠️ Partial | 3     |
| ❌ Fail   | 2     |
| ⏭️ Skipped | 1     |
| **Total** | **18** |

**Verdict**: NOT READY — 2 failures, 3 partial matches

## Failures

### R005: Export button in toolbar
- **Expected**: Toolbar contains "Export" button with document.arrow.up icon
- **Actual**: Button is missing from toolbar
- **Fix**: Add NSToolbarItem with identifier "export" to toolbar configuration

### R011: Dark mode background colors
- **Expected**: All backgrounds invert to #1E1E1E in dark mode
- **Actual**: Sidebar background stays white (#FFFFFF)
- **Fix**: Sidebar background color not using .windowBackgroundColor semantic color

## Partial Matches

### R001: Sidebar width
- **Expected**: 240px fixed width
- **Actual**: ~260px (approximately 20px wider)
- **Fix**: Check sidebar width constraint, should be 240

## Passes
[List of all passing requirements with ✅]

## Manual Checks Required
- [ ] R007: Drag and drop reorder (cannot automate)

Template: assets/templates/qa-report.md — Copy-ready gap report template.

Phase 7: Loop Decision

After producing the gap report:

If all pass (✅ only):

✅ QA COMPLETE — App matches spec and mocks.
Ready for macos-cicd-distributor.

If failures exist (❌ or ⚠️):

Report the gap report to the coding agent / orchestrator.
Wait for fixes.
Re-run from Phase 2 (build).

Loop limits:
- Maximum 5 iterations per QA session
- If iteration 5 still has failures, STOP and escalate to Ovidiu
- Each iteration should fix at least 1 issue — if no progress after 2 iterations, STOP

Verification Dimensions

Visual Fidelity (What You See)

Compares the rendered app against design mocks across these axes:

Axis	What to Check	Tolerance
Layout	Element positions, spacing, alignment	±5px
Sizing	Component widths, heights	±5px
Colors	Backgrounds, text, accents	Semantic match (not pixel-exact)
Typography	Font size, weight, style	Approximate match
Icons	Correct icon, correct size	Present/absent + approximate
Shadows/Effects	Drop shadows, blur, vibrancy	Present/absent

Tolerance philosophy: This is semantic comparison, not pixel diffing. "The sidebar is roughly the right width and has the right items" matters more than "the sidebar is exactly 240.0px." Flag significant deviations, ignore rendering engine differences.

Spec Compliance (What It Does)

Every requirement in the spec maps to a verification:

Requirement Type	Verification Method
"User can X"	Behavioral test (XCUITest or accessibility)
"Screen shows X"	Visual comparison
"X is disabled when Y"	State check via accessibility API
"Data persists after Z"	Behavioral test with app restart
"Performance: X under Y ms"	Instrumented timing

Behavioral Correctness (How It Works)

For interactive requirements, verify the app responds correctly:
- Navigation between screens
- Button actions produce expected results
- Menu items trigger correct behavior
- Keyboard shortcuts work
- Window resize behavior matches expectations

Integration with Other Skills

┌─────────────────────────────────────────────────────────┐
│                    Development Flow                      │
│                                                          │
│  macos-senior-engineer  →  Writes the code               │
│  macos-tdd-expert       →  TDD during development        │
│  macos-qa-loop          →  Verifies against spec & mocks │
│  macos-cicd-distributor →  Signs and ships                │
│                                                          │
│  project-curator        →  Provides the spec             │
│  macos-senior-ux        →  Provides the design rationale │
│  macos-app-architect    →  Provides architecture context │
└─────────────────────────────────────────────────────────┘

Handoff from macos-tdd-expert: When unit/integration tests pass, invoke macos-qa-loop to verify the built app visually and behaviorally.

Handoff to macos-cicd-distributor: When QA loop reports all-pass, the app is ready for signing, notarization, and distribution.

Workflow Quick Start

Step 1: Gather Inputs

# Identify your inputs
PROJECT_PATH="./MyApp.xcodeproj"  # or .xcworkspace
SCHEME="MyApp"
SPEC_PATH="./docs/spec.md"
MOCKS_DIR="./mocks/"
OUTPUT_DIR="./qa-output/"

mkdir -p "$OUTPUT_DIR"

Step 2: Run the Loop

Invoke the skill:

Verify MyApp against spec at docs/spec.md and mocks in mocks/ directory.
Project: MyApp.xcodeproj, Scheme: MyApp

The skill will:
1. Parse the spec into a verification checklist
2. Build the app
3. Screenshot every screen referenced in mocks
4. Compare screenshots against mocks
5. Run behavioral checks
6. Produce a gap report
7. Loop if needed

Step 3: Review Report

The gap report lands in $OUTPUT_DIR/qa-report.md. Review and either:
- Accept (all pass) → proceed to distribution
- Fix (failures exist) → re-run after fixes

Configuration

Required Inputs

Input	Description	Example
`PROJECT_PATH`	Path to .xcodeproj or .xcworkspace	`./MyApp.xcodeproj`
`SCHEME`	Xcode build scheme	`MyApp`
`SPEC_PATH`	Path to specification document	`./docs/spec.md`
`MOCKS_DIR`	Directory containing design mock images	`./mocks/`

Optional Inputs

Input	Description	Default
`OUTPUT_DIR`	Where to write reports and screenshots	`./qa-output/`
`MAX_ITERATIONS`	Maximum QA loop iterations	`5`
`CONFIGURATION`	Xcode build configuration	`Debug`
`UI_TEST_SCHEME`	Scheme for XCUITest suite	`${SCHEME}UITests`
`APPEARANCE`	Light, Dark, or Both	`Both`

Mock File Naming Convention

Mocks should be named to match screens:

mocks/
├── main-window.png           # Main app window
├── main-window-dark.png      # Main window in dark mode
├── preferences.png           # Preferences pane
├── preferences-dark.png      # Preferences in dark mode
├── empty-state.png           # Main window with no data
└── error-dialog.png          # Error alert

The skill matches mock filenames to screen identifiers in the verification checklist.

Error Handling

Error	Response
Build fails	STOP. Report build error. Do not screenshot.
App crashes on launch	STOP. Report crash log.
Screenshot capture fails	Retry once. If still fails, report and skip visual check for that screen.
Mock image missing	Skip visual comparison for that screen. Note in report.
Accessibility API blocked	Report permission requirement. Suggest granting in System Settings > Privacy > Accessibility.
No progress after 2 iterations	STOP. Escalate to Ovidiu with full gap history.
Max iterations (5) reached	STOP. Produce final report with remaining gaps.

Resources

References:
- references/verification-workflow.md — Complete loop mechanics, phase transitions, and state management
- references/visual-comparison-guide.md — How to compare screenshots vs mocks using vision, tolerance thresholds, edge cases
- references/spec-checklist-format.md — How to parse specs into testable verification checklists

Scripts:
- scripts/screenshot-app.sh — Build + launch + screenshot utility for macOS apps

Templates:
- assets/templates/qa-report.md — Gap report template
- assets/templates/spec-checklist.md — Specification verification checklist template

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.