Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add eovidiu/agents-skills --skill "macos-qa-loop"
Install specific skill from multi-skill repository
# Description
Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".
# SKILL.md
name: macos-qa-loop
author: Ovidiu Eftimie
description: Autonomous QA verification loop for macOS applications. Builds the app, screenshots every screen, compares against design mocks using vision, verifies spec compliance, and loops until the app matches or reports remaining gaps. Use when validating a macOS app against its specification and design mocks before release. Triggers on "verify app against mocks", "QA loop", "check app matches spec", "validate macOS app".
macOS QA Verification Loop
Overview
This skill runs an autonomous quality assurance loop that verifies a macOS application matches its specification and design mocks. It builds the app, captures screenshots of every relevant screen, compares them against design mocks using Claude's vision capabilities, checks spec requirements, and produces a structured gap report. The loop repeats after fixes until the app is fully compliant β or declares exactly what remains.
This is the bridge between "tests pass" and "it's actually correct."
When to Use This Skill
Use this skill when:
- A macOS app implementation is complete (or near-complete) and needs verification against spec
- Design mocks exist (screenshots, Figma exports, or reference images) and you need to confirm the built app matches them
- You want automated visual regression checking during development
- You need a structured gap report showing what matches spec and what doesn't
- Final pre-release QA before handing off to macos-cicd-distributor
Do NOT use this skill when:
- You're still writing initial code (use macos-tdd-expert for TDD during development)
- No spec or mocks exist (nothing to verify against)
- The app doesn't build yet (fix build errors first)
Trigger phrases:
- "Verify this app against the mocks"
- "Run QA loop on the macOS app"
- "Check if the app matches the spec"
- "Compare the app against design mocks"
- "Is the app done per spec?"
Prerequisites
Before running the QA loop, ensure:
- Spec exists β A specification document describing the app's requirements (markdown, structured text, or
project-curatorformat) - Mocks exist β Design reference images for each screen/state (PNG, JPEG, or PDF)
- App builds β
xcodebuildsucceeds without errors - Xcode project path β Know the
.xcodeprojor.xcworkspacelocation - Scheme name β The Xcode scheme to build and run
The QA Loop
Phase 1: Intake β Parse Spec and Mocks
Read the specification and produce a verification checklist: a structured list of every testable requirement.
## Verification Checklist
| ID | Requirement | Type | Screen | Status |
|------|--------------------------------------|----------|-----------|---------|
| R001 | Sidebar shows 5 navigation items | Visual | Main | pending |
| R002 | Clicking "Settings" opens prefs pane | Behavior | Main | pending |
| R003 | Dark mode inverts all backgrounds | Visual | Main+Prefs| pending |
| R004 | Export button is disabled when empty | State | Main | pending |
Types:
- Visual β Verified by comparing screenshot against mock
- Behavior β Verified by interacting with the app (XCUITest or accessibility API)
- State β Verified by checking specific UI states (enabled/disabled, visible/hidden)
- Data β Verified by checking displayed data matches expected values
Reference: references/spec-checklist-format.md β Detailed guide on parsing specs into verification checklists.
Phase 2: Build
Build the application using xcodebuild:
xcodebuild -project "$PROJECT_PATH" \
-scheme "$SCHEME" \
-configuration Debug \
-derivedDataPath "$DERIVED_DATA" \
build 2>&1
If build fails, STOP. Report the build error. Do not proceed to screenshots.
Build output location:
APP_PATH="$DERIVED_DATA/Build/Products/Debug/$APP_NAME.app"
Phase 3: Launch and Screenshot
Launch the app and capture screenshots of every screen/state defined in the mocks:
# Launch the app
open "$APP_PATH"
sleep 2 # Allow launch animation to complete
# Capture the main window
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
"$OUTPUT_DIR/screen-main.png"
For multi-screen apps, navigate to each screen and capture:
# Use osascript to navigate via accessibility
osascript -e '
tell application "System Events"
tell process "AppName"
click menu item "Preferences..." of menu "AppName" of menu bar 1
end tell
end tell
'
sleep 1
screencapture -l $(osascript -e "tell app \"$APP_NAME\" to id of window 1") \
"$OUTPUT_DIR/screen-preferences.png"
Reference: references/verification-workflow.md β Complete screenshot capture workflow including window targeting, state navigation, and multi-monitor handling.
Script: scripts/screenshot-app.sh β Ready-to-use build + launch + screenshot utility.
Phase 4: Visual Comparison
Compare each captured screenshot against its corresponding design mock using Claude's vision.
How this works:
1. Read the design mock image (the reference)
2. Read the captured screenshot (the actual)
3. Ask Claude to compare them semantically
Comparison dimensions:
- Layout β Element positioning, spacing, alignment
- Typography β Font sizes, weights, line heights (approximate)
- Colors β Background, text, accent colors match
- Components β All expected UI elements are present
- States β Correct enabled/disabled, selected/unselected states
- Content β Placeholder text, icons, labels match expectations
Comparison output format:
### Screen: Main Window
**Mock**: mocks/main-window.png
**Actual**: screenshots/screen-main.png
| Aspect | Match | Issue |
|------------|-------|-----------------------------------------------|
| Layout | β οΈ | Sidebar is 20px wider than mock |
| Typography | β
| β |
| Colors | β
| β |
| Components | β | Missing "Export" button in toolbar |
| States | β
| β |
| Content | β οΈ | Placeholder text still shows "Lorem ipsum" |
Reference: references/visual-comparison-guide.md β Detailed comparison methodology, tolerance thresholds, and edge cases.
Phase 5: Behavioral Verification
For requirements typed as Behavior or State, verify by interacting with the running app:
Option A: XCUITest (preferred if tests exist)
xcodebuild test \
-project "$PROJECT_PATH" \
-scheme "${SCHEME}UITests" \
-configuration Debug \
-derivedDataPath "$DERIVED_DATA" \
2>&1
Option B: Accessibility API via osascript
tell application "System Events"
tell process "AppName"
-- Check button exists and is enabled
set exportBtn to button "Export" of toolbar 1 of window 1
return {exists:exists of exportBtn, enabled:enabled of exportBtn}
end tell
end tell
Option C: Manual verification prompt
When automation cannot verify a requirement, produce a clear manual verification step:
**Manual Check Required:**
- [ ] R007: "Drag and drop reorders sidebar items" β Launch app, drag item 3 above item 1, verify order persists after restart
Phase 6: Gap Report
Produce a structured report combining all verification results:
# QA Verification Report
**App**: MyApp v1.0
**Date**: 2025-02-01
**Spec**: docs/spec.md
**Mocks**: mocks/
## Summary
| Status | Count |
|--------|-------|
| β
Pass | 12 |
| β οΈ Partial | 3 |
| β Fail | 2 |
| βοΈ Skipped | 1 |
| **Total** | **18** |
**Verdict**: NOT READY β 2 failures, 3 partial matches
## Failures
### R005: Export button in toolbar
- **Expected**: Toolbar contains "Export" button with document.arrow.up icon
- **Actual**: Button is missing from toolbar
- **Fix**: Add NSToolbarItem with identifier "export" to toolbar configuration
### R011: Dark mode background colors
- **Expected**: All backgrounds invert to #1E1E1E in dark mode
- **Actual**: Sidebar background stays white (#FFFFFF)
- **Fix**: Sidebar background color not using .windowBackgroundColor semantic color
## Partial Matches
### R001: Sidebar width
- **Expected**: 240px fixed width
- **Actual**: ~260px (approximately 20px wider)
- **Fix**: Check sidebar width constraint, should be 240
## Passes
[List of all passing requirements with β
]
## Manual Checks Required
- [ ] R007: Drag and drop reorder (cannot automate)
Template: assets/templates/qa-report.md β Copy-ready gap report template.
Phase 7: Loop Decision
After producing the gap report:
If all pass (β only):
β
QA COMPLETE β App matches spec and mocks.
Ready for macos-cicd-distributor.
If failures exist (β or β οΈ):
Report the gap report to the coding agent / orchestrator.
Wait for fixes.
Re-run from Phase 2 (build).
Loop limits:
- Maximum 5 iterations per QA session
- If iteration 5 still has failures, STOP and escalate to Ovidiu
- Each iteration should fix at least 1 issue β if no progress after 2 iterations, STOP
Verification Dimensions
Visual Fidelity (What You See)
Compares the rendered app against design mocks across these axes:
| Axis | What to Check | Tolerance |
|---|---|---|
| Layout | Element positions, spacing, alignment | Β±5px |
| Sizing | Component widths, heights | Β±5px |
| Colors | Backgrounds, text, accents | Semantic match (not pixel-exact) |
| Typography | Font size, weight, style | Approximate match |
| Icons | Correct icon, correct size | Present/absent + approximate |
| Shadows/Effects | Drop shadows, blur, vibrancy | Present/absent |
Tolerance philosophy: This is semantic comparison, not pixel diffing. "The sidebar is roughly the right width and has the right items" matters more than "the sidebar is exactly 240.0px." Flag significant deviations, ignore rendering engine differences.
Spec Compliance (What It Does)
Every requirement in the spec maps to a verification:
| Requirement Type | Verification Method |
|---|---|
| "User can X" | Behavioral test (XCUITest or accessibility) |
| "Screen shows X" | Visual comparison |
| "X is disabled when Y" | State check via accessibility API |
| "Data persists after Z" | Behavioral test with app restart |
| "Performance: X under Y ms" | Instrumented timing |
Behavioral Correctness (How It Works)
For interactive requirements, verify the app responds correctly:
- Navigation between screens
- Button actions produce expected results
- Menu items trigger correct behavior
- Keyboard shortcuts work
- Window resize behavior matches expectations
Integration with Other Skills
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Development Flow β
β β
β macos-senior-engineer β Writes the code β
β macos-tdd-expert β TDD during development β
β macos-qa-loop β Verifies against spec & mocks β
β macos-cicd-distributor β Signs and ships β
β β
β project-curator β Provides the spec β
β macos-senior-ux β Provides the design rationale β
β macos-app-architect β Provides architecture context β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Handoff from macos-tdd-expert: When unit/integration tests pass, invoke macos-qa-loop to verify the built app visually and behaviorally.
Handoff to macos-cicd-distributor: When QA loop reports all-pass, the app is ready for signing, notarization, and distribution.
Workflow Quick Start
Step 1: Gather Inputs
# Identify your inputs
PROJECT_PATH="./MyApp.xcodeproj" # or .xcworkspace
SCHEME="MyApp"
SPEC_PATH="./docs/spec.md"
MOCKS_DIR="./mocks/"
OUTPUT_DIR="./qa-output/"
mkdir -p "$OUTPUT_DIR"
Step 2: Run the Loop
Invoke the skill:
Verify MyApp against spec at docs/spec.md and mocks in mocks/ directory.
Project: MyApp.xcodeproj, Scheme: MyApp
The skill will:
1. Parse the spec into a verification checklist
2. Build the app
3. Screenshot every screen referenced in mocks
4. Compare screenshots against mocks
5. Run behavioral checks
6. Produce a gap report
7. Loop if needed
Step 3: Review Report
The gap report lands in $OUTPUT_DIR/qa-report.md. Review and either:
- Accept (all pass) β proceed to distribution
- Fix (failures exist) β re-run after fixes
Configuration
Required Inputs
| Input | Description | Example |
|---|---|---|
PROJECT_PATH |
Path to .xcodeproj or .xcworkspace | ./MyApp.xcodeproj |
SCHEME |
Xcode build scheme | MyApp |
SPEC_PATH |
Path to specification document | ./docs/spec.md |
MOCKS_DIR |
Directory containing design mock images | ./mocks/ |
Optional Inputs
| Input | Description | Default |
|---|---|---|
OUTPUT_DIR |
Where to write reports and screenshots | ./qa-output/ |
MAX_ITERATIONS |
Maximum QA loop iterations | 5 |
CONFIGURATION |
Xcode build configuration | Debug |
UI_TEST_SCHEME |
Scheme for XCUITest suite | ${SCHEME}UITests |
APPEARANCE |
Light, Dark, or Both | Both |
Mock File Naming Convention
Mocks should be named to match screens:
mocks/
βββ main-window.png # Main app window
βββ main-window-dark.png # Main window in dark mode
βββ preferences.png # Preferences pane
βββ preferences-dark.png # Preferences in dark mode
βββ empty-state.png # Main window with no data
βββ error-dialog.png # Error alert
The skill matches mock filenames to screen identifiers in the verification checklist.
Error Handling
| Error | Response |
|---|---|
| Build fails | STOP. Report build error. Do not screenshot. |
| App crashes on launch | STOP. Report crash log. |
| Screenshot capture fails | Retry once. If still fails, report and skip visual check for that screen. |
| Mock image missing | Skip visual comparison for that screen. Note in report. |
| Accessibility API blocked | Report permission requirement. Suggest granting in System Settings > Privacy > Accessibility. |
| No progress after 2 iterations | STOP. Escalate to Ovidiu with full gap history. |
| Max iterations (5) reached | STOP. Produce final report with remaining gaps. |
Resources
References:
- references/verification-workflow.md β Complete loop mechanics, phase transitions, and state management
- references/visual-comparison-guide.md β How to compare screenshots vs mocks using vision, tolerance thresholds, edge cases
- references/spec-checklist-format.md β How to parse specs into testable verification checklists
Scripts:
- scripts/screenshot-app.sh β Build + launch + screenshot utility for macOS apps
Templates:
- assets/templates/qa-report.md β Gap report template
- assets/templates/spec-checklist.md β Specification verification checklist template
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.