Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add philoserf/claude-code-setup --skill "test-runner"
Install specific skill from multi-skill repository
# Description
Runs systematic tests on Claude Code customizations. Executes sample queries, validates responses, generates test reports, and identifies edge cases for agents, commands, skills, and hooks.
# SKILL.md
name: test-runner
description: Runs systematic tests on Claude Code customizations. Executes sample queries, validates responses, generates test reports, and identifies edge cases for agents, commands, skills, and hooks.
allowed-tools: [Read, Write, Glob, Grep, Bash, Skill]
model: inherit
Reference Files
This skill uses reference materials:
- examples.md - Concrete test case examples for different customization types
- common-failures.md - Catalog of common failure patterns
Focus Areas
- Sample Query Generation - Creating realistic test queries based on descriptions
- Expected Behavior Validation - Verifying outputs match specifications
- Regression Testing - Ensuring changes don't break existing functionality
- Edge Case Identification - Finding unusual scenarios and boundary conditions
- Integration Testing - Validating customizations work together
- Performance Assessment - Analyzing context usage and efficiency
Test Framework
Test Types
Functional Tests
Purpose: Verify core functionality works as specified
Process:
- Generate test queries from description/documentation
- Execute customization with test input
- Compare actual output to expected behavior
- Record PASS/FAIL for each test case
Integration Tests
Purpose: Ensure customizations work together
Process:
- Test hook interactions (PreToolUse, PostToolUse chains)
- Verify skills can invoke sub-agents
- Check commands delegate correctly
- Validate settings.json configuration
- Test tool permission boundaries
Usability Tests
Purpose: Assess user experience quality
Process:
- Evaluate error messages (are they helpful?)
- Check documentation completeness
- Test edge cases (what breaks it?)
- Assess output clarity
- Verify examples work as shown
Test Execution Strategy
For Skills
- Discovery Test: Generate queries that should trigger the skill
- Invocation Test: Actually invoke the skill with sample query
- Output Test: Verify skill produces expected results
- Tool Test: Confirm only allowed tools are used
- Reference Test: Check that references load correctly
For Agents
- Frontmatter Test: Validate YAML structure
- Invocation Test: Invoke agent with test prompt
- Tool Test: Verify agent uses appropriate tools
- Output Test: Check output format and quality
- Context Test: Measure context usage
For Commands
- Delegation Test: Verify command invokes correct agent/skill
- Usage Test: Test with valid and invalid arguments
- Documentation Test: Verify usage instructions are accurate
- Output Test: Check output format and clarity
For Hooks
- Input Test: Verify JSON stdin handling
- Exit Code Test: Confirm 0 (allow) and 2 (block) work correctly
- Error Handling Test: Verify graceful degradation
- Performance Test: Check execution speed
- Integration Test: Test hook chain behavior
Test Process
Step 1: Identify Customization Type
Determine what to test:
- Agent (in agents/)
- Command (in commands/)
- Skill (in skills/)
- Hook (in hooks/)
Step 2: Read Documentation
Use Read tool to examine:
- Primary file content
- Frontmatter/configuration
- Usage instructions
- Examples (if provided)
Step 3: Generate Test Cases
Based on description and documentation:
For Skills:
- Extract trigger phrases from description
- Create 5-10 sample queries that should trigger
- Create 3-5 queries that should NOT trigger
- Identify edge cases from description
For Agents:
- Create prompts based on focus areas
- Generate scenarios agent should handle
- Identify scenarios outside agent scope
For Commands:
- Test with documented arguments
- Test with no arguments
- Test with invalid arguments
For Hooks:
- Create sample tool inputs that should pass
- Create inputs that should block
- Create malformed inputs to test error handling
Step 4: Execute Tests
Read-Only Testing (default):
- Analyze whether customization would work
- Check configurations and settings
- Verify documentation accuracy
- Assess expected behavior
Active Testing (when appropriate):
- Actually invoke skills with sample queries
- Run commands with test arguments
- Trigger hooks with test inputs
- Record actual outputs
Step 5: Compare Results
For each test:
- Expected: What should happen (from docs/description)
- Actual: What did happen (from testing)
- Status: PASS (matched) / FAIL (didn't match) / EDGE CASE (unexpected)
Step 6: Generate Test Report
Create structured report following output format.
Output Format
# Test Report: {name}
**Type**: {agent|command|skill|hook}
**File**: {path}
**Tested**: {YYYY-MM-DD HH:MM}
**Test Mode**: {read-only|active}
## Summary
{1-2 sentence overview of what was tested and overall results}
## Test Results
**Total Tests**: {count}
**Passed**: {count} ({percentage}%)
**Failed**: {count} ({percentage}%)
**Edge Cases**: {count}
## Functional Tests
### Test 1: {test name}
- **Input**: {test input/query}
- **Expected**: {expected behavior}
- **Actual**: {actual behavior}
- **Status**: PASS | FAIL | EDGE CASE
- **Notes**: {observations}
### Test 2: {test name}
...
## Integration Tests
{If applicable - tests with other customizations}
### Integration 1: {integration name}
- **Components**: {what was tested together}
- **Expected**: {expected interaction}
- **Actual**: {actual interaction}
- **Status**: PASS | FAIL
- **Notes**: {observations}
## Usability Assessment
- **Documentation**: CLEAR | UNCLEAR | MISSING
- **Error Messages**: HELPFUL | CONFUSING | ABSENT
- **Examples**: WORKING | BROKEN | MISSING
- **Overall UX**: EXCELLENT | GOOD | NEEDS WORK | POOR
## Edge Cases Discovered
{Unusual scenarios or boundary conditions found during testing}
1. {edge case 1}
- **Impact**: {severity}
- **Recommendation**: {how to handle}
2. {edge case 2}
...
## Performance Metrics
{If applicable}
- **Context Usage**: ~{token estimate}
- **Execution Time**: {estimate or N/A}
- **Resource Impact**: LOW | MEDIUM | HIGH
## Failures and Issues
{Detailed analysis of failed tests}
### Failure 1: {test name}
- **Why It Failed**: {root cause}
- **Expected vs Actual**: {specific diff}
- **Fix Recommendation**: {how to resolve}
### Failure 2: {test name}
...
## Recommendations
### Priority 1 (Critical)
{Must-fix issues that prevent functionality}
1. {critical fix 1}
2. {critical fix 2}
### Priority 2 (Important)
{Should-fix issues that impact usability}
1. {important fix 1}
2. {important fix 2}
### Priority 3 (Nice-to-Have)
{Could-fix issues that polish the experience}
1. {enhancement 1}
2. {enhancement 2}
## Next Steps
{Specific actions to address failures and improve tests}
Test Case Examples
For detailed test case examples for skills, agents, commands, and hooks, see examples.md.
Best Practices for Testing
- Test Both Paths: Success cases AND failure cases
- Edge Cases Matter: Test boundaries and unusual inputs
- Clear Expected Behavior: Document what should happen
- Realistic Queries: Use natural language users would actually type
- Integration Testing: Don't just test in isolation
- Performance Aware: Note if tests are slow or heavy
- Regression Testing: Re-test after changes
- Document Failures: Explain why tests failed, not just that they did
- Actionable Recommendations: Provide specific fixes
- Version Testing: Note which version was tested
Common Test Failures
For a catalog of common failure patterns by customization type, see common-failures.md.
Tools Used
This skill uses these tools for testing:
- Read - Examine customization files
- Write - Generate and save test reports to files
- Grep - Search for patterns and configurations
- Glob - Find files and customizations
- Bash - Execute read-only commands for analysis
- Skill - Invoke skills for active testing
In read-only mode (default), no customizations are actually invoked - the skill analyzes configurations and documentation to assess expected behavior. In active mode, skills are invoked with test queries to verify actual behavior.
Test reports are written to ~/.claude/logs/evaluations/tests/ using the Write tool.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.