philoserf

test-runner

9
0
# Install this skill:
npx skills add philoserf/claude-code-setup --skill "test-runner"

Install specific skill from multi-skill repository

# Description

Runs systematic tests on Claude Code customizations. Executes sample queries, validates responses, generates test reports, and identifies edge cases for agents, commands, skills, and hooks.

# SKILL.md


name: test-runner
description: Runs systematic tests on Claude Code customizations. Executes sample queries, validates responses, generates test reports, and identifies edge cases for agents, commands, skills, and hooks.
allowed-tools: [Read, Write, Glob, Grep, Bash, Skill]

model: inherit


Reference Files

This skill uses reference materials:

Focus Areas

  • Sample Query Generation - Creating realistic test queries based on descriptions
  • Expected Behavior Validation - Verifying outputs match specifications
  • Regression Testing - Ensuring changes don't break existing functionality
  • Edge Case Identification - Finding unusual scenarios and boundary conditions
  • Integration Testing - Validating customizations work together
  • Performance Assessment - Analyzing context usage and efficiency

Test Framework

Test Types

Functional Tests

Purpose: Verify core functionality works as specified

Process:

  1. Generate test queries from description/documentation
  2. Execute customization with test input
  3. Compare actual output to expected behavior
  4. Record PASS/FAIL for each test case

Integration Tests

Purpose: Ensure customizations work together

Process:

  1. Test hook interactions (PreToolUse, PostToolUse chains)
  2. Verify skills can invoke sub-agents
  3. Check commands delegate correctly
  4. Validate settings.json configuration
  5. Test tool permission boundaries

Usability Tests

Purpose: Assess user experience quality

Process:

  1. Evaluate error messages (are they helpful?)
  2. Check documentation completeness
  3. Test edge cases (what breaks it?)
  4. Assess output clarity
  5. Verify examples work as shown

Test Execution Strategy

For Skills

  1. Discovery Test: Generate queries that should trigger the skill
  2. Invocation Test: Actually invoke the skill with sample query
  3. Output Test: Verify skill produces expected results
  4. Tool Test: Confirm only allowed tools are used
  5. Reference Test: Check that references load correctly

For Agents

  1. Frontmatter Test: Validate YAML structure
  2. Invocation Test: Invoke agent with test prompt
  3. Tool Test: Verify agent uses appropriate tools
  4. Output Test: Check output format and quality
  5. Context Test: Measure context usage

For Commands

  1. Delegation Test: Verify command invokes correct agent/skill
  2. Usage Test: Test with valid and invalid arguments
  3. Documentation Test: Verify usage instructions are accurate
  4. Output Test: Check output format and clarity

For Hooks

  1. Input Test: Verify JSON stdin handling
  2. Exit Code Test: Confirm 0 (allow) and 2 (block) work correctly
  3. Error Handling Test: Verify graceful degradation
  4. Performance Test: Check execution speed
  5. Integration Test: Test hook chain behavior

Test Process

Step 1: Identify Customization Type

Determine what to test:

  • Agent (in agents/)
  • Command (in commands/)
  • Skill (in skills/)
  • Hook (in hooks/)

Step 2: Read Documentation

Use Read tool to examine:

  • Primary file content
  • Frontmatter/configuration
  • Usage instructions
  • Examples (if provided)

Step 3: Generate Test Cases

Based on description and documentation:

For Skills:

  • Extract trigger phrases from description
  • Create 5-10 sample queries that should trigger
  • Create 3-5 queries that should NOT trigger
  • Identify edge cases from description

For Agents:

  • Create prompts based on focus areas
  • Generate scenarios agent should handle
  • Identify scenarios outside agent scope

For Commands:

  • Test with documented arguments
  • Test with no arguments
  • Test with invalid arguments

For Hooks:

  • Create sample tool inputs that should pass
  • Create inputs that should block
  • Create malformed inputs to test error handling

Step 4: Execute Tests

Read-Only Testing (default):

  • Analyze whether customization would work
  • Check configurations and settings
  • Verify documentation accuracy
  • Assess expected behavior

Active Testing (when appropriate):

  • Actually invoke skills with sample queries
  • Run commands with test arguments
  • Trigger hooks with test inputs
  • Record actual outputs

Step 5: Compare Results

For each test:

  • Expected: What should happen (from docs/description)
  • Actual: What did happen (from testing)
  • Status: PASS (matched) / FAIL (didn't match) / EDGE CASE (unexpected)

Step 6: Generate Test Report

Create structured report following output format.

Output Format

# Test Report: {name}

**Type**: {agent|command|skill|hook}
**File**: {path}
**Tested**: {YYYY-MM-DD HH:MM}
**Test Mode**: {read-only|active}

## Summary

{1-2 sentence overview of what was tested and overall results}

## Test Results

**Total Tests**: {count}
**Passed**: {count} ({percentage}%)
**Failed**: {count} ({percentage}%)
**Edge Cases**: {count}

## Functional Tests

### Test 1: {test name}

- **Input**: {test input/query}
- **Expected**: {expected behavior}
- **Actual**: {actual behavior}
- **Status**: PASS | FAIL | EDGE CASE
- **Notes**: {observations}

### Test 2: {test name}

...

## Integration Tests

{If applicable - tests with other customizations}

### Integration 1: {integration name}

- **Components**: {what was tested together}
- **Expected**: {expected interaction}
- **Actual**: {actual interaction}
- **Status**: PASS | FAIL
- **Notes**: {observations}

## Usability Assessment

- **Documentation**: CLEAR | UNCLEAR | MISSING
- **Error Messages**: HELPFUL | CONFUSING | ABSENT
- **Examples**: WORKING | BROKEN | MISSING
- **Overall UX**: EXCELLENT | GOOD | NEEDS WORK | POOR

## Edge Cases Discovered

{Unusual scenarios or boundary conditions found during testing}

1. {edge case 1}
   - **Impact**: {severity}
   - **Recommendation**: {how to handle}

2. {edge case 2}
   ...

## Performance Metrics

{If applicable}

- **Context Usage**: ~{token estimate}
- **Execution Time**: {estimate or N/A}
- **Resource Impact**: LOW | MEDIUM | HIGH

## Failures and Issues

{Detailed analysis of failed tests}

### Failure 1: {test name}

- **Why It Failed**: {root cause}
- **Expected vs Actual**: {specific diff}
- **Fix Recommendation**: {how to resolve}

### Failure 2: {test name}

...

## Recommendations

### Priority 1 (Critical)

{Must-fix issues that prevent functionality}

1. {critical fix 1}
2. {critical fix 2}

### Priority 2 (Important)

{Should-fix issues that impact usability}

1. {important fix 1}
2. {important fix 2}

### Priority 3 (Nice-to-Have)

{Could-fix issues that polish the experience}

1. {enhancement 1}
2. {enhancement 2}

## Next Steps

{Specific actions to address failures and improve tests}

Test Case Examples

For detailed test case examples for skills, agents, commands, and hooks, see examples.md.

Best Practices for Testing

  1. Test Both Paths: Success cases AND failure cases
  2. Edge Cases Matter: Test boundaries and unusual inputs
  3. Clear Expected Behavior: Document what should happen
  4. Realistic Queries: Use natural language users would actually type
  5. Integration Testing: Don't just test in isolation
  6. Performance Aware: Note if tests are slow or heavy
  7. Regression Testing: Re-test after changes
  8. Document Failures: Explain why tests failed, not just that they did
  9. Actionable Recommendations: Provide specific fixes
  10. Version Testing: Note which version was tested

Common Test Failures

For a catalog of common failure patterns by customization type, see common-failures.md.

Tools Used

This skill uses these tools for testing:

  • Read - Examine customization files
  • Write - Generate and save test reports to files
  • Grep - Search for patterns and configurations
  • Glob - Find files and customizations
  • Bash - Execute read-only commands for analysis
  • Skill - Invoke skills for active testing

In read-only mode (default), no customizations are actually invoked - the skill analyzes configurations and documentation to assess expected behavior. In active mode, skills are invoked with test queries to verify actual behavior.

Test reports are written to ~/.claude/logs/evaluations/tests/ using the Write tool.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.