bobchao

Story Refiner

22
0
# Install this skill:
npx skills add bobchao/pm-skills-rfp-to-stories --skill "Story Refiner"

Install specific skill from multi-skill repository

# Description

Evaluates User Story quality and automatically corrects items not meeting standards. Reviews from developer, QA, and stakeholder perspectives, directly producing improved versions for low-quality Stories, reducing manual intervention.

# SKILL.md


name: "Story Refiner"
description: "Evaluates User Story quality and automatically corrects items not meeting standards. Reviews from developer, QA, and stakeholder perspectives, directly producing improved versions for low-quality Stories, reducing manual intervention."


Story Refiner Skill

Language Preference

Default: Respond in the same language as the user's input or as explicitly requested by the user.

If the user specifies a preferred language (e.g., "θ«‹η”¨δΈ­ζ–‡ε›žη­”", "Reply in Japanese"), use that language for all outputs. Otherwise, match the language of the provided Stories.


Role Definition

You simultaneously play three roles to review User Stories:

  1. Senior Developer: Evaluates technical feasibility and estimation clarity
  2. QA Engineer: Evaluates testability and acceptance criteria clarity
  3. Product Stakeholder: Evaluates requirement coverage and value clarity

Core Principles

Correction Over Reporting

  • Don't just point out problems, directly fix them
  • Every flagged issue must have a corresponding improved version
  • Humans only need final confirmation, not manual correction

Conservative Correction

  • Only correct Stories with "obvious problems"
  • Don't correct for the sake of correcting
  • Stories that already pass don't need changes

Transparent Annotation

  • Clearly explain why corrections were made
  • Provide original vs. improved version comparison
  • Let humans choose to accept or keep original version

Input Format

This Skill accepts the following inputs:

  1. Story Writer output (recommended)
  2. Any format User Stories list
  3. Original RFP + Stories (can cross-reference coverage)

Evaluation Criteria Reference

All scoring and evaluation must follow the standards defined in references/evaluation-criteria.md.

This document defines:
- Three scoring dimensions (Development Clarity, Testability, Value Clarity)
- Detailed scoring criteria for each dimension (1-5 points)
- Specific checkpoints and common deduction patterns
- Final score calculation method

Important: Both Quick Scan (Phase 1) and Detailed Evaluation (Phase 2) use these same criteria, with different levels of depth.


Evaluation Flow

Phase 1: Quick Scan

Score each Story initially (1-5 points) using the three dimensions from references/evaluation-criteria.md:

Scoring Method:
1. Quickly assess each dimension (Development Clarity, Testability, Value Clarity) on a 1-5 scale
2. Calculate final score: round((Development Clarity + Testability + Value Clarity) / 3)
3. Use the scoring criteria tables in references/evaluation-criteria.md as reference

Quick Assessment Focus:
- Development Clarity: Is action specific? Scope clear? Dependencies clear?
- Testability: Can write test cases? Acceptance criteria present? Value verifiable?
- Value Clarity: Value clear? Role correct? Maps to requirements?

Score Level Action
5 Excellent Keep, no modification
4 Good Keep, may have minor suggestions
3 Passing Mark for observation, may need minor adjustments
2 Insufficient Must correct
1 Severely insufficient Must rewrite

Only Stories scoring ≀ 3 enter Phase 2 detailed evaluation.

Phase 2: Multi-Perspective Detailed Evaluation

For Stories needing review, perform detailed evaluation from three perspectives using the Specific Checkpoints and Common Deduction Patterns defined in references/evaluation-criteria.md.

πŸ‘¨β€πŸ’» Developer Perspective

Reference: references/evaluation-criteria.md - Dimension 1: Development Clarity

Detailed Checkpoints (from evaluation-criteria.md):
- [ ] Is action description specific?
- 5 points: "Upload JPG/PNG format images, limited to 5MB"
- 3 points: "Upload images"
- 1 point: "Handle images"
- [ ] Does scope have boundaries?
- 5 points: "Edit article title and content"
- 3 points: "Edit article"
- 1 point: "Manage articles"
- [ ] Are dependencies clear?
- 5 points: Clearly marked "requires US-001 login feature completed first"
- 3 points: Implied dependency but not marked
- 1 point: Confusing or circular dependencies

Common Problems (see evaluation-criteria.md for deduction patterns):
- Vague verbs: "manage", "handle", "maintain" (-1~2 points)
- No scope boundary: "all settings", "various reports" (-1~2 points)
- Compound features: "create and edit" (-1 point)
- Technical details mixed in: "load using AJAX" (-1 point)

πŸ§ͺ QA Perspective

Reference: references/evaluation-criteria.md - Dimension 2: Testability

Detailed Checkpoints (from evaluation-criteria.md):
- [ ] Are acceptance criteria clear?
- 5 points: Has specific Given-When-Then or checklist
- 3 points: Has general direction but not specific
- 1 point: No acceptance criteria, or vague like "should be user-friendly"
- [ ] Is value verifiable?
- 5 points: "so that I can find target article within 3 seconds" (measurable)
- 3 points: "so that I can find articles faster" (relative but comparable)
- 1 point: "so that I can have a better experience" (not measurable)
- [ ] Are error scenarios considered?
- 5 points: Clearly states error handling
- 3 points: Only happy path, but error handling can be inferred
- 1 point: Error scenarios completely unconsidered, and important to feature

Common Problems (see evaluation-criteria.md for deduction patterns):
- No acceptance criteria: None at all (-1~2 points, important features deduct more)
- Vague criteria: "should be fast", "should look good" (-1 point)
- Untestable value: "so that I can have better experience" (-2 points)

πŸ‘€ Stakeholder Perspective

Reference: references/evaluation-criteria.md - Dimension 3: Value Clarity

Detailed Checkpoints (from evaluation-criteria.md):
- [ ] Does "so that..." state real value?
- 5 points: "so that I can pull up data within 10 seconds when customer calls"
- 3 points: "so that I can quickly view data"
- 1 point: "so that I can use this feature" (circular reasoning)
- [ ] Is role correct?
- 5 points: Role is clear and is the true beneficiary of this feature
- 3 points: Role too generic (e.g., "user" covers too much)
- 1 point: Wrong role (e.g., giving admin feature to regular user)
- [ ] Maps to original requirements?
- 5 points: Can directly trace to a specific RFP paragraph
- 3 points: Is reasonably derived implied requirement
- 1 point: Can't see connection to original requirements

Common Problems (see evaluation-criteria.md for deduction patterns):
- Circular reasoning: "so that I can use this feature" (-2 points)
- Role too generic: Everything is "user" (-1 point)
- Technical task disguised: "As a developer" (-3 points)
- Deviates from original requirements: Features RFP didn't mention (-1~2 points)

Phase 3: Auto-Correction

For Stories scoring ≀ 3, execute corrections based on problem type:

Correction Strategies

Problem Type Correction Method
Scope too large Split into multiple Stories
Scope vague Add specific operation description
Value unclear Rewrite "so that..." part
Not testable Add specific acceptance criteria
Format issue Adjust to standard format
Wrong role Correct to proper role
Improper granularity Split or merge

Correction Principles

  1. Minimum change: If small change works, don't make big changes
  2. Preserve intent: Don't change original requirement intent
  3. Clear annotation: Explain what was changed and why

Phase 4: Iterative Validation (Max 3 Rounds)

Corrected Stories need re-evaluation to ensure quality meets standards. This is the core of iterative refinement.

Why Iteration Is Needed

Situation Single-Pass Refinement Problem Iterative Solution
Story is split New Stories aren't evaluated βœ… Next round evaluates new Stories
Over-correction Might break something βœ… Next round catches and fine-tunes
Acceptance criteria still not specific Passes through βœ… Next round strengthens

Iteration Flow

Round 1: Evaluate all Stories β†’ Correct low-scoring items β†’ Produce corrected version
    ↓
Round 2: Evaluate "corrected" + "newly generated" Stories β†’ Correct again if needed
    ↓
Round 3: (If still issues) Final fine-tuning
    ↓
Terminate: Output final version

Termination Conditions (Stop when any is met)

  1. Quality achieved: All Stories score β‰₯ 4
  2. No corrections needed: This round had no Story corrections
  3. Limit reached: Already executed 3 rounds
  4. Convergence failed: Same Story corrected 2 rounds in a row but score didn't improve

Iteration Rules

Rule Description
Progressive convergence Each round should reduce problems, not increase them
History memory Track each Story's correction history, avoid back-and-forth changes
Correction limit Same Story can only be majorly changed once, then only fine-tuned
New Story priority From round 2, prioritize evaluating Stories generated in previous round

Decreasing Correction Intensity

Round Allowed Correction Types
Round 1 All corrections (split, rewrite, add acceptance criteria, etc.)
Round 2 Moderate corrections (add acceptance criteria, adjust wording, minor splits)
Round 3 Fine-tuning only (word corrections, add details, no splitting or rewriting)

This design ensures:
- Round 1 solves structural problems
- Round 2 handles omissions and fine-tuning
- Round 3 is just wrap-up, avoiding infinite modification

Iteration Summary Output

Record at end of each round:

### Round N Refinement Summary

| Metric | Value |
|--------|-------|
| Stories Evaluated | XX |
| Corrections Made | XX |
| New (from splits) | XX |
| Average Score Improvement | +X.X |

**This Round's Corrections**:
- US-XXX: [Correction summary]
- US-XXX: [Correction summary]

**Continue?**: [Yes/No, reason]

Output Format

Structure Overview

# Story Refinement Report

## πŸ“Š Refinement Summary

### Overall Results
- Original Story Count: XX
- Final Story Count: XX (including split additions)
- Refinement Rounds: X / 3
- Termination Reason: [Quality achieved / No corrections needed / Limit reached]

### Per-Round Statistics
| Round | Evaluated | Corrected | Added | Average Score |
|-------|-----------|-----------|-------|---------------|
| Round 1 | XX | XX | XX | X.X |
| Round 2 | XX | XX | XX | X.X |
| ... | ... | ... | ... | ... |

## πŸ”„ Refinement History
[Per-round correction summaries, collapsible]

## βœ… Final Passing Stories
[Stories scoring β‰₯ 4]

## πŸ”§ Corrected Stories
[Original β†’ Final version comparison, noting correction round]

## βž• Split-Generated Stories
[New Stories from splits]

## πŸ—‘οΈ Recommended for Removal
[Stories not matching requirements or duplicates]

## πŸ“‹ Final Story List
[Complete integrated list, ready for use]

Correction Detail Format

### πŸ”§ US-XXX: [Title]

**Original Version**:
> As a [role], I want [action], so that [value].

**Problem Diagnosis**:
- πŸ§ͺ QA Perspective: Acceptance criteria unclear, can't write tests
- πŸ‘¨β€πŸ’» Developer Perspective: Scope includes multiple independent features

**Correction Method**: Split into two Stories + add acceptance criteria

**Improved Version**:

**US-XXX-A**: As a [role], I want [action A], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1
  - [ ] Condition 2

**US-XXX-B**: As a [role], I want [action B], so that [value].
- Acceptance Criteria:
  - [ ] Condition 1

---

Special Situation Handling

Situation 1: Large Number of Stories Need Correction (>50%)

This may indicate systematic issues in Story Writer phase:

  1. Don't correct one by one (too inefficient)
  2. Identify common problem patterns
  3. Propose systematic suggestions
  4. Recommend re-running Story Writer

Situation 2: Discovered Missing Features

If comparing to RFP reveals features not covered by Stories:

  1. Mark as "recommended addition"
  2. Produce suggested Story
  3. Mark source (derived from which part of RFP)

Situation 3: Discovered Duplicate Stories

  1. Mark duplicate items
  2. Recommend which to keep (or merge)
  3. Explain judgment basis

Situation 4: Story Quality Is Excellent

If all Stories score β‰₯ 4:

  1. Briefly confirm "Quality is good, no corrections needed"
  2. Can provide minor optimization suggestions (not mandatory)
  3. Directly output final list

Output Example

Refer to assets/refine-example.md for complete output example.


Reference Documents

  • Evaluation Criteria: references/evaluation-criteria.md - Defines detailed scoring standards for all three dimensions
  • Output Example: assets/refine-example.md - Complete refinement report example

Integration with Other Skills

Standard Flow

[rfp-analyzer] β†’ [story-writer] β†’ [story-refiner] β†’ Final output

Usage: After Story Writer produces User Stories draft, use Story Refiner to evaluate quality and automatically correct low-scoring Stories. This is a separate step that should be called explicitly when refinement is needed.


Quality Threshold Settings

Default Threshold

  • Pass threshold: β‰₯ 4 points
  • Must correct: ≀ 2 points
  • Observation zone: 3 points (optional correction)

Strict Mode

When user requests "strict check" or project risk is higher:

  • Pass threshold: 5 points
  • Must correct: ≀ 3 points
  • All Stories must have acceptance criteria

Lenient Mode

When user requests "quick pass" or project is MVP/POC:

  • Pass threshold: β‰₯ 3 points
  • Only correct ≀ 1 point severe issues
  • Acceptance criteria optional

Checklist

After completing refinement, confirm the following items:

  • [ ] All Stories ≀ 2 points have been corrected or rewritten
  • [ ] Corrected Stories meet INVEST principles
  • [ ] Split-generated new Stories have proper numbering
  • [ ] Final list has no duplicates
  • [ ] All original requirement coverage preserved
  • [ ] Clear annotation of which are original vs. improved versions
  • [ ] Termination reason is reasonable (not forced stop from reaching limit)
  • [ ] No Story was changed back-and-forth across multiple rounds

Iterative vs. Single-Pass Refinement

When to Use Iterative (Default)

  • Formal projects
  • Story count > 10
  • Has split operations
  • Higher quality requirements

When to Use Single-Pass

When user explicitly says "quick refine" or "one pass only":

  • MVP/POC projects
  • Time pressure
  • Story count < 10
  • General quality requirements

Why 3 Round Limit

  1. Rule of thumb: Most problems resolved within 2 rounds
  2. Diminishing returns: Round 3+ corrections are usually nitpicking
  3. Avoid over-engineering: Infinite refinement may drift from original requirements
  4. Time cost: Each round requires processing time

If large numbers of low-scoring Stories remain after 3 rounds:
1. Output current results with annotations
2. Suggest returning to Story Writer to regenerate
3. Analyze whether RFP itself has systematic issues

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.