Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add miles-knowbl/orchestrator --skill "estimation"
Install specific skill from multi-skill repository
# Description
Estimate effort, complexity, and duration for systems and features. Provides frameworks for sizing work, accounting for risk, and calibrating estimates over time. Supports planning and expectation setting throughout the engineering loop.
# SKILL.md
name: estimation
description: "Estimate effort, complexity, and duration for systems and features. Provides frameworks for sizing work, accounting for risk, and calibrating estimates over time. Supports planning and expectation setting throughout the engineering loop."
phase: INIT
category: core
version: "1.0.0"
depends_on: []
tags: [planning, estimation, sizing, effort]
Estimation
Predict effort before you build.
When to Use
- New system β Estimate before starting implementation
- Feature planning β Size work for prioritization
- Sprint planning β Break down into time-boxed chunks
- Stakeholder communication β Set realistic expectations
- Resource allocation β Plan team capacity
- Trade-off decisions β Compare build vs buy, now vs later
Reference Requirements
MUST read before applying this skill:
| Reference | Why Required |
|---|---|
estimation-methods.md |
Different estimation approaches |
estimate-template.md |
Format for estimate documentation |
Read if applicable:
| Reference | When Needed |
|---|---|
complexity-factors.md |
When assessing complexity |
Verification: Check calibration data before finalizing estimate.
Required Deliverables
| Deliverable | Location | Condition |
|---|---|---|
ESTIMATE.md |
Project root | Always |
Core Concept
Estimation answers: "How much effort will this take?"
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ESTIMATION β
β β
β INPUT OUTPUT β
β βββββ ββββββ β
β FeatureSpec βββββββββββββββββββΆ Complexity: Large β
β Context βββββββββββββββββββΆ Effort: 40-60 hours β
β Constraints βββββββββββββββββββΆ Duration: 2-3 weeks β
β Risk: Medium (1.5x buffer) β
β Confidence: Medium β
β β
β Estimation is NOT a commitment β it's a forecast with uncertainty β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Estimation Dimensions
| Dimension | What It Measures | Units |
|---|---|---|
| Complexity | How hard is this? | S / M / L / XL |
| Effort | How much work? | Person-hours or person-days |
| Duration | How long on calendar? | Days or weeks |
| Risk | How uncertain? | Multiplier (1.2x - 3x) |
| Confidence | How sure are we? | High / Medium / Low |
Complexity vs Effort vs Duration
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β COMPLEXITY β EFFORT β DURATION β
β β
β Example: Database migration β
β β
β Complexity: Small (straightforward script) β
β Effort: 4 hours (write, test, document) β
β Duration: 2 weeks (needs DBA review, maintenance window) β
β β
β Example: New microservice β
β β
β Complexity: Large (many moving parts) β
β Effort: 80 hours β
β Duration: 2 weeks (if 1 person) or 1 week (if 2 people) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Estimation Process
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β ESTIMATION PROCESS β
β β
β 1. UNDERSTAND SCOPE β
β βββ Read FeatureSpec thoroughly β
β βββ Identify all capabilities β
β βββ Note interfaces and integrations β
β βββ List unknowns and assumptions β
β β
β 2. BREAK DOWN β
β βββ Decompose into estimable chunks β
β βββ Each chunk should be < 1 day of work β
β βββ Identify dependencies between chunks β
β β
β 3. SIZE EACH CHUNK β
β βββ Apply estimation method β
β βββ Note complexity factors β
β βββ Record assumptions β
β β
β 4. ACCOUNT FOR RISK β
β βββ Identify uncertainties β
β βββ Apply risk multiplier β
β βββ Consider worst-case scenarios β
β β
β 5. AGGREGATE β
β βββ Sum effort estimates β
β βββ Calculate duration (accounting for parallelism) β
β βββ State confidence level β
β β
β 6. COMMUNICATE β
β βββ Present as range, not point β
β βββ Explain assumptions and risks β
β βββ Update as you learn more β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Applying Calibration Data
IMPORTANT: Before finalizing any estimate, check for historical calibration data.
Step 1.5: Load Calibration
After understanding scope (Step 1), load calibration data:
1. Check for calibration file:
Path: learning/calibration.json
2. If found, extract relevant multipliers:
- adjustments.global.agenticMultiplier (if agentic execution)
- adjustments.byComplexity[SIZE] (S/M/L/XL)
- adjustments.byPhase[PHASE] (per-phase adjustments)
- adjustments.byCategory[CATEGORY] (domain-specific)
3. Check confidence levels:
- < 3 samples: Do NOT apply (flag for future tracking)
- 3-5 samples: Apply cautiously with Β±30% range note
- 6+ samples: Apply with confidence
Applying Phase Multipliers
When using Skill-Phase Estimation, apply historical adjustments:
## Calibrated Skill-Phase Estimate
| Phase | Base Hours | Multiplier (samples) | Adjusted | Confidence |
|-------|-----------|----------------------|----------|------------|
| spec | 0.5h | 1.15 (8) | 0.58h | Good |
| architect | 1h | 1.25 (8) | 1.25h | Good |
| implement | 6h | 0.75 (12) | 4.5h | Good |
| test | 2h | 0.80 (10) | 1.6h | Good |
| verify | 0.5h | 1.0 (3) | 0.5h | Low |
| **Total** | **12.5h** | | **10.3h** | |
Calibration impact: -18% from historical data
Document Calibration in ESTIMATE.md
Add a calibration section to every estimate:
## Calibration Applied
| Adjustment | Multiplier | Samples | Confidence | Applied |
|------------|------------|---------|------------|---------|
| Global agentic | 0.3x | 6 | Good | Yes |
| Complexity (M) | 0.85x | 4 | Low | Yes (Β±30%) |
| Phase: IMPLEMENT | 0.75x | 12 | Good | Yes |
| Phase: TEST | 0.80x | 10 | Good | Yes |
| Category: MCP | 1.1x | 2 | None | No (n<3) |
### Unadjusted Estimate
- Total: 12.5 hours
### Calibrated Estimate
- Total: 10.3 hours
- Adjustments applied: 4
- Adjustments skipped (low confidence): 1
- Overall confidence: Medium-High
Confidence Rules
| Samples | Confidence | Action |
|---|---|---|
| 0 | None | Use 1.0x (no adjustment) |
| 1-2 | Very Low | Do NOT apply, flag for tracking |
| 3-5 | Low | Apply with Β±30% range note |
| 6-10 | Medium | Apply with Β±20% range |
| 10+ | High | Apply with confidence |
No Calibration Data
If learning/calibration.json doesn't exist or has no relevant data:
## Calibration Applied
No historical calibration data available for this domain.
Using base estimates. After completion:
- Actual hours will be recorded
- Calibration multipliers will be calculated
- Future estimates will benefit from this data
This estimate contributes to: First calibration baseline
Estimation Methods
1. T-Shirt Sizing
Quick relative sizing for early planning.
| Size | Relative Effort | Typical Duration | Example |
|---|---|---|---|
| S | 1x | < 1 day | Add a field, fix a bug |
| M | 2-3x | 1-3 days | New endpoint, simple feature |
| L | 5-8x | 1-2 weeks | New service, complex feature |
| XL | 13-20x | 2-4 weeks | Major system, many integrations |
When to use: Initial scoping, backlog grooming, rough planning.
2. Analogous Estimation
Compare to similar past work.
## Analogous Estimate
**Item:** User notification service
**Similar to:** Email service (completed 3 months ago)
| Aspect | Email Service | Notification Service | Adjustment |
|--------|---------------|---------------------|------------|
| Core logic | 20 hours | Similar | 20 hours |
| Integrations | 2 (SMTP, templates) | 4 (push, SMS, email, in-app) | 40 hours (+20) |
| Testing | 8 hours | More channels | 16 hours |
| **Total** | **40 hours** | | **76 hours** |
**Confidence:** Medium (similar but more integrations)
When to use: You've done similar work before.
3. Parametric Estimation
Calculate based on countable units.
## Parametric Estimate
**Item:** REST API for Order Service
| Component | Count | Hours Each | Total |
|-----------|-------|------------|-------|
| Endpoints | 8 | 3 | 24 |
| Database models | 4 | 2 | 8 |
| Integration tests | 8 | 1.5 | 12 |
| Documentation | 8 | 0.5 | 4 |
| **Subtotal** | | | **48** |
| Setup/config | | | 8 |
| **Total** | | | **56 hours** |
**Basis:** Historical average of 3 hours per endpoint (including tests)
When to use: Repetitive, well-understood work.
4. Three-Point Estimation
Account for uncertainty with optimistic/likely/pessimistic.
## Three-Point Estimate
**Item:** Payment integration
| Scenario | Estimate | Notes |
|----------|----------|-------|
| Optimistic (O) | 24 hours | Clean API, good docs, no issues |
| Most Likely (M) | 40 hours | Typical integration challenges |
| Pessimistic (P) | 80 hours | Poor API, compliance issues, rework |
**PERT Estimate:** (O + 4M + P) / 6 = (24 + 160 + 80) / 6 = **44 hours**
**Standard Deviation:** (P - O) / 6 = 9.3 hours
**Range:** 35-53 hours (Β±1 SD)
**Confidence:** Medium
When to use: Significant uncertainty, need to communicate risk.
5. Bottom-Up Estimation
Sum detailed task estimates.
## Bottom-Up Estimate
**Item:** Work Order Service
### Capability 1: Work Order CRUD
| Task | Hours |
|------|-------|
| Database schema | 2 |
| Model and repository | 3 |
| Create endpoint | 2 |
| Read endpoints (list, detail) | 3 |
| Update endpoint | 2 |
| Delete endpoint | 1 |
| Unit tests | 4 |
| Integration tests | 3 |
| **Subtotal** | **20** |
### Capability 2: Assignment
| Task | Hours |
|------|-------|
| Assignment logic | 4 |
| Availability check | 3 |
| Notification trigger | 2 |
| Tests | 4 |
| **Subtotal** | **13** |
[... more capabilities ...]
### Summary
| Capability | Hours |
|------------|-------|
| CRUD | 20 |
| Assignment | 13 |
| Status transitions | 10 |
| Completion flow | 12 |
| **Implementation total** | **55** |
| Scaffolding | 4 |
| Documentation | 6 |
| Code review / fixes | 8 |
| **Grand total** | **73 hours** |
When to use: Detailed planning, accurate forecasts, sprint commitment.
6. Skill-Phase Estimation
Estimate by engineering skill/phase for calibration accuracy.
## Skill-Phase Estimate
**Item:** Work Order Service
### Phase Breakdown
| Phase | Skill | Estimated | Notes |
|-------|-------|-----------|-------|
| Specification | spec | 0.5h | FEATURESPEC.md |
| Specification | estimation | 0.25h | This document |
| Architecture | architect | 1h | ARCHITECTURE.md |
| Setup | scaffold | 0.5h | Project structure |
| Implementation | implement | 6h | 4 capabilities |
| Testing | test-generation | 2h | Unit + integration |
| Verification | code-verification | 0.5h | Lint, types, tests |
| Validation | code-validation | 0.5h | Full system check |
| Documentation | document | 0.5h | README, API docs |
| Review | code-review | 0.5h | Self-review, PR |
| Ship | deploy | 0.25h | PR, merge |
| **Total** | | **12.5h** | |
### Per-Capability Breakdown
For skills called multiple times (implement, test-generation, code-verification):
| ID | Capability | Implement | Test | Verify | Total |
|----|------------|-----------|------|--------|-------|
| C1 | Work Order CRUD | 90m | 30m | 10m | 130m |
| C2 | Assignment | 60m | 20m | 5m | 85m |
| C3 | Status transitions | 45m | 15m | 5m | 65m |
| C4 | Completion flow | 60m | 20m | 5m | 85m |
| **Total** | | 4.25h | 1.4h | 0.4h | 6.1h |
Why use this:
- Maps directly to skillsLog for calibration
- Calibration compares estimate vs actual per-skill
- Identifies which phases we systematically over/under-estimate
When to use: All agentic execution. This is the primary estimation format.
β See references/estimation-methods.md
Complexity Factors
Apply multipliers for conditions that increase difficulty:
| Factor | Impact | Multiplier |
|---|---|---|
| New technology | Learning curve, unknowns | +50-200% |
| Integration complexity | Each external system | +20-40% per integration |
| Security requirements | Auth, encryption, audit | +30-50% |
| Performance requirements | Optimization, caching | +20-40% |
| Regulatory/compliance | Documentation, controls | +50-100% |
| UI complexity | Complex interactions, polish | +20-50% |
| Data migration | ETL, validation, rollback | +30-100% |
| Legacy code | Understanding, compatibility | +30-50% |
| Distributed system | Coordination, consistency | +40-80% |
| Real-time requirements | WebSockets, streaming | +30-50% |
Applying Factors
## Complexity-Adjusted Estimate
**Base estimate:** 40 hours
**Applicable factors:**
- New technology (learning Kafka): +50% β +20 hours
- 2 integrations (Auth, Inventory): +30% each β +24 hours
- Security (handles PII): +30% β +12 hours
**Adjusted estimate:** 40 + 20 + 24 + 12 = **96 hours**
Note: Factors may overlap; apply judgment to avoid double-counting.
β See references/complexity-factors.md
Risk and Uncertainty
Risk Categories
| Category | Examples | Impact |
|---|---|---|
| Technical | New tech, complex algorithms, performance | High variance |
| Integration | Third-party APIs, legacy systems | Dependencies |
| Requirements | Unclear scope, changing needs | Rework |
| Resource | Key person unavailable, skill gaps | Delays |
| External | Vendor delays, regulatory changes | Blockers |
Risk Multipliers
| Confidence | Risk Level | Multiplier | When to Apply |
|---|---|---|---|
| High | Low | 1.0-1.2x | Well-understood, done before |
| Medium | Medium | 1.3-1.5x | Some unknowns, new elements |
| Low | High | 1.5-2.0x | Many unknowns, new territory |
| Very Low | Very High | 2.0-3.0x | Unprecedented, research-like |
Communicating Uncertainty
Always present estimates as ranges:
β "It will take 40 hours"
β
"I estimate 30-50 hours, most likely around 40"
β "We'll be done in 2 weeks"
β
"Target is 2 weeks; risk factors could push to 3 weeks"
Estimate Output Format
# Estimate: [System/Feature Name]
## Summary
| Dimension | Value |
|-----------|-------|
| Complexity | [S/M/L/XL] |
| Effort | [X-Y hours] |
| Duration | [X-Y days/weeks] |
| Confidence | [High/Medium/Low] |
| Risk Multiplier | [1.Xx] |
## Scope
[What's included]
- Capability 1
- Capability 2
- [...]
[What's NOT included]
- Out of scope item 1
- [...]
## Breakdown
| Component | Base Hours | Factors | Adjusted |
|-----------|------------|---------|----------|
| [Component 1] | X | [factors] | Y |
| [Component 2] | X | [factors] | Y |
| **Total** | | | **Z** |
## Assumptions
- [Assumption 1]
- [Assumption 2]
## Risks
| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| [Risk 1] | [H/M/L] | [H/M/L] | [Action] |
## Dependencies
- Requires [X] to be complete
- Blocked by [Y] until [date]
## Historical Comparison
[Similar past work and how this compares]
---
*Estimated by: [Agent/Person]*
*Date: [Date]*
*Valid until: [Date or "requirements change"]*
β See references/estimate-template.md
Calibration
Track estimates vs actuals to improve over time.
Tracking Template
## Estimate Retrospective
**System:** Work Order Service
**Estimated:** 73 hours
**Actual:** 92 hours
**Variance:** +26%
### What Was Different?
| Component | Estimated | Actual | Variance | Why |
|-----------|-----------|--------|----------|-----|
| CRUD | 20 | 18 | -10% | Went smoothly |
| Assignment | 13 | 24 | +85% | Availability logic more complex |
| Status | 10 | 14 | +40% | Edge cases discovered |
| Completion | 12 | 15 | +25% | Signature handling tricky |
| Other | 18 | 21 | +17% | Normal variance |
### Lessons Learned
1. Assignment logic always more complex than expected β increase multiplier
2. Need to account for edge case discovery β add 20% buffer
3. Integration tests took longer than unit tests β adjust ratio
### Adjustment for Future
- Assignment/scheduling features: apply 1.5x multiplier
- Add 15% buffer for edge case discovery
Calibration Metrics
| Metric | Calculation | Target |
|---|---|---|
| Accuracy | Actual / Estimated | 0.9 - 1.1 |
| Precision | Std dev of (Actual / Estimated) | < 0.3 |
| Bias | Average (Actual - Estimated) | ~0 |
Consistent underestimate β Increase base estimates
Consistent overestimate β Decrease base estimates
High variance β Break down further, reduce unknowns
β See references/calibration-guide.md
Common Estimation Mistakes
| Mistake | Problem | Fix |
|---|---|---|
| Forgetting overhead | Only count coding time | Add 20-30% for meetings, reviews, context switching |
| Optimism bias | Assume best case | Use three-point or add buffer |
| Anchoring | First number sticks | Estimate independently, then compare |
| Scope creep | Requirements grow | Document assumptions, re-estimate on change |
| Hero estimates | "I could do it in X" | Estimate for average developer |
| Ignoring dependencies | Assume parallel work | Map dependencies, account for handoffs |
| Not updating | Stale estimates | Re-estimate as you learn |
Relationship to Other Skills
| Skill | Relationship |
|---|---|
spec |
Provides scope to estimate |
triage |
Uses estimates for prioritization |
entry-portal |
Estimates feed into queue planning |
loop-controller |
Estimates inform session planning |
Key Principles
Estimate in ranges. Point estimates are false precision.
Break it down. Smaller pieces are easier to estimate.
Document assumptions. They're as important as the number.
Track and learn. Calibrate based on actuals.
Update when things change. Estimates have a shelf life.
Communicate uncertainty. Stakeholders need to understand risk.
Mode-Specific Behavior
Estimation approach differs by orchestrator mode:
Greenfield Mode
| Aspect | Behavior |
|---|---|
| Scope | Full systemβall capabilities and layers |
| Approach | Comprehensiveβbottom-up + skill-phase |
| Patterns | Free choiceβestablish estimation baselines |
| Deliverables | Full estimate with risk factors |
| Validation | Historical comparison from similar projects |
| Constraints | Minimalβmedium to high uncertainty expected |
Greenfield estimation:
- Estimate all layers (data, service, API, UI)
- Include scaffolding and setup time
- Account for learning curve on new patterns
- Plan for comprehensive test coverage
- Include documentation time
Greenfield risk factors:
Base estimate: X hours
+ New technology learning: +30-50%
+ Architecture decisions: +20-30%
+ Comprehensive testing: +20-30%
+ Documentation: +10-15%
= Adjusted estimate: 1.8x - 2.2x base
Brownfield-Polish Mode
| Aspect | Behavior |
|---|---|
| Scope | Gap-specificβmissing capabilities only |
| Approach | Extend existingβgap-based estimation |
| Patterns | Should match existing velocity patterns |
| Deliverables | Gap-based estimate with compatibility buffer |
| Validation | Velocity in this codebase |
| Constraints | Low to medium uncertaintyβknown territory |
Polish estimation:
- Estimate only what's missing
- Include time to understand existing code
- Account for maintaining compatibility
- Reduced testing (fill gaps only)
- Minimal documentation updates
Polish estimation formula:
For each gap:
Understanding time: 0.5-2 hours (existing code review)
Implementation time: Based on gap complexity
Testing time: Match existing coverage
Integration time: Ensure compatibility
Total = Sum of gaps Γ 1.2 (compatibility buffer)
Polish-specific factors:
| Factor | Impact |
|--------|--------|
| Code quality | Low quality = +30-50% |
| Test coverage | Low coverage = +20-40% |
| Documentation | Poor docs = +20-30% |
| Coupling | High coupling = +20-40% |
Brownfield-Enterprise Mode
| Aspect | Behavior |
|---|---|
| Scope | Change-specificβsingle change only |
| Approach | Surgicalβchange-impact analysis |
| Patterns | Must conform exactly to team velocity |
| Deliverables | Change estimate with review cycles |
| Validation | Team velocity in this system |
| Constraints | Low uncertaintyβconstrained scope |
Enterprise estimation:
- Estimate the specific change
- Include impact analysis time
- Account for review cycles
- Plan for comprehensive testing (regression)
- Include CI/CD pipeline time
Enterprise estimation formula:
Impact analysis: 1-4 hours
Implementation: Based on change size
Regression testing: Proportional to risk
Review cycles: 2-4 hours per cycle
CI/CD: Fixed (pipeline duration)
Buffer for process: +20%
Enterprise constraints:
- Fixed time for security review
- Fixed time for compliance checks
- Multiple approval stages
- Scheduled deployment windows
Mode Comparison
| Aspect | Greenfield | Polish | Enterprise |
|---|---|---|---|
| Typical multiplier | 1.8x - 2.2x | 1.2x - 1.5x | 1.1x - 1.3x |
| Biggest uncertainty | Architecture | Compatibility | Process |
| Estimation unit | Capabilities | Gaps | Changes |
| Calibration source | Similar projects | This codebase | This system |
Estimation Output by Mode
Greenfield estimate structure:
## Estimate: [System Name]
Complexity: L (new system)
Base effort: 80 hours
Risk multiplier: 1.8x
Adjusted: 120-160 hours
Confidence: Medium
Polish estimate structure:
## Estimate: [Gap Fill]
Gaps identified: 5
Base effort: 24 hours
Compatibility buffer: 1.2x
Adjusted: 28-32 hours
Confidence: Medium-High
Enterprise estimate structure:
## Estimate: [Change Request]
Change scope: Minimal (2 files)
Implementation: 8 hours
Review/process: 6 hours
Total: 14-16 hours
Confidence: High
References
references/estimation-methods.md: Detailed method explanationsreferences/complexity-factors.md: Factor catalog with examplesreferences/estimate-template.md: Standard estimate documentreferences/calibration-guide.md: Improving estimates over time
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.