Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add DipakMajhi/product-management-skills --skill "assumption-testing"
Install specific skill from multi-skill repository
# Description
Map, score, and design tests for product assumptions before building. Covers assumption extraction across desirability/usability/viability/feasibility, Strategyzer Test Card and Learning Card, Riskiest Assumption Test (RAT), David Bland's Testing Business Ideas methodology, Alberto Savoia pretotyping (XYZ hypothesis, YODA vs OPD), evidence hierarchy (5 levels from opinion to controlled experiment), experiment type decision tree (12 methods ordered by cost and fidelity), sample size guidance for qualitative vs quantitative, learning velocity metrics, experiment portfolio management, common validation mistakes (confirmation bias, survivorship bias, false positives), and kill criteria design. Use when identifying riskiest assumptions, designing pretotypes or MVPs, validating solutions before engineering investment, or building an evidence-based product development practice.
# SKILL.md
name: assumption-testing
description: "Map, score, and design tests for product assumptions before building. Covers assumption extraction across desirability/usability/viability/feasibility, Strategyzer Test Card and Learning Card, Riskiest Assumption Test (RAT), David Bland's Testing Business Ideas methodology, Alberto Savoia pretotyping (XYZ hypothesis, YODA vs OPD), evidence hierarchy (5 levels from opinion to controlled experiment), experiment type decision tree (12 methods ordered by cost and fidelity), sample size guidance for qualitative vs quantitative, learning velocity metrics, experiment portfolio management, common validation mistakes (confirmation bias, survivorship bias, false positives), and kill criteria design. Use when identifying riskiest assumptions, designing pretotypes or MVPs, validating solutions before engineering investment, or building an evidence-based product development practice."
argument-hint: "[describe the product idea or solution, the desired outcome it supports, what you already know vs. what you are guessing, and any constraints on testing]"
Assumption Testing Framework
You help PMs avoid building the wrong thing by surfacing and testing the riskiest assumptions before investing in development. Most products fail not because of poor execution but because the underlying assumptions were wrong.
Apply this skill to: $ARGUMENTS
The Build Trap
The build trap occurs when teams ship features based on opinion rather than evidence. The antidote is assumption testing: identifying what must be true for a solution to work, and testing the riskiest assumptions cheaply before building.
Research indicates roughly 70% of features fail to deliver expected value. Assumption testing is the discipline that prevents wasted effort.
Step 1: Assumption Extraction
For any product idea or solution, extract assumptions across four categories:
Desirability Assumptions (Do customers want this?)
- Users have the problem we think they have
- Users experience this problem frequently enough to seek a solution
- Users are willing to change their current behavior or switch from existing solutions
- The problem is painful enough that users will pay for a solution
- Our target segment is large enough to build a business around
- Users perceive our solution as addressing their problem (not a different one)
Usability Assumptions (Can customers use this?)
- Users can discover and understand the feature without assistance
- Users can complete the key action without confusion or errors
- The solution fits into their existing workflow and tools
- The learning curve is acceptable for the target user sophistication level
- Users understand the value they are getting from the solution
- Error recovery is intuitive (users can fix mistakes without support)
Viability Assumptions (Does this work for the business?)
- This can be built within budget and timeline constraints
- The pricing users will accept generates sufficient margin
- We can acquire customers at a sustainable CAC
- LTV:CAC ratio is at least 3:1 at projected scale
- This does not cannibalize existing revenue streams in a net-negative way
- Regulatory and compliance requirements can be met
- The solution strengthens (or at least does not weaken) our competitive position
Feasibility Assumptions (Can we build this?)
- We have (or can acquire) the technical capability
- We can build this within the required timeline
- Key dependencies (APIs, partners, data sources) are accessible and reliable
- The solution can scale to the required user load
- We can maintain and iterate on this solution with available resources
- Data quality and availability support the solution's core logic
Extraction from Lean Canvas
Every cell on the Lean Canvas contains testable assumptions:
| Canvas Cell | Example Assumptions |
|---|---|
| Problem | Users actually face this problem frequently |
| Customer Segments | This segment is large enough and reachable |
| Unique Value Proposition | Users perceive this as differentiated |
| Solution | This approach solves the problem better than alternatives |
| Channels | We can reach customers through these channels at acceptable cost |
| Revenue Streams | Users will pay this price at this frequency |
| Cost Structure | Unit economics work at scale |
| Key Metrics | These metrics actually indicate product health |
| Unfair Advantage | This advantage is durable and hard to replicate |
Step 2: Assumption Scoring
Score each assumption on two dimensions (1-5 each):
| Dimension | 1 | 3 | 5 |
|---|---|---|---|
| Impact if wrong | Solution still works, minor adjustment | Significant rework required | Entire solution fails, total write-off |
| Certainty (Evidence) | Strong data or direct evidence | Indirect evidence or analogy | Pure guess, no supporting evidence |
Risk Score = Impact x (6 - Certainty)
Highest risk scores get tested first. Plot assumptions on a 2x2 matrix:
| Quadrant | Impact | Certainty | Action |
|---|---|---|---|
| Leap of Faith | High | Low | TEST IMMEDIATELY -- these determine success or failure |
| Known Risk | High | High | MONITOR -- important but currently validated |
| Unknown | Low | Low | DEFER -- test later or accept the risk |
| Safe | Low | High | IGNORE -- low risk, well understood |
Assumption Mapping Workshop
To facilitate assumption mapping with a team:
- Use "We believe that..." format for each assumption
- Color-code by category (green = desirability, blue = usability, yellow = viability, red = feasibility)
- Each participant writes assumptions individually (3 minutes)
- Share and cluster as a group (10 minutes)
- Vote on impact and certainty using dot stickers
- Plot on the 2x2 matrix
- Select top 3 leap-of-faith assumptions for immediate testing
Step 3: Evidence Hierarchy
Not all evidence is equal. Know the strength of what you have:
| Level | Evidence Type | Strength | Example |
|---|---|---|---|
| 1 | Opinion and intuition | Weakest | "I think users want this" |
| 2 | Secondary research (OPD) | Low-Medium | Market reports, competitor analysis, published studies |
| 3 | Qualitative primary research | Medium | Customer interviews (5-15 users), observation |
| 4 | Quantitative behavioral data | Medium-High | Analytics, A/B tests, smoke tests measuring real behavior |
| 5 | Controlled experiments | Strongest | Randomized controlled tests, longitudinal studies |
Key principle (Alberto Savoia): Your Own Data (YODA) is always more valuable than Other People's Data (OPD). OPD may be biased, outdated, or collected from a different context. Pretotyping and direct experiments generate YODA.
Progression rule: Start with the cheapest evidence level that provides a useful signal. Only invest in higher-fidelity evidence when lower-cost methods are inconclusive or when the stakes justify it.
Step 4: Experiment Design
For each high-priority assumption, design the cheapest test that provides sufficient signal. Use the Riskiest Assumption Test (RAT) principle: test the assumption, not the full solution.
Experiment Types (Ordered by Cost and Fidelity)
| Method | Cost | Time | Best For | Signal Strength |
|---|---|---|---|---|
| Data mining | Free | Hours | Checking if existing data already answers the question | Medium-High |
| Desk research | Free | Hours | Market size, competitive landscape, regulatory requirements | Low-Medium |
| Customer interviews (5-15) | Low | Days | Desirability assumptions, understanding motivations | Medium |
| Survey (50+ responses) | Low | Days | Quantifying demand, willingness to pay, preferences | Medium |
| Smoke test / Fake door | Low | Days | Measuring real demand for a feature that does not yet exist | High |
| Landing page test | Low | Days-Week | Conversion intent with ad spend driving traffic | High |
| Concierge MVP | Medium | Weeks | Delivering the solution manually to validate value and WTP | High |
| Wizard of Oz | Medium | Weeks | Simulating automation with humans behind the scenes | High |
| Prototype test | Medium | Weeks | Usability assumptions with clickable prototype and real users | High |
| Technical spike | High | Weeks | Feasibility requiring actual code to validate | High |
| Pilot / Beta | High | Months | Full solution validation with limited user group | Very High |
| A/B test (production) | High | Weeks-Months | Measuring causal impact of a change at scale | Very High |
Experiment Selection Decision Tree
- Can existing data answer this? If yes, mine the data first.
- Is this a desirability question? Start with interviews (5-8 users), then validate with a smoke test.
- Is this a usability question? Build a prototype and test with 5 users.
- Is this a viability question? Test pricing with a landing page or concierge MVP.
- Is this a feasibility question? Run a technical spike.
- Is the risk high enough to justify investment? If not, accept the assumption and monitor.
Sample Size Guidance
| Method | Minimum Sample | Why |
|---|---|---|
| Qualitative interviews | 5-8 per segment | Saturation reached at 12-13; 5 users reveal ~85% of usability issues (Nielsen) |
| Surveys | 50-200 responses | Needed for statistical confidence on proportions |
| Smoke tests | 100-500 exposures | Need enough traffic for meaningful click-through signal |
| A/B tests | Calculate per MDE | Use power calculation based on baseline rate and minimum detectable effect |
Step 5: Strategyzer Test Card and Learning Card
Test Card (Before the Experiment)
Document before running any experiment:
| Field | Content |
|---|---|
| Hypothesis | We believe [assumption] |
| Test | To verify, we will [describe the experiment] |
| Metric | We will measure [specific metric] |
| Success criteria | We are right if [threshold] |
| Failure criteria | We are wrong if [threshold] |
| Timeline | This test will take [duration] |
| Cost | [Resources, budget, people needed] |
Critical: Define success and failure criteria BEFORE running the test. This prevents post-hoc rationalization.
Learning Card (After the Experiment)
Document after each experiment:
| Field | Content |
|---|---|
| Hypothesis tested | What we set out to validate |
| Observation | What actually happened, with data |
| Learnings | Specific insight gained |
| Decision | Pivot / Persevere / Stop / Run follow-up test |
| Confidence change | How much did our certainty increase? |
Step 6: Pretotyping (Alberto Savoia)
Pretotyping answers "Should we build this?" before prototyping answers "Can we build this?"
XYZ Hypothesis Format
"At least [X percent] of [Y target segment] will [Z measurable action]."
Example: "At least 20% of packaged sushi buyers will buy second-day sushi if it is half the price of fresh sushi."
This format forces specificity: who, how many, what behavior. No vague "users will like it."
Pretotype Techniques
| Technique | How It Works | Example |
|---|---|---|
| Mechanical Turk | Replace complex automation with humans doing the work manually; users do not know | Zappos founders manually buying and shipping shoes before building e-commerce |
| Pinocchio | Non-functional physical or visual representation to gauge reaction | Jeff Hawkins carrying a wooden block sized like a Palm Pilot |
| Fake Door | Button, menu item, or link for a feature that does not exist; measure clicks | "Coming soon" feature in navigation; track click-through rate |
| One-Night Stand | Temporary, bare-minimum version of the experience | Pop-up shop, one-day event, manual email sequence |
| Re-Label | Take an existing product and repackage/reposition it | Rename an existing feature and test if new framing increases adoption |
| Infiltrator | Place your product idea inside an existing platform to test demand | List a product on a marketplace before building it |
Pretotype Success Metrics
Do not rely on stated intent ("Would you use this?"). Measure actual behavior:
| Signal | Benchmark | Strength |
|---|---|---|
| Click-through on fake door | 2-5% indicates real interest | Medium |
| Email signup for waitlist | 5-10% of landing page visitors | Medium-High |
| Pre-order or deposit | Any meaningful conversion | Very High (strongest signal) |
| Time spent engaging | Above baseline for similar content | Medium |
| Repeat behavior | Users return without prompting | High |
Step 7: Common Validation Mistakes
| Mistake | What Goes Wrong | Prevention |
|---|---|---|
| Confirmation bias | Testing to validate rather than to learn; cherry-picking positive results | Pre-commit to success/failure criteria; have a skeptic review the test design |
| Leading questions | Interview questions that suggest the desired answer | Use open-ended, neutral questions; have someone else review your script |
| Survivorship bias | Only interviewing current users (not churned or non-adopters) | Include churned users and people who evaluated but did not adopt |
| False positives from smoke tests | Curiosity clicks mistaken for purchase intent | Measure depth of engagement, not just clicks; require multi-step commitment |
| Stated vs. revealed preference | People say they would use it but behavior says otherwise | Always measure behavior over stated intent |
| Testing safe assumptions | Validating what you already know instead of what is risky | Use the assumption scoring matrix; test high-impact, low-certainty first |
| Insufficient sample | Drawing conclusions from 2-3 data points | Follow sample size guidance per method |
| Frankenstein MVP | Assembling bits from various sources without coherent experience | Ensure the test artifact delivers a coherent value proposition, even if minimal |
| Moving goalposts | Changing success criteria after seeing results | Write Test Cards before experiments; do not change criteria post-hoc |
Step 8: Learning Velocity and Experiment Portfolio
Learning Velocity
Learning velocity measures how fast the team converts uncertainty into validated knowledge.
| Metric | What It Measures | Healthy Range |
|---|---|---|
| Assumptions tested per sprint | Rate of learning | 1-2 per sprint |
| Learning-to-experiment ratio | Quality of experiments | 1:1 (each experiment produces a usable learning) |
| Time from hypothesis to decision | Cycle time | Days to weeks, not months |
| Percentage of solutions discarded based on evidence | Willingness to learn vs. confirm | 30-50% (teams that never discard are not testing risky assumptions) |
Portfolio Approach to Experiments
Run a portfolio of experiments simultaneously across different risk levels:
| Risk Level | Experiment Investment | Expected Outcome |
|---|---|---|
| High risk, high reward | 20% of experiment budget | Some will fail; the ones that succeed create breakthrough value |
| Medium risk | 50% of experiment budget | Most will produce useful learning |
| Low risk (optimization) | 30% of experiment budget | Incremental improvements on validated features |
Key insight: Learning velocity (converting R&D into validated knowledge) matters more than experiment velocity (number of experiments run). A team running 10 inconclusive experiments per sprint is less effective than a team running 2 decisive ones.
Step 9: Decision Framework
After running experiments, make an explicit decision:
| Evidence Pattern | Decision | Next Step |
|---|---|---|
| Core desirability validated, feasibility confirmed | Build | Move to full product development |
| Desirability validated but usability concerns | Iterate on design | Run prototype tests before building |
| Desirability strong, viability uncertain | Test pricing | Conduct willingness-to-pay research |
| Core desirability assumption invalidated | Pivot | Return to opportunity discovery; reframe the problem |
| Multiple assumptions invalidated | Kill | Stop investing; document learnings for future reference |
| Mixed signals, insufficient data | Run follow-up test | Design higher-fidelity experiment targeting the ambiguous assumption |
Kill Criteria
Define kill criteria before starting experiments:
"We will abandon this idea if ANY of the following are true after testing:
- Fewer than [X%] of target users express the problem
- Willingness to pay is below [$Y]
- Technical feasibility requires more than [Z months] of engineering
- [Specific competitive barrier] cannot be overcome"
Build Trigger
"We will commit to full development when ALL of the following are validated:
- Desirability: [Specific evidence threshold]
- Usability: [Specific evidence threshold]
- Viability: [Specific evidence threshold]
- Feasibility: [Specific evidence threshold]"
Step 10: Integration with Continuous Discovery
Assumption testing is most effective embedded in a continuous discovery cadence:
- Weekly customer interviews surface new assumptions and validate existing ones
- Opportunity Solution Trees connect assumptions to specific opportunities and solutions
- Assumption backlogs are maintained and groomed alongside the product backlog
- Test velocity becomes a team metric: how many assumptions are we validating per sprint?
- Learning logs accumulate institutional knowledge about what works and what does not
Output Template
ASSUMPTION MAP
Solution/Idea: [Description]
Desired Outcome: [What business goal this supports]
Date: [When this analysis was created]
ASSUMPTION INVENTORY
| ID | Assumption | Category | Impact (1-5) | Certainty (1-5) | Risk Score | Priority |
|----|-----------|----------|-------------|-----------------|------------|----------|
| A1 | [Text] | [D/U/V/F]| X | X | X | Test Now / Monitor / Defer / Ignore |
| A2 | ... | | | | | |
EVIDENCE INVENTORY
| Assumption | Current Evidence | Evidence Level (1-5) | Gaps |
|-----------|-----------------|---------------------|------|
| A1 | [What we know] | [Level] | [What we need] |
LEAP-OF-FAITH ASSUMPTIONS (Top 3 to Test)
A[N]: [Assumption]
- Category: [Desirability / Usability / Viability / Feasibility]
- Why it is risky: [Impact explanation + lack of evidence]
- Cheapest valid test: [Test type + detailed description]
- Test Card:
- Hypothesis: We believe [statement]
- Test: To verify, we will [method]
- Metric: We will measure [specific metric]
- Success criteria: [Threshold that means proceed]
- Failure criteria: [Threshold that means pivot]
- Timeline: [Duration]
- Cost: [Resources needed]
- Sample size: [N and rationale]
RECOMMENDED TESTING SEQUENCE
[Ordered list: which assumptions to test first and why, including dependencies]
KILL CRITERIA
[What combined evidence pattern would cause us to abandon this idea entirely]
BUILD TRIGGER
[What combined evidence across D/U/V/F would give confidence to commit to full development]
LEARNING LOG (updated after each experiment)
| Date | Assumption | Method | Result | Confidence Change | Decision |
|------|-----------|--------|--------|-------------------|----------|
| [Date] | [A1] | [Method] | [Finding] | [+/-] | [Proceed / Pivot / Kill] |
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.