williamzujkowski

Load Testing Scenario Designer

3
0
# Install this skill:
npx skills add williamzujkowski/cognitive-toolworks --skill "Load Testing Scenario Designer"

Install specific skill from multi-skill repository

# Description

How to execute the test with prerequisites and validation steps

# SKILL.md


name: "Load Testing Scenario Designer"
slug: testing-load-designer
description: "Design load testing scenarios using k6, JMeter, Gatling, or Locust with ramp-up patterns, think time modeling, and performance SLI validation."
capabilities:
- generate_load_test_scripts
- design_ramp_up_patterns
- model_think_time
- validate_performance_sli
- configure_distributed_load
inputs:
target_service:
type: string
description: "URL or endpoint to test (e.g., https://api.example.com/checkout)"
required: true
test_type:
type: enum
description: "Type of load test to design"
enum: ["load", "stress", "spike", "soak"]
required: true
sli_requirements:
type: object
description: "Performance SLI thresholds (p95_latency_ms, throughput_rps, error_rate_percent)"
required: true
tool:
type: enum
description: "Load testing tool to generate script for"
enum: ["k6", "jmeter", "gatling", "locust"]
required: false
default: "k6"
scenario_details:
type: object
description: "User count, duration, ramp-up time, think time distribution"
required: false
outputs:
test_script:
type: code
description: "Executable load test script for specified tool"
test_config:
type: json
description: "Test configuration with VUs, duration, ramp-up, stages"
assertions:
type: json
description: "SLI validation thresholds and success criteria"
execution_plan:
type: markdown
description: "How to execute the test with prerequisites and validation steps"
keywords:
- load-testing
- performance-testing
- k6
- jmeter
- gatling
- locust
- sli-validation
- ramp-up
- stress-testing
- spike-testing
- soak-testing
- performance-engineering
version: 1.0.0
owner: william@cognitive-toolworks
license: MIT
security:
pii: false
secrets: false
sandbox: recommended
links:
- https://k6.io/docs/
- https://grafana.com/docs/k6/latest/using-k6/
- https://gatling.io/docs/
- https://jmeter.apache.org/usermanual/
- https://docs.locust.io/
- https://sre.google/sre-book/monitoring-distributed-systems/


Purpose & When-To-Use

Trigger conditions:

  • Validating application performance before production deployment
  • Establishing performance baselines and capacity planning
  • Testing system behavior under peak load, stress, or spike conditions
  • Validating SLI/SLO compliance for latency, throughput, and error rates
  • Simulating realistic user behavior with ramp-up and think time
  • Testing distributed system resilience under sustained load (soak testing)

Use this skill when you need to design realistic, repeatable load testing scenarios with clear performance thresholds, appropriate ramp-up patterns, and tool-specific implementations for k6, JMeter, Gatling, or Locust.


Pre-Checks

Before execution, verify:

  1. Time normalization: NOW_ET = 2025-10-26T02:31:21-04:00 (NIST/time.gov semantics, America/New_York)
  2. Input schema validation:
  3. target_service is a valid URL with protocol (http/https)
  4. test_type is one of: load, stress, spike, soak
  5. sli_requirements contains numeric values for at least one metric
  6. tool (if provided) is one of: k6, jmeter, gatling, locust
  7. scenario_details (if provided) has valid numeric ranges
  8. Source freshness: All cited sources accessed on NOW_ET; verify links resolve
  9. Tool compatibility: Confirm target service is accessible and testable

Abort conditions:

  • Target service URL is unreachable or requires complex authentication not specified
  • SLI requirements are contradictory (e.g., "10ms p95 latency" for external API)
  • Test type and scenario details conflict (e.g., "spike test" with gradual ramp-up)
  • Tool selection is incompatible with test requirements (e.g., complex distributed scenarios in basic Locust setup)

Procedure

T1: Fast Path (≤2k tokens)

Goal: Generate basic load test script with simple ramp-up and assertions.

  1. Parse inputs and apply defaults:
  2. Determine tool (default: k6)
  3. Extract test type and map to pattern:
    • load: Gradual ramp-up to target VUs, sustain, ramp-down
    • stress: Gradual ramp-up beyond capacity to find breaking point
    • spike: Rapid jump to high VUs, sustain briefly, drop
    • soak: Low/moderate VUs sustained for extended duration
  4. Parse SLI requirements (p95_latency_ms, throughput_rps, error_rate_percent)

  5. Generate basic test script (k6 example per k6 docs):
    ```javascript
    import http from 'k6/http';
    import { check, sleep } from 'k6';

export const options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp-up
{ duration: '5m', target: 100 }, // Sustain
{ duration: '2m', target: 0 }, // Ramp-down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% <500ms
http_req_failed: ['rate<0.01'], // <1% errors
},
};

export default function () {
const res = http.get('https://api.example.com/checkout');
check(res, { 'status 200': (r) => r.status === 200 });
sleep(1); // Think time
}
```

  1. Output initial configuration:
    json { "test_config": { "tool": "k6", "virtual_users": 100, "duration_minutes": 9, "ramp_up_minutes": 2, "think_time_seconds": 1 }, "assertions": { "p95_latency_ms": 500, "error_rate_percent": 1 } }

Token budget: ≤2k tokens


T2: Extended Analysis (≤6k tokens)

Goal: Generate realistic scenarios with advanced patterns, distributed load, and comprehensive assertions.

  1. Design realistic ramp-up pattern based on test type:
  2. Load test (per k6 Load Testing):
    • Gradual ramp-up: 0 → target VUs over 10-20% of total test time
    • Sustain at target: 60-70% of total test time
    • Gradual ramp-down: 10-20% of total test time
  3. Stress test:
    • Multi-stage ramp: 0 → 50% → 75% → 100% → 125% → 150% → find breaking point
    • Shorter sustain periods at each stage (2-3 minutes)
  4. Spike test:
    • Instant jump: 0 → peak VUs in <30 seconds
    • Brief sustain: 1-2 minutes at peak
    • Instant drop: Return to baseline
  5. Soak test:

    • Moderate VUs (50-70% of capacity)
    • Extended duration (2-24 hours)
    • Monitor for memory leaks, degradation
  6. Model think time distribution (per Google SRE Book - Load Testing):

  7. Use realistic user behavior patterns, not uniform sleep()
  8. Apply randomization: sleep(Math.random() * 3 + 1) for 1-4s range
  9. Consider page type: landing (5-10s), checkout (30-60s), browse (2-5s)
  10. Add variance with percentile-based think time (p50: 3s, p90: 10s, p99: 30s)

  11. Map SLI requirements to tool-specific assertions:

  12. k6: Use thresholds object with percentile syntax
  13. JMeter: Configure Assertions (Response Assertion, Duration Assertion)
  14. Gatling: Use assertions DSL with percentile checks
  15. Locust: Custom stats collection and failure conditions

  16. Generate tool-specific advanced script:

  17. Add request tagging/grouping for multi-endpoint scenarios
  18. Include custom metrics (business transactions, funnel completion)
  19. Configure distributed execution parameters if needed
  20. Add data parameterization (CSV for users, JSON for payloads)
  21. Reference JMeter User Manual for JMeter-specific patterns
  22. Reference Gatling Documentation for Gatling DSL
  23. Reference Locust Documentation for Locust class-based tests

Token budget: ≤6k tokens total (including T1)


T3: Deep Dive (≤12k tokens)

Goal: Advanced patterns including distributed load, custom protocols, and comprehensive monitoring integration.

  1. Design distributed load generation:
  2. k6 Cloud/Enterprise: Configure multiple load zones (US-East, US-West, EU-West)
  3. JMeter Distributed: Master-slave configuration with RMI
  4. Gatling Enterprise: Inject distribution across multiple nodes
  5. Locust Distributed: Master-worker architecture with load distribution

  6. Add advanced test patterns:

  7. Breakpoint testing: Incrementally increase load until system breaks
  8. Capacity testing: Find maximum sustainable throughput
  9. Endurance patterns: Multi-day soak with scheduled load variations
  10. Recovery testing: Inject load spikes, measure recovery time

  11. Integrate with observability stack (per Google SRE - Monitoring):

    • Configure Prometheus remote-write for k6 metrics
    • Set up Grafana dashboards for real-time visualization
    • Add CloudWatch/Datadog integration for cloud metrics correlation
    • Configure distributed tracing correlation (OpenTelemetry)
  12. Generate comprehensive execution plan:

    • Pre-test validation: Smoke test, baseline collection
    • Test execution: Monitoring checklist, abort criteria
    • Post-test analysis: Report generation, SLI compliance validation
    • Iterative tuning: Adjust VUs/duration based on results

Token budget: ≤12k tokens total (including T1 + T2)


Decision Rules

Test type selection guidance:

  • Load test: Normal expected traffic + 20-50% headroom
  • Stress test: 2-3x expected peak load to find breaking point
  • Spike test: 5-10x sudden traffic surge (flash sale, DDoS simulation)
  • Soak test: 50-70% capacity sustained 2-24 hours (memory leak detection)

VU calculation (requests per second → virtual users):

VUs = (target_RPS × response_time_seconds) / (1 - think_time_ratio)

Example:
- Target: 1000 RPS
- Response time: 200ms (0.2s)
- Think time: 1s per request
- VUs = (1000 × 0.2) / (1 - 0.83) = 200 / 0.17 ≈ 1176 VUs

Tool selection matrix:

Feature k6 JMeter Gatling Locust
Ease of use High Medium Medium High
Protocol support HTTP/WebSocket/gRPC Any (plugins) HTTP/WebSocket/JMS HTTP/Custom
Distributed Cloud/Enterprise Built-in (RMI) Enterprise Built-in
Scripting JavaScript GUI + Groovy Scala DSL Python
Best for Modern APIs, DevOps Legacy/complex protocols JVM apps, high load Python devs, simple APIs

SLI threshold recommendations (from Google SRE Book):

  • Latency: p50 <100ms, p95 <500ms, p99 <1s (API endpoints)
  • Throughput: Based on capacity planning (RPS per instance × instance count)
  • Error rate: <0.1% (four nines reliability), <1% (three nines)
  • Availability: 99.9% (43.2 min/month downtime), 99.95% (21.6 min/month)

Stop conditions:

  • If target service returns 5xx errors during smoke test: abort and fix service
  • If SLI requirements are unattainable (require <10ms p95 for external API): renegotiate
  • If test script complexity exceeds tool capabilities: recommend tool change

Output Contract

Required fields (all outputs):

interface LoadTestScript {
  tool: "k6" | "jmeter" | "gatling" | "locust";
  script_content: string;          // Executable test script
  script_language: string;         // "javascript", "xml", "scala", "python"
  entry_point: string;             // How to execute (e.g., "k6 run script.js")
}

interface TestConfig {
  tool: string;
  test_type: "load" | "stress" | "spike" | "soak";
  virtual_users: number | object;  // Number or stages array
  duration_minutes: number;
  ramp_up_pattern: Array<{
    stage: number;
    duration_seconds: number;
    target_vus: number;
  }>;
  think_time_config: {
    min_seconds: number;
    max_seconds: number;
    distribution: "uniform" | "normal" | "exponential";
  };
  distributed_config?: {
    enabled: boolean;
    load_zones?: string[];
    workers?: number;
  };
}

interface Assertions {
  latency_thresholds: {
    p50_ms?: number;
    p95_ms: number;
    p99_ms?: number;
  };
  throughput_threshold?: {
    min_rps: number;
  };
  error_rate_threshold: {
    max_percent: number;
  };
  custom_checks?: Array<{
    metric: string;
    operator: "lt" | "lte" | "gt" | "gte" | "eq";
    value: number;
  }>;
}

interface ExecutionPlan {
  prerequisites: string[];         // Required setup steps
  smoke_test_command: string;      // Pre-flight validation
  full_test_command: string;       // Main execution
  monitoring_checklist: string[];  // What to observe during test
  abort_criteria: string[];        // When to stop test early
  success_criteria: string[];      // How to validate results
  report_generation?: string;      // Post-test analysis steps
}

Format:

  • test_script: Valid code for specified tool (JavaScript for k6, XML for JMeter, Scala for Gatling, Python for Locust)
  • test_config: Valid JSON
  • assertions: Valid JSON with numeric values
  • execution_plan: Markdown with code blocks for commands

Validation:

  • Script is syntactically valid for target tool
  • VU counts and durations are positive integers
  • Thresholds are achievable (p95 < p99, error_rate <100%)
  • Think time min < max

Examples

Example 1: k6 E-Commerce Checkout Load Test (T2)

import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
  stages: [
    { duration: '3m', target: 1000 },
    { duration: '10m', target: 1000 },
    { duration: '2m', target: 0 },
  ],
  thresholds: {
    http_req_duration: ['p(95)<800'],
    http_req_failed: ['rate<0.005'],
    http_reqs: ['rate>500'],
  },
};
export default function () {
  const payload = JSON.stringify({
    cart_id: '123',
    payment: 'card'
  });
  const res = http.post(
    'https://api.example.com/checkout',
    payload,
    { headers: { 'Content-Type': 'application/json' } }
  );
  check(res, {
    'status 200': (r) => r.status === 200,
    'checkout success': (r) => r.json('success')
  });
  sleep(Math.random() * 3 + 2);
}

Quality Gates

Token budgets (mandatory):

  • T1 ≤ 2k tokens (basic script + simple assertions)
  • T2 ≤ 6k tokens (realistic scenarios + think time modeling)
  • T3 ≤ 12k tokens (distributed load + monitoring integration)

Safety checks:

  • [ ] No hardcoded credentials or API keys in test scripts
  • [ ] No production data in test payloads (use synthetic/anonymized data)
  • [ ] Load test targets are non-production environments (unless explicitly approved)
  • [ ] Distributed tests include rate limiting to prevent accidental DDoS

Auditability:

  • [ ] All sources cited with access date = NOW_ET
  • [ ] VU calculations include methodology and assumptions
  • [ ] SLI thresholds tied to business requirements or SRE standards
  • [ ] Test results are reproducible with same script + config

Determinism:

  • [ ] Same inputs produce same script structure (±10% VU variance acceptable)
  • [ ] Ramp-up patterns follow documented heuristics
  • [ ] Think time distributions use seeded randomness where possible

Validation checklist:

  • [ ] Script executes without syntax errors
  • [ ] Assertions align with SLI requirements
  • [ ] VU count and duration are realistic for target infrastructure
  • [ ] Think time modeling prevents unrealistic "robot" traffic

Resources

Primary sources (accessed 2025-10-26):

  1. k6 Documentation: https://k6.io/docs/
    Official k6 load testing tool documentation with test lifecycle, scripting, and thresholds.

  2. k6 Using Guide: https://grafana.com/docs/k6/latest/using-k6/
    Comprehensive guide on test types, scenarios, executors, and distributed testing with k6.

  3. Gatling Documentation: https://gatling.io/docs/
    Gatling load testing framework docs covering Scala DSL, simulation design, and reports.

  4. JMeter User Manual: https://jmeter.apache.org/usermanual/
    Apache JMeter user manual with test plan creation, distributed testing, and protocols.

  5. Locust Documentation: https://docs.locust.io/
    Locust Python-based load testing framework docs with distributed mode and custom tasks.

  6. Google SRE Book - Monitoring Distributed Systems: https://sre.google/sre-book/monitoring-distributed-systems/
    Google SRE principles for SLI/SLO definition, load testing strategies, and performance validation.

Additional templates:

  • See examples/load-test-example.js for complete k6 workflow example
  • See resources/jmeter-template.jmx for JMeter test plan template
  • See resources/gatling-template.scala for Gatling simulation template

Related skills:

  • observability-slo-calculator (for defining SLI/SLO before load testing)
  • testing-chaos-designer (for resilience testing under load)
  • observability-stack-configurator (for monitoring during load tests)

End of SKILL.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.