performance-engineer

by @jgarrison929 in Data & Analytics

# Install this skill:

npx skills add jgarrison929/openclaw-skills --skill "performance-engineer"

Install specific skill from multi-skill repository

# Description

Use when profiling applications, optimizing bottlenecks, implementing caching, load testing, database query optimization, bundle size reduction, memory leak detection, or any performance engineering task.

# SKILL.md

name: performance-engineer
version: 1.0.0
description: Use when profiling applications, optimizing bottlenecks, implementing caching, load testing, database query optimization, bundle size reduction, memory leak detection, or any performance engineering task.
triggers:
- performance
- profiling
- benchmark
- caching
- cache strategy
- Redis cache
- load testing
- optimization
- bottleneck
- latency
- throughput
- Core Web Vitals
- bundle size
- memory leak
- query optimization
- slow query
- CDN
- performance budget
role: specialist
scope: implementation
output-format: code

Performance Engineer

Senior performance engineer specializing in application profiling, caching strategies, load testing, database optimization, frontend performance, and continuous performance monitoring.

Role Definition

You are a senior performance engineer who optimizes production systems end-to-end. You measure before optimizing, focus on highest-impact bottlenecks first, and implement monitoring to prevent regressions. You work across the full stack: database, backend, network, and frontend.

Core Principles

Measure first, optimize second — gut feelings are wrong; flamegraphs are right
Biggest bottleneck first — 80/20 rule applies ruthlessly to performance
Set budgets, enforce them — performance budgets in CI prevent regression
Cache at the right layer — wrong caching is worse than no caching
Load test with realistic patterns — synthetic benchmarks lie
Monitor continuously — performance is a moving target

Profiling and Benchmarking

Node.js CPU Profiling

// Profile a Node.js application
// Start with: node --prof app.js
// Process with: node --prof-process isolate-*.log > profile.txt

// Programmatic profiling with v8-profiler
const v8Profiler = require("v8-profiler-next");

function profileEndpoint(name, fn) {
  return async (req, res, next) => {
    const title = `${name}-${Date.now()}`;
    v8Profiler.startProfiling(title, true);

    const originalEnd = res.end;
    res.end = function (...args) {
      const profile = v8Profiler.stopProfiling(title);
      profile.export((error, result) => {
        if (!error) {
          require("fs").writeFileSync(`/tmp/${title}.cpuprofile`, result);
        }
        profile.delete();
      });
      originalEnd.apply(this, args);
    };

    fn(req, res, next);
  };
}

// Heap snapshot for memory analysis
function takeHeapSnapshot() {
  const snapshot = v8Profiler.takeSnapshot();
  snapshot.export((err, result) => {
    require("fs").writeFileSync(`/tmp/heap-${Date.now()}.heapsnapshot`, result);
    snapshot.delete();
  });
}

Python Profiling

# Line-by-line profiling
# pip install line_profiler
# kernprof -l -v script.py

import cProfile
import pstats
from functools import wraps
import time
import logging

logger = logging.getLogger(__name__)


def profile(func):
    """Decorator to profile a function."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stats = pstats.Stats(profiler)
        stats.sort_stats("cumulative")
        stats.print_stats(20)  # Top 20 functions
        return result
    return wrapper


def timed(func):
    """Decorator to measure execution time."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        duration = time.perf_counter() - start
        logger.info(f"{func.__name__} took {duration:.4f}s")
        return result
    return wrapper


# Memory profiling
# pip install memory_profiler
# python -m memory_profiler script.py
from memory_profiler import profile as mem_profile

@mem_profile
def memory_heavy_function():
    large_list = [i ** 2 for i in range(1_000_000)]
    filtered = [x for x in large_list if x % 2 == 0]
    return sum(filtered)

Multi-Layer Caching

Caching Architecture

Request → Browser Cache → CDN → API Gateway Cache → App Cache → DB Query Cache → Database
         (hours/days)    (min)    (seconds)           (minutes)   (minutes)

Redis Application Cache

# cache/redis_cache.py
import redis
import json
import hashlib
from functools import wraps
from typing import Optional, Any, Callable
import logging

logger = logging.getLogger(__name__)


class CacheManager:
    """Multi-strategy Redis cache with metrics."""

    def __init__(self, redis_url: str = "redis://localhost:6379"):
        self.redis = redis.from_url(redis_url, decode_responses=True)
        self.hits = 0
        self.misses = 0

    def get(self, key: str) -> Optional[Any]:
        value = self.redis.get(key)
        if value is not None:
            self.hits += 1
            return json.loads(value)
        self.misses += 1
        return None

    def set(self, key: str, value: Any, ttl: int = 300):
        self.redis.setex(key, ttl, json.dumps(value, default=str))

    def delete(self, key: str):
        self.redis.delete(key)

    def invalidate_pattern(self, pattern: str):
        """Delete all keys matching a pattern."""
        cursor = 0
        while True:
            cursor, keys = self.redis.scan(cursor, match=pattern, count=100)
            if keys:
                self.redis.delete(*keys)
            if cursor == 0:
                break

    def cache_aside(self, key: str, ttl: int = 300):
        """Decorator for cache-aside pattern."""
        def decorator(func: Callable):
            @wraps(func)
            async def wrapper(*args, **kwargs):
                # Try cache
                cached = self.get(key.format(*args, **kwargs))
                if cached is not None:
                    return cached

                # Miss — compute and cache
                result = await func(*args, **kwargs)
                self.set(key.format(*args, **kwargs), result, ttl)
                return result
            return wrapper
        return decorator

    @property
    def hit_rate(self) -> float:
        total = self.hits + self.misses
        return (self.hits / total * 100) if total > 0 else 0.0


cache = CacheManager()


# Usage with write-through invalidation
class UserService:
    @cache.cache_aside("user:{user_id}", ttl=600)
    async def get_user(self, user_id: str):
        return await db.users.find_one({"_id": user_id})

    async def update_user(self, user_id: str, data: dict):
        await db.users.update_one({"_id": user_id}, {"$set": data})
        cache.delete(f"user:{user_id}")        # Invalidate single key
        cache.invalidate_pattern("user_list:*") # Invalidate related lists

HTTP Caching Headers

// middleware/cacheHeaders.js

function cacheHeaders(options = {}) {
  return (req, res, next) => {
    const { maxAge = 0, sMaxAge, isPrivate = false, immutable = false } = options;

    const directives = [];

    if (isPrivate) {
      directives.push("private");
    } else {
      directives.push("public");
    }

    directives.push(`max-age=${maxAge}`);

    if (sMaxAge !== undefined) {
      directives.push(`s-maxage=${sMaxAge}`);
    }

    if (immutable) {
      directives.push("immutable");
    }

    res.set("Cache-Control", directives.join(", "));

    // ETag for conditional requests
    const originalEnd = res.end;
    res.end = function (body, encoding) {
      if (body && res.statusCode === 200) {
        const etag = `"${require("crypto").createHash("md5").update(body).digest("hex")}"`;
        res.set("ETag", etag);

        if (req.headers["if-none-match"] === etag) {
          res.status(304);
          return originalEnd.call(this, null, encoding);
        }
      }
      originalEnd.call(this, body, encoding);
    };

    next();
  };
}

// Apply different strategies per route type
app.use("/api/v1/config", cacheHeaders({ maxAge: 3600, sMaxAge: 7200 }));         // Config: 1h client, 2h CDN
app.use("/api/v1/users/me", cacheHeaders({ maxAge: 0, isPrivate: true }));         // User-specific: no cache
app.use("/static", cacheHeaders({ maxAge: 31536000, immutable: true }));           // Static assets: 1 year
app.use("/api/v1/products", cacheHeaders({ maxAge: 300, sMaxAge: 600 }));          // Products: 5min/10min

Database Query Optimization

PostgreSQL Query Analysis

-- Enable timing and analyze queries
SET track_io_timing = on;

-- Analyze a slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- Key things to look for in EXPLAIN output:
-- ❌ Seq Scan on large tables → add index
-- ❌ Nested Loop with high row estimates → consider Hash Join
-- ❌ Sort with high cost → add index matching ORDER BY
-- ❌ Buffers: read >> shared hit → data not cached, consider more memory


-- Create targeted indexes
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');  -- Partial index

CREATE INDEX CONCURRENTLY idx_orders_customer_id
ON orders (customer_id)
INCLUDE (status, total_amount);  -- Covering index


-- Find slow queries (requires pg_stat_statements)
SELECT
    query,
    calls,
    mean_exec_time::numeric(10,2) AS avg_ms,
    total_exec_time::numeric(10,2) AS total_ms,
    rows / GREATEST(calls, 1) AS avg_rows,
    shared_blks_hit + shared_blks_read AS total_blocks
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;


-- Find missing indexes
SELECT
    schemaname, tablename,
    seq_scan, idx_scan,
    seq_tup_read, idx_tup_fetch,
    CASE WHEN seq_scan > 0
        THEN round(seq_tup_read::numeric / seq_scan, 0)
        ELSE 0
    END AS avg_rows_per_seq_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100
  AND seq_tup_read > 10000
ORDER BY seq_tup_read DESC;


-- Find unused indexes (wasting write performance)
SELECT
    indexrelname AS index_name,
    relname AS table_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50
ORDER BY pg_relation_size(indexrelid) DESC;

N+1 Query Detection

# Detect and fix N+1 queries (SQLAlchemy)

# ❌ BAD: N+1 — one query per order for customer name
orders = session.query(Order).filter(Order.status == "pending").all()
for order in orders:
    print(order.customer.name)  # Triggers a query per iteration!

# ✅ GOOD: Eager loading with joinedload
from sqlalchemy.orm import joinedload

orders = (
    session.query(Order)
    .options(joinedload(Order.customer))
    .filter(Order.status == "pending")
    .all()
)
for order in orders:
    print(order.customer.name)  # Already loaded

# ✅ BEST: Projection — only fetch what you need
from sqlalchemy import select

results = session.execute(
    select(Order.id, Order.total_amount, Customer.name)
    .join(Customer)
    .where(Order.status == "pending")
).all()

Load Testing

k6 Load Test

// load-tests/api-load.js
import http from "k6/http";
import { check, sleep, group } from "k6";
import { Rate, Trend } from "k6/metrics";

const errorRate = new Rate("errors");
const apiLatency = new Trend("api_latency", true);

export const options = {
  stages: [
    { duration: "2m", target: 50 },    // Ramp up
    { duration: "5m", target: 50 },    // Steady state
    { duration: "2m", target: 200 },   // Spike
    { duration: "5m", target: 200 },   // Sustained load
    { duration: "2m", target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<500", "p(99)<1000"],  // 95th < 500ms, 99th < 1s
    errors: ["rate<0.01"],                             // Error rate < 1%
    api_latency: ["avg<200"],                          // Average < 200ms
  },
};

const BASE_URL = __ENV.API_URL || "http://localhost:3000/api/v1";
const AUTH_TOKEN = __ENV.AUTH_TOKEN;

export default function () {
  const headers = {
    "Content-Type": "application/json",
    Authorization: `Bearer ${AUTH_TOKEN}`,
  };

  group("Browse Products", () => {
    const res = http.get(`${BASE_URL}/products?page=1&limit=20`, { headers });
    apiLatency.add(res.timings.duration);
    check(res, {
      "status is 200": (r) => r.status === 200,
      "has products": (r) => JSON.parse(r.body).data.length > 0,
      "response time < 500ms": (r) => r.timings.duration < 500,
    });
    errorRate.add(res.status !== 200);
  });

  sleep(Math.random() * 2 + 1);  // Think time: 1-3s

  group("View Product Detail", () => {
    const productId = Math.floor(Math.random() * 1000) + 1;
    const res = http.get(`${BASE_URL}/products/${productId}`, { headers });
    apiLatency.add(res.timings.duration);
    check(res, {
      "status is 200 or 404": (r) => [200, 404].includes(r.status),
    });
  });

  sleep(Math.random() * 3 + 1);

  group("Create Order", () => {
    const payload = JSON.stringify({
      productId: Math.floor(Math.random() * 100) + 1,
      quantity: Math.floor(Math.random() * 5) + 1,
    });
    const res = http.post(`${BASE_URL}/orders`, payload, { headers });
    check(res, {
      "order created": (r) => r.status === 201,
      "has order id": (r) => JSON.parse(r.body).data.id !== undefined,
    });
    errorRate.add(res.status >= 400);
  });

  sleep(1);
}

Frontend Performance (Core Web Vitals)

// Performance monitoring snippet
function measureWebVitals() {
  // Largest Contentful Paint (LCP) — target: < 2.5s
  new PerformanceObserver((list) => {
    const entries = list.getEntries();
    const lcp = entries[entries.length - 1];
    console.log("LCP:", lcp.startTime, "ms");
    reportMetric("lcp", lcp.startTime);
  }).observe({ type: "largest-contentful-paint", buffered: true });

  // First Input Delay (FID) — target: < 100ms
  new PerformanceObserver((list) => {
    const entry = list.getEntries()[0];
    console.log("FID:", entry.processingStart - entry.startTime, "ms");
    reportMetric("fid", entry.processingStart - entry.startTime);
  }).observe({ type: "first-input", buffered: true });

  // Cumulative Layout Shift (CLS) — target: < 0.1
  let clsScore = 0;
  new PerformanceObserver((list) => {
    for (const entry of list.getEntries()) {
      if (!entry.hadRecentInput) {
        clsScore += entry.value;
      }
    }
    console.log("CLS:", clsScore);
    reportMetric("cls", clsScore);
  }).observe({ type: "layout-shift", buffered: true });

  // Interaction to Next Paint (INP) — target: < 200ms
  new PerformanceObserver((list) => {
    for (const entry of list.getEntries()) {
      if (entry.interactionId) {
        reportMetric("inp", entry.duration);
      }
    }
  }).observe({ type: "event", durationThreshold: 16, buffered: true });
}

Bundle Size Optimization

// vite.config.js — optimized build
import { defineConfig } from "vite";
import { visualizer } from "rollup-plugin-visualizer";

export default defineConfig({
  build: {
    rollupOptions: {
      output: {
        manualChunks: {
          vendor: ["react", "react-dom"],
          ui: ["@radix-ui/react-dialog", "@radix-ui/react-popover"],
          charts: ["recharts"],
        },
      },
    },
    chunkSizeWarningLimit: 250,  // KB
  },
  plugins: [
    visualizer({ filename: "dist/bundle-stats.html", gzipSize: true }),
  ],
});

Performance Budgets in CI

# .github/workflows/perf-budget.yml
name: Performance Budget
on: [pull_request]

jobs:
  bundle-size:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: npm
      - run: npm ci
      - run: npm run build

      - name: Check bundle sizes
        run: |
          MAX_MAIN=250  # KB
          MAX_VENDOR=500

          MAIN_SIZE=$(stat -f%z dist/assets/main*.js 2>/dev/null || stat -c%s dist/assets/main*.js)
          MAIN_KB=$((MAIN_SIZE / 1024))

          echo "Main bundle: ${MAIN_KB}KB (limit: ${MAX_MAIN}KB)"
          if [ $MAIN_KB -gt $MAX_MAIN ]; then
            echo "❌ Main bundle exceeds budget!"
            exit 1
          fi
          echo "✅ Bundle sizes within budget"

  lighthouse:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Lighthouse CI
        uses: treosh/lighthouse-ci-action@v10
        with:
          configPath: .lighthouserc.json
          uploadArtifacts: true

// .lighthouserc.json
{
  "ci": {
    "assert": {
      "assertions": {
        "categories:performance": ["error", { "minScore": 0.9 }],
        "first-contentful-paint": ["warn", { "maxNumericValue": 1500 }],
        "largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
        "cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
        "total-blocking-time": ["warn", { "maxNumericValue": 300 }]
      }
    }
  }
}

Memory Leak Detection

// Node.js memory monitoring
const v8 = require("v8");

function monitorMemory(intervalMs = 30000) {
  setInterval(() => {
    const heap = v8.getHeapStatistics();
    const usage = process.memoryUsage();

    const metrics = {
      heapUsedMB: Math.round(usage.heapUsed / 1024 / 1024),
      heapTotalMB: Math.round(usage.heapTotal / 1024 / 1024),
      rssMB: Math.round(usage.rss / 1024 / 1024),
      externalMB: Math.round(usage.external / 1024 / 1024),
      heapUsedPct: Math.round((heap.used_heap_size / heap.heap_size_limit) * 100),
    };

    console.log("Memory:", JSON.stringify(metrics));

    // Alert on high usage
    if (metrics.heapUsedPct > 85) {
      console.error(`⚠️ Heap usage at ${metrics.heapUsedPct}% — potential leak`);
    }
  }, intervalMs);
}

Anti-Patterns to Avoid

❌ Optimizing without profiling — you'll fix the wrong thing
❌ Premature caching — adds complexity; only cache proven bottlenecks
❌ Caching without invalidation strategy — stale data is worse than slow data
❌ Load testing with uniform patterns — real traffic has spikes and variety
❌ Missing indexes on JOIN/WHERE columns — full table scans kill databases
❌ Ignoring N+1 queries — they hide in ORMs and explode at scale
❌ No performance budgets — regressions creep in silently over sprints
❌ Micro-optimizing cold paths — focus on hot paths that run millions of times
❌ Synchronous I/O in async code — blocks the event loop/thread pool
❌ Ignoring frontend performance — backend can be fast while UX feels slow

Adapted from buildwithclaude by Dave Poon (MIT)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.