Manage Apple Reminders via the `remindctl` CLI on macOS (list, add, edit, complete, delete)....
npx skills add jgarrison929/openclaw-skills --skill "performance-engineer"
Install specific skill from multi-skill repository
# Description
Use when profiling applications, optimizing bottlenecks, implementing caching, load testing, database query optimization, bundle size reduction, memory leak detection, or any performance engineering task.
# SKILL.md
name: performance-engineer
version: 1.0.0
description: Use when profiling applications, optimizing bottlenecks, implementing caching, load testing, database query optimization, bundle size reduction, memory leak detection, or any performance engineering task.
triggers:
- performance
- profiling
- benchmark
- caching
- cache strategy
- Redis cache
- load testing
- optimization
- bottleneck
- latency
- throughput
- Core Web Vitals
- bundle size
- memory leak
- query optimization
- slow query
- CDN
- performance budget
role: specialist
scope: implementation
output-format: code
Performance Engineer
Senior performance engineer specializing in application profiling, caching strategies, load testing, database optimization, frontend performance, and continuous performance monitoring.
Role Definition
You are a senior performance engineer who optimizes production systems end-to-end. You measure before optimizing, focus on highest-impact bottlenecks first, and implement monitoring to prevent regressions. You work across the full stack: database, backend, network, and frontend.
Core Principles
- Measure first, optimize second β gut feelings are wrong; flamegraphs are right
- Biggest bottleneck first β 80/20 rule applies ruthlessly to performance
- Set budgets, enforce them β performance budgets in CI prevent regression
- Cache at the right layer β wrong caching is worse than no caching
- Load test with realistic patterns β synthetic benchmarks lie
- Monitor continuously β performance is a moving target
Profiling and Benchmarking
Node.js CPU Profiling
// Profile a Node.js application
// Start with: node --prof app.js
// Process with: node --prof-process isolate-*.log > profile.txt
// Programmatic profiling with v8-profiler
const v8Profiler = require("v8-profiler-next");
function profileEndpoint(name, fn) {
return async (req, res, next) => {
const title = `${name}-${Date.now()}`;
v8Profiler.startProfiling(title, true);
const originalEnd = res.end;
res.end = function (...args) {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require("fs").writeFileSync(`/tmp/${title}.cpuprofile`, result);
}
profile.delete();
});
originalEnd.apply(this, args);
};
fn(req, res, next);
};
}
// Heap snapshot for memory analysis
function takeHeapSnapshot() {
const snapshot = v8Profiler.takeSnapshot();
snapshot.export((err, result) => {
require("fs").writeFileSync(`/tmp/heap-${Date.now()}.heapsnapshot`, result);
snapshot.delete();
});
}
Python Profiling
# Line-by-line profiling
# pip install line_profiler
# kernprof -l -v script.py
import cProfile
import pstats
from functools import wraps
import time
import logging
logger = logging.getLogger(__name__)
def profile(func):
"""Decorator to profile a function."""
@wraps(func)
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats("cumulative")
stats.print_stats(20) # Top 20 functions
return result
return wrapper
def timed(func):
"""Decorator to measure execution time."""
@wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
duration = time.perf_counter() - start
logger.info(f"{func.__name__} took {duration:.4f}s")
return result
return wrapper
# Memory profiling
# pip install memory_profiler
# python -m memory_profiler script.py
from memory_profiler import profile as mem_profile
@mem_profile
def memory_heavy_function():
large_list = [i ** 2 for i in range(1_000_000)]
filtered = [x for x in large_list if x % 2 == 0]
return sum(filtered)
Multi-Layer Caching
Caching Architecture
Request β Browser Cache β CDN β API Gateway Cache β App Cache β DB Query Cache β Database
(hours/days) (min) (seconds) (minutes) (minutes)
Redis Application Cache
# cache/redis_cache.py
import redis
import json
import hashlib
from functools import wraps
from typing import Optional, Any, Callable
import logging
logger = logging.getLogger(__name__)
class CacheManager:
"""Multi-strategy Redis cache with metrics."""
def __init__(self, redis_url: str = "redis://localhost:6379"):
self.redis = redis.from_url(redis_url, decode_responses=True)
self.hits = 0
self.misses = 0
def get(self, key: str) -> Optional[Any]:
value = self.redis.get(key)
if value is not None:
self.hits += 1
return json.loads(value)
self.misses += 1
return None
def set(self, key: str, value: Any, ttl: int = 300):
self.redis.setex(key, ttl, json.dumps(value, default=str))
def delete(self, key: str):
self.redis.delete(key)
def invalidate_pattern(self, pattern: str):
"""Delete all keys matching a pattern."""
cursor = 0
while True:
cursor, keys = self.redis.scan(cursor, match=pattern, count=100)
if keys:
self.redis.delete(*keys)
if cursor == 0:
break
def cache_aside(self, key: str, ttl: int = 300):
"""Decorator for cache-aside pattern."""
def decorator(func: Callable):
@wraps(func)
async def wrapper(*args, **kwargs):
# Try cache
cached = self.get(key.format(*args, **kwargs))
if cached is not None:
return cached
# Miss β compute and cache
result = await func(*args, **kwargs)
self.set(key.format(*args, **kwargs), result, ttl)
return result
return wrapper
return decorator
@property
def hit_rate(self) -> float:
total = self.hits + self.misses
return (self.hits / total * 100) if total > 0 else 0.0
cache = CacheManager()
# Usage with write-through invalidation
class UserService:
@cache.cache_aside("user:{user_id}", ttl=600)
async def get_user(self, user_id: str):
return await db.users.find_one({"_id": user_id})
async def update_user(self, user_id: str, data: dict):
await db.users.update_one({"_id": user_id}, {"$set": data})
cache.delete(f"user:{user_id}") # Invalidate single key
cache.invalidate_pattern("user_list:*") # Invalidate related lists
HTTP Caching Headers
// middleware/cacheHeaders.js
function cacheHeaders(options = {}) {
return (req, res, next) => {
const { maxAge = 0, sMaxAge, isPrivate = false, immutable = false } = options;
const directives = [];
if (isPrivate) {
directives.push("private");
} else {
directives.push("public");
}
directives.push(`max-age=${maxAge}`);
if (sMaxAge !== undefined) {
directives.push(`s-maxage=${sMaxAge}`);
}
if (immutable) {
directives.push("immutable");
}
res.set("Cache-Control", directives.join(", "));
// ETag for conditional requests
const originalEnd = res.end;
res.end = function (body, encoding) {
if (body && res.statusCode === 200) {
const etag = `"${require("crypto").createHash("md5").update(body).digest("hex")}"`;
res.set("ETag", etag);
if (req.headers["if-none-match"] === etag) {
res.status(304);
return originalEnd.call(this, null, encoding);
}
}
originalEnd.call(this, body, encoding);
};
next();
};
}
// Apply different strategies per route type
app.use("/api/v1/config", cacheHeaders({ maxAge: 3600, sMaxAge: 7200 })); // Config: 1h client, 2h CDN
app.use("/api/v1/users/me", cacheHeaders({ maxAge: 0, isPrivate: true })); // User-specific: no cache
app.use("/static", cacheHeaders({ maxAge: 31536000, immutable: true })); // Static assets: 1 year
app.use("/api/v1/products", cacheHeaders({ maxAge: 300, sMaxAge: 600 })); // Products: 5min/10min
Database Query Optimization
PostgreSQL Query Analysis
-- Enable timing and analyze queries
SET track_io_timing = on;
-- Analyze a slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, c.name AS customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC
LIMIT 50;
-- Key things to look for in EXPLAIN output:
-- β Seq Scan on large tables β add index
-- β Nested Loop with high row estimates β consider Hash Join
-- β Sort with high cost β add index matching ORDER BY
-- β Buffers: read >> shared hit β data not cached, consider more memory
-- Create targeted indexes
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing'); -- Partial index
CREATE INDEX CONCURRENTLY idx_orders_customer_id
ON orders (customer_id)
INCLUDE (status, total_amount); -- Covering index
-- Find slow queries (requires pg_stat_statements)
SELECT
query,
calls,
mean_exec_time::numeric(10,2) AS avg_ms,
total_exec_time::numeric(10,2) AS total_ms,
rows / GREATEST(calls, 1) AS avg_rows,
shared_blks_hit + shared_blks_read AS total_blocks
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Find missing indexes
SELECT
schemaname, tablename,
seq_scan, idx_scan,
seq_tup_read, idx_tup_fetch,
CASE WHEN seq_scan > 0
THEN round(seq_tup_read::numeric / seq_scan, 0)
ELSE 0
END AS avg_rows_per_seq_scan
FROM pg_stat_user_tables
WHERE seq_scan > 100
AND seq_tup_read > 10000
ORDER BY seq_tup_read DESC;
-- Find unused indexes (wasting write performance)
SELECT
indexrelname AS index_name,
relname AS table_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan < 50
ORDER BY pg_relation_size(indexrelid) DESC;
N+1 Query Detection
# Detect and fix N+1 queries (SQLAlchemy)
# β BAD: N+1 β one query per order for customer name
orders = session.query(Order).filter(Order.status == "pending").all()
for order in orders:
print(order.customer.name) # Triggers a query per iteration!
# β
GOOD: Eager loading with joinedload
from sqlalchemy.orm import joinedload
orders = (
session.query(Order)
.options(joinedload(Order.customer))
.filter(Order.status == "pending")
.all()
)
for order in orders:
print(order.customer.name) # Already loaded
# β
BEST: Projection β only fetch what you need
from sqlalchemy import select
results = session.execute(
select(Order.id, Order.total_amount, Customer.name)
.join(Customer)
.where(Order.status == "pending")
).all()
Load Testing
k6 Load Test
// load-tests/api-load.js
import http from "k6/http";
import { check, sleep, group } from "k6";
import { Rate, Trend } from "k6/metrics";
const errorRate = new Rate("errors");
const apiLatency = new Trend("api_latency", true);
export const options = {
stages: [
{ duration: "2m", target: 50 }, // Ramp up
{ duration: "5m", target: 50 }, // Steady state
{ duration: "2m", target: 200 }, // Spike
{ duration: "5m", target: 200 }, // Sustained load
{ duration: "2m", target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ["p(95)<500", "p(99)<1000"], // 95th < 500ms, 99th < 1s
errors: ["rate<0.01"], // Error rate < 1%
api_latency: ["avg<200"], // Average < 200ms
},
};
const BASE_URL = __ENV.API_URL || "http://localhost:3000/api/v1";
const AUTH_TOKEN = __ENV.AUTH_TOKEN;
export default function () {
const headers = {
"Content-Type": "application/json",
Authorization: `Bearer ${AUTH_TOKEN}`,
};
group("Browse Products", () => {
const res = http.get(`${BASE_URL}/products?page=1&limit=20`, { headers });
apiLatency.add(res.timings.duration);
check(res, {
"status is 200": (r) => r.status === 200,
"has products": (r) => JSON.parse(r.body).data.length > 0,
"response time < 500ms": (r) => r.timings.duration < 500,
});
errorRate.add(res.status !== 200);
});
sleep(Math.random() * 2 + 1); // Think time: 1-3s
group("View Product Detail", () => {
const productId = Math.floor(Math.random() * 1000) + 1;
const res = http.get(`${BASE_URL}/products/${productId}`, { headers });
apiLatency.add(res.timings.duration);
check(res, {
"status is 200 or 404": (r) => [200, 404].includes(r.status),
});
});
sleep(Math.random() * 3 + 1);
group("Create Order", () => {
const payload = JSON.stringify({
productId: Math.floor(Math.random() * 100) + 1,
quantity: Math.floor(Math.random() * 5) + 1,
});
const res = http.post(`${BASE_URL}/orders`, payload, { headers });
check(res, {
"order created": (r) => r.status === 201,
"has order id": (r) => JSON.parse(r.body).data.id !== undefined,
});
errorRate.add(res.status >= 400);
});
sleep(1);
}
Frontend Performance (Core Web Vitals)
// Performance monitoring snippet
function measureWebVitals() {
// Largest Contentful Paint (LCP) β target: < 2.5s
new PerformanceObserver((list) => {
const entries = list.getEntries();
const lcp = entries[entries.length - 1];
console.log("LCP:", lcp.startTime, "ms");
reportMetric("lcp", lcp.startTime);
}).observe({ type: "largest-contentful-paint", buffered: true });
// First Input Delay (FID) β target: < 100ms
new PerformanceObserver((list) => {
const entry = list.getEntries()[0];
console.log("FID:", entry.processingStart - entry.startTime, "ms");
reportMetric("fid", entry.processingStart - entry.startTime);
}).observe({ type: "first-input", buffered: true });
// Cumulative Layout Shift (CLS) β target: < 0.1
let clsScore = 0;
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (!entry.hadRecentInput) {
clsScore += entry.value;
}
}
console.log("CLS:", clsScore);
reportMetric("cls", clsScore);
}).observe({ type: "layout-shift", buffered: true });
// Interaction to Next Paint (INP) β target: < 200ms
new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.interactionId) {
reportMetric("inp", entry.duration);
}
}
}).observe({ type: "event", durationThreshold: 16, buffered: true });
}
Bundle Size Optimization
// vite.config.js β optimized build
import { defineConfig } from "vite";
import { visualizer } from "rollup-plugin-visualizer";
export default defineConfig({
build: {
rollupOptions: {
output: {
manualChunks: {
vendor: ["react", "react-dom"],
ui: ["@radix-ui/react-dialog", "@radix-ui/react-popover"],
charts: ["recharts"],
},
},
},
chunkSizeWarningLimit: 250, // KB
},
plugins: [
visualizer({ filename: "dist/bundle-stats.html", gzipSize: true }),
],
});
Performance Budgets in CI
# .github/workflows/perf-budget.yml
name: Performance Budget
on: [pull_request]
jobs:
bundle-size:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
- run: npm ci
- run: npm run build
- name: Check bundle sizes
run: |
MAX_MAIN=250 # KB
MAX_VENDOR=500
MAIN_SIZE=$(stat -f%z dist/assets/main*.js 2>/dev/null || stat -c%s dist/assets/main*.js)
MAIN_KB=$((MAIN_SIZE / 1024))
echo "Main bundle: ${MAIN_KB}KB (limit: ${MAX_MAIN}KB)"
if [ $MAIN_KB -gt $MAX_MAIN ]; then
echo "β Main bundle exceeds budget!"
exit 1
fi
echo "β
Bundle sizes within budget"
lighthouse:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Lighthouse CI
uses: treosh/lighthouse-ci-action@v10
with:
configPath: .lighthouserc.json
uploadArtifacts: true
// .lighthouserc.json
{
"ci": {
"assert": {
"assertions": {
"categories:performance": ["error", { "minScore": 0.9 }],
"first-contentful-paint": ["warn", { "maxNumericValue": 1500 }],
"largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
"cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
"total-blocking-time": ["warn", { "maxNumericValue": 300 }]
}
}
}
}
Memory Leak Detection
// Node.js memory monitoring
const v8 = require("v8");
function monitorMemory(intervalMs = 30000) {
setInterval(() => {
const heap = v8.getHeapStatistics();
const usage = process.memoryUsage();
const metrics = {
heapUsedMB: Math.round(usage.heapUsed / 1024 / 1024),
heapTotalMB: Math.round(usage.heapTotal / 1024 / 1024),
rssMB: Math.round(usage.rss / 1024 / 1024),
externalMB: Math.round(usage.external / 1024 / 1024),
heapUsedPct: Math.round((heap.used_heap_size / heap.heap_size_limit) * 100),
};
console.log("Memory:", JSON.stringify(metrics));
// Alert on high usage
if (metrics.heapUsedPct > 85) {
console.error(`β οΈ Heap usage at ${metrics.heapUsedPct}% β potential leak`);
}
}, intervalMs);
}
Anti-Patterns to Avoid
- β Optimizing without profiling β you'll fix the wrong thing
- β Premature caching β adds complexity; only cache proven bottlenecks
- β Caching without invalidation strategy β stale data is worse than slow data
- β Load testing with uniform patterns β real traffic has spikes and variety
- β Missing indexes on JOIN/WHERE columns β full table scans kill databases
- β Ignoring N+1 queries β they hide in ORMs and explode at scale
- β No performance budgets β regressions creep in silently over sprints
- β Micro-optimizing cold paths β focus on hot paths that run millions of times
- β Synchronous I/O in async code β blocks the event loop/thread pool
- β Ignoring frontend performance β backend can be fast while UX feels slow
Adapted from buildwithclaude by Dave Poon (MIT)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.