Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language...
Use when a user wants you to discover and optionally install new agent skills for a task, and you must get explicit consent before any global install into Codex.
Validates Terminal User Interface (TUI) output using freeze for screenshot capture and LLM-as-judge for semantic validation. Supports both visual (PNG/SVG) and text-based validation modes.
Create comprehensive user guides, tutorials, how-to documentation, and step-by-step instructions with screenshots and examples. Use when writing user documentation, tutorials, or getting started guides.
Expert guidance on X's (Twitter's) open-sourced recommendation algorithm from the official xai-org/x-algorithm repository. Use this skill to create content that maximizes algorithmic...
Capture, organize, and develop ideas using structured thinking frameworks
Generate quick, effective responses for chat and messaging platforms
Document API changes, breaking changes, migration guides, and version history for APIs. Use when documenting API versioning, breaking changes, or creating API migration guides.
Make better decisions using structured frameworks and mental models
This skill is for interface design β dashboards, admin panels, apps, tools, and interactive products. NOT for marketing design (landing pages, marketing sites, campaigns).
Classify code tasks and execute task-specific checklists with quality gates. Route to WRITE, DEBUG, REVIEW, OPTIMIZE, REFACTOR, SIMPLIFY, or SECURE workflows, each invoking relevant CC and APOSD...
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots,...
CTO Co-Pilot - strategic technical leadership, architecture decisions, infrastructure optimization, and engineering team coordination
This skill should be used when the user asks to "calculate TAM",
This skill should be used when the user asks to "calculate TAM",
This skill should be used when the user asks to "calculate TAM",
This skill should be used when the user asks to "calculate TAM",
Automates browser interactions for web testing, form filling, screenshots, and data extraction. Use when the user needs to navigate websites, interact with web pages, fill forms, take screenshots,...
Schema compatibility and breaking change analysis
Use when exploring alternative scenarios, testing assumptions through "what if" questions, understanding causal relationships, conducting pre-mortem analysis, stress testing decisions, or when...