Accelerate LLM inference using speculative decoding, Medusa multiple heads, and lookahead decoding techniques. Use when optimizing inference speed (1.5-3.6× speedup), reducing latency for...
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Professional code review with auto CHANGELOG generation, integrated with Codex AI
Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5× cost reduction vs dense models), implementing sparse...
Expert in legacy Windows PowerShell 5.1. Specializes in WMI, ADSI, COM automation, and maintaining backward compatibility with Windows Server environments. Use for Windows-specific automation on...
Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA, multimodal support
Run the Codex Readiness unit test report. Use when you need deterministic checks plus in-session LLM evals for AGENTS.md/PLANS.md.
Run the Codex Readiness integration test. Use when you need an end-to-end agentic loop with build/test scoring.
Summarize huge articles (URL or local file) via a Codex CLI-driven chunk→reduce pipeline, keeping only the final short summary in context and saving it to summaries/*.md.
Intelligently delegate code generation, boilerplate creation, and automation tasks to OpenAI Codex CLI for rapid prototyping and development.
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
Provides guidance for interpreting and manipulating neural network internals using nnsight with optional NDIF remote execution. Use when needing to run interpretability experiments on massive...
Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents working together on complex tasks, when you need role-based agent collaboration...
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track...
Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate transformer internals via HookPoints and activation caching. Use when...
Large Language and Vision Assistant. Enables visual instruction tuning and image-based conversations. Combines CLIP vision encoder with Vicuna/LLaMA language models. Supports multi-turn image...