This skill should be used when the user asks to "optimize context", "reduce token costs", "improve context efficiency", "implement KV-cache optimization", "partition context", or mentions context...
This skill should be used when the user asks to "understand context", "explain context windows", "design agent architecture", "debug context issues", "optimize context usage", or discusses context...
Transform markdown/HTML into styled 3:4 ratio images for Xiaohongshu
Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on...
GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without...
Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable...
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+...
Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines -...
Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory...
Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI, Anthropic, Google), 500+ integrations, ReAct agents, tool calling, memory...
State-space model with O(n) complexity vs Transformers' O(nΒ²). 5Γ faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2...
Track website and app analytics with Google Analytics 4's comprehensive platform.
Layer 4 artifact for Behavior-Driven Development test scenarios using Gherkin Given-When-Then format
Orchestrate comprehensive planning phase from ideation to development-ready specifications using 4 specialized agents
Systematic debugging methodology with 4-phase process, root cause tracing, and elite observability standards. No fixes without investigation.
Expert guidance for OpenAI API development including GPT models, Assistants API, function calling, embeddings, and best practices for production applications.
Use OpenAI's Codex CLI to invoke GPT models from the command line. Use when you need a second AI opinion, want to verify your work, or need OpenAI-specific capabilities.
Master of Modern PHP (8.4-8.6+), specialized in Property Hooks, Partial Function Application, and High-Performance Engine optimization.
Senior expert in Tailwind CSS 4.0+, CSS-First architecture, and modern Design Systems. Use when configuring themes, migrating from v3, or implementing native container queries.
Conduct comprehensive research on any topic by coordinating 2-4 specialized researcher agents in parallel, then synthesizing findings into a detailed report via mandatory report-writer agent delegation