🤖

AI & LLM

LLM integrations, prompt engineering, and AI orchestration

7,400 スキル

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism,...

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision,...

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of...

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when...

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on...

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision...

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment,...

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs,...

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production...

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production...

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments...

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile...