Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision...

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on...

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when...

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of...

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision,...

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism,...

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation,...

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual...

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with...

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images....

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16×...