2680 results (22.5ms) page 37 / 134
zechenzhangAGI / ai-research-skills-rwkv-architecture exact

RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel), infer like RNN (sequential). Linux Foundation AI project. Production at Windows,...

zechenzhangAGI / ai-research-skills-whisper exact

OpenAI's general-purpose speech recognition model. Supports 99 languages, transcription, translation to English, and language identification. Six model sizes from tiny (39M params) to large (1550M...

zechenzhangAGI / ai-research-skills-mamba-architecture exact

State-space model with O(n) complexity vs Transformers' O(nΒ²). 5Γ— faster inference, million-token sequences, no KV cache. Selective SSM with hardware-aware design. Mamba-1 (d_state=16) and Mamba-2...

zechenzhangAGI / ai-research-skills-chroma exact

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text search, filter by metadata. Simple 4-function API. Scales from notebooks to...

zechenzhangAGI / ai-research-skills-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2Γ— faster than DeepSpeedChat with...

zechenzhangAGI / ai-research-skills-awq-quantization exact

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster...

zechenzhangAGI / ai-research-skills-modal-serverless-gpu exact

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without infrastructure management, deploying ML models as APIs, or running batch jobs with automatic scaling.

zechenzhangAGI / ai-research-skills-autogpt-agents exact

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents, deploying persistent autonomous agents, or building complex multi-step AI...

zechenzhangAGI / ai-research-skills-langsmith-observability exact

LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building...

zechenzhangAGI / ai-research-skills-nemo-curator exact

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16Γ— faster), quality filtering (30+ heuristics), semantic deduplication, PII...

zechenzhangAGI / ai-research-skills-distributed-llm-pretraining-torchtitan exact

Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use when pretraining Llama 3.1, DeepSeek V3, or custom models at scale from 8 to 512+...

zechenzhangAGI / ai-research-skills-verl-rl-training exact

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with...

zechenzhangAGI / ai-research-skills-skypilot-multi-cloud-orchestration exact

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or...

zechenzhangAGI / ai-research-skills-phoenix-observability exact

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring...

zechenzhangAGI / ai-research-skills-ray-train exact

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of nodes. Built-in hyperparameter tuning with Ray Tune, fault tolerance, elastic...

zechenzhangAGI / ai-research-skills-quantizing-models-bitsandbytes exact

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4,...

zechenzhangAGI / ai-research-skills-training-llms-megatron exact

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on...

zechenzhangAGI / ai-research-skills-hqq-quantization exact

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when...

zechenzhangAGI / ai-research-skills-fine-tuning-with-trl exact

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF,...

zechenzhangAGI / ai-research-skills-optimizing-attention-flash exact

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory...