Search: gpu | AgentSkillsRepo

zechenzhangAGI / ai-research-skills-skypilot-multi-cloud-orchestration exact

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or...

★ 1,712 ai

ai ai-research claude claude-code

skypilot-multi-cloud-orchestration 0.09

Ianfr13 / claude-code-plugins-skypilot-multi-cloud-orchestration exact

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or...

★ 0 ai

serving-llms-vllm 0.09

ovachiever / droid-tings-serving-llms-vllm exact

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with...

★ 19 ai

serving-llms-vllm 0.09

zechenzhangAGI / ai-research-skills-serving-llms-vllm exact

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with...

★ 1,712 ai

ai ai-research claude claude-code

faiss 0.09

ovachiever / droid-tings-faiss exact

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN...

★ 19 ai

faiss 0.09

zechenzhangAGI / ai-research-skills-faiss exact

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors, GPU acceleration, and various index types (Flat, IVF, HNSW). Use for fast k-NN...

★ 1,712 ai

ai ai-research claude claude-code

openrlhf-training 0.09

ovachiever / droid-tings-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with...

★ 19 ai

openrlhf-training 0.09

zechenzhangAGI / ai-research-skills-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with...

★ 1,712 ai

ai ai-research claude claude-code

nanogpt 0.09

ovachiever / droid-tings-nanogpt exact

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture...

★ 19 ai

nanogpt 0.09

zechenzhangAGI / ai-research-skills-nanogpt exact

Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for learning transformers. By Andrej Karpathy. Perfect for understanding GPT architecture...

★ 1,712 ai

ai ai-research claude claude-code

nemo-curator 0.09

zechenzhangAGI / ai-research-skills-nemo-curator exact

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII...

★ 1,712 ai

ai ai-research claude claude-code

nemo-curator 0.09

ovachiever / droid-tings-nemo-curator exact

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII...

★ 19 ai

pytorch-lightning 0.09

ovachiever / droid-tings-pytorch-lightning exact

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard),...

★ 19 ai

ray-data 0.09

ovachiever / droid-tings-ray-data exact

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s...

★ 19 data

ray-data 0.09

zechenzhangAGI / ai-research-skills-ray-data exact

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images. Integrates with Ray Train, PyTorch, TensorFlow. Scales from single machine to 100s...

★ 1,712 ai

ai ai-research claude claude-code

pytorch-lightning 0.09

K-Dense-AI / claude-scientific-skills-pytorch-lightning exact

Deep learning framework (PyTorch Lightning). Organize PyTorch code into LightningModules, configure Trainers for multi-GPU/TPU, implement data pipelines, callbacks, logging (W&B, TensorBoard),...

★ 6,907 ai

ai-scientist bioinformatics chemoinformatics claude

tensorrt-llm 0.09

ovachiever / droid-tings-tensorrt-llm exact

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than...

★ 19 ai

tensorrt-llm 0.09

zechenzhangAGI / ai-research-skills-tensorrt-llm exact

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than...

★ 1,712 ai

ai ai-research claude claude-code

training-llms-megatron 0.09

ovachiever / droid-tings-training-llms-megatron exact

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on...

★ 19 ai

training-llms-megatron 0.09

zechenzhangAGI / ai-research-skills-training-llms-megatron exact

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on...

★ 1,712 ai

ai ai-research claude claude-code

Confirm

Submit a Skill