Search: grpo | AgentSkillsRepo

grpo-rl-training 0.30

zechenzhangAGI / ai-research-skills-grpo-rl-training exact

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

★ 1,712 ai

ai ai-research claude claude-code

grpo-rl-training 0.23

ovachiever / droid-tings-grpo-rl-training exact

Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training

★ 19 ai

verl-rl-training 0.15

zechenzhangAGI / ai-research-skills-verl-rl-training exact

Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when implementing RLHF, GRPO, PPO, or other RL algorithms for LLM post-training at scale with...

★ 1,712 ai

ai ai-research claude claude-code

openrlhf-training 0.15

zechenzhangAGI / ai-research-skills-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with...

★ 1,712 ai

ai ai-research claude claude-code

fine-tuning-with-trl 0.15

zechenzhangAGI / ai-research-skills-fine-tuning-with-trl exact

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF,...

★ 1,712 ai

ai ai-research claude claude-code

axolotl 0.15

zechenzhangAGI / ai-research-skills-axolotl exact

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

★ 1,712 ai

ai ai-research claude claude-code

torchforge-rl-training 0.09

zechenzhangAGI / ai-research-skills-torchforge-rl-training exact

Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms. Use when you want clean RL abstractions, easy algorithm experimentation, or...

★ 1,712 ai

ai ai-research claude claude-code

slime-rl-training 0.09

zechenzhangAGI / ai-research-skills-slime-rl-training exact

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM...

★ 1,712 ai

ai ai-research claude claude-code

miles-rl-training 0.09

zechenzhangAGI / ai-research-skills-miles-rl-training exact

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring...

★ 1,712 ai

ai ai-research claude claude-code

simpo-training 0.09

zechenzhangAGI / ai-research-skills-simpo-training exact

Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use...

★ 1,712 ai

ai ai-research claude claude-code

openrlhf-training 0.08

ovachiever / droid-tings-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2× faster than DeepSpeedChat with...

★ 19 ai

fine-tuning-with-trl 0.08

ovachiever / droid-tings-fine-tuning-with-trl exact

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF,...

★ 19 ai

model_finetuning 0.08

DonggangChen / antigravity-agentic-skills-model-finetuning exact

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF,...

★ 2 ai

model_finetuning 0.08

vuralserhat86 / antigravity-agentic-skills-model-finetuning exact

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF,...

★ 27 ai

model-trainer 0.08

eugenepyvovarov / mcpbundler-agent-skills-marketplace-model-trainer exact

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward...

★ 5 ai

agent-skill agent-skills claude codex

hugging-face-model-trainer 0.08

huggingface / skills-hugging-face-model-trainer exact

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward...

★ 1,015 ai

slime-user 0.08

yzlnew / infra-skills-slime-user exact

Guide for using SLIME (LLM post-training framework for RL Scaling). Use when working with SLIME for reinforcement learning training of language models, including setup, configuration, training...

★ 51 ai

axolotl 0.07

ovachiever / droid-tings-axolotl exact

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

★ 19 ai

constitutional-ai 0.07

zechenzhangAGI / ai-research-skills-constitutional-ai exact

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety...

★ 1,712 ai

ai ai-research claude claude-code

phoenix-observability 0.07

zechenzhangAGI / ai-research-skills-phoenix-observability exact

Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring...

★ 1,712 ai

ai ai-research claude claude-code

Confirm

Submit a Skill