Research-aligned self-consistency for debugging. Spawns independent solver agents that each explore and debug the problem from scratch. Uses majority voting. Based on "Self-Consistency Improves...
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM...
深度调研的多Agent编排工作流:把一个调研目标拆成可并行子目标,用 Claude Code 非交互模式(`claude -p`)运行子进程;联网与采集优先使用已安装的 skills,其次使用 MCP 工具;用脚本聚合子结果并分章精修,最终交付"成品报告文件路径 +...
Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety...
Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with self-critique/revision, then RLAIF (RL from AI Feedback). Use for safety...
>
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session
Use when executing implementation plans with independent tasks in the current session