Parameter-efficient fine-tuning for LLMs using LoRA, QLoRA, and 25+ methods. Use when fine-tuning large models...
find ~/zechenzhangAGI/ -name "*.skill"
High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models...
RNN+Transformer hybrid with O(n) inference. Linear time, infinite context, no KV cache. Train like GPT (parallel),...
Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO,...
Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM...
Expert guidance for GRPO/RL fine-tuning with TRL for reasoning and task-specific model training
Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen,...
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4...
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use...
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when...
Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms....
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k...
Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA,...
Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for...
Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training...
Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from...
State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV...
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds....
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment,...