Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4...
find ~/zechenzhangAGI/ -name "*.skill"
Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images....
Provides PyTorch-native distributed LLM pretraining using torchtitan with 4D parallelism (FSDP2, TP, PP, CP). Use...
Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual...
Provides guidance for PyTorch-native agentic RL using torchforge, Meta's library separating infra from algorithms....
Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies....
Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for...
Provides guidance for training LLMs with reinforcement learning using verl (Volcano Engine RL). Use when...
High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks...
Language-independent tokenizer treating text as raw Unicode. Supports BPE and Unigram algorithms. Fast (50k...
Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances...
Expert guidance for fine-tuning LLMs with LLaMA-Factory - WebUI no-code, 100+ models, 2/3/4/5/6/8-bit QLoRA,...
Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or...
Educational GPT implementation in ~300 lines. Reproduces GPT-2 (124M) on OpenWebText. Clean, hackable code for...
Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization
Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from...
State-space model with O(n) complexity vs Transformers' O(n²). 5× faster inference, million-token sequences, no KV...
Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds....
Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment,...
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism,...