198 results (5.0ms) page 4 / 10
zechenzhangAGI / ai-research-skills-training-llms-megatron exact

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies. Use when training models >1B parameters, need maximum GPU efficiency (47% MFU on...

zechenzhangAGI / ai-research-skills-slime-rl-training exact

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM...

zechenzhangAGI / ai-research-skills-optimizing-attention-flash exact

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when training/running transformers with long sequences (>512 tokens), encountering GPU memory...

zechenzhangAGI / ai-research-skills-sparse-autoencoder-training exact

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network activations into interpretable features. Use when discovering interpretable...

zechenzhangAGI / ai-research-skills-implementing-llms-litgpt exact

Implements and trains LLMs using Lightning AI's LitGPT with 20+ pretrained architectures (Llama, Gemma, Phi, Qwen, Mistral). Use when need clean model implementations, educational understanding of...

zechenzhangAGI / ai-research-skills-sglang exact

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5Γ— faster...

zechenzhangAGI / ai-research-skills-deepspeed exact

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

ovachiever / droid-tings-moe-training exact

Train Mixture of Experts (MoE) models using DeepSpeed or HuggingFace. Use when training large-scale models with limited compute (5Γ— cost reduction vs dense models), implementing sparse...

amitlals / sap-rpt1-oss-predictor exact

Use SAP-RPT-1-OSS open source tabular foundation model for predictive analytics on SAP business data. Handles classification and regression tasks including customer churn prediction, delivery...

huggingface / skills-hugging-face-model-trainer exact

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward...

huggingface / skills-hugging-face-jobs exact

This skill should be used when users want to run any workload on Hugging Face Jobs infrastructure. Covers UV scripts, Docker-based jobs, hardware selection, cost estimation, authentication with...

huggingface / skills-hugging-face-paper-publisher exact

Publish and manage research papers on Hugging Face Hub. Supports creating paper pages, linking papers to models/datasets, claiming authorship, and generating professional markdown-based research articles.

zechenzhangAGI / ai-research-skills-pytorch-fsdp exact

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision, CPU offloading, FSDP2

zechenzhangAGI / ai-research-skills-guidance exact

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance - Microsoft Research's constrained...

zechenzhangAGI / ai-research-skills-lambda-labs-gpu-cloud exact

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node...

zechenzhangAGI / ai-research-skills-unsloth exact

Expert guidance for fast fine-tuning with Unsloth - 2-5x faster training, 50-80% less memory, LoRA/QLoRA optimization

zechenzhangAGI / ai-research-skills-openrlhf-training exact

High-performance RLHF framework with Ray+vLLM acceleration. Use for PPO, GRPO, RLOO, DPO training of large models (7B-70B+). Built on Ray, vLLM, ZeRO-3. 2Γ— faster than DeepSpeedChat with...

Orchestra-Research / ai-research-skills-ml-paper-writing exact

Write publication-ready ML/AI papers for NeurIPS, ICML, ICLR, ACL, AAAI, COLM. Use when drafting papers from research repos, structuring arguments, verifying citations, or preparing camera-ready...

zechenzhangAGI / ai-research-skills-nemo-guardrails exact

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation, fact-checking, hallucination detection, PII filtering, toxicity detection. Uses...

zechenzhangAGI / ai-research-skills-outlines exact

Guarantee valid JSON/XML/code structure during generation, use Pydantic models for type-safe outputs, support local models (Transformers, vLLM), and maximize inference speed with Outlines -...