New Skills | AgentSkillsRepo

evaluating-code-models @zechenzhangAGI

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when...

hqq-quantization @zechenzhangAGI

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision...

gptq @zechenzhangAGI

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on...

gguf-quantization @zechenzhangAGI

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer...

optimizing-attention-flash @zechenzhangAGI

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when...

quantizing-models-bitsandbytes @zechenzhangAGI

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is...

awq-quantization @zechenzhangAGI

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when...

skypilot-multi-cloud-orchestration @zechenzhangAGI

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or...

modal-serverless-gpu @zechenzhangAGI

Serverless GPU cloud platform for running ML workloads. Use when you need on-demand GPU access without...

lambda-labs-gpu-cloud @zechenzhangAGI

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances...

ray-train @zechenzhangAGI

Distributed training orchestration across clusters. Scales PyTorch/TensorFlow/HuggingFace from laptop to 1000s of...

pytorch-lightning @zechenzhangAGI

High-level PyTorch framework with Trainer class, automatic distributed training (DDP/FSDP/DeepSpeed), callbacks...

pytorch-fsdp @zechenzhangAGI

Expert guidance for Fully Sharded Data Parallel training with PyTorch FSDP - parameter sharding, mixed precision,...

training-llms-megatron @zechenzhangAGI

Trains large language models (2B-462B parameters) using NVIDIA Megatron-Core with advanced parallelism strategies....

deepspeed @zechenzhangAGI

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism,...

huggingface-accelerate @zechenzhangAGI

Simplest distributed training API. 4 lines to add distributed support to any PyTorch script. Unified API for...

nemo-guardrails @zechenzhangAGI

NVIDIA's runtime safety framework for LLM applications. Features jailbreak detection, input/output validation,...

llamaguard @zechenzhangAGI

Meta's 7-8B specialized moderation model for LLM input/output filtering. 6 safety categories - violence/hate, sexual...

constitutional-ai @zechenzhangAGI

Anthropic's method for training harmless AI through self-improvement. Two-phase approach - supervised learning with...

ray-data @zechenzhangAGI

Scalable data processing for ML workloads. Streaming execution across CPU/GPU, supports Parquet/CSV/JSON/images....

nemo-curator @zechenzhangAGI

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16×...

transformer-lens-interpretability @zechenzhangAGI

Provides guidance for mechanistic interpretability research using TransformerLens to inspect and manipulate...

sparse-autoencoder-training @zechenzhangAGI

Provides guidance for training and analyzing Sparse Autoencoders (SAEs) using SAELens to decompose neural network...

pyvene-interventions @zechenzhangAGI

Provides guidance for performing causal interventions on PyTorch models using pyvene's declarative intervention...

cat ~/Neu

Confirm

Submit a Skill