Top Skills by Stars | AgentSkillsRepo

skypilot-multi-cloud-orchestration @zechenzhangAGI

Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or...

awq-quantization @zechenzhangAGI

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when...

quantizing-models-bitsandbytes @zechenzhangAGI

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is...

optimizing-attention-flash @zechenzhangAGI

Optimizes transformer attention with Flash Attention for 2-4x speedup and 10-20x memory reduction. Use when...

gguf-quantization @zechenzhangAGI

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer...

gptq @zechenzhangAGI

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on...

hqq-quantization @zechenzhangAGI

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision...

evaluating-code-models @zechenzhangAGI

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when...

evaluating-llms-harness @zechenzhangAGI

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking...

nemo-evaluator-sdk @zechenzhangAGI

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend...

llama-cpp @zechenzhangAGI

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment,...

sglang @zechenzhangAGI

Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs,...

tensorrt-llm @zechenzhangAGI

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production...

serving-llms-vllm @zechenzhangAGI

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production...

mlflow @zechenzhangAGI

Track ML experiments, manage model registry with versioning, deploy models to production, and reproduce experiments...

tensorboard @zechenzhangAGI

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile...

weights-and-biases @zechenzhangAGI

Track ML experiments with automatic logging, visualize training in real-time, optimize hyperparameters with sweeps,...

autogpt-agents @zechenzhangAGI

Autonomous AI agent platform for building and deploying continuous agents. Use when creating visual workflow agents,...

crewai-multi-agent @zechenzhangAGI

Multi-agent orchestration framework for autonomous AI collaboration. Use when building teams of specialized agents...

langchain @zechenzhangAGI

Framework for building LLM-powered applications with agents, chains, and RAG. Supports multiple providers (OpenAI,...

llamaindex @zechenzhangAGI

Data framework for building LLM applications with RAG. Specializes in document ingestion (300+ connectors),...

chroma @zechenzhangAGI

Open-source embedding database for AI applications. Store embeddings and metadata, perform vector and full-text...

faiss @zechenzhangAGI

Facebook's library for efficient similarity search and clustering of dense vectors. Supports billions of vectors,...

pinecone @zechenzhangAGI

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense +...

cat ~/Top

Confirm

Submit a Skill