Search: quantization | AgentSkillsRepo

hqq-quantization 0.30

zechenzhangAGI / ai-research-skills-hqq-quantization exact

Half-Quadratic Quantization for LLMs without calibration data. Use when quantizing models to 4/3/2-bit precision without needing calibration datasets, for fast quantization workflows, or when...

★ 1,712 ai

ai ai-research claude claude-code

model-quantization 0.28

martinholovsky / claude-skills-generator-model-quantization exact

Expert skill for AI model quantization and optimization. Covers 4-bit/8-bit quantization, GGUF conversion, memory optimization, and quality-performance tradeoffs for deploying LLMs in...

★ 20 ai

gguf-quantization 0.28

zechenzhangAGI / ai-research-skills-gguf-quantization exact

GGUF format and llama.cpp quantization for efficient CPU/GPU inference. Use when deploying models on consumer hardware, Apple Silicon, or when needing flexible quantization from 2-8 bit without...

★ 1,712 ai

ai ai-research claude claude-code

awq-quantization 0.25

zechenzhangAGI / ai-research-skills-awq-quantization exact

Activation-aware weight quantization for 4-bit LLM compression with 3x speedup and minimal accuracy loss. Use when deploying large models (7B-70B) on limited GPU memory, when you need faster...

★ 1,712 ai

ai ai-research claude claude-code

quantizing-models-bitsandbytes 0.23

zechenzhangAGI / ai-research-skills-quantizing-models-bitsandbytes exact

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4,...

★ 1,712 ai

ai ai-research claude claude-code

quantizing-models-bitsandbytes 0.23

ovachiever / droid-tings-quantizing-models-bitsandbytes exact

Quantizes LLMs to 8-bit or 4-bit for 50-75% memory reduction with minimal accuracy loss. Use when GPU memory is limited, need to fit larger models, or want faster inference. Supports INT8, NF4,...

★ 19 ai

AgentDB Performance Optimization 0.08

natea / fitfinder-agentdb-performance-optimization exact

Optimize AgentDB performance with quantization (4-32x memory reduction), HNSW indexing (150x faster search), caching, and batch operations. Use when optimizing memory usage, improving search...

★ 2 ai

AgentDB Performance Optimization 0.08

proffesor-for-testing / agentic-qe-agentdb-performance-optimization exact

Optimize AgentDB performance with quantization (4-32x memory reduction), HNSW indexing (150x faster search), caching, and batch operations. Use when optimizing memory usage, improving search...

★ 148 ai

agenticsfoundation agents quality-engineering agenticqe

serving-llms-vllm 0.08

ovachiever / droid-tings-serving-llms-vllm exact

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with...

★ 19 ai

serving-llms-vllm 0.08

zechenzhangAGI / ai-research-skills-serving-llms-vllm exact

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with...

★ 1,712 ai

ai ai-research claude claude-code

gptq 0.08

ovachiever / droid-tings-gptq exact

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity...

★ 19 ai

gptq 0.08

zechenzhangAGI / ai-research-skills-gptq exact

Post-training 4-bit quantization for LLMs with minimal accuracy loss. Use for deploying large models (70B, 405B) on consumer GPUs, when you need 4× memory reduction with <2% perplexity...

★ 1,712 ai

ai ai-research claude claude-code

tensorrt-llm 0.08

zechenzhangAGI / ai-research-skills-tensorrt-llm exact

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than...

★ 1,712 ai

ai ai-research claude claude-code

tensorrt-llm 0.08

ovachiever / droid-tings-tensorrt-llm exact

Optimizes LLM inference with NVIDIA TensorRT for maximum throughput and lowest latency. Use for production deployment on NVIDIA GPUs (A100/H100), when you need 10-100x faster inference than...

★ 19 ai

llama-cpp 0.08

zechenzhangAGI / ai-research-skills-llama-cpp exact

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization...

★ 1,712 ai

ai ai-research claude claude-code

llama-cpp 0.08

ovachiever / droid-tings-llama-cpp exact

Runs LLM inference on CPU, Apple Silicon, and consumer GPUs without NVIDIA hardware. Use for edge deployment, M1/M2/M3 Macs, AMD/Intel GPUs, or when CUDA is unavailable. Supports GGUF quantization...

★ 19 ai

vector-index-tuning 0.08

halay08 / fullstack-agent-skills-vector-index-tuning exact

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

★ 0 tools

vector-index-tuning 0.08

404kidwiz / agent-skills-backup-vector-index-tuning exact

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

★ 0 tools

vector-index-tuning 0.08

rmyndharis / antigravity-skills-vector-index-tuning exact

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

★ 187 tools

qdrant 0.07

itechmeat / llm-code-qdrant exact

Qdrant vector database: collections, points, payload filtering, indexing, quantization, snapshots, and Docker/Kubernetes deployment.

★ 1 devops

Confirm

Submit a Skill