howlonghasitBen

ollama-router

0
0
# Install this skill:
npx skills add howlonghasitBen/openclaw-skills --skill "ollama-router"

Install specific skill from multi-skill repository

# Description

Smart message routing to local Ollama models to reduce Claude API costs. Classifies messages as simple (ollama), medium (haiku), or complex (opus) and routes accordingly.

# SKILL.md


name: ollama-router
version: 0.1.0
description: Smart message routing to local Ollama models to reduce Claude API costs. Classifies messages as simple (ollama), medium (haiku), or complex (opus) and routes accordingly.
metadata: {"openclaw":{"emoji":"πŸ¦™","category":"cost-optimization","requires":{"bins":["node","curl"]}}}


Ollama Router

Route simple messages to local Ollama models instead of burning Claude tokens.

Overview

The router classifies incoming messages and suggests which model should handle them:
- ollama - Simple greetings, vibes, casual chat β†’ local model (FREE)
- haiku - Simple reasoning tasks β†’ cheap Claude
- opus - Complex coding/reasoning β†’ full Claude

Quick Start

1. Install Ollama

curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:8b-instruct-q4_0

2. Create Custom Model (Optional)

# Create a modelfile with your personality
cat > ~/.ollama/mymodel.modelfile << 'EOF'
FROM llama3.1:8b-instruct-q4_0
PARAMETER temperature 0.8
SYSTEM "You are a helpful assistant with a chill vibe."
EOF

ollama create mymodel -f ~/.ollama/mymodel.modelfile

3. Start the Router

cd ~/.openclaw/skills/ollama-router
npm install
node server.mjs &

Router runs on http://localhost:3333

4. Test Classification

# Simple greeting β†’ ollama
curl "localhost:3333/classify?m=hey%20whats%20up"
# {"route":"ollama","model":{"provider":"ollama","model":"surfgod"}}

# Complex task β†’ opus
curl "localhost:3333/classify?m=write%20a%20smart%20contract"
# {"route":"opus","model":{"provider":"anthropic","model":"claude-opus-4"}}

API Endpoints

GET /classify?m=

Returns routing recommendation:

{
  "route": "ollama|haiku|opus",
  "model": {"provider": "...", "model": "..."}
}

GET /stats

Returns routing statistics:

{
  "total": 100,
  "ollama": 65,
  "haiku": 20,
  "opus": 15
}

Classification Logic

Routes to Ollama (free)

  • Greetings: "hey", "gm", "sup", "yo"
  • Vibes: "nice", "based", "lol", "lmao"
  • Simple questions: "what time", "how are you"
  • Acknowledgments: "thanks", "ok", "cool"

Routes to Opus (expensive but necessary)

  • Code keywords: "code", "function", "debug", "error"
  • File operations: "edit", "create", "write file"
  • Tool usage: "search", "fetch", "browser"
  • Complex reasoning: "explain", "analyze", "compare"

Routes to Haiku (middle ground)

  • Everything else that doesn't match above patterns

Integration with OpenClaw

Agent-Level (Manual)

Add to your AGENTS.md:

## Smart Routing
Before responding, query `localhost:3333/classify?m=<message>`
- If "ollama": Generate via Ollama, respond directly
- If "haiku/opus": Use Claude as normal

Gateway-Level (Preprocessor)

For true zero-token routing, use the OpenClaw fork with preprocessor:
1. Clone and build the fork
2. Set OPENCLAW_PREPROCESSOR=1
3. Router intercepts before Claude sees the message

Customization

Edit classifier.mjs to adjust:
- OLLAMA_PATTERNS - Patterns that route to local model
- OPUS_PATTERNS - Patterns that need full Claude
- Model names and providers

Stats Tracking

The router tracks classification distribution. Goal: >70% to Ollama for casual conversations.

curl localhost:3333/stats

Files

  • server.mjs - HTTP server
  • classifier.mjs - Classification logic
  • package.json - Dependencies

Tips

  1. Tune for your use case - Add patterns specific to your conversations
  2. Monitor stats - Check if routing makes sense
  3. Custom models - Train Ollama models with your personality
  4. GPU helps - 8B models run great on 4GB+ VRAM

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.