model-council

by @michaelboeding in AI & LLM

# Install this skill:

npx skills add michaelboeding/skills --skill "model-council"

Install specific skill from multi-skill repository

# Description

This skill should be used when the user asks for "model council", "multi-model", "compare models", "ask multiple AIs", "consensus across models", "run on different models", or wants to get solutions from multiple AI providers (Claude, GPT, Gemini, Grok) and compare results. Orchestrates parallel execution across AI models/CLIs and synthesizes the best answer.

# SKILL.md

name: model-council
description: This skill should be used when the user asks for "model council", "multi-model", "compare models", "ask multiple AIs", "consensus across models", "run on different models", or wants to get solutions from multiple AI providers (Claude, GPT, Gemini, Grok) and compare results. Orchestrates parallel execution across AI models/CLIs and synthesizes the best answer.

Model Council: Multi-Model Consensus

Run the same problem through multiple AI models in parallel, collect their analysis only, then Claude Code synthesizes and decides the best approach.

Unlike code-council (which uses one model with multiple approaches), model-council leverages different model architectures for true ensemble diversity.

Critical: Analysis Only Mode

IMPORTANT: External models provide analysis and recommendations ONLY. They do NOT make code changes.

External models: Analyze, suggest, reason, compare options
Claude Code: Synthesizes all inputs, makes final decision, implements changes

This ensures:
1. Claude Code remains in control of the codebase
2. No conflicting changes from multiple sources
3. Best ideas from all models, unified execution

Why Multi-Model?

Different models have different:
- Training data and knowledge cutoffs
- Reasoning patterns and biases
- Strengths (math, code, creativity, etc.)

When multiple independent models agree → High confidence the answer is correct.

Execution Modes

Mode 1: CLI Agents (Uses Your Existing Accounts)

Call CLI tools that use your logged-in accounts - leverages existing subscriptions!

CLI Tool	Model	Status
`claude`	Claude (this session)	✅ Already running
`codex`	OpenAI Codex	Requires setup
`gemini`	Google Gemini	Requires setup
`aider`	Multi-model	Requires setup

CLI Setup Instructions

OpenAI Codex CLI:

# Install via npm
npm install -g @openai/codex

# Login (uses browser auth)
codex auth

# Verify
codex --version

Google Gemini CLI:

# Install via npm  
npm install -g @anthropic-ai/gemini-cli

# Or use gcloud with Vertex AI
gcloud auth application-default login

# Verify
gemini --version

Aider (Multi-model, Recommended):

# Install via pip
pip install aider-chat

# Configure with your API keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Run with specific model
aider --model gpt-4o --message "analyze this code"

Check What's Installed:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/model-council/scripts/detect_clis.py

Mode 2: API Calls (Pay per token)

Direct API calls - more reliable, works without CLI setup, but costs money.

Required environment variables:
- ANTHROPIC_API_KEY - For Claude API (https://console.anthropic.com/)
- OPENAI_API_KEY - For GPT-4 API (https://platform.openai.com/api-keys)
- GOOGLE_API_KEY - For Gemini API (https://aistudio.google.com/apikey)
- XAI_API_KEY - For Grok API (https://console.x.ai/)

Configuration

User Model Selection

Users can specify models inline:

model council with claude, gpt-4o, gemini: solve this problem

model council (claude + codex): fix this bug

model council all: use all available models

Default Models

If not specified, use all available:
1. Check which CLI tools are installed
2. Check which API keys are set
3. Use what's available

Config File (Optional)

Users can create ~/.model-council.yaml:

# Preferred models (in order)
models:
  - claude      # Use Claude Code CLI (current session)
  - codex       # Use Codex CLI if installed
  - gemini-cli  # Use Gemini CLI if installed

# Fallback to APIs if CLIs not available
fallback_to_api: true

# API models to use when falling back
api_models:
  anthropic: claude-sonnet-4-20250514
  openai: gpt-4o
  google: gemini-2.0-flash
  xai: grok-3

# Timeout per model (seconds)
timeout: 120

# Run in parallel or sequential
parallel: true

Workflow

Step 1: Parse Model Selection

Determine which models to use:
1. Check user's inline specification (e.g., "with claude, gpt-4o")
2. If none specified, check config file
3. If no config, detect available CLIs and APIs

Step 2: Prepare the Prompt

Format the problem for each model with analysis-only instructions:

Analyze the following problem and provide your recommendations.
DO NOT output code changes directly.
Instead, provide:
1. Your analysis of the problem
2. Recommended approach(es)
3. Potential issues or edge cases to consider
4. Trade-offs between different solutions

Problem:
[user's problem here]

Key rules:
- Keep the core problem identical across models
- Explicitly request analysis, not implementation
- Include relevant context (code snippets, error messages)

Step 3: Execute in Parallel

Use the API council script to query multiple models:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/model-council/scripts/api_council.py \
  --prompt "Analyze this problem and recommend solutions (do not implement): [problem]" \
  --models "claude-sonnet,gpt-4o,gemini-flash"

Available models:
- claude-sonnet, claude-opus - Anthropic
- gpt-4o, gpt-4-turbo, o1 - OpenAI
- gemini-flash, gemini-pro - Google
- grok - xAI

List all models:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/model-council/scripts/api_council.py --list-models

Step 4: Collect Responses

Gather all model responses with metadata:
- Model name and version
- Response time
- Token usage (if available)
- Full response

Step 5: Analyze Consensus

Compare responses looking for:
- Agreement: Do models produce the same answer/approach?
- Unique insights: Does one model catch something others missed?
- Disagreements: Where do models differ and why?

Step 6: Claude Code Synthesizes and Decides

Claude Code (this session) uses ultrathink to:
1. Evaluate each model's analysis
2. Identify the strongest reasoning and recommendations
3. Note where models agree (high confidence) vs disagree (investigate further)
4. Make the final decision on approach
5. Implement the solution - only Claude Code makes code changes

This is the key difference from just asking one model:
- Multiple perspectives inform the decision
- Claude Code remains the single source of truth for implementation
- No conflicting changes from different models

Step 7: Deliver Results

Provide:
1. Final synthesized answer (best combined solution)
2. Consensus score (how many models agreed)
3. Individual responses (for transparency)
4. Insights (what each model contributed)

CLI Detection

To check available CLIs:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/model-council/scripts/detect_clis.py

This checks for:
- claude - Claude Code CLI
- codex - OpenAI Codex CLI
- gemini - Gemini CLI
- aider - Aider (multi-model)
- cursor - Cursor AI (if applicable)

Comparison: code-council vs model-council

Aspect	code-council	model-council
Models used	Claude only	Multiple (Claude, GPT, Gemini, etc.)
Diversity source	Different approaches	Different architectures
Cost	Free (uses current session)	Free (CLIs) or paid (APIs)
Speed	Fast (single model)	Slower (parallel calls)
Best for	Quick iterations	High-stakes decisions

When to Use Each

Use code-council when:
- You want fast iterations
- The problem is well-defined
- You trust Claude's reasoning

Use model-council when:
- High-stakes code (production, security)
- You want architectural diversity
- Models might have different knowledge
- You want to verify Claude's answer

Error Handling

CLI not found: Skip that model, log warning, continue with others.

API key missing: Skip that provider, try CLI fallback if available.

Timeout: Return partial results, note which models timed out.

No models available: Error with setup instructions.

Example Output

## Model Council Analysis Results

### Consensus: HIGH (3/3 models agree on approach)

### Summary of Recommendations:
All models recommend using a hash map for O(1) lookup. 
Key considerations raised:
- Handle null/empty input (Claude, GPT-4o)
- Consider memory vs speed tradeoff (Gemini)
- Add input validation (all models)

### Individual Analyses:

#### Claude Sonnet (API)
Analysis: The bug is caused by off-by-one error in the loop boundary.
Recommendation: Change `i <= len` to `i < len`
Edge cases noted: Empty array, single element
Confidence: High

#### GPT-4o (API)
Analysis: Loop iterates one element past array bounds.
Recommendation: Fix loop condition, add bounds check
Additional insight: Could also use forEach to avoid index errors
Confidence: High

#### Gemini Flash (API)
Analysis: Array index out of bounds on final iteration.
Recommendation: Adjust loop termination condition
Reference: Similar to common off-by-one patterns
Confidence: High

### Claude Code Decision:
Based on consensus, implementing fix with:
- Loop condition change (i < len)
- Added null check for robustness
- Unit test for edge cases

[Claude Code now implements the solution]

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.