lean4-prove

by @grahama1970 in Tools

# Install this skill:

npx skills add grahama1970/agent-skills --skill "lean4-prove"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: lean4-prove
description: >
Retrieval-augmented Lean4 proof generation. Queries 94k+ exemplars from DeepSeek-Prover V1+V2,
uses hybrid search (BM25 + semantic + graph), generates via Claude, compiles in Docker, retries on failure.
allowed-tools: Bash, Read, Docker
triggers:
- prove this
- lean4 proof
- generate proof
- verify lean4
- lean4-prove
- formalize this requirement
metadata:
short-description: Retrieval-augmented Lean4 proof generation

lean4-prove

Retrieval-augmented Lean4 proof generation for engineering requirements. Uses 94,000+ proven theorems from DeepSeek-Prover V1+V2 to guide proof synthesis via hybrid search (BM25 + semantic + graph traversal).

Architecture

Requirement + Tactics + Persona
        │
        ▼
┌───────────────────────────┐
│ 1. RECALL similar proofs  │  ← Hybrid search on ArangoDB
│    from 94k+ exemplars    │     (BM25 + semantic + graph)
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 2. BUILD support pack     │
│    - Validated imports    │
│    - Tactic patterns      │
│    - Similar proofs       │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 3. GENERATE N candidates  │  ← Claude with exemplar context
│    constrained by corpus  │
└───────────────────────────┘
        │
        ▼
┌───────────────────────────┐
│ 4. COMPILE each in        │  ← lean_runner Docker
│    lean_runner container  │
└───────────────────────────┘
        │
   ┌────┴────┐
   │         │
 Success   Failure
   │         │
   ▼         ▼
 Return   Retry with error feedback
          (up to max_retries)

Why Retrieval-Augmented?

Determinism - Exact provenance: "used these 3 proofs as templates"
Version alignment - Exemplars use imports that actually work
Fewer hallucinations - Constrained to lemmas that exist
Tactic idioms - Transfers working simp sets and proof patterns

Usage

# Basic proof
./run.sh --requirement "Prove n + 0 = n"

# With tactics preference
./run.sh -r "Prove commutativity of addition" -t "simp,ring,omega"

# With persona context
./run.sh -r "Prove message integrity" -p "cryptographer"

# Via stdin (JSON)
echo '{"requirement": "Prove n + 0 = n", "tactics": ["rfl"]}' | ./run.sh

# Custom settings
./run.sh -r "Prove theorem" --candidates 5 --retries 5 --model claude-sonnet-4-20250514

Output

{
  "success": true,
  "code": "import Mathlib\n\ntheorem list_append_length (xs ys : List α) :\n    (xs ++ ys).length = xs.length + ys.length := by\n  induction xs with\n  | nil => simp\n  | cons x xs ih => simp [List.cons_append, ih]",
  "attempts": 1,
  "candidate": 0,
  "errors": null,
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "aesop", "norm_num", "intro"],
    "imports_count": 3
  }
}

On failure:

{
  "success": false,
  "code": null,
  "attempts": 9,
  "errors": [
    "Candidate 0 attempt 1: unknown identifier 'natAdd'",
    "Candidate 1 attempt 1: type mismatch..."
  ],
  "retrieval": {
    "retrieved": 5,
    "tactics_added": ["simp", "exact"],
    "imports_count": 3
  }
}

Parameters

Parameter	Default	Description
`--requirement, -r`	(required)	Theorem to prove
`--tactics, -t`	none	Comma-separated preferred tactics
`--persona, -p`	none	Persona context for generation
`--candidates, -n`	3	Parallel proof candidates
`--retries`	3	Max retries per candidate
`--model`	claude-sonnet-4-20250514	Claude model
`--container`	lean_runner	Docker container name
`--timeout`	120	Compilation timeout (seconds)

Environment Variables

# Proof generation
LEAN4_CONTAINER=lean_runner      # Docker container
LEAN4_TIMEOUT=120                # Compile timeout
LEAN4_MAX_RETRIES=3              # Retries per candidate
LEAN4_CANDIDATES=3               # Parallel candidates
LEAN4_PROVE_MODEL=opus           # Claude model (opus recommended for proofs)

# Retrieval (requires ArangoDB with ingested dataset)
LEAN4_RETRIEVAL=1                # Enable/disable retrieval (default: 1)
LEAN4_RETRIEVAL_K=5              # Number of exemplars to retrieve
ARANGO_URL=http://127.0.0.1:8529 # ArangoDB connection
ARANGO_DB=memory                 # Database name (same as memory skill)

Dataset Setup

The skill uses 94,000+ theorems from DeepSeek-Prover V1+V2 for retrieval:
- V1: 27,503 theorems (status: "proven")
- V2: 66,708 theorems (status: "ok" for 11,689 proven + others)

One-time ingest:

# Ingest full dataset (~5 min)
./ingest.sh

# Or limit for testing
./ingest.sh --limit 1000

This populates the lean_theorems collection in ArangoDB with:
- formal_statement - The theorem statement
- formal_proof - Working proof code
- header - Validated imports (Mathlib, Aesop, etc.)
- tactics - Extracted tactic names
- source - "deepseek-prover-v1" or "deepseek-prover-v2"
- status - "proven" (V1) or "ok"/"failed"/etc. (V2)

Authentication

Uses Claude Code CLI (claude -p) in headless non-interactive mode.

The CLI is called with:
- -p flag for print/headless mode
- --output-format text for plain text output
- --max-turns 1 for single-turn operation

Environment variables CLAUDE_CODE and CLAUDECODE are cleared to avoid recursion detection when called from within Claude Code.

No separate API key required - authentication is handled via your Claude subscription.

Requirements

Docker with lean_runner container running (Lean4 + Mathlib installed)
Claude Code CLI (claude) in PATH with valid authentication
ArangoDB running locally (for retrieval - optional but recommended)
Dataset ingested via ./ingest.sh (one-time setup)

Tactics

Common Lean4/Mathlib tactics to suggest:

Tactic	Use For
`rfl`	Reflexivity proofs
`simp`	Simplification
`ring`	Ring arithmetic
`omega`	Linear arithmetic
`decide`	Decidable propositions
`exact`	Exact term construction
`apply`	Apply lemmas
`induction`	Inductive proofs

Examples

Engineering: List operations

./run.sh -r "Prove length(xs ++ ys) = length(xs) + length(ys)" -t "simp,induction"

Engineering: State machine property

./run.sh -r "When mux_enable is false, output shall equal default_value" \
  -p "embedded systems engineer" -t "simp,cases,decide"

Engineering: Protocol correctness

./run.sh -r "Prove message append preserves checksum: checksum(msg ++ data) = update(checksum(msg), data)" \
  -t "simp,induction,ring"

Cryptography

./run.sh -r "Prove that XOR is self-inverse: a ⊕ a = 0" -p "cryptographer" -t "simp,decide"

Complex theorem

./run.sh -r "Prove the sum of first n natural numbers equals n*(n+1)/2" \
  -t "induction,simp,ring" \
  --candidates 5 \
  --retries 5

Difference from lean4-verify

Skill	Purpose
`lean4-verify`	Compile-only. Takes Lean4 code, returns pass/fail
`lean4-prove`	Full pipeline. Takes requirement, generates + compiles + retries

Use lean4-verify when you already have Lean4 code to check.
Use lean4-prove when you need to generate the proof from a requirement.

Advanced: Memory Integration

The skill integrates with the memory project for hybrid retrieval (BM25 + semantic + graph traversal):

# One-time setup: embed theorems and create edges
ARANGO_PASS=yourpass ARANGO_DB=memory python integrate_memory.py

This creates:
- Embeddings: 39k+ theorem embeddings in lesson_embeddings for semantic search
- Tactic edges: 10k+ edges between theorems sharing primary tactics
- Similarity edges: Edges between semantically similar theorems (cosine > 0.7)

This enables:
- Semantic search: Find theorems by meaning, not just keywords
- Multi-hop traversal: "What proofs use similar tactics?"
- Impact analysis: "If I change lemma X, what breaks?"

Lemma Dependency Graph (Planned)

Extract lemma dependencies during compilation to build an impact graph:

Query: "What's affected if mux_default changes?"

Answer:
  mux_default
    └── system_safe_state (NEEDS RE-VERIFY)
          └── requirement_42 (NEEDS RE-VERIFY)

Engineering Use Cases

Ambiguity detection: Formalization fails → requirement is vague
Completeness check: "Are all state transitions covered?"
Regression impact: "Change to X affects theorems Y, Z"
Cross-reference: "Which requirements share this lemma?"

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

lean4-prove

# Description

# SKILL.md

lean4-prove

Architecture

Why Retrieval-Augmented?

Usage

Output

Parameters

Environment Variables

Dataset Setup

Authentication

Requirements

Tactics

Examples

Engineering: List operations

Engineering: State machine property

Engineering: Protocol correctness

Cryptography

Complex theorem

Difference from lean4-verify

Advanced: Memory Integration

Lemma Dependency Graph (Planned)

Engineering Use Cases

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill