Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add ahmedibrahim085/Claude-Multi-Agent-Research-System-Skill --skill "semantic-search"
Install specific skill from multi-skill repository
# Description
Semantic search for finding code by meaning using natural language queries. Orchestrates semantic-search-reader (search/find-similar/list-projects) and semantic-search-indexer (index/reindex/status) agents. Use for understanding unfamiliar codebases, finding similar implementations, or locating functionality by description rather than exact keywords. (project)
# SKILL.md
name: semantic-search
description: Semantic search for finding code by meaning using natural language queries. Orchestrates semantic-search-reader (search/find-similar/list-projects) and semantic-search-indexer (index/reindex/status) agents. Use for understanding unfamiliar codebases, finding similar implementations, or locating functionality by description rather than exact keywords. (project)
allowed-tools: Bash, Read, Glob, Grep
Semantic Search Skill
Orchestrator for Semantic Code Intelligence via Agent Delegation
This skill orchestrates two specialized agents for semantic search operations. It provides bash scripts that import Python modules from the claude-context-local library (NOT an MCP server - no server process runs, just Python imports via PYTHONPATH). Unlike traditional text-based search (Grep) or pattern matching (Glob), semantic search understands the meaning of content, finding functionally similar text even when using different wording, variable names, or patterns.
The skill uses the library's venv Python interpreter to import merkle, chunking, and embedding modules, enabling semantic search, indexing, and similarity finding across any text content (code, docs, markdown, configs).
π¬ Orchestration Instructions
When this skill is active, you MUST spawn the appropriate agent via Task tool.
This skill uses a 2-agent architecture for token optimization:
- semantic-search-reader: Handles READ operations (search, find-similar, list-projects)
- semantic-search-indexer: Handles WRITE operations (index, incremental-reindex, status)
Decision Logic: Which Agent to Spawn?
| User Request Contains | Operation Type | Agent to Spawn |
|---|---|---|
| "find X", "search for Y", "where is Z" | search | semantic-search-reader |
| "find similar to...", "similar chunks" | find-similar | semantic-search-reader |
| "what projects", "list indexed", "show projects" | list-projects | semantic-search-reader |
| "index this", "create index", "full reindex" | index | semantic-search-indexer |
| "incremental reindex", "auto reindex", "update index" | incremental-reindex | semantic-search-indexer |
| "check index", "index status", "is it indexed" | status | semantic-search-indexer |
Agent Spawn Examples
Example 1: Search Operation (semantic-search-reader)
Task(
subagent_type="semantic-search-reader",
description="Search project semantically",
prompt="""You are the semantic-search-reader agent.
Operation: search
Query: "user authentication logic"
K: 10
Project: /path/to/project
Execute the search operation using scripts/search and return interpreted results with explanations."""
)
Example 2: Index Operation (semantic-search-indexer)
Task(
subagent_type="semantic-search-indexer",
description="Index project for semantic search",
prompt="""You are the semantic-search-indexer agent.
Operation: index
Directory: /path/to/project
Full: true
Execute the indexing operation using scripts/incremental-reindex and return interpreted results with statistics."""
)
Example 3: Incremental Reindex Operation (semantic-search-indexer)
Task(
subagent_type="semantic-search-indexer",
description="Incremental reindex with change detection",
prompt="""You are the semantic-search-indexer agent.
Operation: incremental-reindex
Directory: /path/to/project
Max Age: 360 # minutes (6 hours)
Execute smart auto-reindexing using scripts/incremental-reindex.
This will detect changed files using Merkle tree, then auto-fallback to full reindex.
Return statistics showing total files indexed and total chunks."""
)
Example 4: Find Similar (semantic-search-reader)
Task(
subagent_type="semantic-search-reader",
description="Find similar content chunks",
prompt="""You are the semantic-search-reader agent.
Operation: find-similar
Chunk ID: "src/auth.py:45-67:function:authenticate"
K: 5
Project: /path/to/project
Execute the find-similar operation using scripts/find-similar and return interpreted results."""
)
Example 5: Status Check (semantic-search-indexer)
Task(
subagent_type="semantic-search-indexer",
description="Check semantic index status",
prompt="""You are the semantic-search-indexer agent.
Operation: status
Project: /path/to/project
Execute the status operation using scripts/status and return interpreted results with statistics."""
)
Important Notes
- NEVER run bash scripts directly - always spawn the appropriate agent
- Agents handle error interpretation - they convert JSON errors to natural language
- Token optimization: Agent execution happens in separate context (saves YOUR tokens)
- Wait for agent completion - agents return summarized results, not raw JSON
π― When to Use This Skill
β Use Semantic Search When:
1. Exploring Unfamiliar Projects
- "How does this codebase handle user authentication?"
- "Where is database connection pooling implemented?"
- "Show me error handling patterns in this project"
- "Find documentation about the architecture"
2. Finding Functionality Without Keywords
- Looking for implementations but don't know the exact function names
- Need to find code that "does X" without knowing how it's named
- Searching across multiple languages/frameworks with different conventions
3. Discovering Similar Code
- "Find code similar to this payment processing logic"
- "Are there other implementations of rate limiting?"
- "What other modules use this pattern?"
4. Cross-Reference Discovery
- Finding all authentication methods in a polyglot codebase
- Locating retry logic across different services
- Identifying validation patterns in various modules
5. Searching Documentation & Configuration
- "Find documentation explaining the deployment process"
- "Locate configuration examples for database connections"
- "Search for troubleshooting guides or setup instructions"
- "Find ADRs (Architecture Decision Records) about API design"
- "Locate markdown files about testing strategies"
6. Cross-Format Content Discovery
- "Find all references to environment variables (across code, docs, configs)"
- "Search for rate limiting mentions in any format"
- "Locate authentication documentation and implementation together"
- "Find deployment guides and deployment scripts"
β Do NOT Use Semantic Search When:
Use Grep instead for:
- Exact string matching: "import React"
- Known variable/function names: "getUserById"
- Regex patterns: "function.*export"
- File content search with known keywords
Use Glob instead for:
- Finding files by name pattern: "**/*.test.js"
- Locating configuration files: "**/config.yml"
- File system navigation: "src/components/**/*.tsx"
Use Read instead for:
- Reading specific known files
- Examining file contents after Grep/Glob narrowed results
- Sequential file analysis
π Prerequisites
Required: Python Library Dependency
IMPORTANT: This is NOT an MCP server - it's a Python library dependency. No server process runs. Our scripts import Python modules via PYTHONPATH.
This skill requires the claude-context-local Python library for semantic indexing:
# Clone Python library to standard location (5 minutes)
git clone https://github.com/FarhanAliRaza/claude-context-local.git ~/.local/share/claude-context-local
# Set up Python virtual environment and install dependencies
cd ~/.local/share/claude-context-local
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
What this installs:
- Merkle tree change detection (80KB)
- Multi-language code chunking (192KB) - supports 15+ languages
- Embedding generation (76KB) - wraps sentence-transformers
- Dependencies: faiss-cpu, sentence-transformers, tree-sitter
Installation location:
- macOS/Linux: ~/.local/share/claude-context-local
- Windows: %LOCALAPPDATA%\claude-context-local
License: claude-context-local is GPL-3.0. We import via PYTHONPATH (dynamic linking), which preserves our Apache 2.0 license. See docs/architecture/MCP-DEPENDENCY-STRATEGY.md for details.
Index Creation
This skill provides an index script that creates and updates the semantic content index. The index is stored in ~/.claude_code_search/projects/{project_name}_{hash}/ and contains:
- code.index - FAISS vector index
- metadata.db - SQLite database with chunk metadata
- chunk_ids.pkl - Chunk ID mappings
- stats.json - Index statistics
You can verify an index exists using the status script or the list-projects script.
π Auto-Reindex System
Automatic Index Management (Updated v3.0.x - First-Prompt Architecture)
The semantic-search skill now automatically maintains index freshness via the First-Prompt hook, eliminating the need for manual reindexing after code changes. The reindex runs in the background after your first prompt, allowing instant session startup.
How It Works
Background Trigger Logic: The first user prompt after session start spawns a detached background process that checks for changes and updates the index:
| Trigger | Index State | Action | Duration |
|---|---|---|---|
| First prompt | Never indexed | Full index (background) | 3-10 min |
| First prompt | Indexed before | Smart reindex (background) | 3-10 min (Merkle check: 3.5s) |
| Post-write hook | File modified | Incremental update (synchronous) | ~2.7 sec |
| Session start | Any | State initialization only | <100ms (no reindex) |
Key Benefits:
- β
Instant Session Start: Session starts in ~0.5s (no blocking on reindex)
- β
Background Processing: Full reindex completes in 3-10 minutes while you work
- β
Automatic: No manual reindexing required after code changes
- β
Smart: Uses Merkle tree to detect when files changed (3.5s check)
- β
Non-blocking: Hook exits in <100ms, background process continues independently
- β
Simple: IndexFlatIP full reindex - proven, reliable (same as MCP)
6-Hour Cooldown Protection
Prevents expensive full reindex spam during rapid restarts:
Problem: User workflow pattern:
10:00 - First startup β Full index (3 min)
10:05 - Close IDE, fix typo
10:07 - Reopen IDE β Would do full index again (waste 3 min)
10:10 - Close IDE, test change
10:12 - Reopen IDE β Would do full index again (waste 3 min)
Solution: Cooldown logic:
10:00 - First startup β Full index (~3 min)
10:05 - Close IDE, fix typo
10:07 - Reopen IDE β Smart reindex (fast, cooldown active)
10:10 - Close IDE, test change
10:12 - Reopen IDE β Smart reindex (fast, cooldown active)
11:05 - Restart after major refactor β Index exists, incremental anyway
Result: Saves 6 minutes in this example scenario.
Note: Cooldown prevents CHOOSING full index when index directory deleted, but cannot prevent full index when Merkle snapshot is also missing (Merkle stored at ~/.claude_code_search/projects/{project}_{hash}/index/merkle_snapshot.json). If entire index directory deleted, Merkle deleted with it, and incremental-reindex script falls back to full regardless of cooldown.
Concurrent Execution Protection
PID-Based Lock Files: Prevents duplicate indexing when multiple Claude Code windows opened simultaneously:
- Lock file:
~/.claude_code_search/projects/{project}_{hash}/indexing.lock - Contains: Process ID (PID) of running index operation
- Validation: Checks if process still alive before spawning new one
- Stale lock cleanup: Automatically removes locks from dead processes
- Graceful handling: Shows message if indexing already in progress
Behavior:
Window 1: Opens β Spawns background index β Creates lock
Window 2: Opens β Checks lock β PID alive β Skips, shows "already in progress"
Window 1: Index completes β Removes lock
Window 3: Opens β No lock β Proceeds normally
State File Management
Prerequisites State: logs/state/semantic-search-prerequisites.json
- Purpose: Controls conditional enforcement in user-prompt-submit hook
- Updated by: scripts/check-prerequisites
- Read by: First-prompt hook (fast check, <5ms)
- Content:
json
{
"SEMANTIC_SEARCH_SKILL_PREREQUISITES_READY": true,
"last_checked": "2025-12-03T12:00:00Z",
"last_check_details": {
"total_checks": 23,
"passed": 23,
"failed": 0,
"warnings": 0
}
}
Index State: ~/.claude_code_search/projects/{project}_{hash}/index_state.json
- Purpose: Tracks indexing timestamps and Merkle tree state
- Updated by: scripts/incremental_reindex.py (after any reindex operation)
- Read by: Background reindex process (determine if reindex needed)
- Content:
json
{
"last_full_index": "2025-12-03T10:00:00Z",
"last_incremental_index": "2025-12-03T10:15:00Z",
"project_path": "/Users/.../project"
}
Indexing Lock: ~/.claude_code_search/projects/{project}_{hash}/indexing.lock
- Purpose: Prevent concurrent indexing operations
- Contains: PID of running process
- Lifecycle: Created on spawn, updated by script with its PID, removed on completion
- Validation: Checks process alive via os.kill(pid, 0) (doesn't actually kill)
Conditional Enforcement
Prerequisites-Based: The user-prompt-submit hook checks prerequisites before enforcing semantic-search skill:
- If prerequisites TRUE: Enforcement active, semantic-search skill suggested/required
- If prerequisites FALSE: Enforcement skipped, Claude uses Grep/Glob naturally (graceful degradation)
- Default behavior: TRUE if state file missing (backward compatible, lazy initialization works)
Why This Matters:
- First-time users: Can work immediately with Grep/Glob while setup completes
- Missing model: Graceful fallback, no errors
- Network issues: System remains functional
Manual Control
You can still manually trigger indexing operations:
# Force full reindex (ignores cooldown, always does full)
scripts/incremental-reindex /path/to/project --full
# Smart incremental (respects age threshold, default 360min / 6 hours)
scripts/incremental-reindex /path/to/project
# Custom age threshold (reindex if >30min old)
scripts/incremental-reindex /path/to/project --max-age 30
# Check if reindex needed without executing
scripts/incremental-reindex /path/to/project --check-only
Performance Characteristics
Session Start: ~0.5 seconds (no reindex blocking)
- Setup & initialization: <400ms
- Session logging: <50ms
- State initialization: <50ms
First-Prompt Hook Overhead: <100ms
- Session state check: <10ms (single file read)
- Background spawn: <50ms (Popen, detached, non-blocking)
- State update: <10ms (mark as shown)
- User message: <10ms (stdout print)
Background Reindex Process (runs independently, optimized with cache):
- Merkle tree change detection: 3.5 seconds
- Full reindex (with cache, 51 files): 13.67 seconds (first run)
- Incremental reindex (1 file edit): 4.33 seconds (3.2x faster!)
- Lock acquisition: <10ms (atomic file create)
- Lock release: <1ms
Incremental Cache Performance (v3.0):
- Cache hit rate: 98% (50/51 files on measured project)
- Embedding saved: 9.46s (from caching 50 files)
- Model reload avoided: ~0.8s (class-level model caching)
- Rebuild from cache: ~5-6s (clears bloat, no re-embedding)
- Overall speedup: 3.2x (13.67s β 4.33s for 1 file edit)
Post-Write Hook (synchronous, incremental cache enabled):
- Kill-and-restart lock: <50ms
- Incremental reindex: ~4-5 seconds (with cache benefits)
- User sees: "β
Semantic search index updated"
Troubleshooting
Auto-reindex not triggering?
- Check prerequisites: scripts/check-prerequisites
- Verify state file: cat logs/state/semantic-search-prerequisites.json
- Set prerequisites manually: scripts/set-prerequisites-ready
Index not updating after changes?
- Check last index time: scripts/status --project /path/to/project
- Trigger manual reindex: scripts/incremental-reindex /path/to/project
- Force full reindex: scripts/incremental-reindex /path/to/project --full
Concurrent indexing message?
- Another window already indexing (wait for completion)
- Stale lock from crashed process (will auto-cleanup on next attempt)
- Check lock file: cat ~/.claude_code_search/projects/{project}_{hash}/indexing.lock
πΎ Incremental Cache System (v3.0)
Embedding Cache with Lazy Deletion - Optimizes reindexing by caching embeddings and avoiding expensive re-computation.
How It Works
The incremental cache system stores computed embeddings on disk and reuses them across reindex operations:
Cache Structure:
~/.claude_code_search/projects/{project}_{hash}/index/
βββ code.index # FAISS vector index (IndexFlatIP)
βββ metadata.db # SQLite database with chunk metadata
βββ embeddings.pkl # Embedding cache (NEW - Phase 2)
βββ merkle_snapshot.json # Merkle DAG for change detection
βββ stats.json # Index statistics
Lazy Deletion Strategy:
- When files are modified, chunks are deleted from metadata + cache
- Vectors remain in FAISS index (creates "bloat")
- When bloat exceeds threshold β auto-rebuild from cache
- Rebuild is fast (~5-6s) because embeddings are cached
Performance Gains
Before Incremental Cache (Phase 1):
Full reindex (50 files): 246s
After 1 file edit: 246s (full reindex)
After 10 file edits: 246s (full reindex)
After Incremental Cache + Model Caching (Phase 2 + Phase 3):
Full reindex (51 files): 13.67s (with model loading)
Incremental (1 file edit): 4.33s (3.2x faster!)
Rebuild from cache: ~5-6s (no re-embedding)
Key Improvements:
- β
3.2x speedup on single file edits (13.67s β 4.33s)
- β
98% cache hit rate (50/51 files cached)
- β
9.34s time saved from avoided embeddings + model reload
- β
Automatic bloat management via rebuild triggers
Bloat Tracking & Auto-Rebuild
Bloat Calculation:
Bloat % = (Stale Vectors / Active Chunks) Γ 100
Example:
- Active chunks: 250
- Stale vectors: 50 (from lazy deletions)
- Bloat: 50/250 = 20%
Auto-Rebuild Triggers (Test-Driven Calibration):
Rebuild if EITHER:
1. Bloat β₯ 30% (fallback threshold - critical quality level)
OR
2. Bloat β₯ 20% AND stale_count β₯ 400 (primary threshold - efficiency trigger)
Threshold Rationale (Evidence-Based from Test Validation):
- Small projects (20% + <400 stale): No rebuild (avoids overhead)
- Medium projects (20-30% + 400+ stale): Rebuild triggered (efficiency)
- Critical bloat (30%+ any count): Always rebuild (quality threshold)
- Quality: Ensures search accuracy doesn't degrade over time
Note: Thresholds derived from test requirements, not intuition (see docs/phase-3-honest-review.md)
Model Caching Optimization (Phase 3)
Problem: Model reload overhead (~0.8s per reindex) prevented speedup despite cache working.
Solution: Class-level embedder caching
# First indexer instance loads model
indexer1 = FixedIncrementalIndexer(project_path) # Loads model (~0.8s)
# Subsequent instances reuse cached model
indexer2 = FixedIncrementalIndexer(project_path) # Reuses model (~0.001s)
Impact:
- Eliminates model reload on every reindex
- Saves ~0.8s per operation
- Enables 3.2x speedup achievement
Memory Management:
# Optional: Cleanup cached model to free memory
FixedIncrementalIndexer.cleanup_shared_embedder()
Cache Benefits by Project Size
Effectiveness varies with project scale:
| Project Size | Files | Cache Hit Rate | Expected Speedup | Recommendation |
|---|---|---|---|---|
| Tiny | <20 | Low (~50%) | 1.5-2x | Cache helps, but modest |
| Small | 20-50 | Good (~80%) | 2-3x | β Cache recommended |
| Medium | 50-200 | High (~90%) | 3-5x | β Strong cache benefits |
| Large | 200+ | Very High (~95%) | 5-10x+ | β Maximum cache benefits |
Measured on 51-file project: 3.2x speedup, 98% cache hit rate
Cache Operations
View Cache Statistics:
# Check cache effectiveness
scripts/status --project /path/to/project
# Output includes: cached_chunks, cache_hit_rate, bloat_percentage
Force Rebuild from Cache:
# Clears bloat, rebuilds using cached embeddings
scripts/rebuild-from-cache /path/to/project
Manual Bloat Check:
from scripts.incremental_reindex import FixedIncrementalIndexer
indexer = FixedIncrementalIndexer('/path/to/project')
bloat_info = indexer.get_bloat_info()
print(f"Bloat: {bloat_info['bloat_percentage']:.1f}%")
print(f"Stale: {bloat_info['stale_count']} vectors")
Cache Validation
Integrity Checks (Automatic):
- Cache version verification
- Embedding dimension validation
- Metadata consistency checks
- Automatic recovery on corruption
Backup System:
- Auto-backup before rebuilds
- Stored in index/backup/
- Rollback on rebuild failure
π Quick Start
Operation 1: Index a Project
When to use: Create or update the semantic index for a project
# Full index (recommended on first run or after major changes)
scripts/incremental-reindex /path/to/project --full
# Auto-reindex (detects changes via Merkle tree, then full reindex if needed)
scripts/incremental-reindex /path/to/project
# Custom project name
scripts/incremental-reindex /path/to/project --project-name my-project --full
Output: JSON with indexing statistics (files added/modified/removed, chunks indexed, time taken).
Operation 2: Incremental Reindex (RECOMMENDED)
When to use: Smart automatic reindexing with auto-fallback to full reindex
What it does: Uses Merkle tree change detection to identify when files have changed. Auto-fallback: IndexFlatIP doesn't support incremental vector updates, so the script automatically performs a full reindex (clears index and rebuilds from scratch). This is the same approach used by MCP (proven, reliable, works on all platforms including Apple Silicon).
# Auto-detect changes and reindex if >360min old / 6 hours (default)
scripts/incremental-reindex /path/to/project
# Custom age threshold (reindex if >30min old)
scripts/incremental-reindex /path/to/project --max-age 30
# Force full reindex regardless of age
scripts/incremental-reindex /path/to/project --full
# Check if reindex needed without executing
scripts/incremental-reindex /path/to/project --check-only
Output: JSON with detailed statistics:
{
"success": true,
"full_index": true,
"files_indexed": 205,
"chunks_added": 6152,
"total_chunks": 6152,
"time_taken": 195.46
}
Key Benefits:
- β
Simple: Uses IndexFlatIP (same as MCP - proven, reliable)
- β
Compatible: Works on all platforms including Apple Silicon (mps:0)
- β
Smart: Merkle tree detects when files changed (triggers full reindex)
- β
Safe: Full reindex guarantees no stale data or desynchronization
- β
Automatic: Can be triggered by hooks based on age threshold
Operation 3: List Indexed Projects
When to use: See all projects that have been indexed
scripts/list-projects
Output: JSON with array of projects including paths, hashes, creation dates, and index statistics.
Operation 4: Check Index Status
When to use: Verify index exists and inspect statistics for a project
scripts/status --project /path/to/project
Output: JSON with index statistics (chunk count, embedding dimension, files indexed, top folders, chunk types).
Operation 5: Search by Natural Language Query
When to use: Find content by describing what it does or contains
# Basic search (returns top 5 results)
scripts/search --query "user authentication logic" --project /path/to/project
# More results
scripts/search --query "error handling patterns" --k 10 --project /path/to/project
# Search across all indexed projects (omit --project)
scripts/search --query "database queries" --k 5
Output: JSON with ranked results including file paths, line numbers, kind, similarity scores, chunk IDs, and snippets.
Operation 6: Find Similar Content Chunks
When to use: Discover content semantically similar to a reference chunk
# Find similar implementations (use chunk_id from search results)
scripts/find-similar --chunk-id "src/auth.py:45-67:function:authenticate" --project /path/to/project
# More results
scripts/find-similar --chunk-id "lib/utils.py:120-145:method:retry" --k 10 --project /path/to/project
Output: JSON with reference chunk and array of similar chunks ranked by semantic similarity.
π JSON Output Format
All scripts output standardized JSON:
Success:
{
"success": true,
"data": {
"results": [...],
"query": "user authentication",
"total_results": 5
}
}
Error:
{
"success": false,
"error": "Index not found",
"suggestion": "Run indexing first or check storage-dir path",
"path": ".code-search-index"
}
π Typical Workflow
Step 1: Index the Project (One-Time Setup)
scripts/incremental-reindex /path/to/project --full
Step 2: Verify Index Status
scripts/status --project /path/to/project
# or
scripts/list-projects
Step 3: Broad Semantic Search
scripts/search --query "authentication methods" --k 10 --project /path/to/project
Step 4: Find Similar Implementations
# Using chunk_id from search results
scripts/find-similar --chunk-id "src/auth/oauth.py:34-56:function:oauth_login" --project /path/to/project
Step 5: Reindex After Changes
# Auto-reindex (detects changes via Merkle tree, then full reindex)
scripts/incremental-reindex /path/to/project
# Force full reindex (explicit request)
scripts/incremental-reindex /path/to/project --full
Step 6: Narrow with Traditional Tools
# After identifying relevant files, use Read/Grep for details
π Reference Documentation
For detailed guidance, see the references/ directory:
- effective-queries.md: Query patterns, good/bad examples, domain-specific tips
- troubleshooting.md: Common errors, corner cases, compatibility notes
- performance-tuning.md: Optimizing k values, large codebase strategies
βοΈ Arguments Reference
index
DIRECTORY(required): Directory to index (positional argument)--project-name NAME(optional): Custom project name (default: directory basename)--full(optional): Do full reindex (default: incremental)-h, --help: Show usage information
list-projects
- No arguments required
- Lists all indexed projects with statistics
status
--project PATH(optional): Project path to check status for (default: current project or error)
search
--query "QUERY"(required): Natural language search query--k NUM(optional, default: 5): Number of results (5-50 recommended)--project PATH(optional): Project path to search in (default: all projects)
find-similar
--chunk-id "CHUNK_ID"(required): Reference chunk identifier from search results--k NUM(optional, default: 5): Number of similar chunks to return--project PATH(optional): Project path to search in (default: current project)
π Learning Path
Beginners: Start with effective-queries.md to learn query patterns
Troubleshooting: Consult troubleshooting.md for common issues
Performance: Read performance-tuning.md for large codebases (>10k files)
π Design Rationale
Why Bash Orchestrators for Python Library Imports?
Clarification: We use bash scripts to import Python modules, NOT an MCP server. No MCP protocol is used.
- Simplicity: Bash scripts import existing Python modules directly - no reimplementation needed
- Reusability: Imports merkle, chunking, embeddings modules (same IndexFlatIP as MCP)
- Auto-venv: Scripts automatically use claude-context-local's venv Python interpreter
- Token Efficiency: Scripts are compact (~50 lines each) vs bundling 352KB of Python code
- Composability: Scripts output JSON, enabling shell pipelines and automation
- License-safe: Dynamic linking via PYTHONPATH preserves Apache 2.0 license (GPL-compliant)
Orchestrator Pattern
Each bash script:
1. Sets VENV_PYTHON to ~/.local/share/claude-context-local/.venv/bin/python
2. Sets PYTHONPATH for Python imports: export PYTHONPATH="~/.local/share/claude-context-local"
3. Imports Python modules: from merkle import ..., from chunking import ..., etc.
4. Runs indexing code (IndexFlatIP - same as MCP)
NOT Using MCP Protocol:
- β No MCP server process runs (ps aux | grep claude-context-local returns nothing)
- β No MCP protocol communication (stdio/SSE/HTTP)
- β
Pure Python module imports via sys.path.insert() and PYTHONPATH
- β
This preserves Apache 2.0 license (dynamic linking is GPL-safe)
π Notes
- Scripts use the venv Python from claude-context-local library installation
- All errors are output from Python module imports (no MCP server involved)
- Chunk IDs are stable only within a single index build (reindexing may change IDs)
- Index location:
~/.claude_code_search/projects/{project_name}_{hash}/ - Uses FAISS IndexFlatIP (same as MCP - simple, proven, works on all platforms)
- Embedding model: google/embeddinggemma-300m (768 dimensions)
β Platform Compatibility
Apple Silicon: Fully supported! Model loads on MPS (Metal Performance Shaders) with mps:0 device.
All Platforms: IndexFlatIP works reliably on macOS (Intel + Apple Silicon), Linux, and Windows (via WSL).
Next Steps:
- For creating searchable indices: scripts/incremental-reindex /path/to/project --full
- For auto-reindex (detects changes, then full reindex): scripts/incremental-reindex /path/to/project
- Then explore with semantic search queries using scripts/search
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.