Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add grahama1970/agent-skills --skill "embedding"
Install specific skill from multi-skill repository
# Description
>
# SKILL.md
name: embedding
description: >
Standalone embedding service for semantic search. Runs as persistent FastAPI
server for millisecond-latency embeddings. Supports model swapping via env vars.
Use when you need vectors for any database (ArangoDB, Pinecone, etc).
allowed-tools: Bash, WebFetch
triggers:
- embed this
- embed text
- start embedding service
- get embeddings
- generate vectors
- semantic search vectors
metadata:
short-description: Persistent embedding service for semantic search
Embedding Skill
Standalone embedding service for semantic search across any database.
Architecture
┌─────────────────────────────────────────┐
│ embedding service (:8602) │
│ Model: EMBEDDING_MODEL env var │
│ Device: auto (CPU/GPU) │
└───────────────────┬─────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
memory edge-verifier your-project
skill searches ArangoDB/etc
Quick Start
# Start the service (first run loads model ~5-10s)
./run.sh serve
# Embed text (CLI)
./run.sh embed --text "your query here"
# Embed via HTTP (after service is running)
curl -X POST http://127.0.0.1:8602/embed -H "Content-Type: application/json" \
-d '{"text": "your query here"}'
Commands
| Command | Description |
|---|---|
./run.sh serve |
Start persistent FastAPI server |
./run.sh embed --text "..." |
Embed single text (uses service if running) |
./run.sh embed --file input.txt |
Embed file contents |
./run.sh info |
Show model, device, service status |
Configuration
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Sentence-transformers model name |
EMBEDDING_DEVICE |
auto |
Device: auto, cpu, cuda, mps |
EMBEDDING_PORT |
8602 |
Service port |
EMBEDDING_SERVICE_URL |
http://127.0.0.1:8602 |
Client connection URL |
Swapping Models
# Use a different model for this project
export EMBEDDING_MODEL="nomic-ai/nomic-embed-text-v1"
./run.sh serve
# Or for GPU-accelerated
export EMBEDDING_MODEL="intfloat/e5-large-v2"
export EMBEDDING_DEVICE="cuda"
./run.sh serve
API Endpoints
POST /embed
Embed single text.
{"text": "query to embed"}
→ {"vector": [0.1, 0.2, ...], "model": "all-MiniLM-L6-v2", "dimensions": 384}
POST /embed/batch
Embed multiple texts.
{"texts": ["query 1", "query 2"]}
→ {"vectors": [[...], [...]], "model": "...", "count": 2}
GET /info
Service status and configuration.
{
"model": "all-MiniLM-L6-v2",
"device": "cuda",
"dimensions": 384,
"status": "ready"
}
Integration Examples
ArangoDB Semantic Search
import httpx
# Get embedding
resp = httpx.post("http://127.0.0.1:8602/embed", json={"text": "find similar docs"})
vector = resp.json()["vector"]
# Use in AQL query
aql = """
FOR doc IN my_collection
LET score = COSINE_SIMILARITY(doc.embedding, @vector)
FILTER score > 0.7
SORT score DESC
RETURN doc
"""
From Memory Skill
Memory skill can consume this service by setting:
export EMBEDDING_SERVICE_URL="http://127.0.0.1:8602"
Cold Start
First invocation loads the model (~5-10 seconds). After that, embeddings are millisecond-latency. The service logs progress:
[embedding] Loading model: all-MiniLM-L6-v2...
[embedding] Model loaded in 6.2s
[embedding] Service ready on http://127.0.0.1:8602
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.