grahama1970

embedding

0
0
# Install this skill:
npx skills add grahama1970/agent-skills --skill "embedding"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: embedding
description: >
Standalone embedding service for semantic search. Runs as persistent FastAPI
server for millisecond-latency embeddings. Supports model swapping via env vars.
Use when you need vectors for any database (ArangoDB, Pinecone, etc).
allowed-tools: Bash, WebFetch
triggers:
- embed this
- embed text
- start embedding service
- get embeddings
- generate vectors
- semantic search vectors
metadata:
short-description: Persistent embedding service for semantic search


Embedding Skill

Standalone embedding service for semantic search across any database.

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         embedding service (:8602)       β”‚
β”‚  Model: EMBEDDING_MODEL env var         β”‚
β”‚  Device: auto (CPU/GPU)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β–Ό               β–Ό               β–Ό
 memory        edge-verifier    your-project
 skill         searches         ArangoDB/etc

Quick Start

# Start the service (first run loads model ~5-10s)
./run.sh serve

# Embed text (CLI)
./run.sh embed --text "your query here"

# Embed via HTTP (after service is running)
curl -X POST http://127.0.0.1:8602/embed -H "Content-Type: application/json" \
  -d '{"text": "your query here"}'

Commands

Command Description
./run.sh serve Start persistent FastAPI server
./run.sh embed --text "..." Embed single text (uses service if running)
./run.sh embed --file input.txt Embed file contents
./run.sh info Show model, device, service status

Configuration

Variable Default Description
EMBEDDING_MODEL all-MiniLM-L6-v2 Sentence-transformers model name
EMBEDDING_DEVICE auto Device: auto, cpu, cuda, mps
EMBEDDING_PORT 8602 Service port
EMBEDDING_SERVICE_URL http://127.0.0.1:8602 Client connection URL

Swapping Models

# Use a different model for this project
export EMBEDDING_MODEL="nomic-ai/nomic-embed-text-v1"
./run.sh serve

# Or for GPU-accelerated
export EMBEDDING_MODEL="intfloat/e5-large-v2"
export EMBEDDING_DEVICE="cuda"
./run.sh serve

API Endpoints

POST /embed

Embed single text.

{"text": "query to embed"}
β†’ {"vector": [0.1, 0.2, ...], "model": "all-MiniLM-L6-v2", "dimensions": 384}

POST /embed/batch

Embed multiple texts.

{"texts": ["query 1", "query 2"]}
β†’ {"vectors": [[...], [...]], "model": "...", "count": 2}

GET /info

Service status and configuration.

{
  "model": "all-MiniLM-L6-v2",
  "device": "cuda",
  "dimensions": 384,
  "status": "ready"
}

Integration Examples

import httpx

# Get embedding
resp = httpx.post("http://127.0.0.1:8602/embed", json={"text": "find similar docs"})
vector = resp.json()["vector"]

# Use in AQL query
aql = """
FOR doc IN my_collection
  LET score = COSINE_SIMILARITY(doc.embedding, @vector)
  FILTER score > 0.7
  SORT score DESC
  RETURN doc
"""

From Memory Skill

Memory skill can consume this service by setting:

export EMBEDDING_SERVICE_URL="http://127.0.0.1:8602"

Cold Start

First invocation loads the model (~5-10 seconds). After that, embeddings are millisecond-latency. The service logs progress:

[embedding] Loading model: all-MiniLM-L6-v2...
[embedding] Model loaded in 6.2s
[embedding] Service ready on http://127.0.0.1:8602

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.