Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add RSHVR/unofficial-cohere-best-practices --skill "cohere-embeddings"
Install specific skill from multi-skill repository
# Description
Cohere embeddings reference for vector search, semantic similarity, and RAG. Covers Embed v4 (multimodal, Matryoshka dimensions), input types (CRITICAL for search quality), batch processing, and LangChain integration.
# SKILL.md
name: cohere-embeddings
description: Cohere embeddings reference for vector search, semantic similarity, and RAG. Covers Embed v4 (multimodal, Matryoshka dimensions), input types (CRITICAL for search quality), batch processing, and LangChain integration.
Cohere Embeddings Reference
Official Resources
- Docs & Cookbooks: https://github.com/cohere-ai/cohere-developer-experience
- API Reference: https://docs.cohere.com/reference/about
Models Overview
| Model | Context | Dimensions | Features |
|---|---|---|---|
embed-v4.0 |
128K tokens | 256/512/1024/1536 | Multimodal (text+image), Matryoshka |
embed-english-v3.0 |
512 tokens | 1024 | English-only, fast |
embed-multilingual-v3.0 |
512 tokens | 1024 | 100+ languages |
embed-english-light-v3.0 |
512 tokens | 384 | Lightweight, fastest |
Input Types (CRITICAL)
Using the wrong
input_typewill silently degrade search quality. Cohere uses asymmetric embeddings where documents and queries are embedded differently.
| Input Type | Use Case |
|---|---|
search_document |
Documents stored in vector DB for retrieval |
search_query |
User queries searching against documents |
classification |
Text classification tasks |
clustering |
Clustering similar documents |
image |
Image inputs (Embed v4 only) |
Example: Search Pipeline
import cohere
co = cohere.ClientV2()
# INDEXING: Use search_document for docs you're storing
doc_response = co.embed(
model="embed-english-v3.0",
texts=documents,
input_type="search_document" # MUST use for storage
)
# QUERYING: Use search_query for user queries
query_response = co.embed(
model="embed-english-v3.0",
texts=[user_query],
input_type="search_query" # MUST use for retrieval
)
Native SDK Embeddings
Basic Text Embedding
response = co.embed(
model="embed-english-v3.0",
texts=["Hello world", "Machine learning is cool"],
input_type="search_document"
)
embeddings = response.embeddings.float_
print(f"Embedding shape: {len(embeddings)} x {len(embeddings[0])}")
Embed v4 with Matryoshka Dimensions
# High precision (default)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=1536
)
# Balanced (3x faster search)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=512
)
# Compact (6x faster search)
response = co.embed(
model="embed-v4.0",
texts=["text"],
input_type="search_document",
output_dimension=256
)
Different Embedding Types
response = co.embed(
model="embed-english-v3.0",
texts=["Hello"],
input_type="search_document",
embedding_types=["float", "int8", "uint8", "binary", "ubinary"]
)
float_emb = response.embeddings.float_
int8_emb = response.embeddings.int8
binary_emb = response.embeddings.binary
Multimodal Embeddings (Embed v4)
Image Embeddings
import base64
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode()
image_uri = f"data:image/jpeg;base64,{image_base64}"
response = co.embed(
model="embed-v4.0",
images=[image_uri],
input_type="image"
)
Mixed Content
response = co.embed(
model="embed-v4.0",
inputs=[
{"text": "A description of the product"},
{"image": image_uri},
{"text": "Another text chunk"}
],
input_type="search_document"
)
Batch Processing
Hard Limit: 96 Items Per Request
def embed_in_batches(texts: list, batch_size: int = 96):
"""Embed texts in batches of 96 (Cohere API limit)."""
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i + batch_size]
response = co.embed(
model="embed-english-v3.0",
texts=batch,
input_type="search_document"
)
all_embeddings.extend(response.embeddings.float_)
return all_embeddings
Embed Jobs API (Large Datasets)
job = co.embed_jobs.create(
model="embed-english-v3.0",
dataset_id="your-dataset-id",
input_type="search_document"
)
status = co.embed_jobs.get(job.job_id)
print(status.status) # "processing", "complete", "failed"
LangChain Integration
Basic Usage
from langchain_cohere import CohereEmbeddings
embeddings = CohereEmbeddings(model="embed-english-v3.0")
vector = embeddings.embed_query("What is machine learning?")
vectors = embeddings.embed_documents(["Document 1", "Document 2"])
With Vector Store
from langchain_cohere import CohereEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
embeddings = CohereEmbeddings(model="embed-english-v3.0")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
docs = text_splitter.split_documents(your_documents)
vectorstore = FAISS.from_documents(docs, embeddings)
results = vectorstore.similarity_search("your query", k=5)
Best Practices
- Match input types: Always use
search_documentfor stored docs andsearch_queryfor queries - Batch efficiently: Hard limit of 96 texts per request
- Choose dimensions wisely: Lower dimensions = faster search but slightly less precision
- Chunk long texts: Consider chunking at ~6000 chars (texts auto-truncate at 8K)
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=6000,
chunk_overlap=200
)
chunks = splitter.split_text(long_document)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.