cohere-rerank

by @RSHVR in AI & LLM

# Install this skill:

npx skills add RSHVR/unofficial-cohere-best-practices --skill "cohere-rerank"

Install specific skill from multi-skill repository

# Description

Cohere reranking reference for two-stage retrieval, semantic search improvement, and RAG pipelines. Covers Rerank v4 models, structured data reranking, and LangChain integration.

# SKILL.md

name: cohere-rerank
description: Cohere reranking reference for two-stage retrieval, semantic search improvement, and RAG pipelines. Covers Rerank v4 models, structured data reranking, and LangChain integration.

Cohere Rerank Reference

Official Resources

Docs & Cookbooks: https://github.com/cohere-ai/cohere-developer-experience
API Reference: https://docs.cohere.com/reference/about

Models Overview

Model	Context	Languages	Notes
`rerank-v4.0-pro`	32K tokens	100+	Best quality, slower
`rerank-v4.0-fast`	32K tokens	100+	Optimized for speed
`rerank-v3.5`	4K tokens	100+	Good balance

Two-Stage Retrieval Pattern (Recommended)

The proven pattern for production search:
1. Stage 1: Fast retrieval (embeddings/BM25) for top 30 candidates
2. Stage 2: Precise reranking for final top 10 results

import cohere
co = cohere.ClientV2()

# Stage 1: Cast a wide net with embeddings
candidates = vectorstore.similarity_search(query, k=30)

# Stage 2: Precise reranking narrows to best results
reranked = co.rerank(
    model="rerank-v4.0-fast",
    query=query,
    documents=[doc.page_content for doc in candidates],
    top_n=10
)

final_docs = [candidates[r.index] for r in reranked.results]

Native SDK Reranking

Basic Reranking

query = "What is machine learning?"
documents = [
    "Machine learning is a subset of AI that enables systems to learn from data.",
    "The weather today is sunny with clear skies.",
    "Deep learning uses neural networks with many layers.",
]

response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3
)

for result in response.results:
    print(f"Index: {result.index}, Score: {result.relevance_score:.4f}")
    print(f"Document: {documents[result.index]}\n")

With Return Documents

response = co.rerank(
    model="rerank-v3.5",
    query=query,
    documents=documents,
    top_n=3,
    return_documents=True
)

for result in response.results:
    print(f"Score: {result.relevance_score:.4f}")
    print(f"Text: {result.document.text}\n")

Structured Data Reranking

JSON/Dict Documents

import yaml

documents = [
    {"title": "ML Guide", "author": "John", "content": "Machine learning basics..."},
    {"title": "Weather Report", "author": "Jane", "content": "Today's forecast..."},
]

yaml_docs = [yaml.dump(doc, sort_keys=False) for doc in documents]

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning tutorial",
    documents=yaml_docs,
    top_n=2
)

Specify Rank Fields

response = co.rerank(
    model="rerank-v3.5",
    query="machine learning",
    documents=[
        {"title": "ML Guide", "author": "John", "text": "Introduction to ML..."},
        {"title": "Weather", "author": "Jane", "text": "Sunny skies..."}
    ],
    rank_fields=["title", "text"]  # Only consider these fields
)

LangChain Integration

Basic Usage

from langchain_cohere import CohereRerank
from langchain_core.documents import Document

reranker = CohereRerank(model="rerank-v3.5", top_n=3)

documents = [
    Document(page_content="Machine learning is a subset of AI..."),
    Document(page_content="The weather is sunny today..."),
]

reranked = reranker.compress_documents(
    documents=documents,
    query="What is machine learning?"
)

for doc in reranked:
    print(f"Score: {doc.metadata['relevance_score']:.4f}")

With Contextual Compression Retriever

from langchain_cohere import CohereEmbeddings, CohereRerank
from langchain_community.vectorstores import FAISS
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever

embeddings = CohereEmbeddings(model="embed-english-v3.0")
vectorstore = FAISS.from_documents(docs, embeddings)
base_retriever = vectorstore.as_retriever(search_kwargs={"k": 20})

reranker = CohereRerank(model="rerank-v3.5", top_n=5)
retriever = ContextualCompressionRetriever(
    base_compressor=reranker,
    base_retriever=base_retriever
)

results = retriever.invoke("Your query here")

Score Interpretation

Relevance scores are normalized to [0, 1]:
- 0.9+: Highly relevant
- 0.5-0.9: Moderately relevant
- <0.5: Low relevance

threshold = 0.5
relevant = [r for r in response.results if r.relevance_score >= threshold]

Best Practices

Use two-stage retrieval: Embeddings for recall, rerank for precision
Batch large requests: Max 10,000 documents per request
Use YAML for structured data: yaml.dump(doc, sort_keys=False) preserves field order
Filter by score threshold: Don't use low-relevance results

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.