Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add RSHVR/unofficial-cohere-best-practices --skill "cohere-best-practices"
Install specific skill from multi-skill repository
# Description
Production best practices for Cohere AI APIs. Covers model selection, API configuration, error handling, cost optimization, and architectural patterns for chat, RAG, and agentic applications.
# SKILL.md
name: cohere-best-practices
description: Production best practices for Cohere AI APIs. Covers model selection, API configuration, error handling, cost optimization, and architectural patterns for chat, RAG, and agentic applications.
Cohere Best Practices Reference
Official Resources
- Docs & Cookbooks: https://github.com/cohere-ai/cohere-developer-experience
- API Reference: https://docs.cohere.com/reference/about
Model Selection Guide
| Use Case | Model | Notes |
|---|---|---|
| General chat/reasoning | command-a-03-2025 |
Latest Command A model |
| RAG with citations | command-r-plus-08-2024 |
Excellent grounded generation |
| Cost-sensitive tasks | command-r-08-2024 |
Good balance of quality/cost |
| Embeddings (English) | embed-english-v3.0 |
Best for English-only |
| Embeddings (Multilingual) | embed-multilingual-v3.0 |
100+ languages |
| Reranking | rerank-v3.5 |
Good balance |
| Reranking (Quality) | rerank-v4.0-pro |
Best quality, slower |
| Reranking (Speed) | rerank-v4.0-fast |
Optimized for latency |
API Configuration Best Practices
Use Client V2
import cohere
# Correct: Use ClientV2 for all new projects
co = cohere.ClientV2()
# Deprecated: Don't use the old client
# co = cohere.Client() # Avoid
Temperature Settings
# For agents/tool calling - lower temperature for reliability
co.chat(model="command-a-03-2025", temperature=0.3, ...)
# For creative tasks - higher temperature
co.chat(model="command-a-03-2025", temperature=0.7, ...)
# For deterministic outputs - zero temperature
co.chat(model="command-a-03-2025", temperature=0, ...)
Embedding Best Practices
Always Specify input_type
# For documents being indexed
doc_embeddings = co.embed(
texts=documents,
model="embed-english-v3.0",
input_type="search_document", # Critical!
embedding_types=["float"]
)
# For search queries
query_embedding = co.embed(
texts=[query],
model="embed-english-v3.0",
input_type="search_query", # Must match at query time
embedding_types=["float"]
)
Critical: Mismatched
input_typebetween indexing and querying will degrade search quality significantly.
RAG Best Practices
Two-Stage Retrieval Pattern
# Stage 1: Broad retrieval with embeddings
candidates = vectorstore.similarity_search(query, k=30)
# Stage 2: Precise reranking
reranked = co.rerank(
model="rerank-v3.5",
query=query,
documents=[doc.page_content for doc in candidates],
top_n=5
)
# Use reranked results for generation
final_docs = [candidates[r.index] for r in reranked.results]
Grounded Generation with Citations
response = co.chat(
model="command-r-plus-08-2024",
messages=[{"role": "user", "content": question}],
documents=[
{"id": f"doc_{i}", "data": {"text": doc}}
for i, doc in enumerate(final_docs)
]
)
# Access citations
for citation in response.message.citations:
print(f"'{citation.text}' from {citation.sources}")
Error Handling
from cohere.core import ApiError
def safe_chat(messages, max_retries=3):
for attempt in range(max_retries):
try:
return co.chat(
model="command-a-03-2025",
messages=messages
)
except ApiError as e:
if e.status_code == 429: # Rate limit
time.sleep(2 ** attempt)
continue
elif e.status_code >= 500: # Server error
time.sleep(1)
continue
else:
raise
raise Exception("Max retries exceeded")
Cost Optimization
- Use appropriate models: Don't use Command A for simple tasks
- Batch embeddings: Embed multiple texts in one call (up to 96 texts)
- Cache embeddings: Store computed embeddings in a vector database
- Use reranking wisely: Only rerank when quality matters
- Stream for UX: Streaming doesn't cost more but improves perceived latency
Production Checklist
- [ ] Use
ClientV2for all API calls - [ ] Set appropriate
temperaturefor your use case - [ ] Always specify
input_typefor embeddings - [ ] Implement retry logic with exponential backoff
- [ ] Use two-stage retrieval for RAG
- [ ] Cache embeddings to reduce API calls
- [ ] Monitor token usage and costs
- [ ] Handle rate limits gracefully
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.