Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add jrajasekera/claude-skills --skill "venice-ai-api"
Install specific skill from multi-skill repository
# Description
Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.
# SKILL.md
name: venice-ai-api
description: Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.
Venice.ai API Skill
Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.
Quick Reference
Base URL: https://api.venice.ai/api/v1
Auth: Authorization: Bearer VENICE_API_KEY
SDK: Use OpenAI SDK with custom base URL
API Key Types: ADMIN (full access) or INFERENCE (inference only)
Setup
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("VENICE_API_KEY"),
base_url="https://api.venice.ai/api/v1"
)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.VENICE_API_KEY,
baseURL: 'https://api.venice.ai/api/v1'
});
Account Tiers
| Tier | Qualification | Rate Limits | Use Case |
|---|---|---|---|
| Explorer | Pro subscription | Low RPM/TPM (~15-25 req/day) | Testing, prototyping |
| Paid | USD balance or staked VVV (Diems) | Standard production limits | Commercial apps |
| Partner | Enterprise agreement | Custom high-volume | Enterprise SaaS |
API Capabilities
1. Chat Completions
Text inference with multimodal support (text, images, audio, video).
completion = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Hello!"}
]
)
Popular Models:
- llama-3.3-70b - Balanced performance (Tier M, 128K context)
- zai-org-glm-4.7 - Complex tasks, deep reasoning (Tier L, 128K context)
- mistral-31-24b - Vision + function calling (Tier S, 131K context)
- venice-uncensored - No content filtering (Tier S, 32K context)
- deepseek-ai-DeepSeek-R1 - Advanced reasoning, math, coding (Tier L, 64K context)
- qwen3-235b - Massive MoE reasoning (Tier L)
- qwen3-4b - Fast, lightweight (Tier XS, 40K context)
Venice Parameters (via extra_body in Python, direct in JS):
- enable_web_search: "off" | "on" | "auto"
- enable_web_scraping: boolean
- enable_web_citations: boolean β adds ^index^ citation format
- include_venice_system_prompt: boolean (default: true)
- strip_thinking_response: boolean
- disable_thinking: boolean
- character_slug: string
- prompt_cache_key: string β routing hint for cache hits
- prompt_cache_retention: "default" | "extended" | "24h"
See references/chat-completions.md for full parameter reference.
2. Image Generation
Generate images from text prompts.
import requests
response = requests.post(
"https://api.venice.ai/api/v1/image/generate",
headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
json={
"model": "venice-sd35",
"prompt": "A sunset over mountains",
"width": 1024,
"height": 1024
}
)
# Response contains base64 images in images array
Image Models:
| Model | Best For | Pricing |
|-------|----------|---------|
| qwen-image | Highest quality, editing | Variable |
| venice-sd35 | General purpose (default) | ~$0.01/image |
| hidream | Fast generation | ~$0.01/image |
| flux-2-pro | Professional quality | ~$0.04/image |
| flux-2-max | High-quality output | ~$0.02/image |
| nano-banana-pro | Photorealism, 2K/4K support | $0.18-$0.35 |
3. Image Upscaling
Enhance image resolution 2x or 4x.
import base64
with open("image.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/upscale",
headers={"Authorization": f"Bearer {api_key}"},
json={
"image": image_base64,
"scale": 4 # 2 or 4
}
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
f.write(response.content)
Pricing: $0.02 (2x), $0.08 (4x)
4. Image Editing (Inpainting)
Modify existing images with AI-powered instructions.
import base64
with open("photo.jpg", "rb") as f:
image_base64 = base64.b64encode(f.read()).decode("utf-8")
response = requests.post(
"https://api.venice.ai/api/v1/image/edit",
headers={"Authorization": f"Bearer {api_key}"},
json={
"prompt": "Change the sky to a sunset",
"image": image_base64 # or URL starting with http/https
}
)
# Returns raw image binary
with open("edited.png", "wb") as f:
f.write(response.content)
Model: Uses Qwen-Image. Pricing: ~$0.04/edit.
See references/image-api.md for all parameters and style presets.
5. Video Generation
Async queue-based video generation. Always call /video/quote first for pricing.
Full Workflow:
import requests
import time
import base64
api_key = os.getenv("VENICE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
# Step 1: Get price quote
quote = requests.post(
"https://api.venice.ai/api/v1/video/quote",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
print(f"Estimated cost: ${quote.json()['quote']}")
# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"prompt": "A serene forest with sunlight filtering through trees",
"negative_prompt": "low quality, blurry",
"duration": "10s",
"resolution": "720p",
"aspect_ratio": "16:9",
"audio": True
}
)
queue_id = queue_resp.json()["queueid"]
# Step 3: Poll until complete
while True:
status_resp = requests.post(
"https://api.venice.ai/api/v1/video/retrieve",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id,
"delete_media_on_completion": False
}
)
if (status_resp.status_code == 200
and status_resp.headers.get("Content-Type") == "video/mp4"):
with open("output.mp4", "wb") as f:
f.write(status_resp.content)
print("Video saved!")
break
else:
status = status_resp.json()
print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
time.sleep(10)
# Step 4: Cleanup (optional β deletes from Venice storage)
requests.post(
"https://api.venice.ai/api/v1/video/complete",
headers=headers,
json={
"model": "kling-2.5-turbo-pro-text-to-video",
"queueid": queue_id
}
)
Image-to-Video:
with open("image.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode("utf-8")
queue_resp = requests.post(
"https://api.venice.ai/api/v1/video/queue",
headers=headers,
json={
"model": "wan-2.5-preview-image-to-video",
"prompt": "Animate this scene with gentle motion",
"image_url": f"data:image/png;base64,{img_b64}",
"duration": "5s",
"resolution": "720p"
}
)
Video Models:
| Model | Type | Features |
|-------|------|----------|
| kling-2.5-turbo-pro | Text/Image-to-Video | Fast, high quality |
| wan-2.5-preview | Image-to-Video | Animation specialist |
| ltx-2-full | Text/Image-to-Video | Full quality |
| veo3-fast | Text/Image-to-Video | Speed-optimized |
| sora-2 | Image-to-Video | High-end quality |
See references/video-api.md for full parameter reference.
6. Text-to-Speech
Convert text to audio with 60+ voices.
response = requests.post(
"https://api.venice.ai/api/v1/audio/speech",
headers={"Authorization": f"Bearer {api_key}"},
json={
"input": "Hello, welcome to Venice.",
"model": "tts-kokoro",
"voice": "af_sky",
"speed": 1.0, # 0.25 to 4.0
"response_format": "mp3" # mp3, opus, aac, flac, wav, pcm
}
)
with open("speech.mp3", "wb") as f:
f.write(response.content)
Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more.
Pricing: $3.50 per 1M characters.
7. Speech-to-Text
Transcribe audio files.
with open("audio.mp3", "rb") as f:
response = requests.post(
"https://api.venice.ai/api/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {api_key}"},
files={"file": f},
data={
"model": "nvidia/parakeet-tdt-0.6b-v3",
"response_format": "json", # json or text
"timestamps": "true"
}
)
Formats: WAV, FLAC, MP3, M4A, AAC, MP4.
Pricing: $0.0001 per audio second.
8. Embeddings
Generate vector embeddings for RAG and semantic search.
response = requests.post(
"https://api.venice.ai/api/v1/embeddings",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": "text-embedding-bge-m3",
"input": "Privacy-first AI infrastructure",
"encoding_format": "float" # or "base64"
}
)
9. Vision (Multimodal)
Analyze images with vision-capable models.
response = client.chat.completions.create(
model="mistral-31-24b",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://..."}}
]
}]
)
10. Function Calling
Define tools for the model to call.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Weather in SF?"}],
tools=tools
)
11. Structured Outputs
Get guaranteed JSON schema responses.
response = client.chat.completions.create(
model="venice-uncensored",
messages=[...],
response_format={
"type": "json_schema",
"json_schema": {
"name": "my_response",
"strict": True,
"schema": {
"type": "object",
"properties": {"answer": {"type": "string"}},
"required": ["answer"],
"additionalProperties": False
}
}
}
)
Requirements: strict: true, additionalProperties: false, all fields in required.
12. AI Characters
Interact with predefined AI personas.
# List characters
characters = requests.get(
"https://api.venice.ai/api/v1/characters",
headers={"Authorization": f"Bearer {api_key}"},
params={"categories": "philosophy", "limit": 50}
).json()
# Chat with a character
response = client.chat.completions.create(
model="venice-uncensored",
messages=[{"role": "user", "content": "What is the meaning of life?"}],
extra_body={
"venice_parameters": {"character_slug": "alan-watts"}
}
)
13. Model Discovery
Query available models and capabilities programmatically.
# List models by type
models = requests.get(
"https://api.venice.ai/api/v1/models",
headers={"Authorization": f"Bearer {api_key}"},
params={"type": "text"} # text, image, audio, video, embedding
).json()
# Get model traits for auto-selection
traits = requests.get(
"https://api.venice.ai/api/v1/models/traits",
params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}
# Use trait as model ID for automatic routing
response = client.chat.completions.create(
model="fastest", # Venice routes to the current fastest model
messages=[...]
)
Error Handling
Error Codes
| Status | Error Code | Meaning | Action |
|---|---|---|---|
| 400 | INVALID_REQUEST |
Bad parameters | Check payload schema |
| 401 | AUTHENTICATION_FAILED |
Invalid API key | Verify key and balance |
| 402 | β | Insufficient balance | Add USD or stake VVV |
| 403 | β | Unauthorized access | Check key type (ADMIN vs INFERENCE) |
| 413 | β | Payload too large | Reduce request size |
| 415 | β | Invalid content type | Use application/json |
| 422 | β | Content policy violation | Modify prompt |
| 429 | RATE_LIMIT_EXCEEDED |
Too many requests | Backoff, wait for reset |
| 500 | INFERENCE_FAILED |
Model error | Retry with backoff |
| 503 | β | Model at capacity | Retry later or switch model |
| 504 | β | Timeout | Use streaming for long responses |
Abuse Protection
Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.
Retry with Exponential Backoff (Python)
import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_venice_session():
"""Create a requests session with automatic retry and backoff."""
session = requests.Session()
retry = Retry(
total=3,
backoff_factor=1, # 1s, 2s, 4s
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["POST", "GET"]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount("https://", adapter)
return session
session = create_venice_session()
response = session.post(url, json=payload, headers=headers)
Retry with Exponential Backoff (JavaScript)
async function veniceRequest(url, options, maxRetries = 3) {
for (let attempt = 0; attempt <= maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.ok) return response;
if ([429, 500, 502, 503, 504].includes(response.status)) {
if (attempt < maxRetries) {
const delay = Math.pow(2, attempt) * 1000;
console.log(`Retry ${attempt + 1} in ${delay}ms (status ${response.status})`);
await new Promise(r => setTimeout(r, delay));
continue;
}
}
throw new Error(`Venice API error: ${response.status} ${response.statusText}`);
}
}
Rate Limit-Aware Client (Python)
import time
import requests
class VeniceClient:
"""Wrapper that respects rate limits using response headers."""
def __init__(self, api_key):
self.api_key = api_key
self.base_url = "https://api.venice.ai/api/v1"
self.session = create_venice_session()
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
def request(self, method, path, **kwargs):
resp = self.session.request(
method, f"{self.base_url}{path}",
headers=self.headers, **kwargs
)
remaining = resp.headers.get("x-ratelimit-remaining-requests")
if remaining and int(remaining) <= 1:
reset = resp.headers.get("x-ratelimit-reset-requests")
if reset:
wait = max(0, float(reset) - time.time())
time.sleep(wait)
resp.raise_for_status()
return resp
Response Headers
Monitor these headers for production:
- x-ratelimit-remaining-requests β Requests left in window
- x-ratelimit-remaining-tokens β Tokens left in window
- x-ratelimit-reset-requests β Timestamp when request count resets
- x-venice-balance-usd β USD balance
- x-venice-balance-diem β DIEM balance
- x-venice-is-blurred β Image was blurred (safe mode)
- x-venice-is-content-violation β Content policy violation
- x-venice-model-deprecation-warning β Deprecation notice
- x-venice-model-deprecation-date β Sunset date
- CF-RAY β Request ID for support
Rate Limits by Model Tier
Text Models:
| Tier | RPM | TPM | Example Models |
|------|-----|-----|----------------|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |
Other Endpoints:
| Endpoint | RPM |
|----------|-----|
| Image Generation | 20 |
| Audio Synthesis | 60 |
| Audio Transcription | 60 |
| Embeddings | 500 |
| Video Queue | 40 |
| Video Retrieve | 120 |
API Key Management
# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'
# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
-H "Authorization: Bearer $VENICE_API_KEY"
# List keys
curl https://api.venice.ai/api/v1/api_keys \
-H "Authorization: Bearer $VENICE_API_KEY"
# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
-H "Authorization: Bearer $VENICE_API_KEY"
Reference Files
- references/chat-completions.md β Full chat API parameters
- references/image-api.md β Image generation, editing, upscaling details
- references/video-api.md β Video generation workflow and parameters
- references/models.md β Available models, tiers, pricing, and capabilities
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.