jrajasekera

venice-ai-api

1
0
# Install this skill:
npx skills add jrajasekera/claude-skills --skill "venice-ai-api"

Install specific skill from multi-skill repository

# Description

Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.

# SKILL.md


name: venice-ai-api
description: Venice.ai API integration for privacy-first AI applications. Use when building applications with Venice.ai API for chat completions, image generation, video generation, text-to-speech, speech-to-text, or embeddings. Triggers on Venice, Venice.ai, uncensored AI, privacy-first AI, or when users need OpenAI-compatible API with uncensored models.


Venice.ai API Skill

Venice.ai provides privacy-first AI infrastructure with uncensored models and zero data retention. The API is OpenAI-compatible, allowing use of the OpenAI SDK with Venice's base URL. Inference runs on a decentralized network (DePIN) where nodes are disincentivized from retaining user data.

Quick Reference

Base URL: https://api.venice.ai/api/v1
Auth: Authorization: Bearer VENICE_API_KEY
SDK: Use OpenAI SDK with custom base URL
API Key Types: ADMIN (full access) or INFERENCE (inference only)

Setup

from openai import OpenAI

client = OpenAI(
    api_key=os.getenv("VENICE_API_KEY"),
    base_url="https://api.venice.ai/api/v1"
)
import OpenAI from 'openai';

const client = new OpenAI({
    apiKey: process.env.VENICE_API_KEY,
    baseURL: 'https://api.venice.ai/api/v1'
});

Account Tiers

Tier Qualification Rate Limits Use Case
Explorer Pro subscription Low RPM/TPM (~15-25 req/day) Testing, prototyping
Paid USD balance or staked VVV (Diems) Standard production limits Commercial apps
Partner Enterprise agreement Custom high-volume Enterprise SaaS

API Capabilities

1. Chat Completions

Text inference with multimodal support (text, images, audio, video).

completion = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"}
    ]
)

Popular Models:
- llama-3.3-70b - Balanced performance (Tier M, 128K context)
- zai-org-glm-4.7 - Complex tasks, deep reasoning (Tier L, 128K context)
- mistral-31-24b - Vision + function calling (Tier S, 131K context)
- venice-uncensored - No content filtering (Tier S, 32K context)
- deepseek-ai-DeepSeek-R1 - Advanced reasoning, math, coding (Tier L, 64K context)
- qwen3-235b - Massive MoE reasoning (Tier L)
- qwen3-4b - Fast, lightweight (Tier XS, 40K context)

Venice Parameters (via extra_body in Python, direct in JS):
- enable_web_search: "off" | "on" | "auto"
- enable_web_scraping: boolean
- enable_web_citations: boolean — adds ^index^ citation format
- include_venice_system_prompt: boolean (default: true)
- strip_thinking_response: boolean
- disable_thinking: boolean
- character_slug: string
- prompt_cache_key: string — routing hint for cache hits
- prompt_cache_retention: "default" | "extended" | "24h"

See references/chat-completions.md for full parameter reference.

2. Image Generation

Generate images from text prompts.

import requests

response = requests.post(
    "https://api.venice.ai/api/v1/image/generate",
    headers={"Authorization": f"Bearer {os.getenv('VENICE_API_KEY')}"},
    json={
        "model": "venice-sd35",
        "prompt": "A sunset over mountains",
        "width": 1024,
        "height": 1024
    }
)
# Response contains base64 images in images array

Image Models:
| Model | Best For | Pricing |
|-------|----------|---------|
| qwen-image | Highest quality, editing | Variable |
| venice-sd35 | General purpose (default) | ~$0.01/image |
| hidream | Fast generation | ~$0.01/image |
| flux-2-pro | Professional quality | ~$0.04/image |
| flux-2-max | High-quality output | ~$0.02/image |
| nano-banana-pro | Photorealism, 2K/4K support | $0.18-$0.35 |

3. Image Upscaling

Enhance image resolution 2x or 4x.

import base64

with open("image.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/upscale",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "image": image_base64,
        "scale": 4  # 2 or 4
    }
)
# Returns raw image binary
with open("upscaled.png", "wb") as f:
    f.write(response.content)

Pricing: $0.02 (2x), $0.08 (4x)

4. Image Editing (Inpainting)

Modify existing images with AI-powered instructions.

import base64

with open("photo.jpg", "rb") as f:
    image_base64 = base64.b64encode(f.read()).decode("utf-8")

response = requests.post(
    "https://api.venice.ai/api/v1/image/edit",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "prompt": "Change the sky to a sunset",
        "image": image_base64  # or URL starting with http/https
    }
)
# Returns raw image binary
with open("edited.png", "wb") as f:
    f.write(response.content)

Model: Uses Qwen-Image. Pricing: ~$0.04/edit.

See references/image-api.md for all parameters and style presets.

5. Video Generation

Async queue-based video generation. Always call /video/quote first for pricing.

Full Workflow:

import requests
import time
import base64

api_key = os.getenv("VENICE_API_KEY")
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Step 1: Get price quote
quote = requests.post(
    "https://api.venice.ai/api/v1/video/quote",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
print(f"Estimated cost: ${quote.json()['quote']}")

# Step 2: Queue the job (text-to-video)
queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "prompt": "A serene forest with sunlight filtering through trees",
        "negative_prompt": "low quality, blurry",
        "duration": "10s",
        "resolution": "720p",
        "aspect_ratio": "16:9",
        "audio": True
    }
)
queue_id = queue_resp.json()["queueid"]

# Step 3: Poll until complete
while True:
    status_resp = requests.post(
        "https://api.venice.ai/api/v1/video/retrieve",
        headers=headers,
        json={
            "model": "kling-2.5-turbo-pro-text-to-video",
            "queueid": queue_id,
            "delete_media_on_completion": False
        }
    )
    if (status_resp.status_code == 200
            and status_resp.headers.get("Content-Type") == "video/mp4"):
        with open("output.mp4", "wb") as f:
            f.write(status_resp.content)
        print("Video saved!")
        break
    else:
        status = status_resp.json()
        print(f"Status: {status['status']}, Duration: {status['executionDuration']}ms")
        time.sleep(10)

# Step 4: Cleanup (optional — deletes from Venice storage)
requests.post(
    "https://api.venice.ai/api/v1/video/complete",
    headers=headers,
    json={
        "model": "kling-2.5-turbo-pro-text-to-video",
        "queueid": queue_id
    }
)

Image-to-Video:

with open("image.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode("utf-8")

queue_resp = requests.post(
    "https://api.venice.ai/api/v1/video/queue",
    headers=headers,
    json={
        "model": "wan-2.5-preview-image-to-video",
        "prompt": "Animate this scene with gentle motion",
        "image_url": f"data:image/png;base64,{img_b64}",
        "duration": "5s",
        "resolution": "720p"
    }
)

Video Models:
| Model | Type | Features |
|-------|------|----------|
| kling-2.5-turbo-pro | Text/Image-to-Video | Fast, high quality |
| wan-2.5-preview | Image-to-Video | Animation specialist |
| ltx-2-full | Text/Image-to-Video | Full quality |
| veo3-fast | Text/Image-to-Video | Speed-optimized |
| sora-2 | Image-to-Video | High-end quality |

See references/video-api.md for full parameter reference.

6. Text-to-Speech

Convert text to audio with 60+ voices.

response = requests.post(
    "https://api.venice.ai/api/v1/audio/speech",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "input": "Hello, welcome to Venice.",
        "model": "tts-kokoro",
        "voice": "af_sky",
        "speed": 1.0,            # 0.25 to 4.0
        "response_format": "mp3"  # mp3, opus, aac, flac, wav, pcm
    }
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)

Voices: af_sky, af_nova, am_liam, bf_emma, zf_xiaobei, jm_kumo, and 50+ more.
Pricing: $3.50 per 1M characters.

7. Speech-to-Text

Transcribe audio files.

with open("audio.mp3", "rb") as f:
    response = requests.post(
        "https://api.venice.ai/api/v1/audio/transcriptions",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"file": f},
        data={
            "model": "nvidia/parakeet-tdt-0.6b-v3",
            "response_format": "json",  # json or text
            "timestamps": "true"
        }
    )

Formats: WAV, FLAC, MP3, M4A, AAC, MP4.
Pricing: $0.0001 per audio second.

8. Embeddings

Generate vector embeddings for RAG and semantic search.

response = requests.post(
    "https://api.venice.ai/api/v1/embeddings",
    headers={"Authorization": f"Bearer {api_key}"},
    json={
        "model": "text-embedding-bge-m3",
        "input": "Privacy-first AI infrastructure",
        "encoding_format": "float"  # or "base64"
    }
)

9. Vision (Multimodal)

Analyze images with vision-capable models.

response = client.chat.completions.create(
    model="mistral-31-24b",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What is in this image?"},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)

10. Function Calling

Define tools for the model to call.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Weather in SF?"}],
    tools=tools
)

11. Structured Outputs

Get guaranteed JSON schema responses.

response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[...],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "my_response",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {"answer": {"type": "string"}},
                "required": ["answer"],
                "additionalProperties": False
            }
        }
    }
)

Requirements: strict: true, additionalProperties: false, all fields in required.

12. AI Characters

Interact with predefined AI personas.

# List characters
characters = requests.get(
    "https://api.venice.ai/api/v1/characters",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"categories": "philosophy", "limit": 50}
).json()

# Chat with a character
response = client.chat.completions.create(
    model="venice-uncensored",
    messages=[{"role": "user", "content": "What is the meaning of life?"}],
    extra_body={
        "venice_parameters": {"character_slug": "alan-watts"}
    }
)

13. Model Discovery

Query available models and capabilities programmatically.

# List models by type
models = requests.get(
    "https://api.venice.ai/api/v1/models",
    headers={"Authorization": f"Bearer {api_key}"},
    params={"type": "text"}  # text, image, audio, video, embedding
).json()

# Get model traits for auto-selection
traits = requests.get(
    "https://api.venice.ai/api/v1/models/traits",
    params={"type": "text"}
).json()
# e.g. {"default": "zai-org-glm-4.7", "fastest": "qwen3-4b", "uncensored": "venice-uncensored"}

# Use trait as model ID for automatic routing
response = client.chat.completions.create(
    model="fastest",  # Venice routes to the current fastest model
    messages=[...]
)

Error Handling

Error Codes

Status Error Code Meaning Action
400 INVALID_REQUEST Bad parameters Check payload schema
401 AUTHENTICATION_FAILED Invalid API key Verify key and balance
402 Insufficient balance Add USD or stake VVV
403 Unauthorized access Check key type (ADMIN vs INFERENCE)
413 Payload too large Reduce request size
415 Invalid content type Use application/json
422 Content policy violation Modify prompt
429 RATE_LIMIT_EXCEEDED Too many requests Backoff, wait for reset
500 INFERENCE_FAILED Model error Retry with backoff
503 Model at capacity Retry later or switch model
504 Timeout Use streaming for long responses

Abuse Protection

Sending >20 failed requests in 30 seconds triggers a 30-second IP block. Always implement backoff.

Retry with Exponential Backoff (Python)

import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_venice_session():
    """Create a requests session with automatic retry and backoff."""
    session = requests.Session()
    retry = Retry(
        total=3,
        backoff_factor=1,  # 1s, 2s, 4s
        status_forcelist=[429, 500, 502, 503, 504],
        allowed_methods=["POST", "GET"]
    )
    adapter = HTTPAdapter(max_retries=retry)
    session.mount("https://", adapter)
    return session

session = create_venice_session()
response = session.post(url, json=payload, headers=headers)

Retry with Exponential Backoff (JavaScript)

async function veniceRequest(url, options, maxRetries = 3) {
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
        const response = await fetch(url, options);

        if (response.ok) return response;

        if ([429, 500, 502, 503, 504].includes(response.status)) {
            if (attempt < maxRetries) {
                const delay = Math.pow(2, attempt) * 1000;
                console.log(`Retry ${attempt + 1} in ${delay}ms (status ${response.status})`);
                await new Promise(r => setTimeout(r, delay));
                continue;
            }
        }

        throw new Error(`Venice API error: ${response.status} ${response.statusText}`);
    }
}

Rate Limit-Aware Client (Python)

import time
import requests

class VeniceClient:
    """Wrapper that respects rate limits using response headers."""
    def __init__(self, api_key):
        self.api_key = api_key
        self.base_url = "https://api.venice.ai/api/v1"
        self.session = create_venice_session()
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }

    def request(self, method, path, **kwargs):
        resp = self.session.request(
            method, f"{self.base_url}{path}",
            headers=self.headers, **kwargs
        )
        remaining = resp.headers.get("x-ratelimit-remaining-requests")
        if remaining and int(remaining) <= 1:
            reset = resp.headers.get("x-ratelimit-reset-requests")
            if reset:
                wait = max(0, float(reset) - time.time())
                time.sleep(wait)
        resp.raise_for_status()
        return resp

Response Headers

Monitor these headers for production:
- x-ratelimit-remaining-requests — Requests left in window
- x-ratelimit-remaining-tokens — Tokens left in window
- x-ratelimit-reset-requests — Timestamp when request count resets
- x-venice-balance-usd — USD balance
- x-venice-balance-diem — DIEM balance
- x-venice-is-blurred — Image was blurred (safe mode)
- x-venice-is-content-violation — Content policy violation
- x-venice-model-deprecation-warning — Deprecation notice
- x-venice-model-deprecation-date — Sunset date
- CF-RAY — Request ID for support

Rate Limits by Model Tier

Text Models:
| Tier | RPM | TPM | Example Models |
|------|-----|-----|----------------|
| XS | 500 | 1,000,000 | qwen3-4b, llama-3.2-3b |
| S | 75 | 750,000 | mistral-31-24b, venice-uncensored |
| M | 50 | 750,000 | llama-3.3-70b, qwen3-next-80b |
| L | 20 | 500,000 | zai-org-glm-4.7, deepseek-ai-DeepSeek-R1 |

Other Endpoints:
| Endpoint | RPM |
|----------|-----|
| Image Generation | 20 |
| Audio Synthesis | 60 |
| Audio Transcription | 60 |
| Embeddings | 500 |
| Video Queue | 40 |
| Video Retrieve | 120 |

API Key Management

# Create key programmatically (requires ADMIN key)
curl -X POST https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"apiKeyType": "INFERENCE", "description": "My App", "consumptionLimit": {"usd": 100}}'

# Check rate limits and balance
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
  -H "Authorization: Bearer $VENICE_API_KEY"

# List keys
curl https://api.venice.ai/api/v1/api_keys \
  -H "Authorization: Bearer $VENICE_API_KEY"

# Delete key
curl -X DELETE "https://api.venice.ai/api/v1/api_keys?id={key_id}" \
  -H "Authorization: Bearer $VENICE_API_KEY"

Reference Files

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.