Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add vercel-labs/vercel-plugin --skill "ai-gateway"
Install specific skill from multi-skill repository
# Description
Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.
# SKILL.md
name: ai-gateway
description: Vercel AI Gateway expert guidance. Use when configuring model routing, provider failover, cost tracking, or managing multiple AI providers through a unified API.
Vercel AI Gateway
You are an expert in the Vercel AI Gateway β a unified API for calling AI models with built-in routing, failover, cost tracking, and observability.
Overview
AI Gateway provides a single API endpoint to access 100+ models from all major providers. It adds <20ms routing latency and handles provider selection, authentication, failover, and load balancing.
Setup
The AI SDK automatically uses the AI Gateway when you pass a model string in "provider/model" format:
import { generateText } from 'ai'
import { gateway } from 'ai' // Available since AI SDK 5.0.36+
const result = await generateText({
model: gateway('openai/gpt-5-mini'),
prompt: 'Hello!',
})
No additional package needed β the gateway provider is built into the ai package.
Provider Routing
Configure how AI Gateway routes requests across providers:
const result = await generateText({
model: gateway('anthropic/claude-sonnet-4-6'),
prompt: 'Hello!',
providerOptions: {
gateway: {
// Try providers in order; failover to next on error
order: ['bedrock', 'anthropic'],
// Restrict to specific providers only
only: ['anthropic', 'vertex'],
// Fallback models if primary model fails
models: ['openai/gpt-5-mini', 'google/gemini-2.5-flash'],
// Track usage per end-user
user: 'user-123',
// Tag for cost attribution and filtering
tags: ['feature:chat', 'env:production', 'team:growth'],
},
},
})
Routing Options
| Option | Purpose |
|---|---|
order |
Provider priority list; try first, failover to next |
only |
Restrict to specific providers |
models |
Fallback model list if primary model unavailable |
user |
End-user ID for usage tracking |
tags |
Labels for cost attribution and reporting |
Cache-Control Headers
AI Gateway supports response caching to reduce latency and cost for repeated or similar requests:
const result = await generateText({
model: gateway('openai/gpt-5-mini'),
prompt: 'What is the capital of France?',
providerOptions: {
gateway: {
// Cache identical requests for 1 hour
cacheControl: 'max-age=3600',
},
},
})
Caching strategies
| Header Value | Behavior |
|---|---|
max-age=3600 |
Cache response for 1 hour |
max-age=0 |
Bypass cache, always call provider |
s-maxage=86400 |
Cache at the edge for 24 hours |
stale-while-revalidate=600 |
Serve stale for 10 min while refreshing in background |
When to use caching
- Static knowledge queries: FAQs, translations, factual lookups β cache aggressively
- User-specific conversations: Do not cache β each response depends on conversation history
- Embeddings: Cache embedding results for identical inputs to save cost
- Structured extraction: Cache when extracting structured data from identical documents
Cache key composition
The cache key is derived from: model, prompt/messages, temperature, and other generation parameters. Changing any parameter produces a new cache key.
Per-User Rate Limiting
Control usage at the individual user level to prevent abuse and manage costs:
const result = await generateText({
model: gateway('openai/gpt-5-mini'),
prompt: userMessage,
providerOptions: {
gateway: {
user: userId, // Required for per-user rate limiting
tags: ['feature:chat'],
},
},
})
Rate limit configuration
Configure rate limits in the Vercel dashboard under AI Gateway settings:
- Requests per minute per user: Throttle individual users (e.g., 20 RPM)
- Tokens per day per user: Cap daily token consumption (e.g., 100K tokens/day)
- Concurrent requests per user: Limit parallel calls (e.g., 3 concurrent)
Handling rate limit responses
When a user exceeds their limit, the gateway returns HTTP 429:
import { generateText, APICallError } from 'ai'
try {
const result = await generateText({
model: gateway('openai/gpt-5-mini'),
prompt: userMessage,
providerOptions: { gateway: { user: userId } },
})
} catch (error) {
if (APICallError.isInstance(error) && error.statusCode === 429) {
const retryAfter = error.responseHeaders?.['retry-after']
return new Response(
JSON.stringify({ error: 'Rate limited', retryAfter }),
{ status: 429 }
)
}
throw error
}
Budget Alerts and Cost Controls
Tagging for cost attribution
Use tags to track spend by feature, team, and environment:
providerOptions: {
gateway: {
tags: [
'feature:document-qa',
'team:product',
'env:production',
'tier:premium',
],
user: userId,
},
}
Setting up budget alerts
In the Vercel dashboard:
- Navigate to AI Gateway β Usage & Budgets
- Set monthly budget thresholds (e.g., $500/month warning, $1000/month hard limit)
- Configure alert channels (email, Slack webhook, Vercel integration)
- Optionally set per-tag budgets for granular control
Hard spending limits
When a hard limit is reached, the gateway returns HTTP 402 (Payment Required). Handle this gracefully:
if (APICallError.isInstance(error) && error.statusCode === 402) {
// Budget exceeded β degrade gracefully
return fallbackResponse()
}
Cost optimization patterns
- Use cheaper models for classification/routing, expensive models for generation
- Cache embeddings and static queries (see Cache-Control above)
- Set per-user daily token caps to prevent runaway usage
- Monitor cost-per-feature with tags to identify optimization targets
Audit Logging
AI Gateway logs every request for compliance and debugging:
What's logged
- Timestamp, model, provider used
- Input/output token counts
- Latency (routing + provider)
- User ID and tags
- HTTP status code
- Failover chain (which providers were tried)
Accessing logs
- Vercel Dashboard: AI Gateway β Logs β filter by model, user, tag, status, date range
- Vercel API: Query logs programmatically:
curl -H "Authorization: Bearer $VERCEL_TOKEN" \
"https://api.vercel.com/v1/ai-gateway/logs?projectId=$PROJECT_ID&limit=100"
- Log Drains: Forward AI Gateway logs to Datadog, Splunk, or other providers via Vercel Log Drains for long-term retention and custom dashboards
Compliance considerations
- AI Gateway does not log prompt or completion content by default
- Enable content logging in project settings if required for compliance
- Logs are retained per your Vercel plan's retention policy
- Use
userfield consistently to support audit trails
Error Handling Patterns
Provider unavailable
When a provider is down, the gateway automatically fails over if you configured order or models:
const result = await generateText({
model: gateway('anthropic/claude-sonnet-4-6'),
prompt: 'Summarize this document',
providerOptions: {
gateway: {
order: ['anthropic', 'bedrock'], // Bedrock as fallback
models: ['openai/gpt-5-mini'], // Final fallback model
},
},
})
Quota exceeded at provider
If your provider API key hits its quota, the gateway tries the next provider in the order list. Monitor this in logs β persistent quota errors indicate you need to increase limits with the provider.
Invalid model identifier
// Bad β model doesn't exist
model: gateway('openai/gpt-99') // Returns 400 with descriptive error
// Good β use models listed in Vercel docs
model: gateway('openai/gpt-5-mini')
Timeout handling
Gateway has a default timeout per provider. For long-running generations, use streaming:
import { streamText } from 'ai'
const result = streamText({
model: gateway('anthropic/claude-sonnet-4-6'),
prompt: longDocument,
})
for await (const chunk of result.textStream) {
process.stdout.write(chunk)
}
Complete error handling template
import { generateText, APICallError } from 'ai'
async function callAI(prompt: string, userId: string) {
try {
return await generateText({
model: gateway('openai/gpt-5-mini'),
prompt,
providerOptions: {
gateway: {
user: userId,
order: ['openai', 'azure-openai'],
models: ['anthropic/claude-haiku-4-5'],
tags: ['feature:chat'],
},
},
})
} catch (error) {
if (!APICallError.isInstance(error)) throw error
switch (error.statusCode) {
case 402: return { text: 'Budget limit reached. Please try again later.' }
case 429: return { text: 'Too many requests. Please slow down.' }
case 503: return { text: 'AI service temporarily unavailable.' }
default: throw error
}
}
}
Gateway vs Direct Provider β Decision Tree
Use this to decide whether to route through AI Gateway or call a provider SDK directly:
Need failover across providers?
ββ Yes β Use Gateway
ββ No
Need cost tracking / budget alerts?
ββ Yes β Use Gateway
ββ No
Need per-user rate limiting?
ββ Yes β Use Gateway
ββ No
Need audit logging?
ββ Yes β Use Gateway
ββ No
Using a single provider with provider-specific features?
ββ Yes β Use direct provider SDK
ββ No β Use Gateway (simplifies code)
When to use direct provider SDK
- You need provider-specific features not exposed through the gateway (e.g., Anthropic's computer use, OpenAI's custom fine-tuned model endpoints)
- You're self-hosting a model (e.g., vLLM, Ollama) that isn't registered with the gateway
- You need request-level control over HTTP transport (custom proxies, mTLS)
When to always use Gateway
- Production applications β failover and observability are essential
- Multi-tenant SaaS β per-user tracking and rate limiting
- Teams with cost accountability β tag-based budgeting
Supported Providers
- OpenAI (GPT-5.x, o-series)
- Anthropic (Claude 4.x)
- Google (Gemini)
- xAI (Grok)
- Mistral
- DeepSeek
- Amazon Bedrock
- Azure OpenAI
- Cohere
- Perplexity
- Alibaba (Qwen)
- Meta (Llama)
- And many more (100+ models total)
Pricing
- Vercel-managed keys: Tokens at provider list price, no markup
- Bring Your Own Key: 0% markup on token costs
- Free tier: $5 credits every 30 days on any Vercel account
Multimodal Support
Text, image, and video generation all route through the gateway:
// Text
const { text } = await generateText({
model: gateway('openai/gpt-5-mini'),
prompt: 'Hello',
})
// Image
const { image } = await generateImage({
model: gateway('openai/dall-e-3'),
prompt: 'A sunset',
})
Key Benefits
- Unified API: One interface for all providers, no provider-specific code
- Automatic failover: If a provider is down, requests route to the next
- Cost tracking: Per-user, per-feature attribution with tags
- Observability: Built-in monitoring of all model calls
- Low latency: <20ms routing overhead
- No lock-in: Switch models/providers by changing a string
When to Use AI Gateway
| Scenario | Use Gateway? |
|---|---|
| Production app with AI features | Yes β failover, cost tracking |
| Prototyping with single provider | Optional β direct provider works fine |
| Multi-provider setup | Yes β unified routing |
| Need provider-specific features | Use direct provider SDK + Gateway as fallback |
| Cost tracking and budgeting | Yes β user tracking and tags |
| Multi-tenant SaaS | Yes β per-user rate limiting and audit |
| Compliance requirements | Yes β audit logging and log drains |
Official Documentation
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.