phrazzld

langfuse-observability

2
1
# Install this skill:
npx skills add phrazzld/claude-config --skill "langfuse-observability"

Install specific skill from multi-skill repository

# Description

|

# SKILL.md


name: langfuse-observability
description: |
Query Langfuse traces, prompts, and LLM metrics. Use when:
- Analyzing LLM generation traces (errors, latency, tokens)
- Reviewing prompt performance and versions
- Debugging failed generations
- Comparing model outputs across runs
Keywords: langfuse, traces, observability, LLM metrics, prompt management, generations


Langfuse Observability

Query traces, prompts, and metrics from Langfuse. Requires env vars:
- LANGFUSE_SECRET_KEY
- LANGFUSE_PUBLIC_KEY
- LANGFUSE_HOST (e.g., https://us.cloud.langfuse.com)

Quick Start

All commands run from the skill directory:

cd ~/.claude/skills/langfuse-observability

List Recent Traces

# Last 10 traces
npx tsx scripts/fetch-traces.ts --limit 10

# Filter by name pattern
npx tsx scripts/fetch-traces.ts --name "quiz-generation" --limit 5

# Filter by user
npx tsx scripts/fetch-traces.ts --user-id "user_abc123" --limit 10

Get Single Trace Details

# Full trace with spans and generations
npx tsx scripts/fetch-trace.ts <trace-id>

Get Prompt

# Fetch specific prompt
npx tsx scripts/list-prompts.ts --name scry-intent-extraction

# With label
npx tsx scripts/list-prompts.ts --name scry-intent-extraction --label production

Get Metrics Summary

# Summary for recent traces
npx tsx scripts/get-metrics.ts --limit 50

# Filter by trace name
npx tsx scripts/get-metrics.ts --name "quiz-generation" --limit 100

Output Formats

All scripts output JSON to stdout for easy parsing.

Trace List Output

[
  {
    "id": "trace-abc123",
    "name": "quiz-generation",
    "userId": "user_xyz",
    "input": {"prompt": "..."},
    "output": {"concepts": [...]},
    "latencyMs": 3200,
    "createdAt": "2025-12-09T..."
  }
]

Single Trace Output

Includes full nested structure: trace β†’ observations (spans + generations) with token usage.

Metrics Output

{
  "totalTraces": 50,
  "successCount": 48,
  "errorCount": 2,
  "avgLatencyMs": 2850,
  "totalTokens": 125000,
  "byName": {"quiz-generation": 30, "phrasing-generation": 20}
}

Common Workflows

Debug Failed Generation

cd ~/.claude/skills/langfuse-observability

# 1. Find recent traces
npx tsx scripts/fetch-traces.ts --limit 10

# 2. Get details of specific trace
npx tsx scripts/fetch-trace.ts <trace-id>

Monitor Token Usage

# Get metrics for cost analysis
npx tsx scripts/get-metrics.ts --limit 100

Check Prompt Configuration

npx tsx scripts/list-prompts.ts --name scry-concept-synthesis --label production

Cost Tracking

Calculate Costs

// Get metrics with cost calculation
const metrics = await langfuse.getMetrics({ limit: 100 });

// Pricing per 1M tokens (update as needed)
const pricing = {
  "claude-3-5-sonnet": { input: 3.0, output: 15.0 },
  "gpt-4o": { input: 2.5, output: 10.0 },
  "gpt-4o-mini": { input: 0.15, output: 0.6 },
};

function calculateCost(model: string, inputTokens: number, outputTokens: number) {
  const p = pricing[model] || { input: 1, output: 1 };
  return (inputTokens * p.input + outputTokens * p.output) / 1_000_000;
}

Daily/Monthly Spend

# Get traces for date range
npx tsx scripts/fetch-traces.ts --from "2025-12-01" --to "2025-12-07" --limit 1000

# Calculate spend (parse output and sum costs)

Cost Alerts

Set up alerts in Langfuse dashboard:
1. Go to Dashboard β†’ Alerts
2. Create alert for: daily_cost > X or cost_per_trace > Y
3. Configure notification (email, Slack webhook)

Or implement in code:

async function checkCostBudget() {
  const dailyMetrics = await langfuse.getMetrics({ since: "24h" });
  const dailyCost = calculateTotalCost(dailyMetrics);

  if (dailyCost > DAILY_BUDGET) {
    await notifySlack(`⚠️ LLM daily spend ($${dailyCost}) exceeded budget ($${DAILY_BUDGET})`);
  }
}

Production Best Practices

1. Trace Everything

import { Langfuse } from "langfuse";

const langfuse = new Langfuse({
  publicKey: process.env.LANGFUSE_PUBLIC_KEY,
  secretKey: process.env.LANGFUSE_SECRET_KEY,
});

// Wrap every LLM call
async function tracedLLMCall(name: string, messages: Message[]) {
  const trace = langfuse.trace({
    name,
    userId: currentUser.id,
    metadata: { environment: process.env.NODE_ENV },
  });

  const generation = trace.generation({
    name: "chat",
    model: selectedModel,
    input: messages,
  });

  try {
    const response = await llm.chat({ model: selectedModel, messages });

    generation.end({
      output: response.choices[0].message,
      usage: {
        promptTokens: response.usage.prompt_tokens,
        completionTokens: response.usage.completion_tokens,
      },
    });

    return response;
  } catch (error) {
    generation.end({ level: "ERROR", statusMessage: error.message });
    throw error;
  }
}

2. Add Context

// Include useful metadata for debugging
const trace = langfuse.trace({
  name: "user-query",
  userId: user.id,
  sessionId: session.id,  // Group related traces
  metadata: {
    userPlan: user.plan,
    feature: "chat",
    version: "v2.1",
  },
  tags: ["production", "chat-feature"],
});

3. Score Outputs

// Track quality metrics
generation.score({
  name: "user-feedback",
  value: userRating, // 1-5
});

// Or automated scoring
generation.score({
  name: "response-length",
  value: response.content.length < 500 ? 1 : 0,
});

4. Flush Before Exit

// Important for serverless environments
await langfuse.flushAsync();

Promptfoo Integration

Trace β†’ Eval Case Workflow

  1. Find interesting traces in Langfuse (failures, edge cases)
  2. Export as test cases for Promptfoo
  3. Add to regression suite to prevent future issues
// Export failed traces as test cases
const failedTraces = await langfuse.getTraces({ level: "ERROR", limit: 50 });

const testCases = failedTraces.map(trace => ({
  vars: trace.input,
  assert: [
    { type: "not-contains", value: "error" },
    { type: "llm-rubric", value: "Response should address the user's question" },
  ],
}));

// Add to promptfooconfig.yaml

Langfuse Callback in Promptfoo

# promptfooconfig.yaml
defaultTest:
  options:
    callback: langfuse
    callbackConfig:
      publicKey: ${LANGFUSE_PUBLIC_KEY}
      secretKey: ${LANGFUSE_SECRET_KEY}

Alternatives Comparison

Feature Langfuse Helicone LangSmith
Open Source βœ… βœ… ❌
Self-Host βœ… βœ… ❌
Free Tier βœ… Generous βœ… 10K/mo ⚠️ Limited
Prompt Mgmt βœ… ❌ βœ…
Tracing βœ… βœ… βœ…
Cost Track βœ… βœ… βœ…
A/B Testing ⚠️ ❌ βœ…

Choose Langfuse when: Self-hosting needed, cost-conscious, want prompt management.

Choose Helicone when: Proxy-based setup preferred, simple integration.

Choose LangSmith when: LangChain ecosystem, enterprise support needed.

  • llm-evaluation - Promptfoo for testing, pairs well with Langfuse for observability
  • llm-gateway-routing - OpenRouter/LiteLLM for model routing
  • ai-llm-development - Overall LLM development patterns
  • /llm-gates - Audit LLM infrastructure including observability gaps
  • /observe - General observability audit

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.