phoenix-cli

by @Arize-ai in AI & LLM

8,402

702

# Install this skill:

npx skills add Arize-ai/phoenix --skill "phoenix-cli"

Install specific skill from multi-skill repository

# Description

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.

# SKILL.md

name: phoenix-cli
description: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
license: Apache-2.0
metadata:
author: arize-ai
version: "1.0"

Phoenix CLI

Debug and analyze LLM applications using the Phoenix CLI (px).

Quick Start

Installation

npm install -g @arizeai/phoenix-cli
# Or run directly with npx
npx @arizeai/phoenix-cli

Configuration

Set environment variables before running commands:

export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

CLI flags override environment variables when specified.

Debugging Workflows

Debug a failing LLM application

Fetch recent traces to see what's happening:

px traces --limit 10

Find failed traces:

px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'

Get details on a specific trace:

px trace <trace-id>

Look for errors in spans:

px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'

Find performance issues

Get the slowest traces:

px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'

Analyze span durations within a trace:

px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'

Analyze LLM usage

Extract models and token counts:

px traces --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'

Review experiment results

List datasets:

px datasets

List experiments for a dataset:

px experiments --dataset my-dataset

Analyze experiment failures:

px experiment <experiment-id> --format raw --no-progress | \
  jq '.[] | select(.error != null) | {input: .input, error}'

Calculate average latency:

px experiment <experiment-id> --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Command Reference

px traces

Fetch recent traces from a project.

px traces [directory] [options]

Option	Description
`[directory]`	Save traces as JSON files to directory
`-n, --limit <number>`	Number of traces (default: 10)
`--last-n-minutes <number>`	Filter by time window
`--since <timestamp>`	Fetch since ISO timestamp
`--format <format>`	`pretty`, `json`, or `raw`
`--include-annotations`	Include span annotations

px trace

Fetch a specific trace by ID.

px trace <trace-id> [options]

Option	Description
`--file <path>`	Save to file
`--format <format>`	`pretty`, `json`, or `raw`
`--include-annotations`	Include span annotations

px datasets

List all datasets.

px datasets [options]

px dataset

Fetch examples from a dataset.

px dataset <dataset-name> [options]

Option	Description
`--split <name>`	Filter by split (repeatable)
`--version <id>`	Specific dataset version
`--file <path>`	Save to file

px experiments

List experiments for a dataset.

px experiments --dataset <name> [directory]

Option	Description
`--dataset <name>`	Dataset name or ID (required)
`[directory]`	Export experiment JSON to directory

px experiment

Fetch a single experiment with run data.

px experiment <experiment-id> [options]

px prompts

List all prompts.

px prompts [options]

px prompt

Fetch a specific prompt.

px prompt <prompt-name> [options]

Output Formats

pretty (default): Human-readable tree view
json: Formatted JSON with indentation
raw: Compact JSON for piping to jq or other tools

Use --format raw --no-progress when piping output to other commands.

Trace Structure

Traces contain spans with OpenInference semantic attributes:

{
  "traceId": "abc123",
  "spans": [{
    "name": "chat_completion",
    "span_kind": "LLM",
    "status_code": "OK",
    "attributes": {
      "llm.model_name": "gpt-4",
      "llm.token_count.prompt": 512,
      "llm.token_count.completion": 256,
      "input.value": "What is the weather?",
      "output.value": "The weather is sunny..."
    }
  }],
  "duration": 1250,
  "status": "OK"
}

Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.

Key attributes for LLM spans:
- llm.model_name: Model used
- llm.provider: Provider name (e.g., "openai")
- llm.token_count.prompt / llm.token_count.completion: Token counts
- llm.input_messages.*: Input messages (indexed, with role and content)
- llm.output_messages.*: Output messages (indexed, with role and content)
- input.value / output.value: Raw input/output as text
- exception.message: Error message if failed

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.