Arize-ai

phoenix-cli

8,402
702
# Install this skill:
npx skills add Arize-ai/phoenix --skill "phoenix-cli"

Install specific skill from multi-skill repository

# Description

Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.

# SKILL.md


name: phoenix-cli
description: Debug LLM applications using the Phoenix CLI. Fetch traces, analyze errors, review experiments, and inspect datasets. Use when debugging AI/LLM applications, analyzing trace data, working with Phoenix observability, or investigating LLM performance issues.
license: Apache-2.0
metadata:
author: arize-ai
version: "1.0"


Phoenix CLI

Debug and analyze LLM applications using the Phoenix CLI (px).

Quick Start

Installation

npm install -g @arizeai/phoenix-cli
# Or run directly with npx
npx @arizeai/phoenix-cli

Configuration

Set environment variables before running commands:

export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

CLI flags override environment variables when specified.

Debugging Workflows

Debug a failing LLM application

  1. Fetch recent traces to see what's happening:
px traces --limit 10
  1. Find failed traces:
px traces --limit 50 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'
  1. Get details on a specific trace:
px trace <trace-id>
  1. Look for errors in spans:
px trace <trace-id> --format raw | jq '.spans[] | select(.status_code != "OK")'

Find performance issues

  1. Get the slowest traces:
px traces --limit 20 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:5]'
  1. Analyze span durations within a trace:
px trace <trace-id> --format raw | jq '.spans | sort_by(-.duration_ms) | .[0:5] | .[] | {name, duration_ms, span_kind}'

Analyze LLM usage

Extract models and token counts:

px traces --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | {model: .attributes["llm.model_name"], prompt_tokens: .attributes["llm.token_count.prompt"], completion_tokens: .attributes["llm.token_count.completion"]}'

Review experiment results

  1. List datasets:
px datasets
  1. List experiments for a dataset:
px experiments --dataset my-dataset
  1. Analyze experiment failures:
px experiment <experiment-id> --format raw --no-progress | \
  jq '.[] | select(.error != null) | {input: .input, error}'
  1. Calculate average latency:
px experiment <experiment-id> --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Command Reference

px traces

Fetch recent traces from a project.

px traces [directory] [options]
Option Description
[directory] Save traces as JSON files to directory
-n, --limit <number> Number of traces (default: 10)
--last-n-minutes <number> Filter by time window
--since <timestamp> Fetch since ISO timestamp
--format <format> pretty, json, or raw
--include-annotations Include span annotations

px trace

Fetch a specific trace by ID.

px trace <trace-id> [options]
Option Description
--file <path> Save to file
--format <format> pretty, json, or raw
--include-annotations Include span annotations

px datasets

List all datasets.

px datasets [options]

px dataset

Fetch examples from a dataset.

px dataset <dataset-name> [options]
Option Description
--split <name> Filter by split (repeatable)
--version <id> Specific dataset version
--file <path> Save to file

px experiments

List experiments for a dataset.

px experiments --dataset <name> [directory]
Option Description
--dataset <name> Dataset name or ID (required)
[directory] Export experiment JSON to directory

px experiment

Fetch a single experiment with run data.

px experiment <experiment-id> [options]

px prompts

List all prompts.

px prompts [options]

px prompt

Fetch a specific prompt.

px prompt <prompt-name> [options]

Output Formats

  • pretty (default): Human-readable tree view
  • json: Formatted JSON with indentation
  • raw: Compact JSON for piping to jq or other tools

Use --format raw --no-progress when piping output to other commands.

Trace Structure

Traces contain spans with OpenInference semantic attributes:

{
  "traceId": "abc123",
  "spans": [{
    "name": "chat_completion",
    "span_kind": "LLM",
    "status_code": "OK",
    "attributes": {
      "llm.model_name": "gpt-4",
      "llm.token_count.prompt": 512,
      "llm.token_count.completion": 256,
      "input.value": "What is the weather?",
      "output.value": "The weather is sunny..."
    }
  }],
  "duration": 1250,
  "status": "OK"
}

Key span kinds: LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT.

Key attributes for LLM spans:
- llm.model_name: Model used
- llm.provider: Provider name (e.g., "openai")
- llm.token_count.prompt / llm.token_count.completion: Token counts
- llm.input_messages.*: Input messages (indexed, with role and content)
- llm.output_messages.*: Output messages (indexed, with role and content)
- input.value / output.value: Raw input/output as text
- exception.message: Error message if failed

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.