kotsmiltos

schema-scout

0
0
# Install this skill:
npx skills add kotsmiltos/mk-cc-resources --skill "schema-scout"

Install specific skill from multi-skill repository

# Description

Explore the schema and values of any data file (XLSX, CSV, JSON) using the scout CLI. Use when the user asks to examine, index, or explore a data file's structure.

# SKILL.md


name: schema-scout
description: Explore the schema and values of any data file (XLSX, CSV, JSON) using the scout CLI. Use when the user asks to examine, index, or explore a data file's structure.



Analyzes XLSX, CSV, and JSON files, building a schema tree with type detection, value statistics, null analysis, and automatic JSON-in-JSON expansion. Use this skill whenever the user wants to understand the structure, fields, types, or values in a data file.


1. Index the file: scout index <file>
2. View the schema: scout schema <file>

If scout is not on PATH, install with: uv tool install ./tool/ --force (from this skill's directory).


Before using scout, verify it is available:

which scout

If scout is not found, install it from the bundled tool directory (relative to this skill):

uv tool install ./tool/ --force

This installs the scout command globally via uv. No virtual environment activation needed β€” scout will be on PATH after installation.

Dependencies (installed automatically): openpyxl, typer, rich.


Index a file β€” analyze and save a reusable index:

scout index <file>
scout index <file> --force          # Re-index even if index exists
scout index <file> --sheet "Sheet1" # Specific XLSX sheet
scout index <file> --max-rows 5000  # Limit rows scanned

Show full schema tree β€” types, values, nulls at every level:

scout schema <file>

Query a specific field β€” detailed stats for one path:

scout query <file> --path "field.subfield"
scout query <file> --path "items[].name"

List all field paths β€” flat list of every path in the schema:

scout list-paths <file>

Output formats β€” all commands support --format:
- rich (default) β€” colored terminal output with tables and trees
- json β€” machine-readable JSON to stdout
- plain β€” plain text, suitable for piping

scout schema <file> --format json
scout list-paths <file> --format plain


Recommended workflow for exploring an unknown data file:

  1. Index the file to create a .scout-index.json alongside it:
    bash scout index data.xlsx

  2. View the schema tree to understand the overall structure:
    bash scout schema data.xlsx

  3. List all paths if you need a flat reference of available fields:
    bash scout list-paths data.xlsx

  4. Query specific fields to drill into values, types, and distributions:
    bash scout query data.xlsx --path "status" scout query data.xlsx --path "payload.items[].type"

Index files are saved as <filename>.scout-index.json next to the source. Subsequent commands reuse the index automatically β€” no need to re-scan.


- Auto-cleanup: null-only columns are pruned, XLSX overflow columns are trimmed, sparse _col_N columns (less than 5% non-null) are removed
- Encoding repair: double-encoded UTF-8 (common from Excel/ODBC pipelines) is auto-detected and fixed
- JSON detection: columns containing JSON strings are automatically expanded into nested schema trees
- Supported formats: .xlsx, .csv, .json, .ndjson, .jsonl
- Index reuse: pass --force to re-index, otherwise the existing index is loaded


Exploration is successful when:
- scout index creates a .scout-index.json alongside the source file
- scout schema emits a readable schema tree with types and null rates
- Field queries return type, null rate, and value distribution for the specified path
- The user can see the structure and understand the data

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.