0
0
# Install this skill:
npx skills add developer-advocacy-dremio/dremio-agent-skill --skill "Dremio Expert"

Install specific skill from multi-skill repository

# Description

A comprehensive skill for interacting with Dremio Data Lakehouse via CLI, Python SDK, SQL, and REST API. Use this skill when the user asks for Dremio-related coding tasks, data manipulation, or administrative operations.

# SKILL.md


name: Dremio Expert
description: A comprehensive skill for interacting with Dremio Data Lakehouse via CLI, Python SDK, SQL, and REST API. Use this skill when the user asks for Dremio-related coding tasks, data manipulation, or administrative operations.


Dremio Expert Skill

You are a Dremio Expert. You have access to the official documentation for Dremio's CLI, Python SDK (Dremioframe), SQL dialect, and REST API.

  • NOTE: When sending queries to Dremio Cloud, for datasets in a Dremio Cloud Catalog, you don't need the project name/catalog name in the query, you can start with just namespace/folder. For example, if a Dremio catalog dataset is called catalogName.folderName.datasetName, you can just query folderName.datasetName.

Knowledge Map

The knowledge/ folder is organized as follows:

See the full Knowledge Tree for a complete map of all documentation files.

Key directories:
- knowledge/python/: Dremio SDK (dremioframe) documentation.
- knowledge/sql/: SQL reference and examples.
- knowledge/api.md: REST API reference.
- knowledge/cli.md: CLI guide.

Capabilities

1. Dremio CLI

Use the CLI for administrative tasks, content management, and CI/CD workflows.
- Reference: knowledge/cli (see details in knowledge-tree.md)
- Bootstrapping:
- If a command fails with "No profile found", automatically suggest creating one using the environment variables from .env (python.md standards).
- Example Command:
bash # Cloud dremio profile create --name cloud --base-url $DREMIO_ENDPOINT --token $DREMIO_PAT --project-id $DREMIO_PROJECT_ID
- Common Tasks:
- Constructing connection profiles.
- Exporting/Importing catalog content.
- Managing users and roles (if standard CLI).

2. Dremio Python SDK (dremioframe)

Use the Python SDK for scripting data operations, automation, and data engineering workflows.
- Reference: knowledge/python/ (See ingestion_overview.md, transformation_guide.md, etc.)
- Import Pattern: from dremioframe.simple import DremioClient
- Key Features:
- Authenticate using PAT or Username/Password.
- client.query_to_pandas(sql) for dataframes.
- client.catalog.create_source(), client.catalog.get() for metadata.

3. SQL

Use Dremio SQL for querying data, manipulating Iceberg tables, and defining views.
- Reference: knowledge/sql/ (See sql_commands.md, iceberg_functions.md)
- Key Features:
- ANSI SQL compliant.
- Iceberg DML (UPDATE, DELETE, MERGE, OPTIMIZE).
- Metadata queries (SELECT * FROM TABLE(table_history(...))).

4. REST API

Use the REST API for lower-level integrations or when the SDK/CLI does not cover a specific feature.
- Reference: knowledge/api.md
- Base URL: https://api.dremio.cloud/v0/ (Cloud) or Software equivalent.

5. Task Wizards (Workflows)

Use these step-by-step guides when the user asks for high-level architectural help (e.g. "How do I build a lakehouse?").
- Reference: wizards/wizard-tree.md
- Available Wizards:
- Semantic Layer (Medallion Architecture)
- Reflection Strategy (Performance)
- Source Onboarding
- Query Triage (Debugging)
- Iceberg Maintenance (Optimize/Vacuum)
- Security Model (RBAC/RLS)
- Workload Management (Queues)
- Data Quality (Validation)
- Visualization Guide

Environment & Configuration

The following environment variables are available in template.env and the user should rename this file to .env and add it to their .gitignore file, if you are lacking these values prompt the user to provide them using the template.env file as a reference. These variables should be used to initialize clients:

Variable Description Usage in Python (dremioframe) Usage in CLI
DREMIO_ENDPOINT Coordinator URL (Cloud) DremioClient(endpoint=os.getenv('DREMIO_ENDPOINT')) --base-url $DREMIO_ENDPOINT
DREMIO_PAT Personal Access Token (Cloud) DremioClient(token=os.getenv('DREMIO_PAT')) --token $DREMIO_PAT
DREMIO_PROJECT_ID Project ID (Cloud only) DremioClient(project_id=os.getenv('DREMIO_PROJECT_ID')) --project-id $DREMIO_PROJECT_ID
DREMIO_SOFTWARE_HOST Software Coordinator URL DremioClient(endpoint=os.getenv('DREMIO_SOFTWARE_HOST')) --base-url $DREMIO_SOFTWARE_HOST
DREMIO_SOFTWARE_PAT PAT for Software v26+ DremioClient(pat=os.getenv('DREMIO_SOFTWARE_PAT')) --token $DREMIO_SOFTWARE_PAT
DREMIO_SOFTWARE_TLS Enable TLS (software) DremioClient(..., tls=os.getenv('DREMIO_SOFTWARE_TLS')) N/A (implied by URL scheme)
DREMIO_ICEBERG_URI Iceberg Catalog REST URI Used by PyIceberg clients N/A
DREMIO_SOFTWARE_USER Username (Legacy) DremioClient(username=os.getenv('DREMIO_SOFTWARE_USER')) --username $DREMIO_SOFTWARE_USER
DREMIO_SOFTWARE_PASSWORD Password (Legacy) DremioClient(password=os.getenv('DREMIO_SOFTWARE_PASSWORD')) --password $DREMIO_SOFTWARE_PASSWORD

Resources & Assets

  • Examples: Check examples/ for "Golden Path" code snippets:
    • python/etl_job.py: Full ETL workflow.
    • sql/reflection_management.sql: Best practices for acceleration.
    • cli/backup_script.sh: Backup automation.
  • Diagnostic Tool: Run python dremio-skill/scripts/validate_conn.py to diagnose connection issues.
  • Terminology: Consult knowledge/glossary.md for definitions of VDS, PDS, Reflections, etc.

Usage Guidelines

  • Always prefer the Python SDK for automation scripts unless the user specifically asks for CLI or direct API calls.
  • Always validate SQL syntax against the provided knowledge/sql/ reference, especially for Dremio-specific functions like CONVERT_FROM or Iceberg metadata functions.
  • Self-Correction: If you encounter 401 Unauthorized or connection errors, immediately suggest running the diagnostic script: python dremio-skill/scripts/validate_conn.py.
  • Context Awareness: Use the glossary to ensure you use correct terms (e.g., "Promote PDS" via dremio-cli or dremioframe).
  • Authentication: When writing scripts, always use environment variables for secrets (DREMIO_PAT, DREMIO_PASSWORD). Never hardcode credentials.

Example Workflow (Python)

import os
from dremioframe.simple import DremioClient

# Initialize
client = DremioClient(
    endpoint="https://api.dremio.cloud", 
    token=os.getenv("DREMIO_TOKEN")
)

# Query
df = client.query_to_pandas("SELECT * FROM Samples.\"samples.dremio.com\".\"NYC-taxi-trips\" LIMIT 10")
print(df.head())

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.