Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add vsvale/skills_databricks_assistent_agent --skill "databricks_connect"
Install specific skill from multi-skill repository
# Description
Set up and use Databricks Connect for Python to run local code against Databricks compute. Covers installation, authentication, cluster or serverless configuration, and running PySpark from IDEs such as PyCharm or VS Code. Use when connecting a local IDE to Databricks, developing Spark apps locally, debugging against a remote cluster, or when the user mentions Databricks Connect, local development with Databricks, or running PySpark on Databricks from a laptop.
# SKILL.md
name: databricks_connect
description: Set up and use Databricks Connect for Python to run local code against Databricks compute. Covers installation, authentication, cluster or serverless configuration, and running PySpark from IDEs such as PyCharm or VS Code. Use when connecting a local IDE to Databricks, developing Spark apps locally, debugging against a remote cluster, or when the user mentions Databricks Connect, local development with Databricks, or running PySpark on Databricks from a laptop.
Databricks Connect for Python
Databricks Connect lets you run Python and PySpark code from a local IDE (PyCharm, VS Code, Jupyter) against a Databricks cluster or serverless compute. Code executes on Databricks; your environment only needs the client and auth.
When to Use This Skill
Use when the user needs to:
- Connect a local Python environment or IDE to a Databricks workspace
- Develop or debug Spark/PySpark code locally while execution runs on Databricks
- Set up Databricks Connect for the first time or fix connection issues
- Choose and configure cluster vs serverless for Databricks Connect
Requirements (Summary)
Workspace: Unity Catalog must be enabled. Target compute must be a cluster (access mode Assigned or Shared) or serverless.
Local: Python 3.10+ (exact version depends on Databricks Connect and compute type). Authentication to Databricks must be configured (e.g. OAuth U2M via Databricks CLI).
Version rule: Databricks Connect package version must match or be compatible with the Databricks Runtime of the target cluster or serverless. See references/REFERENCES.md for the compatibility table.
Step-by-Step Setup
1. Activate a Virtual Environment
Use a dedicated venv or Poetry env for each Python version you use with Databricks Connect.
# Example with venv
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
2. Remove Conflicting PySpark
Databricks Connect bundles and manages Spark; a separate PySpark install conflicts. Uninstall it first:
pip uninstall pyspark
3. Install Databricks Connect
Match the client version to your cluster or serverless runtime (e.g. 16.4 for DBR 16.4.x). Prefer a “latest patch” spec so you get compatible fixes:
# Replace 16.4 with your cluster/serverless runtime major.minor
pip install --upgrade "databricks-connect==16.4.*"
With Poetry, use the same version in pyproject.toml and poetry add databricks-connect==16.4.* (or ~16.4.0 for patch updates).
4. Configure Authentication
OAuth user-to-machine (U2M) is the typical option for interactive use. Use the Databricks CLI to log in and create a profile that includes cluster (or serverless) config:
databricks auth login --configure-cluster --host https://<workspace-name>.cloud.databricks.com
Follow the prompts to pick or create a cluster and save the profile. Databricks Connect will use this profile by default.
For CI or headless use, use OAuth M2M or other supported auth and set the same parameters via env vars or a config file. See references/REFERENCES.md for links.
5. Point to Compute (Cluster or Serverless)
If you did not use --configure-cluster, or you want to override the default, set connection options explicitly.
Cluster: set cluster_id (or use the cluster ID from the profile).
Serverless: set serverless compute options (workspace, resource ID, etc.) as in the compute configuration docs. Serverless is supported from Databricks Connect 15.1+; check version compatibility for your runtime.
6. Validate the Connection
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
spark.range(10).count() # 10
If this runs without error, the client is talking to your Databricks compute.
Common Code Patterns
Create a Spark Session
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
Use Unity Catalog Tables
# Three-level namespace
df = spark.table("main.default.my_table")
# Or
df = spark.sql("SELECT * FROM main.default.my_table")
Run SQL
spark.sql("CREATE TABLE IF NOT EXISTS main.default.out AS SELECT 1 AS id")
DataFrame API
Standard PySpark DataFrame API works; execution is on Databricks:
df = spark.range(100).withColumn("double", F.col("id") * 2)
df.write.format("delta").mode("overwrite").saveAsTable("main.default.example")
IDE and Tools
- VS Code: Use the Databricks extension; it can install and use Databricks Connect and add “Run on Databricks” for Python files.
- PyCharm: Configure the project interpreter to the venv where
databricks-connectis installed; run scripts as usual—they use the session fromDatabricksSession.builder.getOrCreate(). - Jupyter: Install
databricks-connectin the kernel’s environment and useDatabricksSessionin notebooks.
Edge Cases and Troubleshooting
| Issue | What to do |
|---|---|
PySpark / “conflicting PySpark” errors |
Ensure pyspark is uninstalled in the same env as databricks-connect. |
| “Cluster not found” or “invalid compute” | Check cluster exists, is running, and has access mode Assigned or Shared; confirm cluster_id or serverless config. |
| Version / “incompatible runtime” errors | Align Databricks Connect version with cluster/serverless runtime; use pip show databricks-connect and compare to release notes. |
| Auth errors (e.g. “not authenticated”) | Run databricks auth login --configure-cluster --host <workspace-url> and rerun; for non-interactive, configure env vars or config for the chosen auth type. |
| Unity Catalog / permission errors | Confirm workspace has Unity Catalog enabled and the identity has required catalog/schema/table privileges. |
| UDFs behave differently or fail | Use a local Python minor version that matches the Databricks Runtime Python; see references/REFERENCES.md. |
Best Practices
- One env per runtime: Use a separate virtualenv per Databricks Runtime (or major.minor) you target.
- Version alignment: Keep
databricks-connect==<major>.<minor>.*in sync with cluster/serverless runtime. - No standalone PySpark: Do not install
pysparkin the same environment asdatabricks-connect. - Prefer serverless for ephemeral workloads: If your workspace supports it, serverless can simplify “no cluster to manage” workflows; check compatibility for your Databricks Connect version.
For detailed requirements, version matrix, and official doc links, see references/REFERENCES.md.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.