Manage Apple Reminders via the `remindctl` CLI on macOS (list, add, edit, complete, delete)....
npx skills add dey-abhishek/genie-code-migration-skills --skill "greenplum-to-databricks"
Install specific skill from multi-skill repository
# Description
Migrate Greenplum (GPDB) databases to Databricks Lakehouse. Use when planning or executing Greenplum migrations, converting GP SQL to Spark SQL, mapping data types, migrating stored procedures, external tables, distribution strategies, or ETL pipelines from Greenplum to Databricks Unity Catalog and Spark Declarative Pipelines.
# SKILL.md
name: greenplum-to-databricks
description: Migrate Greenplum (GPDB) databases to Databricks Lakehouse. Use when planning or executing Greenplum migrations, converting GP SQL to Spark SQL, mapping data types, migrating stored procedures, external tables, distribution strategies, or ETL pipelines from Greenplum to Databricks Unity Catalog and Spark Declarative Pipelines.
Greenplum to Databricks Migration
Expert guidance for migrating Greenplum (GPDB/Pivotal) data warehouses to Databricks Lakehouse Platform, covering schema conversion, SQL dialect translation, performance optimization, and pipeline modernization.
Capabilities
- Convert Greenplum DDL to Databricks SQL with Delta Lake
- Map Greenplum data types to Databricks/Spark types
- Translate GP-specific SQL (window functions, arrays, regex) to Spark SQL
- Migrate distribution/partitioning strategies to Liquid Clustering
- Convert external tables (gpfdist, PXF, S3) to Unity Catalog external locations
- Migrate PL/pgSQL stored procedures to Databricks SQL scripting (DBR 16.3+), stored procedures (DBR 17.0+), Python UDFs, or notebooks
- Modernize ETL pipelines from GP workflows to Spark Declarative Pipelines (SDP)
- Migrate Greenplum transactions to Databricks transaction modes (non-interactive and interactive)
- Convert temporary tables from GP to Databricks temporary tables (SQL warehouse)
- Plan phased migration with validation checkpoints
Workflow
Phase 1: Assessment
- Inventory GP schemas, tables, views, functions, external tables
- Identify data volumes, distribution keys, partition strategies
- Catalog PL/pgSQL procedures and their complexity
- Map dependencies and execution order
- Reference: 1-assessment-checklist.md
Phase 2: Schema & DDL Migration
- Convert GP DDL to Databricks SQL (Delta tables in Unity Catalog)
- Apply data type mappings (see reference)
- Replace
DISTRIBUTED BYwithCLUSTER BY(Liquid Clustering) - Convert range/list partitions to Delta partitioning or clustering
- Reference: 2-schema-ddl-migration.md
Phase 3: SQL Dialect Conversion
- Identify target runtime (DBSQL, Spark SQL, or SDP) — see runtime matrix in reference
- Translate GP-specific functions to Databricks equivalents
- Convert
::casting toCAST()syntax (or keep — Databricks supports::) - Rewrite GP array operations, regex, string functions
- Handle
ILIKE,SIMILAR TO, interval arithmetic - Watch for behavioral differences (NULL ordering, integer division, concat with NULL)
- Convert GP transactions (
BEGIN ... COMMIT) to Databricks transaction modes — see 10-transactions.md - Migrate temporary tables (
CREATE TEMP TABLE) to Databricks temporary tables — see 11-temporary-tables.md - Reference: 3-sql-dialect-conversion.md
Phase 4: Data Type Mapping
- Map all GP types to Spark/Delta types
- Handle GP-specific types (money, inet, cidr, macaddr, geometric)
- Convert array types and composite types
- Reference: 4-data-type-mapping.md
Phase 5: Stored Procedure Migration
- Classify procedures by complexity (simple SQL, cursor-based, dynamic SQL)
- Convert simple SQL procedures to Databricks SQL functions
- Migrate complex PL/pgSQL to SQL scripting (DBR 16.3+), stored procedures (DBR 17.0+), or Python UDFs
- Migrate triggers to SDP expectations or notebook workflows
- Handle transactions within procedures — see 10-transactions.md
- Migrate temporary tables used in procedures — see 11-temporary-tables.md
- Reference: 5-stored-procedure-migration.md
Phase 6: External Table & Data Loading
- Map gpfdist/PXF external tables to Unity Catalog external locations
- Convert
CREATE EXTERNAL TABLEtoread_files()or Auto Loader - Migrate S3/GCS/ADLS references to Volumes or external locations
- Reference: 6-external-tables-data-loading.md
Phase 7: ETL Pipeline Modernization
- Map GP ETL patterns to SDP medallion architecture
- Convert
INSERT INTO ... SELECTchains to streaming tables - Modernize scheduling from cron/GP orchestration to Databricks Jobs
- Reference: 7-etl-pipeline-modernization.md
Phase 8: Performance Tuning
- Replace distribution keys with Liquid Clustering columns
- Convert GP indexes to Z-ORDER or clustering strategies
- Optimize joins (broadcast hints, AQE)
- Tune file sizes and compaction
- Reference: 8-performance-tuning.md
Phase 9: Validation & Testing
- Row count and checksum validation
- Query result comparison
- Performance benchmarking
- Reference: 9-validation-testing.md
Quick Reference: Common Conversions
| Greenplum | Databricks |
|---|---|
DISTRIBUTED BY (col) |
CLUSTER BY (col) |
PARTITION BY RANGE (date) |
CLUSTER BY (date) or partitioning |
CREATE EXTERNAL TABLE ... LOCATION |
read_files() / External Location |
serial / bigserial |
BIGINT + GENERATED ALWAYS AS IDENTITY |
text / varchar |
STRING |
numeric(p,s) |
DECIMAL(p,s) |
boolean |
BOOLEAN |
timestamp with time zone |
TIMESTAMP (UTC normalized) |
jsonb / json |
STRING (parse with : or from_json()) |
array[] |
ARRAY<> |
col::type |
CAST(col AS type) |
ILIKE |
ILIKE (natively supported) |
IS DISTINCT FROM |
IS DISTINCT FROM (natively supported) |
NOW() |
current_timestamp() |
EXTRACT(EPOCH FROM ts) |
unix_timestamp(ts) |
string_agg() |
string_agg() / listagg() (native) |
generate_series() |
explode(sequence()) |
jsonb / json |
VARIANT (preferred), OBJECT, or STRING (: operator) |
hstore |
MAP<STRING, STRING> |
age(ts1, ts2) |
FLOOR(MONTHS_BETWEEN(ts1, ts2) / 12) |
ENCODE/DECODE |
base64() / unbase64() / hex() |
aes_encrypt / aes_decrypt |
aes_encrypt() / aes_decrypt() (native) |
ST_GeomFromText etc. |
Native GEOMETRY/GEOGRAPHY + H3 |
xpath() |
xpath() / xpath_string() (native) |
to_tsvector/tsquery |
RLIKE or Vector Search |
postgres_fdw |
Lakehouse Federation (CREATE CONNECTION) |
INHERITS |
Single table + type column, or UNION ALL view |
PL/pgSQL function |
SQL scripting (DBR 16.3+), Python UDF, or SQL function |
CREATE PROCEDURE |
CREATE PROCEDURE / CALL (DBR 17.0+, Unity Catalog) |
PL/pgSQL CURSOR |
DECLARE CURSOR / OPEN / FETCH / CLOSE (DBR 18.1+) |
BEGIN...END block |
SQL compound statements (IF, LOOP, WHILE, FOR, CASE) (DBR 16.3+) |
EXECUTE query (dynamic SQL) |
EXECUTE IMMEDIATE (DBR 16.3+) |
RAISE EXCEPTION |
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'msg' (DBR 18.1+) |
EXCEPTION WHEN ... THEN |
DECLARE HANDLER FOR ... (DBR 18.1+) |
array_to_string() |
array_join() |
col::type (unsafe) |
try_cast(col AS type) (NULL on failure) |
json_object_keys() |
json_object_keys() (native) |
from_xml() / xmltable() |
from_xml() / schema_of_xml() (native) |
percentile_cont/disc |
percentile_cont() / percentile_disc() (native) |
mode() / median |
mode() / median() (native) |
gpfdist |
Auto Loader / read_files() |
BEGIN; ... COMMIT; |
BEGIN ATOMIC ... END; (non-interactive) or BEGIN TRANSACTION; ... COMMIT; (interactive) |
CREATE TEMP TABLE |
CREATE TEMPORARY TABLE (SQL warehouse only) |
ALTER TABLE temp_table ADD COLUMN |
Drop and recreate (schema modifications not supported) |
Examples
Example: Migrate a GP table
User says: "Convert this Greenplum CREATE TABLE to Databricks"
Result: Converts DDL with proper type mapping, replaces DISTRIBUTED BY with CLUSTER BY, adds to Unity Catalog
Example: Convert GP SQL query
User says: "Rewrite this Greenplum query for Spark SQL"
Result: Translates GP-specific syntax (::, ILIKE, arrays, intervals) to Spark SQL equivalents
Example: Plan a full migration
User says: "Help me migrate our Greenplum warehouse to Databricks"
Result: Generates phased migration plan with assessment, schema conversion, data loading, ETL modernization, and validation steps
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.