dey-abhishek

greenplum-to-databricks

0
0
# Install this skill:
npx skills add dey-abhishek/genie-code-migration-skills --skill "greenplum-to-databricks"

Install specific skill from multi-skill repository

# Description

Migrate Greenplum (GPDB) databases to Databricks Lakehouse. Use when planning or executing Greenplum migrations, converting GP SQL to Spark SQL, mapping data types, migrating stored procedures, external tables, distribution strategies, or ETL pipelines from Greenplum to Databricks Unity Catalog and Spark Declarative Pipelines.

# SKILL.md


name: greenplum-to-databricks
description: Migrate Greenplum (GPDB) databases to Databricks Lakehouse. Use when planning or executing Greenplum migrations, converting GP SQL to Spark SQL, mapping data types, migrating stored procedures, external tables, distribution strategies, or ETL pipelines from Greenplum to Databricks Unity Catalog and Spark Declarative Pipelines.


Greenplum to Databricks Migration

Expert guidance for migrating Greenplum (GPDB/Pivotal) data warehouses to Databricks Lakehouse Platform, covering schema conversion, SQL dialect translation, performance optimization, and pipeline modernization.

Capabilities

  • Convert Greenplum DDL to Databricks SQL with Delta Lake
  • Map Greenplum data types to Databricks/Spark types
  • Translate GP-specific SQL (window functions, arrays, regex) to Spark SQL
  • Migrate distribution/partitioning strategies to Liquid Clustering
  • Convert external tables (gpfdist, PXF, S3) to Unity Catalog external locations
  • Migrate PL/pgSQL stored procedures to Databricks SQL scripting (DBR 16.3+), stored procedures (DBR 17.0+), Python UDFs, or notebooks
  • Modernize ETL pipelines from GP workflows to Spark Declarative Pipelines (SDP)
  • Migrate Greenplum transactions to Databricks transaction modes (non-interactive and interactive)
  • Convert temporary tables from GP to Databricks temporary tables (SQL warehouse)
  • Plan phased migration with validation checkpoints

Workflow

Phase 1: Assessment

  1. Inventory GP schemas, tables, views, functions, external tables
  2. Identify data volumes, distribution keys, partition strategies
  3. Catalog PL/pgSQL procedures and their complexity
  4. Map dependencies and execution order
  5. Reference: 1-assessment-checklist.md

Phase 2: Schema & DDL Migration

  1. Convert GP DDL to Databricks SQL (Delta tables in Unity Catalog)
  2. Apply data type mappings (see reference)
  3. Replace DISTRIBUTED BY with CLUSTER BY (Liquid Clustering)
  4. Convert range/list partitions to Delta partitioning or clustering
  5. Reference: 2-schema-ddl-migration.md

Phase 3: SQL Dialect Conversion

  1. Identify target runtime (DBSQL, Spark SQL, or SDP) β€” see runtime matrix in reference
  2. Translate GP-specific functions to Databricks equivalents
  3. Convert :: casting to CAST() syntax (or keep β€” Databricks supports ::)
  4. Rewrite GP array operations, regex, string functions
  5. Handle ILIKE, SIMILAR TO, interval arithmetic
  6. Watch for behavioral differences (NULL ordering, integer division, concat with NULL)
  7. Convert GP transactions (BEGIN ... COMMIT) to Databricks transaction modes β€” see 10-transactions.md
  8. Migrate temporary tables (CREATE TEMP TABLE) to Databricks temporary tables β€” see 11-temporary-tables.md
  9. Reference: 3-sql-dialect-conversion.md

Phase 4: Data Type Mapping

  1. Map all GP types to Spark/Delta types
  2. Handle GP-specific types (money, inet, cidr, macaddr, geometric)
  3. Convert array types and composite types
  4. Reference: 4-data-type-mapping.md

Phase 5: Stored Procedure Migration

  1. Classify procedures by complexity (simple SQL, cursor-based, dynamic SQL)
  2. Convert simple SQL procedures to Databricks SQL functions
  3. Migrate complex PL/pgSQL to SQL scripting (DBR 16.3+), stored procedures (DBR 17.0+), or Python UDFs
  4. Migrate triggers to SDP expectations or notebook workflows
  5. Handle transactions within procedures β€” see 10-transactions.md
  6. Migrate temporary tables used in procedures β€” see 11-temporary-tables.md
  7. Reference: 5-stored-procedure-migration.md

Phase 6: External Table & Data Loading

  1. Map gpfdist/PXF external tables to Unity Catalog external locations
  2. Convert CREATE EXTERNAL TABLE to read_files() or Auto Loader
  3. Migrate S3/GCS/ADLS references to Volumes or external locations
  4. Reference: 6-external-tables-data-loading.md

Phase 7: ETL Pipeline Modernization

  1. Map GP ETL patterns to SDP medallion architecture
  2. Convert INSERT INTO ... SELECT chains to streaming tables
  3. Modernize scheduling from cron/GP orchestration to Databricks Jobs
  4. Reference: 7-etl-pipeline-modernization.md

Phase 8: Performance Tuning

  1. Replace distribution keys with Liquid Clustering columns
  2. Convert GP indexes to Z-ORDER or clustering strategies
  3. Optimize joins (broadcast hints, AQE)
  4. Tune file sizes and compaction
  5. Reference: 8-performance-tuning.md

Phase 9: Validation & Testing

  1. Row count and checksum validation
  2. Query result comparison
  3. Performance benchmarking
  4. Reference: 9-validation-testing.md

Quick Reference: Common Conversions

Greenplum Databricks
DISTRIBUTED BY (col) CLUSTER BY (col)
PARTITION BY RANGE (date) CLUSTER BY (date) or partitioning
CREATE EXTERNAL TABLE ... LOCATION read_files() / External Location
serial / bigserial BIGINT + GENERATED ALWAYS AS IDENTITY
text / varchar STRING
numeric(p,s) DECIMAL(p,s)
boolean BOOLEAN
timestamp with time zone TIMESTAMP (UTC normalized)
jsonb / json STRING (parse with : or from_json())
array[] ARRAY<>
col::type CAST(col AS type)
ILIKE ILIKE (natively supported)
IS DISTINCT FROM IS DISTINCT FROM (natively supported)
NOW() current_timestamp()
EXTRACT(EPOCH FROM ts) unix_timestamp(ts)
string_agg() string_agg() / listagg() (native)
generate_series() explode(sequence())
jsonb / json VARIANT (preferred), OBJECT, or STRING (: operator)
hstore MAP<STRING, STRING>
age(ts1, ts2) FLOOR(MONTHS_BETWEEN(ts1, ts2) / 12)
ENCODE/DECODE base64() / unbase64() / hex()
aes_encrypt / aes_decrypt aes_encrypt() / aes_decrypt() (native)
ST_GeomFromText etc. Native GEOMETRY/GEOGRAPHY + H3
xpath() xpath() / xpath_string() (native)
to_tsvector/tsquery RLIKE or Vector Search
postgres_fdw Lakehouse Federation (CREATE CONNECTION)
INHERITS Single table + type column, or UNION ALL view
PL/pgSQL function SQL scripting (DBR 16.3+), Python UDF, or SQL function
CREATE PROCEDURE CREATE PROCEDURE / CALL (DBR 17.0+, Unity Catalog)
PL/pgSQL CURSOR DECLARE CURSOR / OPEN / FETCH / CLOSE (DBR 18.1+)
BEGIN...END block SQL compound statements (IF, LOOP, WHILE, FOR, CASE) (DBR 16.3+)
EXECUTE query (dynamic SQL) EXECUTE IMMEDIATE (DBR 16.3+)
RAISE EXCEPTION SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = 'msg' (DBR 18.1+)
EXCEPTION WHEN ... THEN DECLARE HANDLER FOR ... (DBR 18.1+)
array_to_string() array_join()
col::type (unsafe) try_cast(col AS type) (NULL on failure)
json_object_keys() json_object_keys() (native)
from_xml() / xmltable() from_xml() / schema_of_xml() (native)
percentile_cont/disc percentile_cont() / percentile_disc() (native)
mode() / median mode() / median() (native)
gpfdist Auto Loader / read_files()
BEGIN; ... COMMIT; BEGIN ATOMIC ... END; (non-interactive) or BEGIN TRANSACTION; ... COMMIT; (interactive)
CREATE TEMP TABLE CREATE TEMPORARY TABLE (SQL warehouse only)
ALTER TABLE temp_table ADD COLUMN Drop and recreate (schema modifications not supported)

Examples

Example: Migrate a GP table

User says: "Convert this Greenplum CREATE TABLE to Databricks"
Result: Converts DDL with proper type mapping, replaces DISTRIBUTED BY with CLUSTER BY, adds to Unity Catalog

Example: Convert GP SQL query

User says: "Rewrite this Greenplum query for Spark SQL"
Result: Translates GP-specific syntax (::, ILIKE, arrays, intervals) to Spark SQL equivalents

Example: Plan a full migration

User says: "Help me migrate our Greenplum warehouse to Databricks"
Result: Generates phased migration plan with assessment, schema conversion, data loading, ETL modernization, and validation steps

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.