sas-to-databricks

by @dey-abhishek in AI & LLM

# Install this skill:

npx skills add dey-abhishek/genie-code-migration-skills --skill "sas-to-databricks"

Install specific skill from multi-skill repository

# Description

Migrate SAS (Base SAS, SAS Viya, Enterprise Miner, Enterprise Guide) workloads to Databricks Lakehouse. Use when planning or executing SAS migrations, converting SAS DATA steps to PySpark/Spark SQL, migrating SAS macros, translating PROC steps, mapping SAS data types, modernizing SAS ETL to SDP pipelines, migrating SAS statistical models to MLflow, or converting SAS reporting to Databricks AI/BI.

# SKILL.md

name: sas-to-databricks
description: Migrate SAS (Base SAS, SAS Viya, Enterprise Miner, Enterprise Guide) workloads to Databricks Lakehouse. Use when planning or executing SAS migrations, converting SAS DATA steps to PySpark/Spark SQL, migrating SAS macros, translating PROC steps, mapping SAS data types, modernizing SAS ETL to SDP pipelines, migrating SAS statistical models to MLflow, or converting SAS reporting to Databricks AI/BI.

SAS to Databricks Migration

Expert guidance for migrating SAS workloads (Base SAS, SAS Viya, SAS Grid, Enterprise Miner, Enterprise Guide) to Databricks Lakehouse Platform, covering code conversion, statistical model migration, ETL modernization, and reporting replacement.

Capabilities

Convert SAS DATA steps to PySpark DataFrames or Spark SQL
Translate SAS PROC steps (PROC SQL, PROC SORT, PROC MEANS, etc.) to Databricks equivalents
Migrate SAS macros (%macro/%mend) to Python functions, Jinja templates, or parameterized notebooks
Map SAS data types (character, numeric, date/datetime/time) to Spark/Delta types
Convert SAS formats and informats to Databricks SQL functions
Migrate SAS datasets (.sas7bdat) to Delta Lake tables in Unity Catalog
Convert SAS libraries (LIBNAME) to Unity Catalog catalogs/schemas
Modernize SAS ETL jobs to Spark Declarative Pipelines (SDP) with medallion architecture
Migrate SAS Enterprise Miner models to MLflow + Databricks Model Serving
Migrate SAS statistical procedures (PROC LOGISTIC, PROC GLM, etc.) to scikit-learn, statsmodels, or Spark ML
Convert SAS Visual Analytics / Enterprise Guide reports to Databricks AI/BI Dashboards
Replace SAS scheduling (LSF, Control-M) with Databricks Workflows
Plan phased migration with validation checkpoints

Workflow

Phase 1: Assessment & Inventory

Inventory SAS programs, macros, datasets, libraries, scheduled jobs
Classify workloads: ETL, reporting, statistical modeling, ML
Identify SAS products in use (Base, Viya, EG, EM, VA, etc.)
Assess code complexity (DATA steps, PROC SQL, macros, custom functions)
Map data sources and downstream consumers
Reference: 1-assessment-inventory.md

Phase 2: Data Migration & Type Mapping

Convert SAS datasets (.sas7bdat) to Delta Lake tables
Map SAS data types to Spark types (character→STRING, numeric→DOUBLE/DECIMAL, etc.)
Migrate SAS formats/informats to SQL functions or Python formatters
Convert SAS libraries (LIBNAME) to Unity Catalog catalog.schema
Handle SAS-specific concepts: missing values, variable lengths, label/format metadata
Reference: 2-data-migration-type-mapping.md

Phase 3: DATA Step Conversion

Map SAS DATA step patterns to PySpark DataFrame operations
Convert SET/MERGE/UPDATE to joins, unions, and MERGE INTO
Translate BY-group processing to window functions and groupBy
Handle retain variables, first./last. logic, arrays, output statements
Convert SAS functions (PUT, INPUT, SUBSTR, SCAN, etc.) to Spark equivalents
Reference: 3-data-step-conversion.md

Phase 4: PROC Step Migration

Map SAS PROCs to Databricks equivalents (SQL, PySpark, or Python libraries)
Convert PROC SQL to Databricks SQL (mostly 1:1)
Replace PROC SORT with ORDER BY or DataFrame .sort()
Migrate PROC MEANS/SUMMARY/FREQ to aggregate SQL or pandas
Convert PROC TRANSPOSE to PIVOT/UNPIVOT
Migrate PROC IMPORT/EXPORT to read_files() / write patterns
Reference: 4-proc-step-migration.md

Phase 5: SAS Macro Migration

Convert %macro/%mend to Python functions
Migrate macro variables (%LET, &var) to Python variables or widget parameters
Handle %DO loops, %IF/%THEN, %INCLUDE
Convert autocall macro libraries to shared Python modules in Unity Catalog Volumes
Replace SAS macro-driven code generation with Jinja templates or parameterized queries
Reference: 5-macro-migration.md

Phase 6: Statistical Model Migration

Classify models by type (regression, classification, survival, time series)
Map SAS procedures to Python equivalents (PROC LOGISTIC → scikit-learn/statsmodels)
Migrate SAS Enterprise Miner workflows to MLflow experiments
Convert model scoring code to MLflow model serving
Handle SAS-specific statistical outputs (ODS, parameter estimates, fit statistics)
Reference: 6-statistical-model-migration.md

Phase 7: ETL Pipeline Modernization

Map SAS ETL patterns to SDP medallion architecture (Bronze/Silver/Gold)
Convert SAS batch jobs to Spark Declarative Pipelines
Modernize scheduling from SAS Grid/LSF/Control-M to Databricks Workflows
Replace SAS DI Studio flows with SDP streaming tables and materialized views
Reference: 7-etl-pipeline-modernization.md

Phase 8: Reporting & BI Migration

Map SAS Visual Analytics dashboards to Databricks AI/BI Dashboards
Convert SAS Enterprise Guide projects to notebooks
Replace PROC REPORT/TABULATE with SQL + AI/BI
Migrate SAS stored processes to Databricks SQL queries or Apps
Reference: 8-reporting-bi-migration.md

Phase 9: Performance Tuning

Replace SAS indexing with Liquid Clustering
Optimize PySpark for SAS-like row-by-row processing patterns
Tune parallelism (SAS threads → Spark partitions)
Handle large-scale sorts and merges efficiently
Reference: 9-performance-tuning.md

Phase 10: Validation & Testing

Row count and value comparison between SAS and Databricks outputs
Statistical result validation (coefficients, p-values, confidence intervals)
Performance benchmarking (SAS runtimes vs Databricks)
Parallel-run strategy for production cutover
Reference: 10-validation-testing.md

Quick Reference: Common Conversions

SAS	Databricks
`DATA work.output; SET input;`	`df = spark.table("input")` or `SELECT * FROM input`
`PROC SQL; CREATE TABLE ...`	`CREATE TABLE catalog.schema.table AS SELECT ...`
`PROC SORT DATA=x; BY a b;`	`SELECT * FROM x ORDER BY a, b`
`PROC MEANS DATA=x; VAR y;`	`SELECT AVG(y), STDDEV(y), MIN(y), MAX(y) FROM x`
`PROC FREQ DATA=x; TABLES a*b;`	`SELECT a, b, COUNT(*) FROM x GROUP BY a, b`
`PROC TRANSPOSE`	`PIVOT` / `UNPIVOT`
`PROC LOGISTIC`	`sklearn.linear_model.LogisticRegression` or `statsmodels`
`PROC REG` / `PROC GLM`	`sklearn.linear_model.LinearRegression` or `statsmodels.OLS`
`PROC IMPORT`	`read_files()` / `spark.read.csv()`
`PROC EXPORT`	`df.write.csv()` / `COPY INTO`
`%MACRO name(); ... %MEND;`	`def name(): ...` (Python function)
`%LET var = value;`	`var = "value"` or widget parameter
`&var` (macro variable)	f-string: `f"...{var}..."`
`LIBNAME lib '/path';`	`USE CATALOG catalog; USE SCHEMA schema;`
`.sas7bdat` dataset	Delta Lake table in Unity Catalog
`MISSING` / `.`	`NULL`
`character(n)`	`STRING`
`numeric` (8 bytes)	`DOUBLE` or `DECIMAL(p,s)`
`date` (SAS date value)	`DATE`
`datetime` (SAS datetime)	`TIMESTAMP`
`PUT(var, format.)`	`date_format()`, `format_number()`, `printf()`
`INPUT(var, informat.)`	`CAST()`, `to_date()`, `to_timestamp()`
`SUBSTR(str, pos, len)`	`SUBSTRING(str, pos, len)`
`SCAN(str, n, delim)`	`split(str, delim)[n-1]`
`COMPRESS(str)`	`regexp_replace(str, '\\s+', '')`
`CATX(sep, a, b, c)`	`concat_ws(sep, a, b, c)`
`INTCK('MONTH', d1, d2)`	`months_between(d2, d1)`
`INTNX('MONTH', d, n)`	`add_months(d, n)`
`FIRST.var / LAST.var`	`ROW_NUMBER()` window function
`RETAIN`	`LAG()` window function or stateful UDF
`ARRAY` (SAS array)	`array()` or multiple columns with `stack()`/`unpivot`
`OUTPUT` statement	Multiple writes or conditional `CASE WHEN`
SAS Grid / LSF scheduling	Databricks Workflows
SAS Enterprise Miner	MLflow + Mosaic AI
SAS Visual Analytics	Databricks AI/BI Dashboards
SAS Enterprise Guide	Databricks Notebooks
SAS DI Studio	Spark Declarative Pipelines (SDP)
SAS/ACCESS engines	Lakehouse Federation / Lakeflow Connect

Examples

Example: Migrate a SAS DATA step

User says: "Convert this SAS DATA step with MERGE and first./last. logic to PySpark"
Result: Converts to PySpark with proper join, window functions for BY-group processing, and RETAIN equivalent

Example: Convert a PROC step

User says: "Rewrite this PROC MEANS with CLASS and OUTPUT to Databricks SQL"
Result: Translates to GROUP BY aggregation with proper statistics mapping

Example: Plan a full SAS migration

User says: "Help me migrate our SAS environment with 500 programs to Databricks"
Result: Generates phased migration plan with assessment, code conversion, model migration, ETL modernization, and parallel-run validation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.