dey-abhishek

sas-to-databricks

0
0
# Install this skill:
npx skills add dey-abhishek/genie-code-migration-skills --skill "sas-to-databricks"

Install specific skill from multi-skill repository

# Description

Migrate SAS (Base SAS, SAS Viya, Enterprise Miner, Enterprise Guide) workloads to Databricks Lakehouse. Use when planning or executing SAS migrations, converting SAS DATA steps to PySpark/Spark SQL, migrating SAS macros, translating PROC steps, mapping SAS data types, modernizing SAS ETL to SDP pipelines, migrating SAS statistical models to MLflow, or converting SAS reporting to Databricks AI/BI.

# SKILL.md


name: sas-to-databricks
description: Migrate SAS (Base SAS, SAS Viya, Enterprise Miner, Enterprise Guide) workloads to Databricks Lakehouse. Use when planning or executing SAS migrations, converting SAS DATA steps to PySpark/Spark SQL, migrating SAS macros, translating PROC steps, mapping SAS data types, modernizing SAS ETL to SDP pipelines, migrating SAS statistical models to MLflow, or converting SAS reporting to Databricks AI/BI.


SAS to Databricks Migration

Expert guidance for migrating SAS workloads (Base SAS, SAS Viya, SAS Grid, Enterprise Miner, Enterprise Guide) to Databricks Lakehouse Platform, covering code conversion, statistical model migration, ETL modernization, and reporting replacement.

Capabilities

  • Convert SAS DATA steps to PySpark DataFrames or Spark SQL
  • Translate SAS PROC steps (PROC SQL, PROC SORT, PROC MEANS, etc.) to Databricks equivalents
  • Migrate SAS macros (%macro/%mend) to Python functions, Jinja templates, or parameterized notebooks
  • Map SAS data types (character, numeric, date/datetime/time) to Spark/Delta types
  • Convert SAS formats and informats to Databricks SQL functions
  • Migrate SAS datasets (.sas7bdat) to Delta Lake tables in Unity Catalog
  • Convert SAS libraries (LIBNAME) to Unity Catalog catalogs/schemas
  • Modernize SAS ETL jobs to Spark Declarative Pipelines (SDP) with medallion architecture
  • Migrate SAS Enterprise Miner models to MLflow + Databricks Model Serving
  • Migrate SAS statistical procedures (PROC LOGISTIC, PROC GLM, etc.) to scikit-learn, statsmodels, or Spark ML
  • Convert SAS Visual Analytics / Enterprise Guide reports to Databricks AI/BI Dashboards
  • Replace SAS scheduling (LSF, Control-M) with Databricks Workflows
  • Plan phased migration with validation checkpoints

Workflow

Phase 1: Assessment & Inventory

  1. Inventory SAS programs, macros, datasets, libraries, scheduled jobs
  2. Classify workloads: ETL, reporting, statistical modeling, ML
  3. Identify SAS products in use (Base, Viya, EG, EM, VA, etc.)
  4. Assess code complexity (DATA steps, PROC SQL, macros, custom functions)
  5. Map data sources and downstream consumers
  6. Reference: 1-assessment-inventory.md

Phase 2: Data Migration & Type Mapping

  1. Convert SAS datasets (.sas7bdat) to Delta Lake tables
  2. Map SAS data types to Spark types (character→STRING, numeric→DOUBLE/DECIMAL, etc.)
  3. Migrate SAS formats/informats to SQL functions or Python formatters
  4. Convert SAS libraries (LIBNAME) to Unity Catalog catalog.schema
  5. Handle SAS-specific concepts: missing values, variable lengths, label/format metadata
  6. Reference: 2-data-migration-type-mapping.md

Phase 3: DATA Step Conversion

  1. Map SAS DATA step patterns to PySpark DataFrame operations
  2. Convert SET/MERGE/UPDATE to joins, unions, and MERGE INTO
  3. Translate BY-group processing to window functions and groupBy
  4. Handle retain variables, first./last. logic, arrays, output statements
  5. Convert SAS functions (PUT, INPUT, SUBSTR, SCAN, etc.) to Spark equivalents
  6. Reference: 3-data-step-conversion.md

Phase 4: PROC Step Migration

  1. Map SAS PROCs to Databricks equivalents (SQL, PySpark, or Python libraries)
  2. Convert PROC SQL to Databricks SQL (mostly 1:1)
  3. Replace PROC SORT with ORDER BY or DataFrame .sort()
  4. Migrate PROC MEANS/SUMMARY/FREQ to aggregate SQL or pandas
  5. Convert PROC TRANSPOSE to PIVOT/UNPIVOT
  6. Migrate PROC IMPORT/EXPORT to read_files() / write patterns
  7. Reference: 4-proc-step-migration.md

Phase 5: SAS Macro Migration

  1. Convert %macro/%mend to Python functions
  2. Migrate macro variables (%LET, &var) to Python variables or widget parameters
  3. Handle %DO loops, %IF/%THEN, %INCLUDE
  4. Convert autocall macro libraries to shared Python modules in Unity Catalog Volumes
  5. Replace SAS macro-driven code generation with Jinja templates or parameterized queries
  6. Reference: 5-macro-migration.md

Phase 6: Statistical Model Migration

  1. Classify models by type (regression, classification, survival, time series)
  2. Map SAS procedures to Python equivalents (PROC LOGISTIC → scikit-learn/statsmodels)
  3. Migrate SAS Enterprise Miner workflows to MLflow experiments
  4. Convert model scoring code to MLflow model serving
  5. Handle SAS-specific statistical outputs (ODS, parameter estimates, fit statistics)
  6. Reference: 6-statistical-model-migration.md

Phase 7: ETL Pipeline Modernization

  1. Map SAS ETL patterns to SDP medallion architecture (Bronze/Silver/Gold)
  2. Convert SAS batch jobs to Spark Declarative Pipelines
  3. Modernize scheduling from SAS Grid/LSF/Control-M to Databricks Workflows
  4. Replace SAS DI Studio flows with SDP streaming tables and materialized views
  5. Reference: 7-etl-pipeline-modernization.md

Phase 8: Reporting & BI Migration

  1. Map SAS Visual Analytics dashboards to Databricks AI/BI Dashboards
  2. Convert SAS Enterprise Guide projects to notebooks
  3. Replace PROC REPORT/TABULATE with SQL + AI/BI
  4. Migrate SAS stored processes to Databricks SQL queries or Apps
  5. Reference: 8-reporting-bi-migration.md

Phase 9: Performance Tuning

  1. Replace SAS indexing with Liquid Clustering
  2. Optimize PySpark for SAS-like row-by-row processing patterns
  3. Tune parallelism (SAS threads → Spark partitions)
  4. Handle large-scale sorts and merges efficiently
  5. Reference: 9-performance-tuning.md

Phase 10: Validation & Testing

  1. Row count and value comparison between SAS and Databricks outputs
  2. Statistical result validation (coefficients, p-values, confidence intervals)
  3. Performance benchmarking (SAS runtimes vs Databricks)
  4. Parallel-run strategy for production cutover
  5. Reference: 10-validation-testing.md

Quick Reference: Common Conversions

SAS Databricks
DATA work.output; SET input; df = spark.table("input") or SELECT * FROM input
PROC SQL; CREATE TABLE ... CREATE TABLE catalog.schema.table AS SELECT ...
PROC SORT DATA=x; BY a b; SELECT * FROM x ORDER BY a, b
PROC MEANS DATA=x; VAR y; SELECT AVG(y), STDDEV(y), MIN(y), MAX(y) FROM x
PROC FREQ DATA=x; TABLES a*b; SELECT a, b, COUNT(*) FROM x GROUP BY a, b
PROC TRANSPOSE PIVOT / UNPIVOT
PROC LOGISTIC sklearn.linear_model.LogisticRegression or statsmodels
PROC REG / PROC GLM sklearn.linear_model.LinearRegression or statsmodels.OLS
PROC IMPORT read_files() / spark.read.csv()
PROC EXPORT df.write.csv() / COPY INTO
%MACRO name(); ... %MEND; def name(): ... (Python function)
%LET var = value; var = "value" or widget parameter
&var (macro variable) f-string: f"...{var}..."
LIBNAME lib '/path'; USE CATALOG catalog; USE SCHEMA schema;
.sas7bdat dataset Delta Lake table in Unity Catalog
MISSING / . NULL
character(n) STRING
numeric (8 bytes) DOUBLE or DECIMAL(p,s)
date (SAS date value) DATE
datetime (SAS datetime) TIMESTAMP
PUT(var, format.) date_format(), format_number(), printf()
INPUT(var, informat.) CAST(), to_date(), to_timestamp()
SUBSTR(str, pos, len) SUBSTRING(str, pos, len)
SCAN(str, n, delim) split(str, delim)[n-1]
COMPRESS(str) regexp_replace(str, '\\s+', '')
CATX(sep, a, b, c) concat_ws(sep, a, b, c)
INTCK('MONTH', d1, d2) months_between(d2, d1)
INTNX('MONTH', d, n) add_months(d, n)
FIRST.var / LAST.var ROW_NUMBER() window function
RETAIN LAG() window function or stateful UDF
ARRAY (SAS array) array() or multiple columns with stack()/unpivot
OUTPUT statement Multiple writes or conditional CASE WHEN
SAS Grid / LSF scheduling Databricks Workflows
SAS Enterprise Miner MLflow + Mosaic AI
SAS Visual Analytics Databricks AI/BI Dashboards
SAS Enterprise Guide Databricks Notebooks
SAS DI Studio Spark Declarative Pipelines (SDP)
SAS/ACCESS engines Lakehouse Federation / Lakeflow Connect

Examples

Example: Migrate a SAS DATA step

User says: "Convert this SAS DATA step with MERGE and first./last. logic to PySpark"
Result: Converts to PySpark with proper join, window functions for BY-group processing, and RETAIN equivalent

Example: Convert a PROC step

User says: "Rewrite this PROC MEANS with CLASS and OUTPUT to Databricks SQL"
Result: Translates to GROUP BY aggregation with proper statistics mapping

Example: Plan a full SAS migration

User says: "Help me migrate our SAS environment with 500 programs to Databricks"
Result: Generates phased migration plan with assessment, code conversion, model migration, ETL modernization, and parallel-run validation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.