Migrating Complex SAS Processes to Databricks - Case Study
Type
- Session
Format
- Hybrid
Track
- Industry and Business Use Cases
Industry
- Public Sector
Difficulty
- Intermediate
Room
- Moscone South | Upper Mezzanine | 159
Duration
- 35 min
Overview
Many federal agencies use SAS software for critical operational data processes (ETL). While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Numerous benefits resulted in migrating to Databricks. These include:
- The application scales to accommodate workloads with query plans which are enhanced and contain proper exception/error handling
- Workflows are automated, with data pipelines which are native to Databricks and able to run in parallel
- Users can develop collaboratively, using an integrated workspace
- Integration of new data sources and/or the inclusion of new data definitions is easier and better governed
- The code is optimized to run in Databricks rather than simply migrated over to run sub-optimally
- Runtime performance gains of ~75% enable CMS to meet the needs of their state customers in a timelier manner
- Legacy developers are provided with a primer to follow when migrating current work processes to Databricks
- Removal of the reliance on SAS programmers, specifically on individuals proficient in SAS Macro Programming
See the best of Data+AI Summit
Watch on demand