Skip to main content
Ensemble Health Partners

CUSTOMER
STORY

Ensemble scales governed AI with Unity Catalog

10,000 tables

Migrated to Unity Catalog in one night

$46 billion

In net patient revenue managed

8,000+ users

Enabled by upstream data engineering work

Highlights

Problem:Ensemble needed a scalable, unified way to govern healthcare data and securely support AI across environments while maintaining strict HIPAA compliance

Solution:Ensemble unified governance with Unity Catalog, enabling consistent access control, lineage, and secure AI across data, users, and environments

Products:Unity Catalog, Agent Bricks

Ensemble provides end‑to‑end, technology‑enabled revenue cycle management (RCM) for hundreds of hospitals and physician groups nationwide, leveraging unmatched healthcare data scale to help providers navigate rising payer pressure and thin margins. As Ensemble expands its AI initiatives across the revenue cycle, from patient engagement to account resolution, governance has become foundational. The organization needed a unified approach to securely power AI while maintaining strict HIPAA compliance. To create a robust governance platform with better data discovery, cataloging, access control, lineage, and auditing capabilities, Ensemble turned to Databricks and Unity Catalog.

Managing complex environments with limited capabilities

Ensemble supports approximately 8,000+ users, enabled by upstream data engineering work, including BI, operations, and innovation teams that consume the data. The company also has a dedicated AI team designing and deploying multiple use cases across its data lake, including prior authorizations, coding and charge capture validation, accounts receivable follow-up, clinical denials and appeals, underpayments, and build-up rate optimization.

To support these expanding AI initiatives, Ensemble is building a Unified Data Hub (UDH), a platinum data layer designed to standardize enterprise data products across clients and business lines. The UDH serves as the governed foundation for AI and analytics, ensuring that trusted, well-documented datasets are consistently consumed across the organization.

Ensemble’s legacy Hive Metastore environment proved difficult to replicate across user environments. Back-end processes were required to implement data masking, creating operational complexity. “We have a production workspace and a user acceptance testing (UAT) workspace, and it was really hard to move a copy of production data down to UAT. Having fresh, recent data is important,” said Pheanouk Pel, AVP, Data Engineering.

Data discovery and cataloging capabilities were limited, and access control, lineage, and auditing were not sufficient for Ensemble’s needs. With healthcare data requiring strict governance and the growing demand for AI-driven use cases, Ensemble needed a more robust platform to consistently manage data across environments while meeting the security and compliance standards expected in healthcare.

Migrating to Unity Catalog with a custom framework

To address these challenges, Ensemble migrated from Hive Metastore to Unity Catalog. Preparation included generating a complete inventory of database objects and validating dependencies (particularly views) to ensure all referenced tables were migrated in the correct order.

“It was pretty painless,” said Brad Beck, Senior Manager, Data Engineering. “We did a lot of validating—probably over validated—in our test environment to make sure everything would work.”

After several months of preparation, the full cutover to Unity Catalog was completed in a single night. The team migrated an estimated 10,000 tables.

Ensemble updated its custom tooling platform, “Flux,” to support Unity Catalog. Because most pipelines are built on this framework, once the platform upgrade was complete, teams were able to migrate seamlessly. Backward compatibility allowed critical data assets, approximately 85% to 90% of enterprise datasets, to move first, followed by supplemental data.

While the migration was operationally significant, the larger impact was strategic: Unity Catalog became the centralized governance layer supporting Ensemble’s data products, AI models, and enterprise reporting.

Unity Catalog also simplified cross-environment replication. Instead of relying on complex workarounds, the team can now use deep clones across catalogs to move data between production and UAT, significantly reducing operational friction while maintaining governance controls.

Immediate improvements in governance and data management

With Unity Catalog in place, data discovery, cataloging, access control, lineage, and auditing capabilities improved immediately.

“Being able to look back at the lineage of data to know where it was sourced through all the layers has been very helpful,” said Beck. “We really ramped up cataloging right after the UC migration, so we have the majority of our enterprise-level data cataloged. That’s been a great search tool for our analytics teams.”

Access control also became more robust. “Access control is much stronger in Unity Catalog with how we cascade roles and manage access requests,” said Michelle McDearmon, AVP, Data Engineering.

The team has implemented a scalable HIPAA-compliant masking pattern using Azure role-based access control (RBAC), Unity Catalog policies, and tag-based masking (ABAC patterns). Fields tagged as sensitive or PHI are automatically masked for users outside authorized groups, ensuring compliance without manual intervention for each new schema or table.

Unity Catalog has also strengthened Ensemble’s AI governance posture. The company is leveraging Agent Bricks to power use cases such as a predictive mapper that helps standardize incoming healthcare data from new clients. Because Agent Bricks operates within Unity Catalog’s governed environment, AI agents inherit the same access controls and policies as the underlying data.

“When we use Agent Bricks, the agent becomes an object that we can permission just like anything else,” said Pel. “That makes secure AI deployment seamless.”

For Ensemble, this consistency ensures that AI innovation does not compromise the security of protected health information.

The team also leverages Unity Catalog system tables for cost monitoring and observability. Tags pushed down from external orchestration tools, such as Azure Data Factory, enable detailed dashboards that break down consumption by client, business line, and team. This visibility allows Ensemble to track usage and make informed resource decisions.

Building the future of governed healthcare data products

Looking ahead, the Unified Data Hub (UDH) will play an increasingly central role in Ensemble’s architecture. As enterprise data products are standardized into the platinum layer, Unity Catalog will provide the governance backbone, ensuring consistent policies, lineage, and access controls across analytics and AI workloads.

The next phase includes expanding metadata richness, strengthening documentation, and integrating Microsoft Purview to enhance discoverability across business users. By formalizing enterprise data products within the UDH, Ensemble enables faster, self-service access to governed data across the organization, reducing bottlenecks and putting trusted data directly in the hands of those who need it.

With Unity Catalog as the control plane for data and AI governance, Ensemble is positioned to accelerate AI innovation while maintaining the security and compliance standards required in healthcare.