Healthcare systems generate enormous amounts of sensitive data, but moving, sharing, and analyzing that data securely across organizations is still a major challenge. In this post, we’ll look at how we at Kythera Labs use Databricks and Delta Sharing to manage more than 300 million patient records and support collaborations across healthcare and life sciences. The blog will cover the practical issues with older data‑sharing methods, why we adopted Delta Sharing, and the impact it’s had on our storage costs, efficiency, and real‑time collaboration.
Kythera Labs is a data technology company that empowers healthcare and life sciences organizations with a unified, high-fidelity healthcare data platform for analysis. As a built-on Databricks Partner, we chose Databricks and Delta Sharing not just for internal data sharing but also to support seamless data exchange with external partners. Today, more than 80% of our customers use products built on the platform. We also support external collaborations, including organizations like Exact Sciences, using Delta Sharing across 50 active customer workspaces.
Kythera Labs chose Delta Sharing to overcome significant challenges in securely sharing healthcare data. With over 300 million patient records spanning a decade of clinical history, traditional methods required creating and moving multiple full copies of datasets, driving storage costs into the hundreds of thousands of dollars and slowing delivery.
Delta Sharing changes that by enabling secure, real‑time access to live data without creating duplicate copies. Instead of storing and maintaining separate datasets for each partner or environment, we can share a single, governed source of truth directly. This approach has allowed us to power internal teams and external collaborations with just 3.5 PB of storage, rather than the 20‑plus PB otherwise required.
Another complexity is meeting our customers where they are on the cloud. Healthcare providers often operate in Azure, while many pharmaceutical companies run on AWS or GCP. Without a technology like Delta Sharing, delivering large datasets across clouds would mean costly transfers, complex ETL work, and multiple stale copies scattered across clouds. With Delta Sharing, we can instantly provide secure access to the same live dataset — no matter the cloud — while maintaining compliance and eliminating unnecessary copies.
This not only streamlines our internal workflows (moving from development to testing to production without re‑copying data) but also makes it easy for customers to act faster, like instantly updating a cancer treatment model with the newest data.
Given the exponential growth in data volume and complexity, traditional data sharing methods like SFTP servers are no longer viable for modern needs. Moving large files back and forth introduces delays, adds security risks, and requires storage of multiple redundant datasets.
While APIs could be a resource, they are insufficient for sharing the vast oceans of data that organizations like Kythera manage. Relying on APIs to share the immense volumes of data we manage would be like trying to fill a swimming pool with a garden hose—it’s technically possible, but too slow and inefficient for our needs.
Operationally, we handle 7–10 million transactions daily while ensuring compliance through our custom “Vault Architecture” built on Delta Sharing. Customers benefit from real-time updates via view sharing without manual intervention.
By adopting Delta Sharing, we’ve completely moved away from these legacy methods and gained operational efficiency while enabling seamless collaboration across clouds and organizations.
Delta Sharing has allowed us to eliminate legacy data-sharing methods, cut storage needs by over 80%, and save more than $2 million in the last 2 years. — Jeff McDonald, CEO, Kythera Labs
Delta Sharing helped Kythera cut storage needs from a projected 24 PB to just 3.5 PB. Over three years, storage demand dropped from 17 PB/month in 2024 to 12 PB/month in 2023 and 6 PB/month in 2022. Those reductions add up to millions in savings. For context, large pharmaceutical companies can spend as much as $14 million each month just on storage.
Storage is just part of the story. The compute costs for performing the ETL copies could be even more significant, ranging from equal to the storage savings to potentially many times greater, depending on the use cases.
Year | Reduction in storage needs | AWS S3 Standard Cost ( PB/month) | Yearly Savings (50% storage discount) |
---|---|---|---|
2024 | 17 PB/month | $21K | $2.1M |
2023 | 12 PB/month | $21K | $1.5M |
2022 | 6 PB/month | $21K | $0.75M |
TOTAL | $4.375M |
Delta Sharing has transformed our data-sharing capabilities by reducing costs, improving efficiency, and enabling real-time collaboration across clouds and organizations. The combination of Delta Sharing, Unity Catalog, and liquid clustering ensures scalability while maintaining compliance with healthcare data standards, exemplifying how open, modern data platforms can revolutionize healthcare analytics.