Improving data quality and speed while lowering costs
Today, Kythera Labs uses Databricks to refine, remaster and build its high-value data assets more cost-effectively than it could with its own infrastructure and to deliver those assets to its clients through Wayfinder. Streaming and processing big data in a single lakehouse has simplified permissions and lowered ETL needs, which translates to speed. “We have gained speed in a variety of ways, primarily by being able to set up more jobs in parallel with the scalable infrastructure,” says Matt Ryan, Co-founder and Director of Engineering at Kythera Labs.
As a result, Kythera can now easily manage all 45 terabytes of its remastered healthcare data.
For Kythera and its clients, the use of DLT translates into time and cost savings. “Before DLT (on Databricks), we could design, test and run a query pipeline in 2 days. With DLT, we can do it in 2 to 4 minutes. It can take about 2 weeks for our competitors that don’t use Databricks,” says McDonald.
Finally, the lakehouse architecture unifies data so Kythera can use it for both analysis and machine learning while reducing the risk of data egress and helping customers lower costs, especially compared to Snowflake. “Customers tell us their Snowflake costs are too high, almost without exception. And we know from experience,” says McDonald. “We tried this with Snowflake; the ETL and egress costs were nearly 5x what we spend with the Databricks Data Intelligence Platform. When our customers want to deconstruct the geographic distribution of 10 million cancer patients, the cost adds up quickly if your data isn’t ready for analysis. When we start someone in a de-normalized model [on Databricks], they instantly see the data they want without having to be an engineer. The data is prepped and ready for analysis.”
Overall, simplified architecture, reduced engineering requirements and automated cluster management have enabled Kythera to reduce IT operational costs. “To develop this same infrastructure without Databricks would be extremely expensive and time-consuming, taking focus away from building our business,” says McDonald. “Databricks is the only way we make our business grow. We’ve tried Snowflake, we’ve tried others, and we’re sticking with Databricks.”
While it’s difficult to put a number on how Databricks has improved Kythera’s business, McDonald says it’s improved by “orders of magnitude” when it comes to opportunity. “We’ve won at least three new clients because of how we're able to fast-track their analysis with data they can use immediately,” he says. “We just signed a large pharma company that told us use of our products will save them 2 years in development.”
In the future, Kythera Labs also plans to use Delta Sharing to support sales with fully accessible “preview” data sets for engaged prospects and integrate remastered claims data with client-sourced data. This “bring your own data” paradigm will enable the analysis of more accurate, representative data — using Delta Lake’s optimized file partitioning to optimize storage and make its pipelines more efficient.
“No one wants to buy healthcare claims data. They want answers,” says McDonald. “Our customers are trying to determine if they should invest in developing a new treatment for a rare disease, expand their service lines, or open a new location. In the lakehouse, we accelerate their ability to access data and discover the answers they need.”