Powering limitless video experiences worldwide
Reduction in AWS fees
Reduction in latency for support engineers to get data logs
Kaltura’s mission is to power any video experience for any organization. Customers deploy the company’s wide array of video solutions to help them teach, learn, communicate, collaborate and entertain. Facing the challenge of building a near real-time event pipeline, Kaltura streamlined and enhanced its workflows by deploying Databricks Data Intelligence Platform and dbt to replace its legacy architecture. This resulted in faster processing speeds and fewer man-hours spent. The company also supports ambitious, resource-intensive new use cases in the lakehouse. As Kaltura’s computing needs continue to increase, the data team can easily scale up resources in Databricks.
Video SaaS solutions delight millions of viewers worldwide
Kaltura provides live, real-time and on-demand video SaaS solutions for over 1,000 customers who engage millions of viewers at home, work and school. Its virtual events products exploded in popularity during the COVID-19 pandemic. Seeking to support all the company’s data needs, Kaltura’s data team recently set out to build a new data lake platform that would replace legacy infrastructure.
“When I first joined our team, we were looking to scale significantly,” recalled Omer Kolodny, Data Engineer Team Lead at Kaltura. “We kept running into challenges and therefore decided to shift to a new, more scalable infrastructure.”
The turning point came when the data team was asked to create a new data product based on streaming events sent from users’ devices. This near real-time event pipeline would need to capture events and write them directly into a data lake. In the process, the pipeline would detect anomalies and notify stakeholders of spikes in the number of events.
Lakehouse architecture streamlines workflows and reduces costs
Around that time, Kaltura’s team heard the buzz about data lakes and decided to try Databricks Data Intelligence Platform. The company soon launched a proof of concept.
“Right off the bat, Databricks Data Intelligence Platform was an amazing experience,” said Kolodny. “The development environment and performance levels were outstanding, and we received an exceptional level of support from our Databricks contact.”
Beyond this early success, Kolodny and his team decided to take things further by incorporating another increasingly popular solution: dbt. The team hoped dbt could help scale its fast-growing, cluttered data.
“dbt is so much more than an ETL tool,” reported Ofer Helman, Data & AI Tech Lead at Kaltura. “It’s also great for organizing data. We went live quickly and integrated it seamlessly with Databricks. And we soon discovered that it has data quality features built in, which eliminated the need for us to implement a separate solution to address any data quality issues.”
With Databricks and dbt, Kaltura has replaced a legacy data architecture. Today, the company runs dbt alongside Databricks to orchestrate its most complex workflows, resulting in faster processing speeds with less need for human involvement. Delta Tables makes it easier for the team to build and maintain pipelines.
“In our new lakehouse architecture, we save all our Amazon S3 data in Delta Tables,” explained Helman. “All our computing happens in Databricks. We’ve decoupled storage and compute, which has eliminated bottlenecks and reduced our computing costs because we no longer need to keep a huge cluster up and running all the time.”
Moving to dbt forced Kaltura to shift to SQL. But Kolodny and his colleagues believe the transition was worth it.
“dbt offers data lineage, catalog capabilities and testing that we just couldn’t pass up,” said Kolodny. “It was worth redoing our processes to tap into all those features. Once we have data in our lakehouse, we can take it from Delta tables to wherever we need it using dbt as our transformation tool, knowing that the data quality will be exceptional. It’s a game-changer for us.”
This architecture transformation came at a time when Kaltura’s data engineering team went from supporting primarily the company’s cloud TV unit to serving the entire company as part of the platform division. With Databricks and dbt, the team is now in a much better position to be transparent about the processes it is running and to help users investigate and debug these processes themselves.
“Within our dbt site, we’ve added links to the Databricks environment,” said Kolodny. “Kalturians can go to dbt, view all our data models, and see our reports and tables in the catalog. From there, they can easily click through to Databricks and use the Databricks SQL editor. This setup offers a great user experience with excellent performance.”
Across Kaltura, dozens of employees now access the data platform regularly. As computing needs increase, Kaltura can easily scale up resources in Databricks.
“With our previous infrastructure, scaling was a more complex process. Now, if we want to process something huge, we can just click a button to choose a huge scale as a Databricks cluster. We can basically process anything, which is another game-changer for us.”
Flexible platform supports ambitious new use cases
Kaltura’s migration to a lakehouse architecture is paying dividends. The company has already reduced its infrastructure costs by 20% and uses fewer man-hours to maintain its data architecture. In addition, support engineers can now get the data logs they need with five-minute latency, which helps them diagnose technical problems for customers more quickly than before.
“With Databricks, the time from receiving an initial data request to actually bringing data into the system is significantly shorter,” said Helman. “Just as importantly, our team can manage everything ourselves in our own environment thanks to the efficiency of Databricks and dbt.”
Kaltura is also using its lakehouse architecture to support a major new product feature — deeper segmentation of the company’s users and their activities for future learning and feature development.
“We connected our new segmentation feature to our lakehouse, and it worked great,” reported Helman. “The development team got the answers they needed very quickly. They ran huge queries on Databricks and were thrilled with the results. We’ll now offer this segmentation feature to our customers as part of our platform.”
Kaltura plans to deploy Databricks and dbt for additional efficiency and enhancement use cases as well. There are company-wide research projects around cost efficiencies, and the use of AI models to improve products and operations.