Moving to the cloud ushers in a new era of data-driven retailing
Reduction in data pipeline creation time
Faster ETL workloads
INDUSTRY: Retail and consumer goods
SOLUTION: Advertising effectiveness, customer segmentation, product matching, recommendation engines
PLATFORM USE CASE: Delta Lake, ETL
“More business units are using the platform in a self-service manner that was not possible before. I can’t say enough about the positive impact that Databricks has had on Columbia.”
— Lara Minor, Senior Enterprise Data Manager, Columbia Sportswear
Columbia is a data-driven enterprise, integrating data from all line-of-business systems to manage its wholesale and retail businesses across all its brands. However, their legacy ETL and analytics infrastructure was unable to support both batch and real-time use cases at scale, blocking their ability to meet the demands of the business and data teams. After migrating to Databricks, they are now able to process and prepare data more efficiently and reliably — driving valuable insights needed to make smarter business decisions.
Legacy analytics systems that were costly and slow
As the retail industry continues to digitize across all channels, Columbia has been at the forefront of leveraging data across their business lines to impact sales, purchasing, supply chain, and product optimization. For example, they wanted to understand how to leverage insights related to geography, brand affinity, gross margins, and costs to improve operations and make smarter decisions. Or how to leverage customer engagement data from product reviews and comments to inform marketing campaigns and improve customer support.
With troves of data at their disposal, the processing efficiency of both batch and real-time data for downstream analytics and reporting was not meeting internal service level agreements. Hampered by specialty ETL tooling and legacy data warehouses that were siloed and complex to scale, the enterprise information management (EIM) team struggled to efficiently build data pipelines that unlocked access to curated data for various data teams and business stakeholders. Furthermore, their infrastructure was rigid and costly to manage and scale which was problematic as the number of people needing access to data was on the rise.
“Our legacy systems could take weeks to ETL data for analytics and reporting,” explained Lara Minor, a senior enterprise data manager at Columbia Sportswear. “As a result, we were unable to support a variety of use cases, impacting analyst and line-of-business satisfaction.”
With various teams from the executives to data analysts and scientists all vying for company-wide data, they realized that they needed to re-platform their analytics system to the cloud to enable more agility and cost efficiency at scale. They also needed to streamline data preparation and ETL, while making it easier and safer for their stakeholders to access the data they need to make smarter decisions.
Getting data to those who need it as quickly as possible
The EIM team at Columbia decided to move to Microsoft Azure which opened the doors to use Azure Databricks and Delta Lake to upgrade their data processing and analytics capabilities. “We were looking for something that was scalable, elastic, and at a lower cost,” said Minor. “Azure and Databricks met those requirements.”
With Databricks, they are now able to build high-performance ETL pipelines that support batch and real-time workloads. The pipelines feed into Delta Lake which provides secure access to curated data. “Delta Lake provides ACID capabilities that simplify data pipeline operations to increase pipeline reliability and data consistency,” explained Minor. “At the same time, features like caching and auto-indexing enable efficient and performant access to the data.”
Once the data is ingested, it can be directed to various endpoints across the company depending on the end-user and use case. For example, business analysts could connect directly with PowerBI for sales reporting that requires near real-time information on-demand. They could make data accessible via Databricks interactive notebooks for data scientists to explore and train models. Or they could send data to their data warehousing tool for use cases with low latency and high concurrency requirements. Whichever data team needed access to the data, they were confident that the data was reliable and consistent.
Faster data pipelines, shorter time-to-insight
Shortening data processing times is key to rapidly delivering data insights to the business. Databricks has helped Columbia’s EIM team accelerate ETL and data preparation, achieving a 70% reduction in ETL pipeline creation time while reducing the amount of time to process ETL workloads from 4 hours to only 5 minutes, 48x improvement.
With a scalable and performant platform that better supports batch and real-time workloads at their disposal, various data users are now empowered to make smarter decisions that impact business operations without having to be over-reliant on the EIM team.
“One of the benefits of this platform is how fast people can come up to speed on it. All that data is coming in, and more business units are using it across the enterprise in a self-service manner that was not possible before,” stated Minor. “I can’t say enough about the positive impact that Databricks has had on Columbia.”
With curated data at their fingertips, use cases — from forecasting consumer demands to analyzing product reviews to increase customer satisfaction — are being driven by data. As Minor concurs, the sky’s the limit in terms of how the team at Columbia can leverage data to make smarter business decisions and drive the business into the future.