Changing the way the world buys, sells and uses vehicles
Cox Automotive Europe is part of Cox Automotive, the world’s largest automotive service organization, and is on a mission to transform the way the world buys, sells, owns and uses vehicles. They work in partnership with automotive manufacturers, fleets and retailers to improve performance and profitability throughout the vehicle lifecycle. Their businesses are organized around their customers’ core needs across vehicle solutions, remarketing, funding, retail and mobility. Their brands in Europe include Manheim, Dealer Auction, NextGear Capital, Modix and Codeweavers.
Cox’s enterprise data services team recently built a platform to consolidate the company’s data and enable their data scientists to create new data-driven products and services more quickly and easily. To enable their small engineering team to unify data and analytics on one platform while enabling orchestration and governance, the enterprise data services team turned to the Databricks Data Intelligence Platform, Workflows, Unity Catalog and Delta Sharing.
Easy orchestration and observability improve ability to deliver value
Cox Automotive’s enterprise data services team maintains a data platform that primarily serves internal customers spanning across business units, though they also maintain a few data feeds to third parties. The enterprise data services team collects data from multiple internal sources and business units. “We use Databricks Workflows as our default orchestration tool to perform ETL and enable automation for about 300 jobs, of which approximately 120 are scheduled to run regularly,” says Robert Hamlet, Lead Data Engineer, Enterprise Data Services at Cox Automotive.
Jobs may be conducted weekly, daily or hourly. The amount of data processed in production pipelines today is approximately 720GB per day. Scheduled jobs pull from different areas both within and outside of the company. Hamlet uses Databricks Workflows to deliver data to the data science team, to the in-house data reporting team through Tableau, or directly into Power BI. “Databricks Workflows has a great user interface that allows you to quickly schedule any type of workflow, be it a notebook or JAR,” says Hamlet. “Parametrization has been especially useful. It gives us clues as to how we can move jobs across environments. Workflows has all the features you would want from an orchestrator.”
Hamlet also likes that Workflows provides observability into every workflow run and failure notifications so they can get ahead of issues quickly and troubleshoot before the data science team is impacted. “We use the job notifications feature to send failure notifications to a webhook, which is linked to our Microsoft Teams account,” he says. “If we receive an alert, we go into Databricks to see what's going on. It’s very useful to be able to peel into the run logs and see what errors occurred. And the Repair Run feature is nice to remove blemishes from your perfect history.”
Unity Catalog and Delta Sharing improve data access across teams
Hamlet’s team recently began using Unity Catalog to manage data access, improving their existing method, which lacked granularity and was difficult to manage. “With our new workspace, we're trying to use more DevOps principles, infrastructure-as-code and groups wherever possible,” he says. “I want to easily manage access to a wide range of data to multiple different groups and entities, and I want it to be as simple as possible for my team to do so. Unity Catalog is the answer to that.”
The enterprise data services team also uses Delta Sharing, which natively integrates with Unity Catalog and allows Cox to centrally manage and audit shared data outside the enterprise data services team while ensuring robust security and governance. “Delta Sharing makes it easy to securely share data with business units and subsidiaries without copying or replicating it,” says Hamlet. “It enables us to share data without the recipient having an identity in our workspace.”
Looking ahead: incorporating additional lakehouse features
Going forward, Hamlet plans to use Delta Live Tables (DLT) to make it easy to build and manage batch and streaming data pipelines that deliver data on the Databricks Data Intelligence Platform. DLT will help data engineering teams simplify ETL development and management. Eventually, Hamlet may also use Delta Sharing to easily share data securely with external suppliers and partners while meeting security and compliance needs. “DLT provides us an opportunity to make it simpler for our team. Scheduling Delta Live Tables will be another place we’ll use Workflows,” he says.
Hamlet is also looking forward to using the data lineage capabilities within Unity Catalog to provide his team with an end-to-end view of how data flows in the lakehouse for data compliance requirements and impact analysis of data changes. “That’s a feature I'm excited about,” Hamlet says. “Eventually, I hope we get to a point where we have all our data in the lakehouse, and we get to make better use of the tight integrations with things like data lineage and advanced permissions management.”