Today, we announced the general availability of Databricks on Google Cloud, a jointly developed service that combines an open Lakehouse platform with an open cloud. Since the announcement of Databricks on Google Cloud, we have seen tremendous momentum for this partnership as customers pursue a multi-cloud approach to their analytics and DS/ML workloads. Customers are demanding a simple, unified platform as they move workloads to Databricks on Google Cloud built on open standards with technologies like Delta Lake, MLflow and Google Kubernetes Engine. This GA release is now available in multiple regions in the US and Europe with additional regions coming soon.
What’s new in GA
- The GA release includes several new features:
- Repo and Project support to sync your work with remote Git repository
- Table ACLs that lets you programmatically grant and revoke access to data from Python and SQL
- DB Connect to connect to Databricks from your favorite IDE
- Cluster Tags for DBU Usage tracking
- Notebook scoped libraries to create and share custom Python environments that are specific to a notebook
- Local SSD Support for caching and improved performance
- Tableau connector to Databricks on Google Cloud
- Terraform provider to easily provision and manage Databricks along with associated cloud infrastructure.
Reckitt was among the first to use Databricks on Google Cloud. Databricks delivers tight integrations with Google Cloud’s compute, storage, analytics and management products. This includes the first Google Kubernetes Engine (GKE) based, fully containerized Databricks runtime on any cloud, pre-built connectors to seamlessly and quickly integrate Databricks with BigQuery, Google Cloud Storage, Looker and Pub/Sub. In addition, customers can deploy Databricks from the Google Cloud Marketplace for simplified procurement and user provisioning, Single Sign-On and unified billing. With Databricks on Google Cloud for data and AI workloads, Reckitt unlocks competitive advantages such as cost savings, agility, increased innovation and business continuity planning. Let’s take a closer look.
Reckitt: AI-focused customer analytics platform
Reckitt, a multinational consumer goods company that serves millions of retail customers worldwide, is on a mission to improve their analytics workflows with AI-driven decisions. Reckitt’s struggle was similar to many other companies – they were dealing with tons of data and disjointed pipelines, and each time the team implemented a data science project, they found themselves reinventing the wheel. This led them to make AI a priority at an enterprise-level with a “ubiquitous AI” vision:
To infuse trusted, AI-driven decisions into daily workflows and liberate our people’s limitless potential for innovation
One of the first projects was a Customer Analytics platform aimed at improving marketing ROI across 50 brand-market units in 13 countries with metrics like audience activation and media effectiveness.
Reckitt chose Databricks on Google Cloud to enable their customer analytics efforts. By unifying media data from hundreds of sources for consumer identification, Reckitt is building a highly-modular data platform that can support key use cases such as measuring the performance of a propensity model to drive sales uplift or the impact of first-party data on their conversion funnel.
The above diagram shows Reckitt’s Databricks on Google Cloud Marketing ROI solution architecture. The main features of the diagram include:
- Data Collection: Read structured and unstructured data into BigQuery from 114 unique media datastreams from sources such as Facebook, YouTube and Pinterest and Google Analytics; unstructured data from IoT devices and SaaS applications such as Salesforce is stored in GCS.
- Transformation: Apply business rules and aggregation in Delta Lake and calculate KPIs. Delta Lake allows Reckitt to reuse existing data pipelines from other public clouds since it stores the data in an open-source parquet format that can be easily stored in GCS.
- Analyze: Use SQL Analytics, Cloud Natural Language and MLflow for further analysis.
- Visualize: Data scientists, business analysts and executives use Power BI and Data Studio for visualization. Insights are leveraged downstream across ad platforms, email, CRM and other systems.
DataOps and MLOps are critical drivers of Reckitt’s multi-cloud architecture. Databricks on Google Cloud makes it possible to reuse PySpark scripts and existing data pipelines on Delta Lake, greatly simplifying data engineering and data science at scale. The result is a hyper-targeted set of audiences that are engaged across multiple media channels, measured by ROI uplift across those channels. Audience activations and marketing ROI boost has yielded 44% efficiency in cost-per-view, 11% reduced cost-per-1000 impressions and 10% improved view-thru-rate. Learn more about Reckitt’s data analytics journey here.
Broad partner support
Databricks on Google Cloud is supported by our broad ecosystem of partners who share their commitment to open standards, integrations and solution expertise for Databricks on Google Cloud. These partners bring deep experience in the Databricks Lakehouse architecture for building the AI and ML foundation across targeted industry solutions. We are pleased to have these partners invest in working with us above and beyond to support the GA launch of Databricks on Google Cloud:
- BI Partners: Tableau, Qlik, Looker
- Ingest Partners: Fivetran, Fishtown Analytics, Talend, Qlik, Infoworks, Trifacta, and Informatica
- Catalog Partners: Collibra
- Governance: Immuta, Privacera
- Data Sources: Confluent, MongoDB
- Consulting Partners: Accenture, Cognizant, Deloitte, Insight, SoftServe, Slalom, TCS
Talking with customers through the public preview, it is clear that multi-cloud is a growing strategy for cloud data and analytics workloads. The general availability of Databricks on Google Cloud further advances the potential of multi-cloud with the open, simple Databricks Lakehouse platform that brings analytics workloads to multiple clouds.