Today, we are proud to announce the availability of Databricks on Google Cloud. This jointly developed service provides a simple, open lakehouse platform for data engineering, data science, analytics, and machine learning. It brings together the Databricks capabilities customers love with the data analytics solutions and global scale available from Google Cloud.
Open data platform meets open cloud
Databricks and Google Cloud share a common vision for an open data platform built on open standards, open APIs, and open infrastructure. With this partnership, organizations get the choice and flexibility to manage infrastructure and access data with the tools they need across cloud and on-premises environments. By adopting open frameworks and APIs, customers get the benefits of open source combined with managed cloud analytics and AI products.
What does our new partnership mean for customers? Enterprises can now implement the Databricks Lakehouse Platform on Google Cloud — made possible by Delta Lake on Databricks. Delta Lake adds data reliability to data lakes with ACID transactions and versioning and better data governance and query performance to data in Google Cloud Storage. This announcement helps enterprises unify their analytics infrastructure on Google Cloud so they can simplify management for all data applications, including real-time streaming, SQL workloads, business intelligence, data science, machine learning, and graph analytics.
The open cloud approach also improves interoperability and portability for enterprises that want to use multiple public clouds for analytics applications. A recent Gartner study concluded that at least 80% of enterprises had adopted a multi-cloud strategy across multiple geographies. The multi-cloud capability of Databricks allows customers to increase the efficiency and productivity of data processes, improve customer experiences, and create new revenue opportunities even when data is distributed across more than one cloud. For example, one leading global fast-food company (and Google Cloud customer) wants to build and deploy marketing solutions, such as churn reduction, behavioral segmentation, and lifetime value for about a dozen global markets by year-end 2021. By architecting a global data platform with Databricks, they will provide each regional business with a choice for their public cloud platform.
Databricks is tightly integrated with Google Cloud compute, storage, analytics, and management products to give customers a simple, unified experience with high performance and enterprise security.
Compute and Storage: Built on Google Kubernetes Engine (GKE), Databricks on Google Cloud is the first fully container-based Databricks runtime on any cloud. It takes advantage of GKE’s managed services for the portability, security, and scalability developers know and love. Read/write access to GCS from Databricks allows customers to execute workloads faster and at lower costs.
Analytics: Databricks has an optimized connector with Google BigQuery that allows easy access to data in BigQuery directly via its Storage API for high-performance queries. The connector has support for additional predicate pushdown, querying named tables and views, and for directly running SQL on BigQuery and loading the results in an Apache Spark™ DataFrame. Also, Looker’s integration with Databricks and support for SQL Analytics, along with an open API environment on Google Cloud, complements the open, multi-cloud architecture. This integration gives Looker users the ability to directly query the data lake, providing an entirely new visualization experience.
Security and Administration: Experience a simplified deployment from the Google Cloud Marketplace with unified billing and one-click setup inside the Google Cloud console. Databricks’ integration with Google Cloud Identity allows customers to simply use their Google Cloud credentials for single sign-on and user provisioning on Databricks.
Put Databricks to work on Google Cloud
Some of the most innovative use cases for Databricks on Google Cloud are in retail, telco, media and entertainment, manufacturing, and financial services. Across every industry, data is driving digital transformation initiatives. With the lakehouse architecture, Databricks and Google Cloud customers are finding new ways to accelerate data-driven innovation.
Here are some of the most popular workloads customers are using Databricks for today. To learn more about industry-specific use cases, visit the Industry Solutions page.
Data lake modernization
Delta Lake on Databricks provides a modern foundation to transition from expensive, hard-to-scale on-premises systems to well-architected Google Cloud Storage–based data lakes. In fact, companies that have migrated to Databricks from a cloud-based Hadoop service realize up to 50% performance improvement in data processing and 40% lower monthly infrastructure cost. Moving to Databricks on Google Cloud helps customers reduce administrative overhead, quickly scale up or down compute resources and reduce operational costs with autoscaling and job termination.
Scalable data processing to prepare data for analytics
Databricks simplifies your ETL architecture and lowers costs to ingest and process data using a high-performance runtime on clusters optimized for data processing at scale. With Delta Lake, you can reliably store all data (structured, semi-structured, and unstructured) in raw format and incrementally move it through the transformation stages to an aggregated, BI-ready tier with ACID guarantees.
Reliable analytics on the data lake
Customers use Delta Lake on top of data lakes based on Google Cloud Storage file-store to bring reliability, performance, and lifecycle management. Delta Lake helps prevent data corruption, run faster queries, improve data freshness, and reproduce ML models, allowing customers to always trust their data for analytical insights. In addition, Databricks provides Delta Engine to significantly accelerate query performance on data lakes, especially those enabled by Delta Lake.
Data science and machine learning
Managed MLflow on Databricks allows data teams to track all experiments and models in one place, publish dashboards, and facilitate handoffs with peers and stakeholders across the entire workflow — from raw data to insights. Databricks’ collaborative workspace allows data teams to explore data, share insights, run experiments, and build ML models faster to be more productive.
The launch of Databricks on Google Cloud is a win-win for customers. The tight integration of Databricks with Google Cloud‘s analytics and AI products delivers a broad range of capabilities — with more to come. Together, we will continue to innovate and support customers in building intelligent applications that solve tough data problems.
If you are interested in Databricks on Google Cloud, request access via the product page. To learn more, visit us at the launch event hosted by TechCrunch where Ali Ghodsi and Thomas Kurian share their vision from this partnership and the benefits to customers.