Today we are excited to announce that Delta Sharing is generally available (GA) on AWS and Azure. With the GA release, you can expect the highest level of stability, support, and enterprise readiness from Databricks for mission-critical workloads on the Databricks Lakehouse Platform.
In this blog, we explore how organizations leverage Delta Sharing to maximize the business value of their data, some of the key features available in the GA release, and how to get started with Delta Sharing on the Databricks Lakehouse Platform.
Customers win with the open standard for data sharing from the lakehouse
Data sharing has become important in the digital economy as enterprises look to easily and securely exchange data with their customers, partners, suppliers, and internal lines of business (LOBs) to better collaborate and unlock value from that data. But the lack of a standards-based data sharing protocol has resulted in solutions tied to a single vendor or commercial product, introducing vendor lock-in risks. These customer challenges led us, at Databricks, to build an open data sharing solution, Delta Sharing.
Delta Sharing provides an open solution to securely share live data from your lakehouse to any computing platform. Data recipients don't have to be on the Databricks Lakehouse Platform or on the same cloud or on any cloud at all. Data providers can share existing large-scale data sets based on the Apache Parquet or Delta Lake formats, without replicating or copying data sets to another system. Data recipients benefit from always having access to the latest version of data with the ability to query, visualize, transform, ingest or enrich shared data with their tools of choice, reducing time-to-value. As governance and security are top concerns for many organizations, Delta Sharing is natively integrated with Unity Catalog, allowing you to manage, govern, audit, and track usage of the shared data on one platform.
Since launching Delta Sharing in the private preview last year, hundreds of customers have embraced Delta Sharing, and today, petabytes of data is being shared through Delta Sharing.
Nasdaq: "Delta Sharing helped us streamline our data delivery process for large data sets. This enables our clients to bring their own compute environment to read fresh curated data with little-to-no integration work, and enables us to continue expanding our catalog of unique, high-quality data products" - William Dague, Head of Alternative Data
Shell: "We recognise that openness of data will play a key role in achieving Shell's Carbon Net Zero ambitions. Delta sharing provides Shell with a standard, controlled, and secure protocol for sharing vast amounts of data easily with our partners to work towards these goals without requiring our partners be on the same data sharing platform" - Bryce Bartmann, Chief Digital Technology Advisor
SafeGraph: "As a data company, giving our customers access to our data sets is critical. The Databricks Lakehouse Platform with Delta Sharing really streamlines that process, allowing us to securely reach a much broader user base regardless of cloud or platform" - Felix Cheung, VP of Engineering
YipitData: "With Delta Sharing, our clients can access curated data sets nearly instantly and integrate them with analytics tools of their choice. The dialogue with our clients shifts from a low-value, technical back-and-forth on ingestion to a high-value analytical discussion where we drive successful client experiences. As our client relationships evolve, we can seamlessly deliver new data sets and refresh existing ones through Delta Sharing to keep clients appraised of key trends in their industries." - Anup Segu, Data Engineering Tech Lead
Pumpjack Dataworks: "Leveraging the powerful capabilities of Delta Sharing from Databricks enables Pumpjack Dataworks to have a faster onboarding experience, removing the need for exporting, importing and remodeling of data, which brings immediate value to our clients. Faster results yield greater commercial opportunity for our clients and their partners" - Corey Zwart, Chief Technology Officer
What's new in Delta Sharing with GA?
While Delta Sharing has a slate of amazing features in the GA release, provided below are some of the key features we are shipping with this release:
Seamless Databricks to Databricks Sharing
For Databrick customers, Delta Sharing makes data sharing on the lakehouse extremely simple, efficient and secure. With just a few UI clicks or SQL commands, data providers can easily share their existing data with recipients on Databricks, without replicating the data. For example, a data provider using Databricks on AWS can share existing data with a recipient using Databricks on Azure or vice-versa. You can explore the user guide for full details. In Databricks to Databricks sharing, the data provider does not need to manage token credentials for recipients who are using Databricks; the sharing connection is established securely through the Databricks platform. All you need is a Databricks account to login and the rest is taken care of by the platform. In addition to cross-account data sharing, another important use case is internal data sharing. If you have multiple Unity Catalog metastores under the same account in different regions, you can easily share data among those metastores by using Delta Sharing without copying any data. SQL workflow example from a data provider's perspective:
-- create a share and add a table to it
CREATE SHARE first_share;
ALTER SHARE first_share ADD TABLE my_table AS default.first_table;
-- create a Databricks recipient using their sharing identifier and grant them access to the share
CREATE RECIPIENT acme USING ID 'aws:us-west-2:3f9b6bf4-...-29bb621ec110';
GRANT SELECT ON SHARE first_share TO RECIPIENT acme;
SQL workflow example from a data recipient's perspective:
-- list the providers who shared data with me
-- view the data shared by provider acme_provider
SHOW SHARES IN PROVIDER acme_provider;
-- create a catalog from the share
CREATE CATALOG my_catalog USING SHARE `acme_provider`.`first_share`;
-- query the shared data
SELECT * FROM my_catalog.default.first_table;
Sharing Change Data Feed
Delta Sharing now supports sharing Change Data Feed (CDF). In addition to sharing a table, a data provider can choose to include the table's CDF, allowing recipients to query changes between specific versions or timestamps of the table. With this feature, recipients can query just the new data or the incremental changes instead of the entire table each time. A data provider can easily share a table with CDF, and a data recipient can query table changes with a simple syntax:
-- data provider: sharing a table with CDF enabled
ALTER SHARE my_share ADD my_table AS default.cdf_table WITH CHANGE DATA FEED
-- data recipient: query table changes from versions 5 to 10
SELECT * FROM table_changes('`default`.`cdf_table`', 5, 10)
Enhanced security features
In the GA release of Delta Sharing, we have also a set of security features to make sharing even more secure. One example of those security features is IP Access List. Data providers can now configure an IP access list for each of their recipients using open connectors. It ensures that credential download and data access can only be initiated from the target IP address. We also added a few more Delta Sharing related permissions (e.g. CREATE SHARE, CREATE RECIPIENT) and introduced owner concept for Delta Sharing objects like Share and Recipient. With those primitives, Delta Sharing on Databricks offers a more flexible access control model, and non-admin users can also perform sharing operations.
Getting Started with Delta Sharing on Databricks
Watch the demo below to learn more about how Delta Sharing can help you seamlessly share live data from your lakehouse to any computing platform.