Data Sharing
What is data sharing?
Data sharing is the ability to make the same data available to one or many consumers. Nowadays, the ever-growing amount of data has become a strategic asset for any company. Sharing data - within your organization or externally - is an enabling technology for new business opportunities. Sharing data as well as consuming data from external sources allows to collaborate with partners, establish new partnerships and generate new revenue streams with data monetization.
Traditional data sharing technologies
Firstly, there are technologies such as SFTP (SSH File Transfer Protocol), or cloud object storage that allow the implementation of home-grown solutions. However, SFTP doesn't scale well for a large number of clients and only serves files offloaded to an FTP server. Using pre-signed object store URLs for sharing data scales to the bandwidth of the object store cloud service, but only works for one particular cloud vendor.
Commercial/closed source data sharing offerings
Secondly, there are data sharing solutions baked into vendor products, such as Oracle, AWS Redshift, or Snowflake. These solutions are convenient to use within a product and they share tables instead of files, but they aren't open and therefore don't permit data sharing with a different platform.
Open source, modern data sharing solutions
Open source-based solutions eliminate the lock-in of commercial solutions and bring a number of additional benefits such as community-developed integrations with popular, open source data processing frameworks. In addition, open protocols allow the easy integration of commercial clients, such as BI tools.
Delta Sharing
Delta Sharing is the world's first open protocol for secure data sharing, making it simple to share data with other organizations regardless of which computing platforms they use.
- Share live data directly — Easily share existing, live data in your Delta Lake without copying it to another system.
- Supports diverse clients — Data recipients can directly connect to Delta Shares from Pandas, Apache Spark™, Rust, and other systems without having to first deploy a specific compute platform. Reduce the friction to get your data to your users.
- Security and governance — Delta Sharing allows you to easily govern, track, and audit access to your shared data sets.
- Scalability — Share large scale datasets reliably and efficiently by leveraging cloud storage systems like S3, ADLS, and GCS.
Delta Sharing on Databricks
Databricks natively integrates with Delta Sharing in our Unity Catalog, providing a streamlined experience for sharing data both within and across organizations. Administrators can manage shares using a new CREATE SHARE SQL command or REST APIs and audit all accesses centrally. Recipients can then consume the data from any platform on any cloud.
Delta Sharing: An open ecosystem
The Delta Sharing ecosystem of open source and commercial partners is growing every day. Easily share data with anyone, no matter where.
Learn more about data sharing on Databricks
Sign up to join the Databricks Delta Sharing waitlist for preview access and updates.