At Databricks, we know that data is one of your most valuable assets and always has to be protected, which is why many customers want the guarantee of private networking from their users to their data and back again. That’s where Private Link comes in. On AWS it’s called PrivateLink, on Azure Private Link, and on GCP Private Service Connect. For simplicity, we’ll refer to all three as Private Link here.
Using Private Link with Databricks provides the following benefits:
End-to-end private networking – With Private Link, you can set up Databricks workspaces that route traffic privately from your users to your data and back again. Routing traffic on private networks substantially reduces the risk of accidental firewall misconfiguration or traffic inspection by very advanced attackers.
Data exfiltration protection – Private Link endpoints grant access to specific resources, allowing you to tightly control network access. In the event of a security incident within your network, only the mapped resource would be accessible, greatly reducing the attack surface for data exfiltration.
Meet compliance requirements – With Private Link, you can set up a secure perimeter around your data so that it is only processed in trusted private networks. This helps you to meet compliance requirements for even your most sensitive workloads.
How it works
Databricks supports private networking via the following services:
You can leverage cloud services like AWS DirectConnect, Azure ExpressRoute and Google Cloud Interconnect to route traffic from your own private network to the cloud over a dedicated connection that never touches the public internet. Some customers route user traffic by running their VPN in their chosen cloud. Once traffic arrives there, you can connect via Private Link to cloud services including Databricks. The magic that resolves the public addresses for these cloud services to your own private IP addresses is DNS resolution or forwarding (recommended).
To enable future scale out, customers often use a hub-and-spoke model to connect their private networks together. Once traffic arrives at the Databricks Private Link service, you can choose from several enforcement options:
Allow access to your workspace from specific private endpoints in your cloud account (recommended)
Allow access to your workspace from any private endpoint in your cloud account
Allow access to your workspace from specific private and public networks
Please see our documentation on AWS and Azure for step-by-step instructions on how to configure Private Link for your Databricks workspaces.
Security without Private Link
Some customers want similar protections but aren’t able to deploy Private Link. These customers can leverage hybrid environments, whereby the control plane is protected by IP access lists (AWS, Azure, GCP) and the classic data plane is secured via firewall protections you can apply to meet similar goals (AWS, Azure, GCP). You can use Private Link to connect to your data and for your clusters and SQL warehouses to connect back to the control plane. These hybrid environments are a good fit when you want to be able to connect tools that aren’t Private Link ready to your Databricks workspace.
In addition, because the Databricks Lakehouse Platform is 100% cloud native, the traffic between the different components of the platform stays on the cloud provider’s global network. Please refer to the AWS, Azure and GCP documentation for more information.